NAME

sync_reads - resynchronize paired FASTQ files

SYNOPSIS

sync_reads [options] --fwd left_reads --rev right reads

DESCRIPTION

sync_reads will re-synchronize two FASTQ files containing paired reads which are no longer in sync due to individual removal of reads during pre-processing (trimming, filtering, etc). In this case, "in sync" means that both files have the same number of reads and, at any given read position in the files, the corresponding reads represent proper pairs. The resulting files will contain matching reads in order (assuming the input files were properly ordered). It will optionally print out unpaired reads to separate files. Memory usage is not dependent on the input file size but rather the maximum distance between paired reads in the two files, as the read cache is flushed each time paired reads are identified. In the worst-case scenario (one file has a single read that pairs with the last read in the matching file) memory usage can approach the largest file size, but in typical usage it rarely exceeds a few MB regardless of file size.

IMPORTANT: Reads in input files MUST be in the same order, aside from missing reads, or the output will report many valid pairs as singletons.

OPTIONS

Mandatory

--fwd forward_fastq: Specify FASTQ file containing the first of the trimmed read pairs
--rev reverse_fastq: Specify FASTQ file containing the second of the trimmed read pairs

Optional

--fwd_out filename: Specify output name for synced forward reads
--rev_out filename: Specify output name for synced reverse reads
--fwd_singles_out filename: Specify output name for forward singleton reads
--rev_singles_out filename: Specify output name for reverse singleton reads
--sync_suffix suffix: Specify suffix to add to synced read output files. This will be added to the input read name before the final suffix (i.e. after the last period). Default is 'sync'.
--compress gzip|dsrc: Specify type of compression for output files (will compress all output files)
--singles: If given, unpaired reads will be written to separate output files. Default is FALSE.
--singles_suffix suffix: Specify suffix to add to singles read output files. This will be added to the input read name before the final suffix (i.e. after the last period). Default is 'singles'.
--help: Display this usage page
--version: Print version information

CAVEATS AND BUGS

Currently no input validation is performed on the input files. Files are assumed to be standard FASTQ file format with each read represented by four lines and no other extraneous information present. CRITICALLY, they are also assumed to be in the same input order after accounting for deleted reads (the software will fail miserably if this is not the case).

Please submit bug reports to the issue tracker in the distribution repository.

AUTHOR

Jeremy Volkening (jeremy.volkening@base2bio.com)

COPYRIGHT AND LICENSE

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.