<<

NAME

rm_gaps - remove gappy columns from multiple alignment

SYNOPSIS

rm_gaps [--cutoff <float> --min_len <int>] < aligned.fasta > out.fasta

DESCRIPTION

Reads a multiple alignment in FASTA format from STDIN, removes gap columns according to several criteria and prints to STDOUT the modified multiple alignment as FASTA.

PREREQUISITES

Requires the following non-core Perl libraries:

OPTIONS

--cutoff float

Minimum fraction of sequences containing a gap at a position for that column to be considered "gappy" and removed (default: 0.1)

--min_len int

Minimum length of a sequence "island" (that is, the region between "gappy" columns) to be output. Islands shorter than this will be discarded. This option can be used to filter out short aligned positions (e.g. 1 or several nucleotides long) within otherwise poorly aligned regions. (default: 1)

--remove_unknown

If given, unknown bases ('N') will be treated the same as gaps for the purpose of cutoff calculations. In other words, the --cutoff threshold will refer to the minimum number of bases that are either gaps or unknowns in order to quality for removal.

--blocks filename

If given, a tab-delimited outfile will be written to the specified filename containing two columns. Each row represents a block of positions relative to the input alignment that are preserved in the trimmed output. The first row is the start coordinate of a block, and the second row contains the end coordinate. This is used by some downstream software to map positions between the stripped and unstripped alignments.

CAVEATS AND BUGS

Please submit bug reports to the issue tracker in the distribution repository.

AUTHOR

Jeremy Volkening (jeremy.volkening@base2bio.com)

LICENSE AND COPYRIGHT

Copyright 2014-23 Jeremy Volkening

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

<<