minimeta - assembler for long-read metagenomic/metatranscriptomic data sets
minimeta --in <reads.fq> --out <consensus.fasta>
Produces a polished consensus assembly from long-read sequencing data using miniasm, racon, and medaka. Software settings are tuned for metagenomic/metatranscriptomic assemblies of variable, sometimes low, coverage.
Requires the following non-core Perl libraries:
Additionally, the following external programs are required for one or more of the optional processing modules (errors will be thrown for missing programs only if that module is requested). All optional dependencies are available in Bioconda.
Path to input reads in FASTx format (required)
Path to existing assembly. If provided, assembly is skipped and only polishing is performed (default: none).
Path to reference FASTA file used by homopolish. Providing this filename also triggers polishing using homopolish (default: none).
Path to write consensus sequence to (as FASTA) [default: STDOUT]
Minimum read coverage required by assembler to keep position (default: 2)
Minimum contig length to keep (default: 1)
If given, final assembly positions with coverage depth below this value will be hard masked with 'N' (default: off)
If given in conjunction with --mask_below
,
splits contigs at masked regions into smaller pieces.
(default: off)
If given in conjuction with --split
,
only splits low coverage regions if one or both junctions is at a homopolymer stretch (default: off)
Number of processsing threads to use for mapping and polishing (default: 1)
Number of Racon polishing rounds to perform (default: 3)
Number of Medaka polishing rounds to perform (default: 1)
Name of model to be used by medaka_consensus (based on basecalling model used for data) (default: depends on medaka version)
Batch size (medaka_consensus parameter -b) for medaka to use; using a smaller value should reduce memory consumption (default: 100)
For re-assemblies, the maximum length of pseudo-reads to generate as an absolute value; the actual value will be the minimum of this and the value of --shred_max_frac times the actual contig length (default: 2000)
For re-assemblies, the maximum length of pseudo-reads to generate as a fraction of the contig length; the actual value will be the minimum of this and the value of --shred_len (default: 0.66)
For re-assemblies, the target depth of the pseudoreads on each contig; this is used to calculate how many reads to generate (default: 10)
Name of model to be used by homopolish. Has no effect if --homopolish not used. (default: R9.4.pkl)
Don't randomly shuffle input reads prior to assembly (default: shuffle)
Trim long poly-N stretches from reads prior to assembly (default: off)
Perform one or more rounds of pseudo-assembly in order to minimize redundancy. For each round, the existing assembly is shredded into pseudoreads and reassembled.
If this option is given, input reads will be split into chunks of --chunk_size reads and each chunk will be assembled independently. The resulting assemblies will be combined, shredded into pseudoreads, and reassembled.
Use a fixed seed for random processes such as shuffling (default: off)
Apply a reduction algorithm to the pre-final assembly to remove redundant contigs (i.e. contigs mostly or completely overlapping with identity above a cutoff specified by --min_ident. Currently this is done using Redundans, which is required to be installed. (default: off)
Minimum identity (0 to 1) between contigs required to remove shorter contig during redundancy reduction. (default: 0.8)
During all-vs-all mapping, discard minimizers occurring above this frequency. This is the -f parameter to minimap2, and can be useful with high-coverage input datasets that may otherwise consume very large amounts of memory and time. A value between 1000 and 10,000 may be useful in these cases. (default: off)
Don't write status messages to STDERR
Print usage description and exit
Print software version and exit
Please submit bug reports to the issue tracker in the distribution repository.
Jeremy Volkening (jeremy.volkening@base2bio.com)
Copyright 2021-23 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.