Accurate identification of protein coding regions in metagenomic sequences is challenging. The MetaGeneMark-2 plugin relies on an innovative approach to solve the parameter estimation problem that conventional gene finding algorithms face due to short contig length and absence of contig’s genomic context.
GENE PROBE Inc., the developers of MetaGeneMark, have created and refined algorithms for gene prediction in metagenomic sequences for more than fifteen years. The MetaGeneMark-2 plugin is further optimized for gene finding in anonymous metagenomic sequences. Our tests show that MetaGeneMark-2 reduces nearly twice the rate of false negative predictions, missed genes, in comparison with MetaGeneMark, where it was estimated to be 2.7%.
MetaGeneMark-2 (metagenomic gene caller with precomputed sets of model parameters) is an ab initio computational tool designed to predict intronless protein coding genes in metagenomic sequences. Parameters of high order statistical models of protein coding and non-¬coding regions are precomputed for each possible sequence composition characterized by the sequence GC content. This heuristic method essentially reconstructs genomic context of a given short anonymous sequence (Zhu et al., 2010*). MetaGeneMark-2 implements the Viterbi algorithm for hidden semi-Markov model describing functional and structural organization of a metagenomic sequence.
MetaGeneMark-2 besides the standard mode of “Gene prediction in prokaryotic metagenomes (genetic code 11)” provides also a mode: “Gene prediction in eukaryotic metatranscriptomes” (Genetic code 1)
*Zhu W., Lomsadze A. and Borodovsky M. Ab initio gene identification in metagenomic sequences.
Nucleic Acids Research, 2010, Vol.38, No.12, e132, doi: 10.1093/nar/gkq275
For analysis of complete (draft) genomes GeneMark Gene Finding provides a software tool GeneMark.hmm-2 designed for uniform in composition sequences that belong to a single species. In this case parameters of the statistical model can be chosen from a set of species-specific models provided along with the gene finding algorithm. GeneMark.hmm-2 will be useful for accurate gene prediction in genomes of close relatives of the species represented in the model library supplied together with the plugin. Additional option is to use a “Universal model’ either for bacterial or archaeal genome. The universal models are precomputed in the same fashion as the metagenomic models. Original publications describe the theory behind the algorithm*.
*Lukashin A. and Borodovsky M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Research, 1998
*Lomsadze A., Gemayel K., Tang S. and Borodovsky M.
Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes.
Genome Res., 2018, doi: 10.1101/gr.230615.117.