QIAGEN powered by

UHGG database for taxonomic profiling contains sequences with long unintentional stretches of N’s

The Unified Human Gastrointestinal Genome database, UHGG (v2.0.1), available from Download Curated Microbial Reference Database, contains 200+ genomes where by mistake all or large parts of the sequence consists of N’s. Both the sequence list and taxonomic profiling index are affected.

When using this database for Taxonomic Profiling, zero reads will map to these stretches of N’s. Consequently, the affected strains or species will go undetected by the analysis, or their abundance counts will be underestimated.

Examples of affected genomes are Escherichia coli, Listeria monocytogenes, Pseudomonas aeruginosa, Salmonella enterica, and Staphylococcus aureus.


If you have run Taxonomic Profiling on samples using the UHGG (v2.0.1) taxonomic profiling index, we recommend that you rerun your data once a corrected version of the UHGG database has been released. Alternatively, use one of the QMI-PTDB databases also available from Download Curated Microbial Reference Database.

How to check which reference index you used for taxonomic profiling results

Open the taxonomic profiling abundance table output and go to History View by clicking on the Show History icon under the View area. Select Taxonomic Profiling. The name of the applied reference index is listed under Parameters.  If this states UHGG (v2.0.1), your results are affected.

Affected database