Taxonomic profiling of whole metagenome sequencing datasets
In a recent study, Willmann and colleagues investigated the impact of ciprofloxacin treatment on the intestinal microbiome. Antimicrobial treatments are known to upset the structure of the intestinal microbiome. Taxonomic profiling can reveal the severity and long term effects of antibiotic treatments on the gut community.
Shotgun metagenomics sequencing is a powerful tool for investigating bio-diversity of microbial communities. Common challenges include unwanted signals from host DNA reducing the sensitivity at which the microorganisms can be detected. In addition, large data volumes and the complex structure of metagenomics data further complicate computational analysis.
Using the original data of Willmann et al. we demonstrate the new taxonomic profiler designed for metagenomics data analysis introduced in CLC Microbial Genomics Module 2.0.
Two healthy male volunteers (S1 and S2) were treated with oral ciprofloxacin (Cp) for six days (500 mg twice daily). Stool samples were collected before treatment (baseline), on day 1, 3 and 6 of treatment, and 2 and 28 days after treatment.
Metagenomic shotgun sequencing was performed on the Illumina HiSeq 2000 platform using a paired-end sequencing approach with a targeted read length of 100 bp and an insert size of 180 bp.
Using QIAGEN CLC Genomics ProSuite, we track the evolution of microbiome diversity during and after treatment by analysing the presence and relative abundance of microbial taxa.
We studied how the community recovers after treatment by comparing microbial diversity and taxonomic composition over time. Furthermore, we show how differential abundance analysis can be used to identify the taxa that are most impacted by Cp treatment.
This application note focuses on taxonomic profiling of whole metagenome shotgun sequencing data and the comparative analysis of microbiome profiles using CLC Microbial Genomics Module. This module is an extension of CLC Genomics Workbench and part of QIAGEN CLC Genomics ProSuite. In this analysis, we used CLC Genomics Workbench (version 10) and CLC Microbial Genomics Module (version 2).
Raw fastq-files and metadata were downloaded directly through CLC Genomics Workbench using the Search for Reads in SRA tool (ENA project number PRJEB10391).
To build a suitable database of reference genomes for taxonomic profiling, we used the Create Microbial Reference Database function. This new tool enables users to easily build and download a reference database containing the latest version of all complete microbial genomes, or meaningful subsets thereof, from NCBI. This reference data manager allows the user to filter reference genomes for assembly quality and taxonomic content. The database used here contained 45,616 microbial genome sequences.
To get started faster CLC Microbial Genomcis Module 2.0 includes 3 preconfigured and ready-to-use Workflows. The Data QC and Taxonomic Profiling workflow performs sequence trimming and taxonomic profiling using the user-build reference database created earlier (Figure 1). To reduce the impact of contamination the workflow includes an option to filter out reads originating from host DNA. The workflow outputs sequencing quality control (QC) reports, and reports from trimming the NGS reads. Microbial community composition is reported in the form of abundance tables and interactive graphics (sunburst plots, stacked bar and stacked area charts) for each sample.
To compare study cohorts or groups of samples abundance profiles can be merged per experiment into a single profile. We merged abundance profiles for each subject and one for all samples combined.
Figure 1. The Data QC and Taxonomic Profiling workflow.
Figure 2. Sunburst chart showing the relative abundance of bacterial within the phylum Firmicutes in S1. As shown in the legend 15% of all taxa identified in this sample group are Firmicutes.
Alpha diversity and rarefaction analysis
Sequencing at sufficient depth is crucial for recovering the complete microbial diversity inherent to a sample. Rarefaction analysis should be performed to assess if sequencing coverage was sufficient. Rarefaction curves plot the number of taxa as a function of the sub-sampled reads. The rarefaction curves in Figure 3, produced with the Alpha Diversity tool, all reach a plateau, indicating that sequencing was performed to a sufficient level.
Several studies have shown the diversity of the gastrointestinal flora remains diminished 8 weeks after treatment [Fouhy et al.] and does not recover completely even 3 months after treatment ends [Panda et al.] – so a complete restoration of the initial diversity is not to be expected within 4 weeks after treatment.
To analyse the impact of Cp treatment on microbiome diversity we have estimated alpha diversity (or diversity withing a sample). The plot in Figure 3 shows that at baseline level individual S1 had a more diverse intestinal microbiome compared to S2. Cp treatment reduced the taxonomic diversity of the gut microbiome severly for subject S1 by day 6. Subject S2 was less affected. Both S1 and S2 recovers almost completely 28 days after treatment. This is in agreement with observations in the original paper: “Distal gut species diversity was profoundly disturbed over the course of Cp exposure in subject 1[…]. The lowest diversity occurred at the last treatment day and in the following 2 days but recovered almost fully 28 days after treatment. Subject 2 generally was much less affected […]”.
Figure 3. Alpha diversity analysis shows that gut microbiome of S1 was profoundly disturbed by Cp treatment. For alpha diversity estimation we use a non-absolute diversity metric. Accurate counts of different taxonomic units can be obtained from the abundance tables.
Beta diversity estimation compares microbial community composition between samples and clusters samples based compositional similarity. To assess the change in microbiome composition across different time-points, we performed beta diversity analysis on the abundance profiles. The Beta Diversity tool offers three diversity measures (Bray-Curtis, Jacquard and euclidian) to calculate distance matrices. The results are shown in Figure 4 and 5, combining the beta-diversity as a bar chart and a PCoA plot for the Bray-Curtis distance. While the bar chart provides details of the relative abundance of taxa, here grouped by phylum, the PCoA plots show the development of microbiome composition over time.
Figure 4. Stacked bar chart showing the gut microbiome diversity of S1 and S2 on phylum level. Samples are grouped by subject, and individual bars are labelled with subject and treatment day.
Subjects S1 and S2 respond to Cp treatment with a similar reduction in the abundance of Proteobacteria. However, for the four other dominating phyla Verrucomicrobia, Firmicutes, Bacteroidetes, and Actinobacteria, subjects S1 and S2 respond differently.
The phylum Verrucomicrobia is only present in S1 at time-point 1 and 2 and recovers at time-point 6. Two weeks after treatment this phylum is present at much higher relative abundance (purple portion of the bar chart) than before the Cp exposure. This agrees with the observation that “Among the five most common phyla, both subjects responded with a similar decrease in abundance of Proteobacteria, but distinct from levels for Verrucomicrobia, Firmicutes, Bacteroidetes, and Actinobacteria” (see Figure. 6 in Willmann et al.)
PCoA analysis of S1 displays the baseline point and point 1 very close together, indicating that on the first day of treatment diversity is mildly affected. However, from day 3 the profile deviate visibly along the first principal coordinate, with day 6 and 8 being most different to the baseline and closer to each other. On day 34, the profile is much closer to baseline again, indicating the partial recovery of the microbial composition after treatment. S2 follow a very similar pattern, but recover to a state much closer to baseline than S1.
Figure 5. PCoA plots of the beta diversity between S1 samples (top) and S2 samples (bottom). Baseline is shown in green, during treatment (day 1, 3 and 6) in blue and post treatment (2 and 28 days after treatment start) in red, and is labeled with the treatment day.
Differential abundance analysis
To determine which taxa were most impacted by exposure to Cp in subjects S1 and S2 we performed differential abundance analysis using the Differential Abundance Analysis tool. CLC Microbial Genomics Module 2.0 feaures state-of-the-art analytics for determining differential abundance between microbial communities. Results can be filtered by fold change or p-values.
Fold-changes at family-level and FDR-corrected p-values are shown in Figure 6. Eubacteriaceae were significantly more abundant in S2, while Oscillospiraceae and Akkermansiaceae were more abundant in S1.
Figure 6. The output of differential abundance analysis on family level, comparing S1 and S2 while correcting for treatment phase.
This study has shown that ciprofloxacin treatment significantly alters the intestinal microbial community structure. While the intestinal microbiota recovers extensively post treatment, the results also highlight that microbiome diversity varies profoundly among healthy individuals.
Here we have demonstrated how use of the new Taxonomic Profiling tool enables the analysis of large shotgun metagenomics sequencing dataset and delivers results that are concordant to findings generated by independent means. Ease of use is ensured through preconfigured Workflows. Built-in statistical tools and data visualisations enable comparisons across multiple samples or sample groups and provide clear and interactive graphical results.
[Willmann et al.] “Antibiotic selection pressure determination through sequence-based metagenomics.” Antimicrobial agents and chemotherapy 59.12 (2015): 7335-7345.
[Panda et al.] “Short-term effect of antibiotics on human gut microbiota.” PloS one 9.4 (2014): e95476.
[Fouhy et al.] “High-throughput sequencing reveals the incomplete, short-term recovery of infant gut microbiota following parenteral antibiotic treatment with ampicillin and gentamicin.” Antimicrobial agents and chemotherapy 56.11 (2012): 5811-5820.
[Zurfluh et al.] “Quinolone resistance mechanisms among extended-spectrum beta-lactamase (ESBL) producing Escherichia coli isolated from rivers and lakes in Switzerland.” PloS one 9.4 (2014): e95864.
For further reading check out our Microbial Genomics Solution.