QIAGEN CLC Single Cell Analysis Module provides tools and workflows for processing scRNA-seq data from raw FASTQ files to clusters of cells with annotated cell types and differentially expressed genes. In addition, functionality for the complete analysis of clonotypes from scTCR-seq data with capabilities to filter, compare and overlay information on dimensionality reduction plots is provided. The module is a part of QIAGEN CLC Genomics Premium, our complete, full-feature package for ‘omics data analysis.
Watch this webinar to learn how to use QIAGEN CLC Single Cell Analysis Module for single-cell gene expression analysis starting from either FASTQ or expression data.
Algorithms in QIAGEN CLC Single Cell Analysis Module have been implemented to scale to large datasets and run on a wide range of hardware. Read more about the implemented algorithms and options in the manual.
Figure 1. The Single Cell Analysis tools. The tools and workflows provide capabilities to analyze both scRNA-seq and scTCR-seq data from FASTQ to expression,cell type prediction and cell clonotypes.
Single Cell Analysis Importers
Several importers for different expression matrix formats, clonotypes as well as accompanying cell and cluster annotations are provided.
Figure 2. Single-cell data importers available through the QIAGEN CLC Single Cell Analysis Module.
Creating the expression data matrix
Two steps are needed to create the expression data matrix: 1) Annotating reads with cell barcode and Unique Molecular Index (UMI) information, and 2) mapping the reads to reference.
Read mapping and counting uses a specialized version of the RNA-seq analysis tool from QIAGEN CLC Genomics Workbench. This tool has achieved top ratings in independent benchmarks (Baruzzo et al., 2017; Donato et al., 2021).
The approach maps reads to the transcriptome, genome and provided spike-ins. Mapping to the genome in addition to the transcriptome removes noise. Multi-mapping reads are placed using an expectation-maximization approach. A comprehensive report includes the types of features in the data (mRNA, lncRNA, etc.) and the correlation of expressions with known spike-in concentrations.
The first part of the analysis includes Quality Control and normalization steps.
Empty droplet detection, as well as doublet removal, is recommended for droplet-based approaches such as 10x Genomics.
Quality Control (QC) steps allow setting thresholds for various criteria for consideration of high-quality cells in downstream analysis. Some of the plots produced are shown below.
Figure 3. Plots from the quality control report.
Normalization of data affects all downstream analyses. The implementation is well suited for removing batch effects.
Figure 4. Batch correction: Several clusters are observed for each of two samples. After batch correction, clusters contain a mixture of both samples.
UMAP and tSNE are de facto standards for visualizing single-cell expression data. Our interactive 2D and 3D visualizations can be overlaid with cluster information, cell annotations and gene expressions.
Figure 5. UMAP of single-cell data from 8393 liver cells (MacParland et al., 2018 ), colored by predicted cell types.
Clustering is performed using the graph-based Leiden algorithm.
Cell-type prediction is traditionally performed on clusters of cells. However, this has the disadvantage that errors in clustering, or simply too coarsely-grained clusters, can lead to imprecise annotations. QIAGEN CLC Single Cell Analysis Module provides a cell classifier that annotates individual cells. The classifier has been trained on large single-cell projects from human and mouse experiments, annotated with the QIAGEN Cell Ontology.
Figure 6. The QIAGEN Cell Ontology browser aids selection of cell types when performing manual curation. The ontology is also supported in the pre-trained classifier.
The plot editor enables expression analysis, manual annotation and has many visualization options.
Manual annotation can be performed with just a few clicks by using the Lasso tool. Comprehensive filtering and selection options make it easy to select the correct cells.
Differential gene expression between pairs of selected clusters or between a cluster and the rest of the cells can be quickly launched using the UMAP or tSNE plot editor and the expression of these genes can be visualized in the form of various expression plots (volcano plot, heat map, dot plot, violin plot). Differential gene expression can be used for GO analysis, guiding additional manual cluster annotation. Pathway analysis can be performed by uploading differential gene expression results to QIAGEN Ingenuity Pathway Analysis (IPA).
Figure 7. Heat Map, Dot Plot and Violin Plot of data where cell types were predicted using our pre-trained classifier.
QIAGEN CLC Single Cell Analysis Module offers the analysis of clonotypes at the cell level. Clonotypes are predicted from De Novo assembled VDJ regions.
Chains are identified and the V, J and CDR3 regions are annotated. Filters on different criteria can be applied, e.g., filer to cells from scRNA-seq cells, remove non-productive clonotypes, keep only TRA and TRB chains, etc.
Clonotypes can also be combined and compared across sample/treatment and other groupings.
Reports summarize the identified clonotypes, with information about diversity, V and J gene usage, CDR3 length distribution and clonotype frequency.
Figure 8. Plots from the immune repertoire reports.
Finally, the clonotypes can be converted to cell annotations and overlaid in UMAP and tSNE plots.
Figure 9. Top: UMAP plot with cells colored after the predicted V gene for the TRB chain. Clonotyped cells for which the TRB clonotype could not be identified are annotated as “None”. Bottom: UMAP plot with the same cells, colored after the predicted cell type. Note that, as expected, only the T cells have matching clonotypes.
The toolbox offers prebuilt workflows for easy execution from either FASTQ or clonotypes and/or expression matrix. The workflows are designed to perform most of the possible steps in the analysis.
Figure 10. A workflow example starting from an expression matrix. This workflow produces a UMAP plot annotated with automatically predicted cell types and clusters. The workflow also outputs a Dot Plot, Heat Map, and Violin Plot showing the expression of highly variable genes in each cell
We frequently release updates and improvements, such as bug fixes or new features. To get a complete overview, please visit the latest improvements page.