Home > CLC Single Cell Analysis Module (commercial plugin)
QIAGEN CLC Single Cell Analysis Module enables analysis from raw FASTQ files to clusters of cells with annotated cell types and differentially expressed genes. It is a part of our QIAGEN CLC Genomics Premium offering, our complete, full-feature package for ‘omics data analysis.
The algorithms have been implemented to scale to large datasets and run on a wide range of hardware. You can read more about the implemented algorithms and options in the manual.
Figure 1. The Single Cell folder, including the tools needed to analyze scRNA-seq data from FASTQ to expression and cell type prediction.
We also provide several importers for different expression matrix formats, as well as accompanying cell and cluster annotations.
Figure 2. Expression matrix importers available in QIAGEN CLC Single Cell Analysis Module.
Two steps are needed to create the expression data matrix: 1) Annotating the read with cell and UMI information, and 2) mapping the reads.
Read mapping and counting uses a specialized version of the RNA-seq analysis tool from QIAGEN CLC Genomics Workbench. A previous version of this tool was among the best performers on an independent benchmark (Baruzzo et al., 2017).
The approach maps reads to the transcriptome, genome and provided spike-ins. Mapping to the genome in addition to the transcriptome removes noise. Multi-mapping reads are placed using an expectation-maximization approach. A comprehensive report includes the types of features in the data (mRNA, lncRNA, etc.) and the correlation of expressions with known spike-in concentrations.
The first part of the analysis includes QC and normalization.
Empty droplet detection is recommended for droplet-based approaches such as 10x Genomics.
Quality Control (QC) steps allow setting thresholds for various criteria for consideration of cells in downstream analysis. Some of the plots produced are shown below.
Figure 3. Plots from the quality control report.
Normalization of data affects all downstream analyses. The implementation is well suited for removing batch effects.
Figure 4. Batch correction: Several clusters are observed for each of two samples. After batch correction clusters contain a mixture of both samples.
UMAP and tSNE are de facto standards for visualizing single-cell expression data. Our interactive 2D and 3D visualizations can be overlaid with cluster information, cell annotations and gene expressions.
Figure 5. UMAP of single-cell data from 1000 PBMCs (2), colored by predicted cell types.
Figure 6. tSNE plot of more than 1 million neurons (3) colored by intersection of marker genes for neural stem cells.
Clustering is performed using the graph-based Leiden algorithm.
Cell-type prediction is traditionally performed on clusters of cells. This has the disadvantage that errors in clustering, or simply too coarsely-grained clustering, can lead to imprecise annotations. QIAGEN CLC Single Cell Analysis Module provides a cell classifier that annotates individual cells. The classifier has been trained on large single-cell projects from human and mouse, classified according to the QIAGEN Cell Ontology.
Figure 7. The QIAGEN Cell Ontology browser aids selection of cell types when performing manual curation. The ontology is also supported in the pre-trained classifier.
The plot editor offers expression analysis tools, tools for manual annotation and many visualization options.
Manual cell type annotation can be performed with just a few clicks by using the Lasso tool. Comprehensive filtering and selection options make it easy to select the correct cells.
Differential gene expression between pairs of selected clusters or between a cluster and the rest of the cells can be quickly revealed using the UMAP or tSNE plot editor and visualized in the form of various expression plots (volcano plot, heat map, dot plot, etc.). Differential gene expression can be used for GO analysis, helping to guide additional manual cluster annotation. Pathway analysis can be performed by uploading differential gene expression results to QIAGEN Ingenuity Pathway Analysis (IPA).
Figure 8. Dot plot of data where cell types were predicted using our pre-trained classifier.
The toolbox offers prebuilt workflows for easy execution from either FASTQ or an expression matrix. The workflow is designed to perform most of the possible steps in the analysis.
Figure 9. A workflow example starting from an expression matrix. This workflow produces UMAP plots annotated with automatically predicted cell types and clusters. The workflow also outputs a Dot Plot and Heat Map showing the expression of highly variable genes in each cell type and each cluster.