Home > CLC Single Cell Analysis Module (commercial plugin)
QIAGEN CLC Single Cell Analysis Module provides tools and workflows for processing single-cell data. The following analyses are supported:
Analyses can be performed from raw FASTQ or from imported matrices and/or clonotypes.
The module is a part of QIAGEN CLC Genomics Premium, our complete, full-feature package for ‘omics data analysis.
Watch this webinar to learn how to use QIAGEN CLC Single Cell Analysis Module for single-cell gene expression analysis starting from either FASTQ or expression data, and try these tutorials on training a cell type classifier or performing velocity analysis.
Algorithms in QIAGEN CLC Single Cell Analysis Module have been implemented to scale to large datasets and run on a wide range of hardware. Read more about the implemented algorithms and options in the manual.
Figure 1. The tools in Single Cell Analysis Module. The tools provide capabilities to analyze scRNA-seq, scVDJ-seq (scTCR-seq and scBCR-seq) and scATAC-seq data.
Single Cell Analysis Importers
Several importers are provided for different expression and peak matrix formats, clonotypes and accompanying cell and cluster annotations.
Figure 2. Single-cell data importers available through the QIAGEN CLC Single Cell Analysis Module.
QIAGEN CLC Single Cell Analysis Module supports scRNA-seq data analysis.
Creating the expression data matrix
Two steps are needed to create the expression data matrix: 1) Annotating reads with cell barcode and Unique Molecular Index (UMI) information, and 2) mapping the reads to a reference.
Read mapping and counting uses a specialized version of the RNA-seq analysis tool from QIAGEN CLC Genomics Workbench. This tool has achieved top ratings in independent benchmarks (Baruzzo et al., 2017; Donato et al., 2021).
The approach maps reads to the transcriptome, genome and provided spike-ins. Mapping to the genome in addition to the transcriptome removes noise. Multi-mapping reads are placed using an expectation-maximization approach. A comprehensive report includes the types of features in the data (mRNA, lncRNA, etc.) and the correlation of expressions with known spike-in concentrations.
The first part of the analysis includes quality control and normalization steps.
Empty droplet detection, as well as doublet removal, is recommended for droplet-based approaches such as 10x Genomics.
Quality Control (QC) steps allow setting thresholds for various criteria for consideration of high-quality cells in downstream analysis. Some of the plots produced are shown below.
Figure 3. Plots from the quality control report.
Normalization of data affects all downstream analyses. The implementation can remove batch effects.
Figure 4. Batch correction: Several clusters are observed for each of two samples. After batch correction, clusters contain a mixture of both samples.
RNA velocity is a powerful method for analyzing time-resolved phenomena, such as embryogenesis or tissue regeneration. Velocity analysis is performed automatically when starting an analysis from FASTQ, but requires spliced and unspliced counts to be present when starting from an imported matrix. The produced velocity matrix is used for scoring velocity genes and creating phase portraits. The matrix, together with cell clusters and/or annotations, aid the analysis of per-gene contributions to transitioning of cells.
Figure 5. Top: Browsable phase portrait with learned dynamics and steady-state ratio. There is one plot for each gene. Phase portraits can be overlaid with cluster information, cell annotations and data from gene expression, velocity and peak matrices. Here, the cells are colored by cell type. Bottom: UMAP showing velocity arrows, where cells are colored by the inferred latent time.
UMAP and tSNE are de facto standards for visualizing single-cell expression data. Our interactive 2D and 3D visualizations can be overlaid with cluster information, cell annotations and data from gene expression, velocity and peak matrices.
Figure 6. UMAP of single-cell data from 8393 liver cells (MacParland et al., 2018), colored by predicted cell types.
Manual annotation can be performed with just a few clicks by using the Lasso tool. Comprehensive filtering and selection options make it easy to select the correct cells.
Clustering is performed using the graph-based Leiden algorithm.
Cell-type prediction is traditionally performed on clusters of cells. However, this has the disadvantage that errors in clustering, or simply too coarsely grained clusters, can lead to imprecise annotations. QIAGEN CLC Single Cell Analysis Module provides a cell classifier that annotates individual cells. The classifier has been trained on large single-cell projects from human and mouse experiments, annotated with the QIAGEN Cell Ontology.
Figure 7. The QIAGEN Cell Ontology browser aids selection of cell types when performing manual curation. The ontology is also supported in the pre-trained classifier.
The plot editor enables expression analysis and manual annotation. It has many visualization options.
Differential gene expression between pairs of selected clusters or between a cluster and the rest of the cells can be quickly launched using the UMAP or tSNE plot editor and the expression of these genes can be visualized in the form of various expression plots (volcano plot, heat map, dot plot, violin plot). Differential gene expression can be used for GO analysis, guiding additional manual cluster annotation. Pathway analysis can be performed by uploading differential gene expression results to QIAGEN Ingenuity Pathway Analysis (IPA).
Figure 8. Heat Map, Dot Plot and Violin Plot of data where cell types were predicted using our pre-trained classifier.
QIAGEN CLC Single Cell Analysis Module offers analysis of clonotypes at the cellular level from scVDJ-seq (scTCR-seq and scBRC-seq) data. Clonotypes are predicted from de novo assembled contigs.
Chains are identified and the V, D, J, C and CDR3 regions are annotated. The predicted clonotypes can be visualized as alignments between the assembled contigs and the annotated regions. Sankey plots show how the V, D, J, C and CDR3 regions form the clonotypes for the different chains.
Filters on different criteria can be applied, e.g., filter to cells for which scRNA-seq data is available, remove non-productive clonotypes, keep only specific chains, etc.
Clonotypes can also be combined and compared across sample/treatment and other groupings.
Reports summarize the identified clonotypes, with information about diversity, V, D, J and C gene usage, CDR3 length distribution and clonotype frequency.
Figure 9. Plots from the immune repertoire reports for scTCR-seq data.
Finally, the clonotypes can be converted to cell annotations and overlaid in UMAP and tSNE plots.
Figure 10. UMAP of scRNA-seq data. Top: Cells are colored by the predicted V gene for the TRB chain from matched scTCR-seq data. Clonotyped cells for which the TRB clonotype could not be identified are annotated as “None”. Bottom: Cells are colored by the predicted cell type. Note that, as expected, only the T cells have matching clonotypes.
QIAGEN CLC Single Cell Analysis Module supports analysis of scATAC-seq data. The analysis pipeline consists of tools for preforming deduplication, mapping and peak calling. Further, nearby genes and transcription factor (TF) motif scans are conducted, and a comprehensive QC report is produced. Read mappings and graph tracks can be split by groups of cells and visualized together in a genome browser. It is also possible to generate UMAP and tSNE plots from the peak matrix.
Figure 11. Peak count analysis showing genome browser view, the UMAP plot calculated on the count matrix, and tables containing information on TF motif binding and nearby genes.
The toolbox offers prebuilt workflows for easy execution from either FASTQ or clonotypes and/or matrices.
Figure 12. The workflows available in QIAGEN CLC Single Cell Analysis Module.
The workflows are designed to perform most of the possible steps in the analysis.
Figure 13. An example of a workflow starting from an expression matrix. This workflow produces a UMAP plot annotated with automatically predicted cell types and clusters. The workflow also outputs a Dot Plot, Heat Map, and Violin Plot showing the expression of highly variable genes in each cell. If spliced and unspliced counts are available, the workflow also produces velocity estimates.
We frequently release updates and improvements, such new features or bug fixes. To get a complete overview, please visit the latest improvements page.