Home > CLC Single Cell Analysis Module (commercial plugin)
QIAGEN CLC Single Cell Analysis Module provides tools and workflows for processing single-cell data. The following analyses are supported:
Analyses can be performed from raw FASTQ or from imported matrices and/or clonotypes.
The module is a part of QIAGEN CLC Genomics Premium, our complete, full-feature package for ‘omics data analysis.
Watch this webinar to learn how to use QIAGEN CLC Single Cell Analysis Module for single-cell gene expression analysis starting from either FASTQ or expression data, and try these tutorials on training a cell type classifier or performing velocity analysis.
Algorithms in QIAGEN CLC Single Cell Analysis Module have been implemented to scale to large datasets and run on a wide range of hardware. Read more about the implemented algorithms and options in the manual.
Figure 1. The tools in Single Cell Analysis Module. The tools provide capabilities for gene expression, immune repertoire, chromatin accessibility and hashtag analysis.
Single Cell Analysis Importers
Several importers are provided for different expression and peak matrix formats, clonotypes and accompanying cell annotations and clusters.
Figure 2. Single-cell data importers available through the QIAGEN CLC Single Cell Analysis Module.
Preparing the reads
The first step in performing any single-cell analysis starting from reads, is to annotate the reads with cell barcode, and optionally Unique Molecular Index (UMI) and hashtag. The annotated reads can then further be processed, as described below, depending on the type of the data.
QIAGEN CLC Single Cell Analysis Module supports scRNA-seq data analysis.
Creating the expression data matrix
The annotated reads are mapped to a reference. Read mapping and counting uses a specialized version of the RNA-seq analysis tool from QIAGEN CLC Genomics Workbench. This tool has achieved top ratings in independent benchmarks (Baruzzo et al., 2017; Donato et al., 2021).
The approach maps the reads to the transcriptome, genome and provided spike-ins. Mapping to the genome in addition to the transcriptome removes noise. Multi-mapping reads are placed using an expectation-maximization approach. A comprehensive report includes the types of features in the data (mRNA, lncRNA, etc.) and the correlation of expressions with known spike-in concentrations.
The first part of the analysis includes quality control and normalization steps.
Empty droplet detection, as well as doublet removal, is recommended for droplet-based approaches such as 10x Genomics.
Quality Control (QC) steps allow setting thresholds for various criteria for consideration of high-quality cells in downstream analysis. Some of the plots produced are shown below.
Figure 3. Plots from the quality control report.
Normalization of data affects all downstream analyses. The implementation can remove batch effects.
Figure 4. Batch correction: Several clusters are observed for each sample. After batch correction, clusters contain a mixture of both samples.
RNA velocity is a powerful method for analyzing time-resolved phenomena, such as embryogenesis or tissue regeneration. Velocity analysis is performed automatically when starting an analysis from FASTQ but requires spliced and unspliced counts to be present when starting from an imported matrix. The produced velocity matrix is used for scoring velocity genes and creating phase portraits. The matrix, together with cell clusters and/or annotations, aid the analysis of per-gene contributions to transitioning of cells.
Figure 5. Top: Browsable phase portrait with learned dynamics and steady-state ratio. There is one plot for each gene. Phase portraits can be overlaid with cell annotations and clusters , and data from gene expression, velocity and peak matrices. Here, the cells are colored by cell type. Bottom: UMAP showing velocity arrows, where cells are colored by the inferred latent time.
UMAP and tSNE are de facto standards for visualizing single-cell expression data. Our interactive 2D and 3D visualizations can be overlaid with cell annotations and clusters, and data from gene expression, velocity and peak matrices.
Figure 6. UMAP of single-cell data from 8393 liver cells (MacParland et al., 2018), colored by predicted cell types.
Manual annotation can be performed with just a few clicks by using the Lasso tool. Comprehensive filtering and selection options make it easy to select the relevant cells. The plot editor enables expression analysis and manual annotation. It has many visualization options.
Clustering is performed using the graph-based Leiden algorithm.
Cell type prediction is traditionally performed on clusters of cells. However, this has the disadvantage that errors in clustering, or simply too coarsely grained clusters, can lead to imprecise annotations. QIAGEN CLC Single Cell Analysis Module provides a pre-trained classifier that annotates individual cells. The classifier has been trained on large single-cell projects from human and mouse experiments, annotated with the QIAGEN Cell Ontology.
Figure 7. The QIAGEN Cell Ontology browser aids selection of cell types when performing manual curation. The ontology is also supported in the pre-trained classifier.
Differential gene expression between pairs of selected clusters or between a cluster and the rest of the cells can be quickly launched using the UMAP or tSNE plot editor and the expression of these genes can be visualized in the form of various expression plots (volcano plot, heat map, dot plot, violin plot). Differential gene expression can be used for GO analysis, guiding additional manual cluster annotation. Pathway analysis can be performed by uploading differential gene expression results to QIAGEN Ingenuity Pathway Analysis (IPA).
Figure 8. Heat mMap, Ddot Pplot and Vviolin Pplot of data where cell types were predicted using our pre-trained classifier.
QIAGEN CLC Single Cell Analysis Module offers analysis of clonotypes at the cellular level from scVDJ-seq (scTCR-seq and scBRC-seq) data. Clonotypes are predicted from cell contigs that are de novo assembled from the annotated reads.
Chains are identified and the V, D, J, C and CDR3 regions are annotated. The predicted clonotypes can be visualized as alignments between the assembled contigs and the annotated regions. Sankey plots show how the V, D, J, C and CDR3 regions form the clonotypes for the different chains.
Filters on different criteria can be applied, e.g., retain only cells for which scRNA-seq data is available, remove non-productive clonotypes, keep only specific chains, etc.
Clonotypes can also be combined and compared across sample/treatment and other groupings.
Reports summarize the identified clonotypes, with information about diversity, V, D, J and C gene usage, CDR3 length distribution and clonotype frequency.
Figure 9. Plots from the immune repertoire reports.
Finally, the clonotypes can be converted to cell annotations and overlaid in UMAP and tSNE plots from scRNA-seq data.
Figure 10. UMAP of scRNA-seq data. Top: Cells are colored by the predicted V gene for the TRB chain from matched scTCR-seq data. Clonotyped cells for which the TRB clonotype could not be identified are annotated as “None”. Bottom: Cells are colored by the predicted cell type. Note that, as expected, only the T cells have matching clonotypes.
QIAGEN CLC Single Cell Analysis Module supports analysis of scATAC-seq data. Peak calling is performed on deduplicated mappings obtained from the annotated reads. Further, nearby genes and transcription factor (TF) motif scans are conducted, and a comprehensive QC report is produced. Read mappings and graph tracks can be split by groups of cells and visualized together in a genome browser. It is also possible to generate UMAP and tSNE plots from the peak matrix.
Figure 11. Peak count analysis showing genome browser view (top), the UMAP plot calculated on the count matrix (bottom left), and tables containing information on TF motif binding and nearby genes (bottom right).
QIAGEN CLC Single Cell Analysis Module supports hashtag analysis for a wide variety of types of data.
Hashtags from annotated reads are mapped to cell annotations. UMAP and tSNE plots produced using scRNA-seq or scATAC-seq data can be colored using these cell annotations, revealing which cells contained which hashtags, and in what amounts. Alternatively, the cell annotations can be used to further demultiplex scRNA-seq, scVDJ-seq and scATAC-seq into samples, if hashtags have been used for sample multiplexing.
Figure 12. UMAP of two samples scRNA-seq data multiplexed using TotalSeq. Cells are colored by sample. Top left: Multiplexed data. Top right: Demultiplexed data. Red cells represent unidentified samples (no matching hashtag). Bottom: Batch corrected demultiplexed data.
The toolbox offers prebuilt workflows for easy execution from either FASTQ or clonotypes and/or matrices.
Figure 13. The workflows available in QIAGEN CLC Single Cell Analysis Module.
The workflows are designed to perform most of the possible steps in the analysis. For example, gene expression workflows produce a UMAP plot annotated with automatically predicted cell types and clusters, a Dot Plot, Heat Map and Violin Plot showing the expression of highly variable genes in each cell, and velocity estimates if spliced and unspliced counts are available.
We frequently release updates and improvements, such new features or bug fixes. To get a complete overview, please visit the latest improvements page.