Explore an easy way to extend your in-house resources to include integrated single-cell data and analysis
Single-cell sequencing is a powerful technology that offers a focused approach to biomarker and target discovery. If you’re working in oncology research, you most likely use it to develop new diagnostic biomarkers or anti-tumor treatments. If you study the immune system, you may use it to detect individual immune cells or to distinguish among different immune cell groups to propose new targets for disease treatment.
Yet, if you work with single-cell sequencing data, you know that beyond the many possibilities lie many challenges. You probably obtain single-cell RNA-seq (scRNA-seq) data from multiple programs and disease indications, and from both publicly and internally generated data. These data are difficult to integrate and align. What’s more, the sheer volume of data and noise within each dataset make it extremely challenging to draw meaningful conclusions.
QIAGEN Discovery Bioinformatics Services eases the challenges of working with single-cell sequencing data to quickly and efficiently help answer questions relevant to your research goals. Our team supplements your workforce with our bioinformatics experts and performs analyses tailored to your interests. We offer a range of support, such as building custom pipelines and server solutions, as well as provide hypercare support and training. We take on everything from secondary analysis services to in-depth analysis of biological data.
Need support with content curation? Leave it up to us, whether it’s literature, datasets, pathways or a customized integrated ‘omics data collection based on your internal data (‘internal Land’). We’ll perform deep meta-analysis on data collections and take on the bulk data processing. By working with us, you’ll save time organizing and visualizing internal pipelines and results so you can focus on validating your hypothesis to more quickly make your next big discovery.
Easily query thousands of single-cell datasets to facilitate hypothesis generation
In addition to our basic services, our custom-built single-cell analysis pipelines incorporate cutting-edge public and/or proprietary bioinformatics tools to identify rare cell types, track cell lineage, infer developmental trajectories and determine inter-cellular interactions of single cells. We do this using data from many single-cell sequencing methods, such as cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), assay for transposase-accessible chromatin by sequencing (ATAC-seq), single-cell T cell receptor sequencing (scTCR-seq) and single nucleus RNA sequencing (snRNA-seq), among others. Our experts query the OmicSoft Single Cell Land database to answer your specific biological questions, such as:
- How is a particular target gene expressed across cell types?
- How is this target gene expressed across multiple single-cell projects?
- What is the expression signature for a specific disease state?
- What are the different cell types and their abundance across datasets?
- What is the differential gene expression of treatment vs. control samples in a particular cell type (e.g. monocytes)?
Figure 1. Example output showing the abundance of different cell types across datasets.
Let us help you manage the unique challenges of analyzing scRNA-seq data
General challenges of analyzing scRNA-seq data
We design our pipelines to handle your most common scRNA-seq analysis challenges. These challenges include the relatively small number of sequencing reads, the sparsity of data, limited processing power and cell population heterogeneity. Gene expression profiling by scRNA-seq is also inherently noisier than bulk RNA-seq, making effective data analysis more complicated. To address these caveats of working with scRNA-seq data, we:
- Use a customized unified pipeline script to process scRNA-seq datasets and store the data into internal data lakes called ‘Lands’
- Perform stringent quality control checks to eliminate data from low-quality cells
- Make batch corrections before performing downstream analysis to avoid incorrect data integration and interpretation due to batch effect, which is another major challenge in scRNA-seq data analysis
- Include normalization steps to correctly interpret the results from scRNA-seq data for cell subpopulation identification and differential expression calling
- Use imputation as a strategy to replace the missing data (dropouts) with substituted values to recover the accurate expression level of all genes
- Employ strategies such as dimensionality reduction or feature selection to deal with the high dimensional nature of scRNA-seq data, which involves thousands of genes and a large number of cells
- Mitigate statistical issues arising from cellular heterogeneity using a set of known markers or use algorithms for unsupervised clustering. We incorporate public and proprietary algorithms in the pipeline, such as Seurat, which can facilitate the identification of cell subpopulations
- Identify distinct subpopulations or groups of cells using differential expression analysis
- Provide a framework to run the samples in parallel using Amazon WorkSpace to store the data
Figure 2. CellMap employing dimension reduction to explore cell types across samples.
Metadata curation challenges
Another challenge of scRNA-seq data is metadata curation. Due to a lack of standards for the deposition of cell-level metadata, you could end up spending the majority of your time processing data and curating metadata. Why not let our team of curators handle the metadata? Our team of experts can manually curate single-cell projects and precisely curate cell clusters to help you more quickly and easily answer your biological questions. To do this, we:
- Extract/curate project, technical, sample and clinical metadata from data submissions and publications
- Apply controlled vocabularies and curation standards
- Curate expected results from papers, such as clustering experiments, cell types and key markers/differentially expressed genes
- Add statistical comparisons using the curated metadata
Differential expression analysis challenges
After identifying the cell type identities of the scRNA-seq clusters, you’d typically perform differential expression analysis between conditions within particular cell types. While commercial algorithms perform differential expression analysis, the p-values from these analyses are often inflated because a cell is treated as a sample. Since single cells within a sample have variation, we can compare gene expression across individual cells. This is known as pseudo-bulk RNA-seq data. Our team implements this approach to create a custom pseudo-bulk integrated ‘omics data collection (‘Land’) so you can easily explore single-cell data to make accurate comparisons.
Figure 3. Explore pre-computed differential expression of marker genes in different cell types, to understand how a target gene differentially expressed in different cell clusters. In this example output, you can see the differential expression of CD34 across different cell types in normal vs. hematologic cancer samples.
Deliverables that get you quick insights
On all our projects, we work with you to determine the output and deliverables that best fit your needs. Here are some examples of our deliverables that’ll support your work using single-cell technologies:
- A centralized database that biologists use for disease-focused queries from a customized integrated ‘omics data collection based on your internal data (‘internal Land’)
- Custom scripts enabling the analysis of both human and non-human data
- Support for hypothesis validation by combining information from multiple studies to create a unified picture of biomarkers and disease signatures
- Differential gene expression analysis that assesses changes of a candidate target gene related to a disease
- Signature-based analyses to gain insights into which diseases are relevant for a target gene
- Assessment of relative cell type (cell state) abundance at sample-level resolution for multiple tissue types
- Reports of both differentially expressed genes and gene signatures at cell-type resolution to help you quickly prioritize drug targets, biomarkers and compounds
- Detection of novel cell subpopulations that are responsible for a response to drug treatments (responders vs. resistant cells)
- Identification of subpopulations of cells with variations in gene expression that can provide insight into developmental trajectories (for example, T-helper cell development, B-cell differentiation)
Figure 4. Single-cell project workflow.
Get in touch
By working with QIAGEN Discovery Bioinformatics Services on projects for scRNA-seq processing, you’ll save time and increase accuracy to more quickly gain a robust and insightful understanding of complex disease. We’ll take on the aspects of your bioinformatics that are most challenging and limiting for you. Our customized support empowers you to more readily gain a robust and insightful understanding of the biological mechanisms in your data, to accelerate and drive your next big discovery.
Would you like to reduce the burden of working with scRNA-seq data to more quickly reach your next biomarker or target discovery? Learn more and request a consultation about our range of bioinformatics services that’ll extend and scale your in-house resources. Contact us today at firstname.lastname@example.org to get your custom single-cell project started. Together, let’s unravel the biological discoveries hidden in single cells.