QIAGEN powered by

Latest improvements for

  Current line          Archive

QIAGEN OmicSoft Lands 2022R1 Release Notes

Invitation to request new data curation

The OmicSoft team is now taking requests for new OncoLand and DiseaseLand expression projects to curate for upcoming releases, which will be included as part of your subscription.

Let us know if there are important datasets that you would like to see curated and represented in the Lands. Public (GEO, SRA and Array Express) expression studies for human, mouse and rat will be evaluated. Single-cell transcriptomic projects, projects for bulk RNA-seq and commercial expression arrays from Affymetrix, Illumina and Agilent are compatible platforms. Please contact us at omicsoft.support@qiagen.com for more information.

We are also evaluating proteomics datasets for curation starting later this year.

Updated Land versions

OmicSoft continues to reprocess our most popular Land databases for the move from Human genome B37 to Human genome B38 and Gencode Gene Model version 33. We recommend that you use the latest version ("B38_GC33") when available, as you will find the newest and most comprehensively curated data here. Lands available in B38_GC33 include OncoHuman, GTEx, TCGA, Blueprint, CCLE, HumanDisease and TRACERx, in addition to the latest Single Cell Land data.

Flat file "text dumps"

If your subscription includes access to OmicSoft Land "text dump" flat-file exports, you can request the latest data in the form of a series of indexed tab-delimited tables. These files are perfect for large exploratory meta-analyses and machine learning studies. If you are interested in accessing the latest data via flat files, ask your OmicSoft account administrator to request the links.

OncoLand Updates

OncoHuman_B38_GC33

Figure 1. New samples in OncoHuman, grouped by DiseaseState and colored by TissueCategory.

 

  • 2022R1 is the first release of OncoHuman, our new Land that brings together solid cancer and hematology studies.
  • All data are processed on B38_GC33 and feature several metadata changes.
  • The 2022R1 content adds 73 projects and 1696 samples, in addition to the projects you would find in OncoGEO_B37.

Solid cancers

This release adds studies on cancers of the gastrointestinal system, central nervous system, prostate, breast, ovary, osteo-articular, head and neck as well as on melanoma.

The datasets explore the gene-expression profiling associated with the influence of tumor microenvironment, drug-induced gene modulation, tumor–nontumor paired samples, metastases, 3D cultures, organoids and xenograft models. Projects focusing on gene editing tools (CRISPR-Cas9) are also included.

Highlighted solid cancers studies

  • Tumor micro-environment: GSE80333, GSE137245, GSE153713, GSE149327, GSE128405
  • 3D culture: GSE148483 (cells cultured on patient-derived scaffolds), GSE147147 (exploring the brain tumor microenvironment in a reproducible and scalable system by developing a rapid three-dimensional bioprinting method), GSE155547 (the effect of platelets on the ovarian metastasis microenvironment in a 3D multicellular model of high-grade serous ovarian cancer)
  • Paired tumor–nontumor samples: GSE147704 (comparative transcriptome analysis of endemic and epidemic Kaposi's sarcoma lesions), GSE105130
  • CRISPR-Cas9: GSE141605, GSE148372, GSE163646

Hematology

Included in this release are studies that explore the gene-expression profiling associated with the mechanisms of action of immunomodulators in vitro, pre–post treatment paired samples and CRISPR-Cas9 genome editing as well as projects included in the online resource for interactive exploration of hematopoietic cancer data (Hemap).

Highlighted Hematology studies

  • CRISPR-Cas9: GSE163817, GSE134173
  • Paired pre–post treatment: GSE2842
  • Hemap: GSE19681 (investigating the role of Hsa21-encoded miR-125b-2 in the pathogenesis of trisomy 21-associated megakaryoblastic leukemia), GSE2842, GSE10258, GSE7538, GSE18866, GSE9250, GSE12902, GSE8685, GSE8687, GSE11118
  • In vitro immunotherapy studies: GSE8685, GSE8687

 

All projects added: GSE101209, GSE10258, GSE105083, GSE105130, GSE105439, GSE106272, GSE109319, GSE11118, GSE112221, GSE114326, GSE114856, GSE115853, GSE119688, GSE120647, GSE124189, GSE128405, GSE12902, GSE131792, GSE132215, GSE132233, GSE132624, GSE134173, GSE137245, GSE137528, GSE138581, GSE140077, GSE141116, GSE141444, GSE141605, GSE142719, GSE147147, GSE147704, GSE148372, GSE148444, GSE148483, GSE149327, GSE152312, GSE153713, GSE155547, GSE159493, GSE160401, GSE162945, GSE163639, GSE163646, GSE163817, GSE18832, GSE18866, GSE19681, GSE2842, GSE65867, GSE70926, GSE71519, GSE71520, GSE7538, GSE77314, GSE78025, GSE80333, GSE82110, GSE83479, GSE84023, GSE86518, GSE8685, GSE8687, GSE9250, GSE95499, GSE97098

ENCODE_RNAbinding

The Encyclopedia of DNA Elements (ENCODE) Consortium is an ongoing international collaboration of research groups funded by National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels as well as regulatory elements that control cells and circumstances in which a gene is active.

ENCODE_RNAbinding_B38_GC33 allows in-depth exploration of the splicing and gene-expression impacts of loss of hundreds of RNA- and DNA-associated genes, generated as part of the ENCODE Consortium.

This release contains RNA-seq experiments of 1122 samples (561 ENCODE experiments) for two popular cell lines (K562 and HEPG2) after shRNA knockdown targeting various proteins:

  • RNA-binding proteins
  • Transcription factors
  • Cofactors
  • DNA repair proteins
  • Chromatin remodeler proteins
  • RNA-polymerase complex
  • DNA replication proteins

Key metadata columns for ENCODE_RNAbinding

  • SampleID —GEO identifier associated with each sample; uniquely identifies each assayed sample
  • CellLine — identifies each sample as being assayed in HepG2 or K562
  • TargetCategory — categories of genes knocked down, such as "chromatin remodeler", "cofactor", "DNA replication", "RNA binding protein", "transcription factor"; useful for finding assays for functionally related genes.
  • Transfection — identifies the specific gene that was knocked down (e.g., "FXR1 shRNA", "TRA2A shRNA") or "control shRNA"; useful for finding assays targeting individual genes. In most cases "control shRNA" should be included

Key visualizations

The Sample Distribution View displays the number of samples available. Default primary grouping is performed by TargetCategory (the functional class of the targeted gene).

Figure 2. Sample Distribution View of ENCODE_RNAbinding

After searching for a gene, the default visualization is the Gene FPKM View, which plots the expression of the specified gene in each assay. It may be useful to filter or trellis to plot one of two cell types (use CellLine: K562 or HEPG2) to see differences in expression in different knockdown experiments compared to control samples (TargetCategory: NA; Transfection: control shRNA).

 

Figure 3. Gene FPKM view in K562 cells treated by shRNAs, grouped by TargetCategory

Because many of these gene knockdowns affect splicing, the Transcript FPKM Views will frequently be useful. Use the Transcript FPKM (Individual Chart) View to see details of each transcript's measured FPKM per sample, or the Transcript FPKM (Multi-transcript Chart) to get an overview of average expression across groups.

 

Figure 4. Transcript FPKM View (Individual Chart) of one splice variant of NASP, before filtering and regrouping

At this point, you may want to focus on certain shRNAs or TargetCategories, such as "RNA binding proteins". Filter for (TargetCategory=RNA binding protein | NA), then set the Profile column to "Transfection" to visualize the details of which shRNA knockdowns led to significant down-regulation or up-regulation of specific transcript variants compared to control shRNA.

 

Figure 5. Transcript FPKM View (Individual Chart) of one splice variant of NASP, after filtering for TargetCategory="RNA binding protein" and "NA", and regrouping on "Transfection".

 

The Exon Junction Summary provides details of the relative usage of both Known and Novel detected exon junctions. Use the "ExonJunction" filters to select Known and/or Novel junctions. It may be helpful to filter according certain transfections, such as those that showed transcript-level differences

 

Figure 6. Exon Junction Summary View of control samples and two transfection experiments for factors that showed transcript-level differences for NASP.

 

DiseaseLand Updates

HumanDisease

 

Figure 7. New samples in HumanDisease, grouped by DiseaseCategory and colored by TissueCategory (Normal Control and Disease Control Samples are hidden).

 

This release of HumanDisease adds 5086 samples and 1169 comparisons from 99 unique project IDs.

Highlighted HumanDisease studies: 

  • Different T-cell types from various tissues: GSE133822, GSE159437, GSE22501
  • NK cells: GSE116178, GSE154919, GSE165849, GSE158485, GSE89020
  • Immune-mediated diseases (arthritis, pemphigus, psoriasis, dermatomyositis): GSE151897, GSE153015, GSE154474, GSE154988, GSE38064, GSE68689, GSE106893, GSE138746, GSE80785, E-MEXP-3890, GSE93776, GSE135004, GSE150954, GSE90152, E-MEXP-2681,  GSE11971
  • Muscular system (muscle-tissue profiling, physical exercise, muscular disease): GSE1300, GSE1295, GSE1295, GSE8441, GSE12648, GSE1017, GSE10760, GSE10760, E-TABM-206, GSE11686, GSE10685, GSE13070, GSE13205, GSE15090, GSE21164, GSE21496, GSE24235, GSE28998, GSE40645, GSE47881, GSE5110, GSE7014, GSE80
  • Infectious disease: GSE12108, GSE13205, GSE119749, GSE13670, GSE9927, GSE11199
  • Ophthalmology (eye-tissue profiling, eye disease): GSE29402, GSE58331, GSE71320, GSE89827, GSE89827
  • Hemap collection: GSE12108, GSE13670, GSE5679, GSE7247, GSE7509, GSE8658, GSE8668, GSE4984, GSE7874, GSE9101, GSE13762, GSE9916, GSE11864, GSE22501, GSE9927, GSE11199

All projects added: E-MEXP-2681, E-MEXP-3890, E-TABM-206, GSE1017, GSE101988, GSE102737, GSE10361, GSE10685, GSE106893, GSE10760, GSE108350, GSE11199, GSE112594, GSE115112, GSE116178, GSE11686, GSE116899, GSE118106, GSE11864, GSE119501, GSE11971, GSE119749, GSE120226, GSE120502, GSE12108, GSE124284, GSE12648, GSE1295, GSE129921, GSE1300, GSE130038, GSE13070, GSE131503, GSE131527, GSE13205, GSE133822, GSE135004, GSE135251, GSE13670, GSE13762, GSE138734, GSE138746, GSE142206, GSE146028, GSE15090, GSE150954, GSE151875, GSE151897, GSE153015, GSE154474, GSE154919, GSE154988, GSE155322, GSE157840, GSE158485, GSE159437, GSE161549, GSE165849, GSE1724, GSE173808, GSE18583, GSE21164, GSE21496, GSE22501, GSE24235, GSE28998, GSE29402, GSE38064, GSE40645, GSE47881, GSE4984, GSE5110, GSE5679, GSE57178, GSE57662, GSE58331, GSE67427, GSE68689, GSE7014, GSE71320, GSE7247, GSE7509, GSE7874, GSE80, GSE80785, GSE8441, GSE85761, GSE8658, GSE8668, GSE89020, GSE89827, GSE90152, GSE9101, GSE93776, GSE9916, GSE9927, GSE99999, PRJNA512027, SRP151738

MouseDisease

 

Figure 8. New samples in MouseDisease, grouped by DiseaseState and colored by TissueCategory (Normal Control and Disease Control Samples are hidden).

This release adds 1556 samples and 782 comparisons from 49 projects.

Highlighted MouseDisease studies

  • Aging: GSE110980, GSE110981,GSE110982, GSE110978, GSE110979, GSE117762, GSE117763, GSE57528, GSE56772, GSE57583
  • Liver disease: GSE162863, GSE162869, GSE166488, GSE166867, GSE167032, GSE167034, GSE167033, GSE162276, GSE138778, GSE160020
  • Endocrine and metabolic diseases (diabetes mellitus): GSE141782, GSE124394
  • Kidney disease (diabetic nephropathy): GSE139987, GSE145301

 

All projects added: E-MTAB-8566, GSE104342, GSE106720, GSE109776, GSE110384, GSE110978, GSE110979, GSE110980, GSE110981, GSE110982, GSE112453, GSE113727, GSE113943, GSE116485, GSE117736, GSE117762, GSE117763, GSE121646, GSE124394, GSE124670, GSE133878, GSE138778, GSE139601, GSE139987, GSE141492, GSE141782, GSE144838, GSE145301, GSE145720, GSE155460, GSE156895, GSE158807, GSE160020, GSE160021, GSE162276, GSE162863, GSE162869, GSE166488, GSE166867, GSE167032, GSE167033, GSE167034, GSE167216, GSE169275, GSE179417, GSE56772, GSE57528, GSE57583, GSE84948

Single Cell Land Updates

To access the latest Single Cell Land data in your subscription, you must use at least OmicSoft Suite v11.6.

New Single Cell Protocol expands Landable projects

Starting with this release, a new set of "Lite" Lands are available (they have "Lite" in the title). These Lite Lands are comprised of UMI-based projects with cell-level cell-type annotations from authors, enabling OmicSoft curators to define cell clusters with exactly the same cells that the authors identified.

These project and sample metadata are still fully curated, and the ClusterCellTypes use the OmicSoft CellType ontology. The gene-expression values are extracted from the data submission matrix, OmicSoft defines the samples to be included in dimension reduction CellMaps, and the author cell annotations define the clusters.

 

Figure 9. More datasets can now be incorporated into the Single Cell Land framework with the new Lite protocol.

 

New single cell datasets of note

 

Figure 10. Millions of new cells from 40 new projects are available in Single Cell Land.

New datasets that may be of interest:

  • Tabula sapiens, single-cell profiling of 24 organs
  • NYSCF, iPSC-derived astrocytes
  • Multiple datasets profiling breast cancers
  • Profiling of pancreatic adenocarcinoma subtypes

Full list of projects added in this latest release: GSE106960, GSE123046, GSE126836, GSE127465, GSE138707, GSE139186, E-MTAB-6308, E-MTAB-8007, GSE101207, GSE108291, GSE110949, GSE117403, GSE117570, GSE119212, GSE124887, GSE124888, GSE125188, GSE129007, GSE129308, GSE132802, GSE137829, GSE138709, GSE138852, GSE139324, GSE140231, GSE142784, GSE145633, GSE150132, GSE151087, GSE153889, GSE154778, GSE157277, GSE162726, GSE92495, GSE97168, PRJCA001063, PRJEB39602, GSE114725, NYSCF, TabulaSapiens

Tabula sapiens (HumanUmiLite_B38_GC33) provides a great complement to Human Cell Landscape (HCL_B38) as an Cell Atlas across 24 subtypes.

 

Figure 11. Restricted expression of CD34 within lung tissue from Tabula Sapiens. CD34 is up-regulated in multiple identified endothelial cell types compared to other cells in the lung tissue (TabulaSapiens CellMap11). CD34 expression is visualized in a subset of cells (top panel), with curated ClusterCellType, indicating that these were a group of endothelial cells all within lung tissue (middle panel). Statistical comparisons pre-computed for each cluster against other clusters in CellMap 11 reveal up-regulation of curated endothelial cell types vs others (bottom panel).

 

Another cell profiling project of interest is GSE116470, which profiles nine different brain regions from adult mice, allowing detailed exploration of expression variability in the central nervous system.

 

Figure 12. Expression of Sox2 in astrocytes (light blue, top panels), hippocampus (left panels) and frontal cortex (right panels) of profiled mouse brains (GSE116470).

Did you know?

OmicSoft Lands include over 118,000 statistical comparisons based on our modeling of the curated data. Sometimes these contrasts are simple, with only one factor in the model. A more complex model is created in other cases, with two, three or more covariates included in the model!

The "ComparisonContrast" field reflects these complex models for the contrast with this syntax:

Controlled Factor 1: Controlled Factor 2: Contrasted Factor 3=> Controlled Level 1:Controlled Level 2→ Case vs Control for Factor 3

For example:

TreatmentStatus:Response:TreatmentHistory => nivolumab:no response -> ipilimumab vs none

 

Figure 13. Expression of DAZ2 in GSE91061, grouped by TreatmentStatus+Response+Treatment History. Samples included in Comparison contrast of TreatmentHistory:ipilimumab vs none in TreatmentStatus:nivolumab and Response:none colored blue.