Latest improvements for QIAGEN OmicSoft Lands

QIAGEN OmicSoft Lands

Release date: 2023-07-19

OmicSoft Lands Release 2023R2


  • New cancer proteomics-focused database “ClinicalProteomicTumor”
  • OncoHuman now includes all legacy Hematology_B37 and MetastaticCancer_B37 datasets
  • New datasets in Human, Mouse, and Rat Disease databases, including new cellular differentiation profiling datasets

Get the most out of your OmicSoft data subscription

Request new data curation

The OmicSoft team invites requests for new OncoLand and DiseaseLand expression projects to curate for upcoming releases. These will be included as part of your subscription.

Let us know which important datasets you would like curated and represented in the Lands. Public expression studies for human, mouse, and rat (GEO, SRA, or Array Express) will be evaluated. Single-cell transcriptomic projects, projects for bulk RNA-seq, and commercial expression arrays from Affymetrix®, Illumina®, and Agilent® are compatible platforms, as well as Mass Spectrometry proteomics datasets. Please email ts-bioinformatics@qiagen.com for more information.

Use the latest Land database versions

We recommend using the latest version (“B38_GC33”) when available, as this is where you will find the newest and most comprehensively curated data. This video demonstrates how to add new Land databases to your dedicated server.

B38_GC33 Land databases include:

  • Oncology Projects — OncoHuman (including legacy Pediatrics, Hematology, and Metastatic Cancer databases) and ClinicalProteomicTumor
  • Oncology Consortia – BeatAML, CGCI, expO, METABRIC, TARGET, TCGA, TRACERx
  • Non-Oncology Projects – HumanDisease
  • Normal Tissue profiling – GTEx, Blueprint
  • Cell Line profiling – ATCC, CCLE, CellLine (GSK, NCI, Pfizer)
  • Single cell Lands (UMI and non-UMI)

Use flat file downloads of Land databases

If your subscription includes access to OmicSoft Land “text dump” flat file exports, you can request the latest data in the form of a series of indexed tab-delimited tables, or through the MyQDI web interface https://my.qiagendigitalinsights.com.

These files are perfect for larger exploratory meta-analysis and ML studies.

Attend live and on-demand webinars

QIAGEN®'s expert Field Application Scientists routinely hold online trainings for beginning and advanced users of OmicSoft Lands data, showcasing how to utilize these resources to answer scientific questions. See the upcoming webinars, as well as recordings of previous webinars, here: https://digitalinsights.qiagen.com/webinars-and-events/

Update to the latest OmicSoft Suite version for latest features

Methylation data in Land databases such as the new Clinical Proteomic Tumor Land are supported starting with OmicSoft Suite v12.4. To take advantage of these new data, please update your OmicSoft Suite environment to v12.4+. An upcoming release of TCGA will include Methylation array data, which will also require v12.4+.

New Database – Clinical Proteomic Tumor Land

Starting with the 2023R2 release, QIAGEN OmicSoft is introducing a new Land database focused on cancer proteomics. The initial release includes three cancer proteomics projects from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) database, with 509 samples, 254 statistical comparisons on both RNA-seq and protein expression, and hundreds of clinical metadata fields. Data types include Mass Spectrometry (MS), RNA-seq, miRNA-seq, DNA-seq mutation- and CNV profiling, and methylation data.

  • “Integrated Proteogenomic Characterization of Clear Cell Renal Cell Carcinoma”
    • PDC000127 (PMID 31675502)
    • 194 samples with multi-omics measurements and 54 statistical comparisons at the RNA and protein level
  • “Proteogenomic and metabolomic characterization of human glioblastoma”
    • PDC000204 (PMID 33577785):
    • 99 samples with multi-omics measurements and 116 statistical comparisons at the RNA and protein level
  • “Proteogenomic characterization of pancreatic ductal adenocarcinoma”
    • PDC000270 (PMID34534465):
    • 216 samples with multi-omics measurements and 84 comparisons at the RNA and protein level

To aid exploration of these datasets, definitions for relevant metadata fields to these studies can be found in this ClinicalProteomicTumor Definitions File, as well as tooltips within Omicsoft Studio.

Figure 1. Correlation of statistical comparisons of CD8+ inflamed (interferon-γ signaling) vs CD8− inflamed (platelet degranulation) tumors in clear cell Renal Cell Carcinoma (ccRCC) from PDC000127, at the RNA and protein level. (A) Differentially regulated genes at the RNA level, calculated by DEseq2. (B) Differentially regulated genes at the protein level, calculated by general linear model. Top up-regulated proteins were selected to illustrate concordance with RNA-level up-regulation. (C) Comparison of direction and magnitude between RNA (X-axis) and protein (Y-axis) differential expression. Genes measured only at the RNA-level are plotted with a fold-change = 0 on the Y-axis.


Figure 2. Expression of GBP2 at the RNA and protein level differentiates CD8+ vs other subtypes of ccRCC. (A) Expression of GBP2 at the RNA level, grouped by genetic subtype and OncoSampleType (Primary tumor vs Solid Tissue Normal). GBP2 is elevated across ccRCC tumor subtypes but is highest in CD8+ inflamed subtype. (B) Expression of GBP2 at the protein level, grouped by genetic subtype and OncoSampleType (Primary tumor vs Solid Tissue Normal). Similar to measurements of RNA, GBP2 protein is elevated across ccRCC tumor subtypes but is highest in CD8+ inflamed subtype. (C) correlation of RNA-seq (log2 FPM) and MS (log2 ratio) expression of GBP5 across pancreatic cancer, glioblastoma, and kidney cancer datasets.


OncoHuman: Curated oncology-focused omics studies

Figure 3. New samples in OncoHuman_B38_GC33, grouped by DiseaseState and colored by TissueCategory.


This release adds 3746 samples and 874 comparisons from 22 datasets on the following topics:

Highlighted studies and topics:

  • Head and neck cancer: GSE67614
  • Lung cancer: GSE175601 and GSE115002
  • Breast cancer: GSE180284, GSE216333, and GSE180775
  • Uterine cancer: GSE120490
  • Esophageal cancer: GSE137867
  • Liver cancer: GSE151412, GSE174570, GSE63898, and GSE183349
  • Colorectal cancer: GSE222202, GSE207194, GSE147571, and GSE37892
  • Prostate cancer: GSE193500 and GSE168718
  • Melanoma: GSE157738
  • Soft tissue sarcoma: GSE159847 and GSE159848
  • Biomarkers of disease progression, prognosis, and response to treatment: GSE180775 (TNBC), GSE120490 (uterine), GSE147571 (colorectal), GSE175601, and GSE115002 (lung)
  • Patient-derived xenograft experiments: GSE216333 (breast) and GSE193500 (prostate)
  • Treatment studies: GSE180775, GSE207194, GSE137867, GSE151412, GSE67614 and GSE186341
  • Large study (1726 samples) with RNA-seq profiles of cell-lines perturbed with 32 kinase inhibitors: GSE186341

In this release, a dataset of interest to many researchers (GSE186341) includes RNA-seq profiles of cell lines perturbed with 32 kinase inhibitors; this dataset served as the basis for a DREAM Challenge to assess computational algorithms for de novo drug polypharmacology predictions. In OncoHuman, profiles for 1728 samples and 703 statistical comparisons of kinase inhibitor responses are ready for exploration. Use these data to interrogate cell line specific responses to this set of kinase inhibitors.


Figure 4. Gene expression of RABL6 24 hours after treatment with gefitinib in nine profiled cell lines from GSE186341.


Metastatic Cancer and Hematology legacy datasets integrated into OncoHuman

OncoHuman now has all compatible datasets previously found in Hematology_B37 (blood cancers) and MetastaticCancer_B37 (metastasis), meaning you can find nearly all curated oncology datasets in the OncoHuman database. All datasets uplifted from legacy databases have been reviewed and curated to meet the latest curation standards.

HumanDisease – Curated Non-cancer disease projects

Figure 5. New samples in HumanDisease_B38_GC33, grouped by DiseaseState and colored by TissueCategory.


This release adds 2329 samples and 654 comparisons from 52 datasets, including studies on:

  • CNS diseases (epilepsy, multiple sclerosis): GSE94744, GSE71058, GSE63808, and GSE196575
  • Eye diseases (glaucoma, macular degeneration): GSE2378, GSE2378, GSE142591, GSE146641, and GSE118167
  • Immune mediated diseases – rheumatoid arthritis: ERP114936, ERP136392, ERP117716, and ERP108327
  • Immune mediated diseases – inflammatory bowel disease: GSE186582
  • Immune mediated diseases – Sjogren Syndrome, sarcoidosis, dermatomyositis, anaphylaxis: GSE154926, GSE169146, GSE100152, and GSE210331
  • Biomarkers of disease stage or progression: GSE136411 (multiple sclerosis) and GSE186582 (Crohn's disease)
  • Studies focused on cellular differentiation (neuronal, adipose, muscular, blood, and other cells): GSE131169, GSE144052, GSE183266, GSE171101, GSE202440, GSE205976, GSE143453, GSE162883, GSE164644, GSE158578, GSE137255, GSE140914, GSE147404, GSE131697, GSE137800, GSE124392, GSE124173, and GSE206088
    • Projects focused on cellular differentiation can be found by using OriginCell and CellType
  • Treatment studies: ERP116751, GSE174389, GSE142591, ERP136392, and GSE169146


Figure 6. Profiled cell types derived from induced pluripotent stem cells (iPSC) and embryonic stem cells (ESC) in HumanDisease, including newly added projects. Filtering for OriginCell = “embryonic stem cells (ESC), induced pluripotent stem cells (iPSC)”, and grouping on the curated CellTypeCategory and CellType parameters, all derived cell types from either ESCs or iPSCs can be explored.

As part of our standard review process, comparisons for the following already-Landed projects were revised, and can be found by an updated "OSModifiedDate":

  • GSE48354, E-MEXP-3787, GSE46888, GSE71645, GSE138734, GSE22501, GSE129852, GSE135467, GSE181549


MouseDisease – Curated Non-cancer disease model projects in mouse

This release adds 811samples and 981comparisons from 27datasets, including studies on:

  • Body map profiles of normal tissue (brain, gastrointestinal, reproductive, endocrine, and cardiovascular): GSE219045
  • Studies focused on cellular differentiation (retina, mammary gland, osteoblast, and muscular cells): GSE126370, GSE148667, GSE99399, GSE149083, GSE131369, GSE115499, GSE115369, GSE144160, GSE182848, GSE115774, GSE110434, GSE154991, GSE168139, and GSE104560 (projects focused on cellular differentiation can be found by using OriginCell and CellType)
  • Eye diseases (glaucoma, macular degeneration): GSE26299, GSE3554, GSE191077, GSE191077, GSE184160, and GSE189555
  • Immune mediated diseases – anaphylaxis: GSE215184

As part of our standard review process, comparisons for the following already-Landed projects were revised and can be found by an updated "OSModifiedDate": GSE130102

RatDisease – Curated Non-cancer disease model projects in rat

This release adds 502 samples and 1061 comparisons from 20 datasets, including studies on:

  • Body map (cardiovascular, brain, gastrointestinal, endocrine, respiratory, and reproductive tissue): GSE219045
  • CNS diseases (neuropathy, depressive disorder): PRJNA313202, GSE194289, and GSE183386
  • Peripheral nerve injury: GSE177037 and GSE201025
  • Cardiovascular disease – heart failure: GSE151253 and GSE186247
  • Cardiovascular disease – ischemic disease: GSE184674 and GSE177078
  • Cardiovascular disease – systemic and pulmonary hypertension: GSE194067, GSE160914, and GSE188348
  • Eye disease – cataract: GSE186248, GSE194074, and GSE194317


