QIAGEN powered by

Latest improvements for QIAGEN OmicSoft Lands

  Current line          Archive

QIAGEN OmicSoft Lands 1.2

Release date: 2022-10-31

QIAGEN OmicSoft Lands release notes 2022R3

In this release: Major TARGET Pediatric Cancer update; integration of all Pediatrics Land data into OncoHuman_B38_GC33; over 1000 new statistical comparisons derived from GTEx data and new projects in HumanDisease and MouseDisease.

Reminder: Get the most out of your OmicSoft subscription

Invitation to request new data curation

The OmicSoft team invites requests for new OncoLand, DiseaseLand and Single Cell Land expression projects to curate for upcoming releases. These will be included as part of your subscription.

Let us know if there are important datasets that you would like to see curated and represented in the Lands. Public expression studies for Human, Mouse and Rat (GEO, SRA or Array Express) will be evaluated. Single-cell transcriptomic projects, projects for bulk RNA-seq and commercial expression arrays from Affymetrix, Illumina and Agilent are compatible platforms. Please email omicsoft.support@qiagen.com for more information.

We are also evaluating proteomics datasets for curation starting later this year.

Updated Land versions

Updates to Lands data are being processed on Human genome B38 and GENCODE Gene Model, version 33 (“B38_GC33” Lands), including OncoHuman, HumanDisease, GTEx, Blueprint, CCLE, ENCODE_RNAbinding, TCGA, TARGET and TRACERx, in addition to the latest Single Cell Lands.

We recommend using the B38_GC33 version when available, as this is where you’ll find the newest and most comprehensively curated data.

Flat file “text dumps”

If your subscription includes access to OmicSoft Land “text dump” flat file exports, you can request the latest data in the form of a series of indexed tab-delimited tables. These files are perfect for larger exploratory meta-analysis and ML studies. If you are interested in access to the latest data via flat file, ask your OmicSoft account administrator to request the links. Keep an eye out for an announcement of new Lands APIs for powerful cross-database queries.

OncoLand — TARGET update and new OncoHuman data

TARGET — Major update of multi-omics data for pediatric cancers

The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers.

For this release, TARGET data have been reprocessed for Human Genome B38 and GENCODE gene model, version 33. As of publication, this corresponds to the latest data release, August 24, 2020.

Data updates include a 50% increase in the number of samples (6127 samples) for 19 diseases, 317 curated metadata fields, including information from 41 publications reviewed for this update, and 98 statistical comparisons.

Data types available for exploration in this update include Copy Number Variation (2719 samples), RNA-seq (1491), DNA-seq somatic mutation (2026) and miRNA-seq (2425).

Figure 1. Distribution of samples in TARGET_B38_GC33, grouped by DiseaseState. Bars are color coded according to OncoSampleType, which includes 19 pediatric cancers.

 

As part of this major update to TARGET Land, nearly 100 statistical comparisons between groups of samples were modeled by the OmicSoft team, revealing differences between genetic subtype, gender, histology, metastatic status and more.

Figure 2. Differential expression of genes from patients with acute lymphocytic leukemia (ALL) grouped by genetic subtype (TAL1 subtype vs HOXA subtype) reveals expression-based markers differentiating these populations (top panel); significantly higher expression of PlexinD1 was observed for ALL patients with HOXA genetic subtype compared to those with the TAL1 genetic subtype (bottom panel).

OncoHuman — Integrated database of thousands of curated oncology ‘omics datasets

Figure 3. Distribution of samples added to the latest release of OncoHuman. Samples are grouped by Disease Category and color coded according to Tissue Category.

 

This release adds 2426 samples and 230 comparisons from 43 datasets included in the following therapeutic areas:

  • Dermatology: GSE141465, GSE184398
  • Endocrinology, metabolism, bone: GSE112202, GSE138198, GSE153659, GSE184398, GSE9195
  • Gastroenterology: GSE155887, GSE184398, GSE21293, GSE41568, GSE45168, GSE47404, GSE51021, GSE57303, GSE59948, GSE67508, GSE87410, GSE88802
  • Hematology, coagulation: GSE151774, GSE158438
  • Immunomodulators: GSE151774, GSE155887
  • Neurology: GSE171197, GSE184398
  • General oncology: E-MTAB-62, GSE112202, GSE119400, GSE12093, GSE122698, GSE125113, GSE126548, GSE12763, GSE13787, GSE138198, GSE141465, GSE143152, GSE151072, GSE151774, GSE153659, GSE155887, GSE1561, GSE158438, GSE167573, GSE168845, GSE171197, GSE174167, GSE175648, GSE184398, GSE20318, GSE21293, GSE3744, GSE41568, GSE42749, GSE45168, GSE47404, GSE51021, GSE57303, GSE57422, GSE59948, GSE6596, GSE67508, GSE69630, GSE7880, GSE87410, GSE88802, GSE9195, GSE98979
  • Pulmonology: GSE119400, GSE122698, GSE125113, GSE126548, GSE184398, GSE20318, GSE42749, GSE57422, GSE69630, GSE7880, GSE87410, GSE98979
  • Renal disease: GSE167573, GSE168845, GSE175648, GSE184398
  • Urologic: GSE184398

 

 

Datasets or comparisons removed or reprocessed:

No datasets were removed this release.

As part of our standard review process, comparisons for the following already-Landed projects were revised. These comparisons can be identified by the content of “OSModifiedDate”:

GSE33939 GPL570, GSE33939 GPL570, GSE6566 GPL570, GSE145062 GPL20301, GSE102886 GPL10558, GSE114453 GPL20301, GSE11151 GPL570, GSE15824 GPL570, GSE5824 GPL96, GSE16354 GPL570, GSE6740 GPL96, GSE138077 GPL20795, GSE115853 GPL20795, GSE140101 GPL16791, GSE25087 GPL570, GSE10797 GPL571, GSE51798 GPL570, GSE162894 GPL24676

 

Pediatrics data integration into OncoHuman

To enable better cross-project analysis of oncology projects including pediatric cancers, Pediatrics Land has been recurated and incorporated into OncoHuman_B38_GC33.

This release includes 9384 samples and 542 comparisons from 144 datasets originally found in Pediatrics_B37.

Highlights of the recuration effort

  • Revised curation of fields to maintain consistency between projects within OncoHuman, including updated curation of DiseaseState, DiseaseStatus, RelapseStatus, MetastasisStatus, CellDescription, SampleOrigin, ConcurrentDisease, ExperimentGroup, OncoSampleType, DiseaseSubtype, Survival columns, TissueRegion, GeneticSubtype
    • DiseaseStatus describes the cancer status (OncoSampleType) at the time the sample was collected from the subject
    • MetastasisStatus describes whether the patient providing the sample had developed metastasis by the time of the follow up
    • RelapseStatus describes whether the patient providing the sample exhibited recurrence after a period of improvement and by the time of follow up
    • MutationStatus and MutationType define the mutation status of a given gene (MutationStatus[GeneName]) and the type of mutation identified for the gene (MutationType[GeneName]), available only when provided by the authors.
    • Revised DiseaseState unifies “normal control” and “disease control” values under “control”

Integrated projects from Pediatrics_B37

  • Hematologic cancer: GSE10172; GSE10609; GSE10792; GSE108088; GSE13351; GSE13425; GSE13576; GSE14062; GSE14286; GSE14471; GSE17195; GSE17459; GSE17703; GSE19143; GSE19475; GSE19577; GSE20910; GSE2191; GSE26281; GSE26713; GSE2677; GSE27237; GSE28460; GSE28703; GSE29326; GSE29986; GSE30392; GSE32962; GSE33315; GSE34670; GSE35504; GSE39816; GSE41621; GSE42001; GSE42038; GSE42056; GSE42221; GSE42765; GSE43176; GSE43209; GSE45249; GSE46170; GSE4698; GSE47051; GSE50999; GSE52891; GSE52991; GSE55876; GSE55877; GSE56488; GSE56599; GSE57795; GSE58290; GSE60926; GSE61999; GSE635; GSE63988; GSE64905; GSE66638; GSE67684; GSE69346; GSE74299; GSE7440; GSE74460; GSE75461
  • Nervous system cancer: E-TABM-1107; GSE100427; GSE108088; GSE109401; GSE12907; GSE12992; GSE13267; GSE14295; GSE16155; GSE17714; GSE18271; GSE19404; GSE21166; GSE26576; GSE28238; GSE28409; GSE29683; GSE29684; GSE30074; GSE32374; GSE34280; GSE3446; GSE34824; GSE35133; GSE35493; GSE37382; GSE37384; GSE38330; GSE39182; GSE39218; GSE42762; GSE43392; GSE44971; GSE47407; GSE49243; GSE50385; GSE51020; GSE54720; GSE5675; GSE59983; GSE60899; GSE63296; GSE67851; GSE68956; GSE70576; GSE73066; GSE74195; GSE77947; GSE83266; GSE8596; GSE86574; GSE89446; GSE90689
  • Musculoskeletal system cancer: GSE100427; GSE108088; GSE12865; GSE14827; GSE16088; GSE34620; GSE37371; GSE40018; GSE40021; GSE45544; GSE73166; GSE74970; GSE8596; GSE92689
  • Urinary system cancer: GSE10320; GSE108088; GSE11024; GSE11482; GSE2712; GSE53224; GSE68956; GSE90633
  • Gastrointestinal system cancer: GSE108088; GSE75271; GSE75284; GSE83518

Pediatrics_B37 datasets or comparisons excluded from integration into OncoHuman

  • Datasets that were redundant: GSE74183 GPL17586 (redundant with GSE75461), GSE2351 GPL96 (redundant with GSE635 GPL96), GSE28497 GPL96 (redundant with GSE33315 GPL96), GSE29686 GPL570 (superseries, replaced by GSE29684 GPL570),
  • Content that was re-assigned to HumanDisease: GSE34309 GPL571, GSE19919 GPL6480
  • Content already in OncoHuman: GSE37418 GPL570, GSE22139 GPL570
  • Methylation content (will be added to future release): GSE102994 GPL13534, GSE36278 GPL13534, GSE44684 GPL13534, GSE49377 GPL13534, GSE52556 GPL13534, GSE54719 GPL13534, GSE56600 GPL13534, GSE61044 GPL13534, GSE73801 GPL13534, GSE77241 GPL13534, GSE92577 GPL13534, GSE95486 GPL13534
  • Very old or unsupported data E-TABM-1107 GPL6801

 

DiseaseLand – New datasets

HumanDisease

Figure 4. Distribution of samples added in the latest release of HumanDisease, grouped by Disease Category (excluding normal control and disease control) and color coded by Tissue.

 

This release adds 3484 samples and 1037 comparisons from 123 datasets, including studies on

  • Cardiovascular disease (atherosclerotic disease, cardiomyopathy of various causes, heart failure, pulmonary hypertension): GSE109048, GSE111782, GSE112630, GSE118882, GSE120567, GSE120836, GSE120895, GSE124026, GSE125126, GSE125990, GSE126198, GSE130036, GSE131793, GSE132651, GSE143953, GSE152669, GSE153555, GSE155495, GSE159243, GSE175739, GSE188238, GSE159610, GSE144932, GSE193776, GSE194079, GSE194080, GSE160145
  • Endocrinology, metabolism, bone (Type 1 DM, : GSE111006, GSE111010, GSE111016, GSE151066, GSE158292, GSE163731, GSE150411
  • Gastroenterology (celiac disease, hepatic disease): GSE146441, GSE126409, GSE164266
  • Infectious disease (viral): GSE141498, GSE155925, GSE155986, GSE164366, GSE166337, GSE24132, GSE135192
  • Neurology: GSE137143, GSE145348, GSE145349
  • Pulmonology: E-MTAB-5029, ERP136980, GSE149413, GSE43402
  • Rheumatology (arthritis): GSE168505, GSE171652, GSE176199, GSE176223, GSE183531, GSE185064
  • Immune cell biology (T cells mainly derived from peripheral blood, cord blood, gut, tonsil and normal and diseased tissues): GSE17851, GSE20934, GSE1460, GSE158439, GSE138851, GSE166327, GSE119732, GSE174779, GSE164276, GSE175550, GSE162051, GSE125916, GSE151073, GSE99374, GSE81408, GSE69090, GSE28200, GSE17354, GSE122941, GSE129906, GSE131743, GSE137380, GSE144108, GSE154928, GSE163260, GSE164086, GSE23663, GSE33374, GSE62095, GSE62096, GSE62097, GSE93902, GSE105095, GSE129251, GSE129356, GSE132799, GSE135452, GSE135936, GSE145527, GSE146438, GSE151204
  • Aging (of muscle tissue, immune system): GSE111010, GSE111006, GSE60216

Datasets or comparisons removed or reprocessed for HumanDisease:

No datasets were removed this release.

As part of our standard review process, comparisons for the following already-Landed projects were revised, and can be identified by the content of “OSModifiedDate”: GSE40240 GPL6244, GSE11909 GPL96, GSE11909 GPL97, GSE35957 GPL570, GSE10161 GPL96, GSE42771 GPL570, GSE66385 GPL16791, GSE74158 GPL10558, GSE101508 GPL10558, GSE13699 GPL6883, GSE4172 GPL570, GSE120852 GPL16791, GSE95038 GPL16686, GSE18876 GPL5175, GSE115348 GPL20301, GSE145358 GPL18573, GSE144826 GPL17692, GSE7247 GPL570, GSE9927 GPL570, GSE11908 GPL96, GSE11908 GPL97, E-MTAB-4377 GPL15433

MouseDisease

Figure 5. Distribution of samples added in the latest release of Mouse Disease, grouped by Disease and color coded by Tissue.

 

This release adds 350 samples and 1037 comparisons from 18 datasets, including studies on

  • Cardiovascular disease (myocarditis, ischemic stroke): GSE155423, GSE107983
  • Endocrinology, metabolism, bone (dyslipidemias, obesity): GSE131348, GSE138810, GSE146470, GSE147412, GSE151182, GSE156254, GSE157201, GSE163652, GSE131348, GSE159882, GSE168676, GSE184836
  • Infectious disease (viral): GSE111861
  • Neurology: GSE111031, E-MTAB-10601
  • Pulmonology (COPD): GSE119257

 

Datasets or comparisons removed or reprocessed:

No datasets were removed this release.

As part of our standard review process, comparisons for the following already-Landed projects were revised, and can be identified with the content of “OSModifiedDate” : GSE101823 GPL17021, GSE101823 GPL21103, GSE103908 GPL19057, GSE113924 GPL19057, GSE125015 GPL21493, GSE126642 GPL17791, GSE131914 GPL21103, GSE132298 GPL6246, GSE13379 GPL1261, GSE139601 GPL16570, GSE147034 GPL17021, GSE28043 GPL6246, GSE35751 GPL1261, GSE35758 GPL1261, GSE35761 GPL1261, GSE35763 GPL1261, GSE35765 GPL1261, GSE38688 GPL6885, GSE40395 GPL4134, GSE41095 GPL1261, GSE42880 GPL13112, GSE50855 GPL10787, GSE55096 GPL1261, GSE60186 GPL1261, GSE60414 GPL11180, GSE61847 GPL6246, GSE62169 GPL16570, GSE65094 GPL1261, GSE77720 GPL13112, GSE85339 GPL13912, GSE95739 GPL13112

GTEx — Over 1000 new statistical comparisons, plus proteomics data

In the latest update, detailed comparisons were constructed between samples using detailed tissue information (TissueDetail_GTEx), sex (Gender) and age (AgeRange[years]). These comparisons revealed statistically relevant difference in expression between the following.

TissueDetail_GTEx vs others

  • Within a tissue
  • Within a tissue + sex
  • Within a tissue + age range
  • Within a tissue + sex + age range

Male vs female

  • Within a tissue
  • Within a tissue and age range

Age range vs others

  • Within a tissue
  • Within a tissue + sex

Figure 6. Schematic of sample groups used in new GTEx statistical comparisons. Comparisons were generated between groups defined along one or more axis: Male vs Female (X-axis), Tissue Detail (Y-axis) and age range (Z-axis).

Figure 7. In the analysis of sun-exposed skin, CXCL9 expression is significantly down-regulated for 20–29 year-old males vs other groups (top panel). Sample-level expression shows a trend of increased expression of CXCL9 in sun-exposed skin in older age groups (bottom panel).

 

In addition, 201 samples with Proteomics data (mass spectrometry) were integrated, allowing exploration and comparison of protein-level data.

Figure 8. CD44 (cell-surface adhesion receptor) RNA-seq expression vs protein detection across tissues in GTEx was plotted using the OmicSoft Studio function “RNA-seq Expresison=>MS integration”.

Processing pipeline and curation protocol changes

Changes to curation protocol: New AgeSummary metadata field

The AgeSummary field, which aggregates subject age-related information from multiple curated columns, has been added to relevant projects to aid discovery and visualization of age-related data.

  • Age – Used whenever the source data contain multiple units in a single column
  • Age[unit] – Used when the source data specify the time unit in the column header
    • Age[days]
    • Age[months]
    • Age[weeks]
    • Age[years]
  • AgeRange – Used when age values are provided as ranges within a project; especially used for modelling of statistical comparisons between groups
    • AgeRange[days]
    • AgeRange[months]
    • AgeRange[weeks]
    • AgeRange[years]