QIAGEN powered by

Latest improvements for QIAGEN OmicSoft Lands

  Current line          Archive

QIAGEN OmicSoft Lands 1.2

Release date: 2023-02-07

QIAGEN OmicSoft Lands release notes 2022R4

Here is a quick overview of what is new in this release:

  • Hundreds of ‘omics studies have been added to OncoHuman and DiseaseLand.
  • CCLE has been updated to DepMap 2022.
  • BeatAML, METABRIC and CellLine Lands updates have been made to Human Genome 38.
  • New Single Cell Land datasets and new CellType vs Others comparisons have been added.

How to get the most out of your OmicSoft subscription

Request new data curation

The OmicSoft team invites requests for new OncoLand and DiseaseLand expression projects to curate for upcoming releases. These will be included as part of your subscription.

Let us know which important datasets you would like curated and represented in the Lands. Public expression studies for Human, Mouse and Rat (GEO, SRA or Array Express) will be evaluated. Single-cell transcriptomic projects, projects for bulk RNA-seq and commercial expression arrays from Affymetrix, Illumina and Agilent are compatible platforms. Please email omicsoft.support@qiagen.com for more information.

We are also evaluating proteomics datasets for curation starting later this year.

Use the latest Land database versions

We recommend using the latest version (“B38_GC33”) when available, as this is where you’ll find the newest and most comprehensively curated data.

B38_GC33 Lands include OncoHuman, HumanDisease, GTEx, Blueprint, CCLE, ENCODE_RNAbinding, TCGA, TARGET, TRACERx, METABRIC, CellLines and BeatAML, in addition to the latest Single Cell Lands.

Please reach out to your OmicSoft Server administrator to remind them to download the latest Land databases if you notice any missing databases. This video provides a concise explanation of what your OmicSoft Server administrator will do.

Use flat-file downloads of Land databases

If your subscription includes access to OmicSoft Land “text dump” flat-file exports, you can request the latest data in the form of a series of indexed tab-delimited tables or through the MyQDI web interface.

These files are perfect for larger exploratory meta-analysis and ML studies. If you are interested in access to the latest data via flat file, ask your OmicSoft account administrator to request the links.

Attend live and on-demand webinars

QIAGEN's expert Field Application Scientists routinely hold online trainings for beginning and advanced users of OmicSoft Lands data, showcasing how to use these resources to answer scientific questions. See the upcoming webinars, as well as recordings of previous webinars, here: https://digitalinsights.qiagen.com/webinars-and-events/

Learn more about OmicSoft’s new APIs

OmicSoft Lands are now searchable with new, powerful Python and R APIs that harness the syntax power of SQL to seamlessly search across all databases to find exactly the data you are interested in. Learn more on this blog post.

Figure 1. Schematic of the new OmicSoft Lands API. Python or R API clients submit SQL queries to the OmicSoft Lands API query engine, which searches all curated data in the Lands collection. Matching datasets are quickly returned to the client environment for analysis.

OncoHuman Updates

Figure 2. New samples in OncoHuman_B38_GC33, grouped by DiseaseCategory.

Summary

This release adds 6926 samples and 432 comparisons from 95 datasets. In addition, this release incorporates 3316 samples and 584 comparisons from 67 datasets of Hematology_B37 Land, which were revised and aligned to current curation standards and added to OncoHuman.

Highlighted dataset

  • Leucegene project: GSE49601 GPL11154, GSE62190 GPL11154, GSE67039 and GPL11154 were revised to include new metadata.

Colorectal peritoneal metastases study GSE190609, with paired primary tumor samples, was added.

Figure 3. Heatmap of RNA-seq expression of the top 109 differentially expressed genes between primary colorectal cancers and paired peritoneal metastases. Samples were grouped by OncoSampleType (Metastatic vs Primary Tumor) and SubjectID to confirm consistent differential expression between the groups, which are indicated with colored bars at the bottom border.

Highlighted topics

  • Breast cancer: GSE199135, GSE167213, GSE193542, GSE210399, GSE165914, GSE178708, GSE157284, GSE162228
  • Ovarian cancer: GSE191231, GSE193875, GSE201203, GSE117765, GSE190902, GSE195984
  • Colorectal cancer: GSE167395, PRJNA814344, GSE179979, GSE146587, GSE161023, GSE164541, GSE196006, GSE162960, GSE159216, GSE190609, GSE180440, GSE200427, PRJEB41875, GSE128213, GSE178120, GSE183202, GSE132024, GSE209746, GSE158559, GSE106584, GSE170999, GSE183984, GSE197802, GSE18088, GSE80606, PRJNA816986, GSE157004
  • Gastric cancer: GSE84433, GSE84426, GSE183136
  • Hematologic cancers: (ALL, AML, MM and others): GSE148658, GSE137768, GSE115895, GSE115464, GSE95648, GSE165405, GSE138803, GSE138659, GSE37389, GSE150372, GSE127180, GSE39041, GSE114085, GSE147931, GSE174537, GSE72213, GSE138717, PRJEB30312
  • Lung cancer: GSE74777, GSE141755, GSE142186, GSE162353, GSE133518
  • Relapsed or refractory disease (various cancers): GSE195933, GSE151594, GSE183817, GSE171806, GSE162095
  • Drug efficiency studies (mostly treated vs control): GSE151594, GSE146362, GSE155559, GSE202434, GSE157982, GSE171806, GSE193542, GSE210399, GSE191231, GSE201203, GSE183984, GSE84433, GSE84426, GSE183136, GSE196038, GSE150372, GSE152755, GSE120844, GSE138717, GSE133518, GSE199107
  • Studies containing paired samples (primary tumor–metastasis, tumor–nontumor): GSE162228, GSE146587, GSE164541, GSE196006, GSE190609, GSE180440, GSE200427, GSE128213, GSE167488, GSE38476
  • Studies investigating the value of different prognostic biomarkers: GSE197802, GSE18088, GSE183136, GSE74777, GSE138717, GSE99420
  • Other additions: GSE174302 , GSE182824 , GSE176559 , GSE142514 , GSE54268 , GSE54267 , GSE54266 , GSE54265 , GSE54264 , GSE51984 , GSE169038 , GSE178631

 

Removed or reprocessed datasets or comparisons

No datasets were removed from this release.

As part of our standard review process, comparisons for the following already-Landed projects were revised and can be identified with an updated “OSModifiedDate”:

GSE68468 GPL96, GSE61723 GPL16686, GSE147043 GPL18573, GSE26980 GPL5175, GSE94035 GPL11154, GSE13906 GPL570, GSE94035 GPL11154

BeatAML — New data and metadata

This update of the BeatAML study integrates genomic and drug response data published in Bottomly et al, 2022 as well as proteomic data published in Gosline et al., 2022, increasing the utility of these data for those studying AML and related cancers.

CCLE — DepMap updates and updated data dictionary

CCLE_B38_GC33 has been updated to include updates from in DepMap 2022Q2, including 28 new cell lines, and a new metadata dictionary is available: https://resources.omicsoft.com/downloads/land/CCLE/CCLE_B38_GC33_DataDictionary.xlsx

New metadata fields describe the following parameters:

  • ParentID[DepMap] and SubjectID[DepMap]:The parental cell line IDs and/or the SubjectID from which the cell line was derived.
  • DiseaseState[Cellosaurus] and DiseaseState[Cellosaurus][NCItCode]: The DiseaseState associated with each cell line, as annotated in the Cellosaurus repository.
  • CellDescription: This now includes model manipulation strategies.

Additional OmicSoft CV fields added this release: CellLine, CellType, AgeCategory, GeneDependency[XPR1][PMID: 35437317], PairingType, PairingStatus, AgeSummary, SampleMaterial, SampleType, Molecule

CellLine Land — Integrating GSK, NCI and Pfizer cell-line profiling datasets

This release includes the new CellLine_B38_GC33 Land, which combines and updates the Lands CellLine_GSK_B37, CellLine_NCI_B37 and CellLine_Pfizer_B37 with new metadata and the latest standards. Use these data as a complement to CCLE_B38_GC33 to explore ‘omics and metadata information for nearly 2000 cell lines.

Figure 4. Cell lines available in CellLine Land CellLine_B38_GC33. CellLine Land combines multi-omics data from three profiling projects (GSK, NCI and Pfizer) that are grouped on the Y-axis and colored according to the curated Histology metadata field.

METABRIC — Breast cancer multi-omics study

Data from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) trial have been re-analyzed and re-curated on Human Genome 38 and Gencode.V33. 4128 samples with curated metadata, CNV (1992 samples) and microarray Expression Intensity Probes (2136 samples).

Figure 5. Distribution of METABRIC samples grouped by GeneticSubtype, and colored according to OncoSampleType.

DiseaseLand updates

Figure 6. Distribution of new samples added to HumanDisease, MouseDisease and RatDisease (excluding controls), grouped by DiseaseCategory.

HumanDisease

This release of HumanDisease adds 3064 samples and 541 comparisons from 75 datasets.

Highlighted topics

  • Aging: GSE164012, GSE150137, GSE148219, GSE148219
  • CNS diseases (bipolar disorder, epilepsy, schizophrenia): GSE134497, GSE7624, GSE62699, GSE191248, GSE174704, GSE165604, GSE133534, GSE121376, GSE119290, GSE93577, GSE26629
  • Immune cells profiling: GSE174284, GSE128163, GSE81975, GSE55843, GSE184784, GSE168642, GSE151079, GSE112923
  • Diabetes mellitus: GSE203346, GSE164416, GSE159984, GSE159984, GSE157988, GSE156903, GSE193273, GSE164338, GSE113969, GSE156248, GSE166502, GSE166467, GSE161355, GSE156061
  • Immune mediated diseases (IBD, psoriasis): GSE207022, GSE206285, GSE201397
  • Viral diseases (HIV, CoV, mononucleosis, Zika): GSE132228, GSE85599, GSE152418, GSE151453, GSE144585, GSE168658
  • Transplant: GSE192444 (the value of monitoring cfDNA to assess transplant rejection), GSE146495, GSE145780
  • Metabolic diseases (dyslipidemia, obesity): GSE126352, GSE156247, GSE197285, GSE159955, GSE144414
  • Kidney disease: GSE163603, GSE175759
  • Other additions: GSE111977, GSE117887, GSE12293, GSE125805, GSE125999, GSE128367, GSE129247, GSE134048, GSE134555, GSE140844, GSE141136, GSE143692, GSE158312, GSE159337, GSE159924, GSE163244, GSE181076, GSE181258, GSE182875, GSE53667, GSE54112, PRJEB20634, PRJNA736745, syn21861227

Removed or reprocessed datasets or comparisons

No datasets were removed from this release.

As part of our standard review process, comparisons for the following already-Landed projects were revised, and can be identified with an updated “OSModifiedDate”:

GSE30933 GPL6255, GSE35957 GPL570, GSE74988 GPL11532, GSE137619 GPL21290, GSE7036 GPL570, GSE33070 GPL6244, GSE16031 GPL6106, GSE16031 GPL6097

 

MouseDisease

This update of MouseDisease adds 1499 samples and 564 comparisons from 62 datasets

Highlighted topics

  • Atherosclerosis: GSE163657, GSE193118, GSE164517, GSE90835, GSE104914, GSE143162, GSE94044, GSE120565, GSE109259, GSE102558, GSE179952, GSE116569, GSE118463, GSE191044, GSE180649, GSE93954
  • Stroke: GSE173714, GSE173713, GSE137482, GSE116878, GSE128623
  • Metabolic disease or dyslipidemia: GSE102072, GSE120120, GSE136792, GSE136797, GSE135734, GSE125946
  • Liver disease: GSE179394, GSE179394
  • Kidney disease: GSE69556
  • Diabetes mellitus: GSE153431, GSE142204
  • Embryo or fetal tissue profiling: GSE55966, GSE72491, GSE33979
  • Adult tissue profiling: GSE74747, GSE33141, GSE67991, GSE53105, GSE65388, GSE77997, GSE63810 (brain, heart, kidney, liver, lung, skin, spleen, testis, thymus, eye or retina, pancreas, bone, hair follicle)
  • Other additions: GSE12293, GSE126481, GSE134005, GSE140369, GSE162660, GSE163060, GSE164672, GSE185734, GSE186971, GSE190156, GSE190812, GSE22131, GSE45278, GSE58261, GSE60243, GSE68155, GSE68283, GSE68284, GSE72095, GSE72165, PRJEB20634

Removed or reprocessed datasets or comparisons

No datasets were removed from this release.

As part of our standard review process, comparisons for the following already-Landed projects were revised, and can be identified with an updated “OSModifiedDate”: GSE40156 GPL8321, GSE10000 GPL8321, GSE29752 GPL11002

RatDisease

This RatDisease update adds 492 samples and 202 comparisons from 20 datasets.

Highlighted topics

  • Brain injury and stroke: GSE115614, GSE171144, GSE148350, GSE162072
  • Immune-cell profiling: GSE156188
  • Metabolic disease, obesity: GSE149829, GSE176298
  • Osteoarthritis: GSE99021
  • Tissue profiling: retina GSE133563 (glaucoma model), GSE110675; muscle GSE118825 (aging), GSE162565 (muscle repair)
  • Cardiovascular disease: GSE130102, GSE159722, GSE107551, GSE135172
  • Other additions: GSE114031, GSE131012, GSE141650, GSE147732

Removed or reprocessed datasets or comparisons

No datasets were removed from this release.

As part of our standard review process, comparisons for the following already-Landed projects were revised, and can be identified with an updated “OSModifiedDate”: GSE57800 GPL1355, GSE57805 GPL1355, GSE57815 GPL1355, GSE57811 GPL1355

Single Cell Lands

The latest release of Single Cell Lands adds 77 new projects with 4294 comparisons to HumanUmi_B38_GC33 and 18 new projects with 664 new comparisons to MouseUmi_B38_GC33.

In addition, all data from HumanUmiLite_B38_GC33 and MouseUmiLite_B38_GC33 are now within HumanUmi_B38_GC33 and MouseUmi_B38_GC24; you can safely delete the “Lite” Lands.

New to this release are curated “cell type” comparisons, which complement the comparisons between computationally identified clusters to reveal differences between distinct cell types.

For example, comparisons can be found between Schwann cells and other cell types in a CellMap-dimension reduction analysis. In each comparison, all clusters curated as “Schwann cell” in a CellMap were grouped together and compared to all other cells in the CellMap. This analysis reveals genes that are particularly up- or down-regulated in Schwann cells, such as ERBB3. Other visualizations, such as the Percentage Cells Expressing and Gene Expression Overlay Views, can confirm that this is supported by evidence from multiple projects.

Figure 7. After searching for pre-computed comparisons between Schwann cells and other cells, the top up-regulated genes were visualized with the Significant Genes table, revealing genes including ERBB3, CADM3 and CD9.

Figure 8. Searching for ERBB3 expression across selected datasets that included Schwann cells reveal consistently high expression across studies, as shown in the Percentage Expressing Cells plot in which cells are grouped by ClusterCellType and ProjectName.

Figure 9. The Gene Expression Overlay plot reveals up-regulation of ERBB3 in multiple CellMaps (top panel) with curated Schwann Cell populations (bottom panel). Blue arrows indicate curated Schwann Cell clusters, which were compared to all other cells in a CellMap of new Cell Type vs Others comparisons.

Revisions to the OmicSoft curation protocol

Project.StudyRevision is a new Project metadata field that captures significant differences between the OmicSoft representation of a dataset and the original dataset (i.e., as found in the sources). These significant differences do not include standard transformations, such as data reprocessing through OS pipelines, metadata formatting due to the use of controlled vocabularies or application of curation protocol.

For example, this field will be used by curators to capture changes caused by additional input from the authors regarding the dataset, after they were contacted to clarify metadata inconsistencies.

Whenever landed metadata has been altered by the addition of author input, Project.StudyRevision will contain “Authors Contacted” and Project.Comments will end with the same string (“Authors Contacted”) followed by a short explanation of the revisions made according to the information received from the authors.

DiseaseState curation for CCLE_B38_GC33 2022R4 uses CCLE sources (repository and paper) rather than using Cell Line description defined in the OmicSoft ontology to maintain consistency with the CCLE terminology.