Release date: 2023-07-19
The OmicSoft team invites requests for new OncoLand and DiseaseLand expression projects to curate for upcoming releases. These will be included as part of your subscription.
Let us know which important datasets you would like curated and represented in the Lands. Public expression studies for human, mouse, and rat (GEO, SRA, or Array Express) will be evaluated. Single-cell transcriptomic projects, projects for bulk RNA-seq, and commercial expression arrays from Affymetrix®, Illumina®, and Agilent® are compatible platforms, as well as Mass Spectrometry proteomics datasets. Please email ts-bioinformatics@qiagen.com for more information.
We recommend using the latest version (“B38_GC33”) when available, as this is where you will find the newest and most comprehensively curated data. This video demonstrates how to add new Land databases to your dedicated server.
B38_GC33 Land databases include:
If your subscription includes access to OmicSoft Land “text dump” flat file exports, you can request the latest data in the form of a series of indexed tab-delimited tables, or through the MyQDI web interface https://my.qiagendigitalinsights.com.
These files are perfect for larger exploratory meta-analysis and ML studies.
QIAGEN®'s expert Field Application Scientists routinely hold online trainings for beginning and advanced users of OmicSoft Lands data, showcasing how to utilize these resources to answer scientific questions. See the upcoming webinars, as well as recordings of previous webinars, here: https://digitalinsights.qiagen.com/webinars-and-events/
Methylation data in Land databases such as the new Clinical Proteomic Tumor Land are supported starting with OmicSoft Suite v12.4. To take advantage of these new data, please update your OmicSoft Suite environment to v12.4+. An upcoming release of TCGA will include Methylation array data, which will also require v12.4+.
Starting with the 2023R2 release, QIAGEN OmicSoft is introducing a new Land database focused on cancer proteomics. The initial release includes three cancer proteomics projects from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) database, with 509 samples, 254 statistical comparisons on both RNA-seq and protein expression, and hundreds of clinical metadata fields. Data types include Mass Spectrometry (MS), RNA-seq, miRNA-seq, DNA-seq mutation- and CNV profiling, and methylation data.
To aid exploration of these datasets, definitions for relevant metadata fields to these studies can be found in this ClinicalProteomicTumor Definitions File, as well as tooltips within Omicsoft Studio.
Figure 1. Correlation of statistical comparisons of CD8+ inflamed (interferon-γ signaling) vs CD8− inflamed (platelet degranulation) tumors in clear cell Renal Cell Carcinoma (ccRCC) from PDC000127, at the RNA and protein level. (A) Differentially regulated genes at the RNA level, calculated by DEseq2. (B) Differentially regulated genes at the protein level, calculated by general linear model. Top up-regulated proteins were selected to illustrate concordance with RNA-level up-regulation. (C) Comparison of direction and magnitude between RNA (X-axis) and protein (Y-axis) differential expression. Genes measured only at the RNA-level are plotted with a fold-change = 0 on the Y-axis.
Figure 2. Expression of GBP2 at the RNA and protein level differentiates CD8+ vs other subtypes of ccRCC. (A) Expression of GBP2 at the RNA level, grouped by genetic subtype and OncoSampleType (Primary tumor vs Solid Tissue Normal). GBP2 is elevated across ccRCC tumor subtypes but is highest in CD8+ inflamed subtype. (B) Expression of GBP2 at the protein level, grouped by genetic subtype and OncoSampleType (Primary tumor vs Solid Tissue Normal). Similar to measurements of RNA, GBP2 protein is elevated across ccRCC tumor subtypes but is highest in CD8+ inflamed subtype. (C) correlation of RNA-seq (log2 FPM) and MS (log2 ratio) expression of GBP5 across pancreatic cancer, glioblastoma, and kidney cancer datasets.
Figure 3. New samples in OncoHuman_B38_GC33, grouped by DiseaseState and colored by TissueCategory.
This release adds 3746 samples and 874 comparisons from 22 datasets on the following topics:
Highlighted studies and topics:
In this release, a dataset of interest to many researchers (GSE186341) includes RNA-seq profiles of cell lines perturbed with 32 kinase inhibitors; this dataset served as the basis for a DREAM Challenge to assess computational algorithms for de novo drug polypharmacology predictions. In OncoHuman, profiles for 1728 samples and 703 statistical comparisons of kinase inhibitor responses are ready for exploration. Use these data to interrogate cell line specific responses to this set of kinase inhibitors.
Figure 4. Gene expression of RABL6 24 hours after treatment with gefitinib in nine profiled cell lines from GSE186341.
OncoHuman now has all compatible datasets previously found in Hematology_B37 (blood cancers) and MetastaticCancer_B37 (metastasis), meaning you can find nearly all curated oncology datasets in the OncoHuman database. All datasets uplifted from legacy databases have been reviewed and curated to meet the latest curation standards.
Figure 5. New samples in HumanDisease_B38_GC33, grouped by DiseaseState and colored by TissueCategory.
This release adds 2329 samples and 654 comparisons from 52 datasets, including studies on:
Figure 6. Profiled cell types derived from induced pluripotent stem cells (iPSC) and embryonic stem cells (ESC) in HumanDisease, including newly added projects. Filtering for OriginCell = “embryonic stem cells (ESC), induced pluripotent stem cells (iPSC)”, and grouping on the curated CellTypeCategory and CellType parameters, all derived cell types from either ESCs or iPSCs can be explored.
As part of our standard review process, comparisons for the following already-Landed projects were revised, and can be found by an updated "OSModifiedDate":
This release adds 811samples and 981comparisons from 27datasets, including studies on:
As part of our standard review process, comparisons for the following already-Landed projects were revised and can be found by an updated "OSModifiedDate": GSE130102
This release adds 502 samples and 1061 comparisons from 20 datasets, including studies on:
If you have further questions, please contact your local QIAGEN® representative or contact our Technical Support Center at www.qiagen.com/support/technical-support