
Release date: 2020-06-26
In this release: Major TARGET Pediatric Cancer update; integration of all Pediatrics Land data into OncoHuman_B38_GC33; over 1000 new statistical comparisons derived from GTEx data and new projects in HumanDisease and MouseDisease.
The OmicSoft team invites requests for new OncoLand, DiseaseLand and Single Cell Land expression projects to curate for upcoming releases. These will be included as part of your subscription.
Let us know if there are important datasets that you would like to see curated and represented in the Lands. Public expression studies for Human, Mouse and Rat (GEO, SRA or Array Express) will be evaluated. Single-cell transcriptomic projects, projects for bulk RNA-seq and commercial expression arrays from Affymetrix, Illumina and Agilent are compatible platforms. Please email omicsoft.support@qiagen.com for more information.
We are also evaluating proteomics datasets for curation starting later this year.
Updates to Lands data are being processed on Human genome B38 and GENCODE Gene Model, version 33 (“B38_GC33” Lands), including OncoHuman, HumanDisease, GTEx, Blueprint, CCLE, ENCODE_RNAbinding, TCGA, TARGET and TRACERx, in addition to the latest Single Cell Lands.
We recommend using the B38_GC33 version when available, as this is where you’ll find the newest and most comprehensively curated data.
If your subscription includes access to OmicSoft Land “text dump” flat file exports, you can request the latest data in the form of a series of indexed tab-delimited tables. These files are perfect for larger exploratory meta-analysis and ML studies. If you are interested in access to the latest data via flat file, ask your OmicSoft account administrator to request the links. Keep an eye out for an announcement of new Lands APIs for powerful cross-database queries.
The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers.
For this release, TARGET data have been reprocessed for Human Genome B38 and GENCODE gene model, version 33. As of publication, this corresponds to the latest data release, August 24, 2020.
Data updates include a 50% increase in the number of samples (6127 samples) for 19 diseases, 317 curated metadata fields, including information from 41 publications reviewed for this update, and 98 statistical comparisons.
Data types available for exploration in this update include Copy Number Variation (2719 samples), RNA-seq (1491), DNA-seq somatic mutation (2026) and miRNA-seq (2425).
Figure 1. Distribution of samples in TARGET_B38_GC33, grouped by DiseaseState. Bars are color coded according to OncoSampleType, which includes 19 pediatric cancers.
As part of this major update to TARGET Land, nearly 100 statistical comparisons between groups of samples were modeled by the OmicSoft team, revealing differences between genetic subtype, gender, histology, metastatic status and more.
Figure 2. Differential expression of genes from patients with acute lymphocytic leukemia (ALL) grouped by genetic subtype (TAL1 subtype vs HOXA subtype) reveals expression-based markers differentiating these populations (top panel); significantly higher expression of PlexinD1 was observed for ALL patients with HOXA genetic subtype compared to those with the TAL1 genetic subtype (bottom panel).
Figure 3. Distribution of samples added to the latest release of OncoHuman. Samples are grouped by Disease Category and color coded according to Tissue Category.
This release adds 2426 samples and 230 comparisons from 43 datasets included in the following therapeutic areas:
As part of our standard review process, comparisons for the following already-Landed projects were revised. These comparisons can be identified by the content of “OSModifiedDate”:
GSE33939 GPL570, GSE33939 GPL570, GSE6566 GPL570, GSE145062 GPL20301, GSE102886 GPL10558, GSE114453 GPL20301, GSE11151 GPL570, GSE15824 GPL570, GSE5824 GPL96, GSE16354 GPL570, GSE6740 GPL96, GSE138077 GPL20795, GSE115853 GPL20795, GSE140101 GPL16791, GSE25087 GPL570, GSE10797 GPL571, GSE51798 GPL570, GSE162894 GPL24676
To enable better cross-project analysis of oncology projects including pediatric cancers, Pediatrics Land has been recurated and incorporated into OncoHuman_B38_GC33.
This release includes 9384 samples and 542 comparisons from 144 datasets originally found in Pediatrics_B37.
Figure 4. Distribution of samples added in the latest release of HumanDisease, grouped by Disease Category (excluding normal control and disease control) and color coded by Tissue.
This release adds 3484 samples and 1037 comparisons from 123 datasets, including studies on
As part of our standard review process, comparisons for the following already-Landed projects were revised, and can be identified by the content of “OSModifiedDate”: GSE40240 GPL6244, GSE11909 GPL96, GSE11909 GPL97, GSE35957 GPL570, GSE10161 GPL96, GSE42771 GPL570, GSE66385 GPL16791, GSE74158 GPL10558, GSE101508 GPL10558, GSE13699 GPL6883, GSE4172 GPL570, GSE120852 GPL16791, GSE95038 GPL16686, GSE18876 GPL5175, GSE115348 GPL20301, GSE145358 GPL18573, GSE144826 GPL17692, GSE7247 GPL570, GSE9927 GPL570, GSE11908 GPL96, GSE11908 GPL97, E-MTAB-4377 GPL15433
Figure 5. Distribution of samples added in the latest release of Mouse Disease, grouped by Disease and color coded by Tissue.
This release adds 350 samples and 1037 comparisons from 18 datasets, including studies on
As part of our standard review process, comparisons for the following already-Landed projects were revised, and can be identified with the content of “OSModifiedDate” : GSE101823 GPL17021, GSE101823 GPL21103, GSE103908 GPL19057, GSE113924 GPL19057, GSE125015 GPL21493, GSE126642 GPL17791, GSE131914 GPL21103, GSE132298 GPL6246, GSE13379 GPL1261, GSE139601 GPL16570, GSE147034 GPL17021, GSE28043 GPL6246, GSE35751 GPL1261, GSE35758 GPL1261, GSE35761 GPL1261, GSE35763 GPL1261, GSE35765 GPL1261, GSE38688 GPL6885, GSE40395 GPL4134, GSE41095 GPL1261, GSE42880 GPL13112, GSE50855 GPL10787, GSE55096 GPL1261, GSE60186 GPL1261, GSE60414 GPL11180, GSE61847 GPL6246, GSE62169 GPL16570, GSE65094 GPL1261, GSE77720 GPL13112, GSE85339 GPL13912, GSE95739 GPL13112
In the latest update, detailed comparisons were constructed between samples using detailed tissue information (TissueDetail_GTEx), sex (Gender) and age (AgeRange[years]). These comparisons revealed statistically relevant difference in expression between the following.
TissueDetail_GTEx vs others
Male vs female
Age range vs others
Figure 6. Schematic of sample groups used in new GTEx statistical comparisons. Comparisons were generated between groups defined along one or more axis: Male vs Female (X-axis), Tissue Detail (Y-axis) and age range (Z-axis).
Figure 7. In the analysis of sun-exposed skin, CXCL9 expression is significantly down-regulated for 20–29 year-old males vs other groups (top panel). Sample-level expression shows a trend of increased expression of CXCL9 in sun-exposed skin in older age groups (bottom panel).
In addition, 201 samples with Proteomics data (mass spectrometry) were integrated, allowing exploration and comparison of protein-level data.
Figure 8. CD44 (cell-surface adhesion receptor) RNA-seq expression vs protein detection across tissues in GTEx was plotted using the OmicSoft Studio function “RNA-seq Expresison=>MS integration”.
The AgeSummary field, which aggregates subject age-related information from multiple curated columns, has been added to relevant projects to aid discovery and visualization of age-related data.
The OmicSoft team invites requests for new OncoLand and DiseaseLand expression projects to curate for upcoming releases, as part of your subscription.
Let us know if there are important datasets that you would like to see curated and represented in the Lands. Public expression studies for Human, Mouse and Rat (GEO, SRA or Array Express) will be evaluated. Single-cell transcriptomic projects, projects for bulk RNA-seq and commercial expression arrays from Affymetrix, Illumina and Agilent are compatible platforms. Please email omicsoft.support@qiagen.com for more information.
We are also evaluating proteomics datasets for curation starting later this year.
OmicSoft continues to reprocess our most popular Land databases to move from Human genome B37 to Human genome B38 and Gencode Gene Model version 33. We recommend using the latest version (“B38_GC33”) when available, as this is where you’ll find the newest and most comprehensively curated data. Lands available in B38_GC33 include OncoHuman, GTEx, TCGA, Blueprint, CCLE, HumanDisease and TRACERx, in addition to the latest Single Cell Lands.
If your subscription includes access to OmicSoft Land “text dump” flat file exports, you can request the latest data in the form of a series of indexed tab-delimited tables. These files are perfect for larger exploratory meta-analysis and ML studies. If you are interested in access to the latest data via flat file, ask your OmicSoft account administrator to request the links.
Figure 1. Distribution of samples added in the latest release of OncoHuman, grouped by Disease Category and Colored by Tissue Category.
This release of OncoHuman_B38_GC33 adds 2954 samples and 404 comparisons from 69 projects, including studies on:
As a reminder, all new oncology-focused project requests will be added to the integrated OncoHuman database, including hematology, pediatrics and metastatic data.
Figure 2. RNA-seq expression of key differentially expressed genes between genetic subtypes of pancreatic ductal adenocarcinoma, from study E-MTAB-6830. Pre-computed comparisons from this study were used to identify top differentially expressed genes; these genes were then searched to generate a Gene FPKM heatmap, sorted by the curated field GeneticSubType.
Figure 3. Distribution of samples added in the latest release of Human Disease, grouped by Disease Category and colored by Tissue Category.
This release of HumanDisease_B38_GC33 adds 7555 samples and 1293 comparisons from 99 project IDs, including studies on:
Figure 4. Differential expression of 30 genes identified as being consistently differentially regulated between pairs of treatments in inflammatory bowel disease peripheral blood mononuclear cells (PBMCs) from GSE137680. Extensive pre-computed comparisons between groups of samples treated with various stimuli enable quick identification and refinement of signatures for treatments.
Figure 5. Distribution of samples added to MouseDisease in the latest release, grouped by Disease State and colored by Tissue Category.
This release adds 774 samples and 1731 comparisons from 32 project IDs, including studies on:
Figure 6. Distribution of samples added to RatDisease in the latest release, grouped by Tissue Category and colored by Disease Category.
This release adds 264 samples and 98 comparisons from 17 project IDs, including studies on:
Our Field Application Scientists routinely host online webinars on the basics of QIAGEN OmicSoft data exploration and advanced use cases to help you answer your scientific questions more quickly.
Browse upcoming webinars for OmicSoft and IPA.
Explore our extensive repository of webinars and video tutorials.
Have a topic of interest that you would love to see covered in a webinar? Email your requests to ts-bioinformatics@qiagen.com.
We’d love to meet you at our IPA and OmicSoft user meeting in Boston on September 22 and 23. Check out our registration page for more information and to save your spot.
We are currently planning an IPA and OmicSoft user meeting in London in October. Keep an eye out for your invitation to arrive next month, and reach out to your sales rep with questions or to express interest.
Review the full details of the OmicSoft 2022R2 release here.
Release notes on all past releases are found here.
Learn more about the QIAGEN OmicSoft portfolio here.
The OmicSoft team is now taking requests for new OncoLand and DiseaseLand expression projects to curate for upcoming releases, which will be included as part of your subscription.
Let us know if there are important datasets that you would like to see curated and represented in the Lands. Public (GEO, SRA and Array Express) expression studies for human, mouse and rat will be evaluated. Single-cell transcriptomic projects, projects for bulk RNA-seq and commercial expression arrays from Affymetrix, Illumina and Agilent are compatible platforms. Please contact us at omicsoft.support@qiagen.com for more information.
We are also evaluating proteomics datasets for curation starting later this year.
OmicSoft continues to reprocess our most popular Land databases for the move from Human genome B37 to Human genome B38 and Gencode Gene Model version 33. We recommend that you use the latest version ("B38_GC33") when available, as you will find the newest and most comprehensively curated data here. Lands available in B38_GC33 include OncoHuman, GTEx, TCGA, Blueprint, CCLE, HumanDisease and TRACERx, in addition to the latest Single Cell Land data.
If your subscription includes access to OmicSoft Land "text dump" flat-file exports, you can request the latest data in the form of a series of indexed tab-delimited tables. These files are perfect for large exploratory meta-analyses and machine learning studies. If you are interested in accessing the latest data via flat files, ask your OmicSoft account administrator to request the links.
Figure 1. New samples in OncoHuman, grouped by DiseaseState and colored by TissueCategory.
This release adds studies on cancers of the gastrointestinal system, central nervous system, prostate, breast, ovary, osteo-articular, head and neck as well as on melanoma.
The datasets explore the gene-expression profiling associated with the influence of tumor microenvironment, drug-induced gene modulation, tumor–nontumor paired samples, metastases, 3D cultures, organoids and xenograft models. Projects focusing on gene editing tools (CRISPR-Cas9) are also included.
Included in this release are studies that explore the gene-expression profiling associated with the mechanisms of action of immunomodulators in vitro, pre–post treatment paired samples and CRISPR-Cas9 genome editing as well as projects included in the online resource for interactive exploration of hematopoietic cancer data (Hemap).
All projects added: GSE101209, GSE10258, GSE105083, GSE105130, GSE105439, GSE106272, GSE109319, GSE11118, GSE112221, GSE114326, GSE114856, GSE115853, GSE119688, GSE120647, GSE124189, GSE128405, GSE12902, GSE131792, GSE132215, GSE132233, GSE132624, GSE134173, GSE137245, GSE137528, GSE138581, GSE140077, GSE141116, GSE141444, GSE141605, GSE142719, GSE147147, GSE147704, GSE148372, GSE148444, GSE148483, GSE149327, GSE152312, GSE153713, GSE155547, GSE159493, GSE160401, GSE162945, GSE163639, GSE163646, GSE163817, GSE18832, GSE18866, GSE19681, GSE2842, GSE65867, GSE70926, GSE71519, GSE71520, GSE7538, GSE77314, GSE78025, GSE80333, GSE82110, GSE83479, GSE84023, GSE86518, GSE8685, GSE8687, GSE9250, GSE95499, GSE97098
The Encyclopedia of DNA Elements (ENCODE) Consortium is an ongoing international collaboration of research groups funded by National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels as well as regulatory elements that control cells and circumstances in which a gene is active.
ENCODE_RNAbinding_B38_GC33 allows in-depth exploration of the splicing and gene-expression impacts of loss of hundreds of RNA- and DNA-associated genes, generated as part of the ENCODE Consortium.
This release contains RNA-seq experiments of 1122 samples (561 ENCODE experiments) for two popular cell lines (K562 and HEPG2) after shRNA knockdown targeting various proteins:
The Sample Distribution View displays the number of samples available. Default primary grouping is performed by TargetCategory (the functional class of the targeted gene).
Figure 2. Sample Distribution View of ENCODE_RNAbinding
After searching for a gene, the default visualization is the Gene FPKM View, which plots the expression of the specified gene in each assay. It may be useful to filter or trellis to plot one of two cell types (use CellLine: K562 or HEPG2) to see differences in expression in different knockdown experiments compared to control samples (TargetCategory: NA; Transfection: control shRNA).
Figure 3. Gene FPKM view in K562 cells treated by shRNAs, grouped by TargetCategory
Because many of these gene knockdowns affect splicing, the Transcript FPKM Views will frequently be useful. Use the Transcript FPKM (Individual Chart) View to see details of each transcript's measured FPKM per sample, or the Transcript FPKM (Multi-transcript Chart) to get an overview of average expression across groups.
Figure 4. Transcript FPKM View (Individual Chart) of one splice variant of NASP, before filtering and regrouping
At this point, you may want to focus on certain shRNAs or TargetCategories, such as "RNA binding proteins". Filter for (TargetCategory=RNA binding protein | NA), then set the Profile column to "Transfection" to visualize the details of which shRNA knockdowns led to significant down-regulation or up-regulation of specific transcript variants compared to control shRNA.
Figure 5. Transcript FPKM View (Individual Chart) of one splice variant of NASP, after filtering for TargetCategory="RNA binding protein" and "NA", and regrouping on "Transfection".
The Exon Junction Summary provides details of the relative usage of both Known and Novel detected exon junctions. Use the "ExonJunction" filters to select Known and/or Novel junctions. It may be helpful to filter according certain transfections, such as those that showed transcript-level differences
Figure 6. Exon Junction Summary View of control samples and two transfection experiments for factors that showed transcript-level differences for NASP.
Figure 7. New samples in HumanDisease, grouped by DiseaseCategory and colored by TissueCategory (Normal Control and Disease Control Samples are hidden).
This release of HumanDisease adds 5086 samples and 1169 comparisons from 99 unique project IDs.
All projects added: E-MEXP-2681, E-MEXP-3890, E-TABM-206, GSE1017, GSE101988, GSE102737, GSE10361, GSE10685, GSE106893, GSE10760, GSE108350, GSE11199, GSE112594, GSE115112, GSE116178, GSE11686, GSE116899, GSE118106, GSE11864, GSE119501, GSE11971, GSE119749, GSE120226, GSE120502, GSE12108, GSE124284, GSE12648, GSE1295, GSE129921, GSE1300, GSE130038, GSE13070, GSE131503, GSE131527, GSE13205, GSE133822, GSE135004, GSE135251, GSE13670, GSE13762, GSE138734, GSE138746, GSE142206, GSE146028, GSE15090, GSE150954, GSE151875, GSE151897, GSE153015, GSE154474, GSE154919, GSE154988, GSE155322, GSE157840, GSE158485, GSE159437, GSE161549, GSE165849, GSE1724, GSE173808, GSE18583, GSE21164, GSE21496, GSE22501, GSE24235, GSE28998, GSE29402, GSE38064, GSE40645, GSE47881, GSE4984, GSE5110, GSE5679, GSE57178, GSE57662, GSE58331, GSE67427, GSE68689, GSE7014, GSE71320, GSE7247, GSE7509, GSE7874, GSE80, GSE80785, GSE8441, GSE85761, GSE8658, GSE8668, GSE89020, GSE89827, GSE90152, GSE9101, GSE93776, GSE9916, GSE9927, GSE99999, PRJNA512027, SRP151738
Figure 8. New samples in MouseDisease, grouped by DiseaseState and colored by TissueCategory (Normal Control and Disease Control Samples are hidden).
This release adds 1556 samples and 782 comparisons from 49 projects.
All projects added: E-MTAB-8566, GSE104342, GSE106720, GSE109776, GSE110384, GSE110978, GSE110979, GSE110980, GSE110981, GSE110982, GSE112453, GSE113727, GSE113943, GSE116485, GSE117736, GSE117762, GSE117763, GSE121646, GSE124394, GSE124670, GSE133878, GSE138778, GSE139601, GSE139987, GSE141492, GSE141782, GSE144838, GSE145301, GSE145720, GSE155460, GSE156895, GSE158807, GSE160020, GSE160021, GSE162276, GSE162863, GSE162869, GSE166488, GSE166867, GSE167032, GSE167033, GSE167034, GSE167216, GSE169275, GSE179417, GSE56772, GSE57528, GSE57583, GSE84948
To access the latest Single Cell Land data in your subscription, you must use at least OmicSoft Suite v11.6.
Starting with this release, a new set of "Lite" Lands are available (they have "Lite" in the title). These Lite Lands are comprised of UMI-based projects with cell-level cell-type annotations from authors, enabling OmicSoft curators to define cell clusters with exactly the same cells that the authors identified.
These project and sample metadata are still fully curated, and the ClusterCellTypes use the OmicSoft CellType ontology. The gene-expression values are extracted from the data submission matrix, OmicSoft defines the samples to be included in dimension reduction CellMaps, and the author cell annotations define the clusters.
Figure 9. More datasets can now be incorporated into the Single Cell Land framework with the new Lite protocol.
Figure 10. Millions of new cells from 40 new projects are available in Single Cell Land.
New datasets that may be of interest:
Full list of projects added in this latest release: GSE106960, GSE123046, GSE126836, GSE127465, GSE138707, GSE139186, E-MTAB-6308, E-MTAB-8007, GSE101207, GSE108291, GSE110949, GSE117403, GSE117570, GSE119212, GSE124887, GSE124888, GSE125188, GSE129007, GSE129308, GSE132802, GSE137829, GSE138709, GSE138852, GSE139324, GSE140231, GSE142784, GSE145633, GSE150132, GSE151087, GSE153889, GSE154778, GSE157277, GSE162726, GSE92495, GSE97168, PRJCA001063, PRJEB39602, GSE114725, NYSCF, TabulaSapiens
Tabula sapiens (HumanUmiLite_B38_GC33) provides a great complement to Human Cell Landscape (HCL_B38) as an Cell Atlas across 24 subtypes.
Figure 11. Restricted expression of CD34 within lung tissue from Tabula Sapiens. CD34 is up-regulated in multiple identified endothelial cell types compared to other cells in the lung tissue (TabulaSapiens CellMap11). CD34 expression is visualized in a subset of cells (top panel), with curated ClusterCellType, indicating that these were a group of endothelial cells all within lung tissue (middle panel). Statistical comparisons pre-computed for each cluster against other clusters in CellMap 11 reveal up-regulation of curated endothelial cell types vs others (bottom panel).
Another cell profiling project of interest is GSE116470, which profiles nine different brain regions from adult mice, allowing detailed exploration of expression variability in the central nervous system.
Figure 12. Expression of Sox2 in astrocytes (light blue, top panels), hippocampus (left panels) and frontal cortex (right panels) of profiled mouse brains (GSE116470).
OmicSoft Lands include over 118,000 statistical comparisons based on our modeling of the curated data. Sometimes these contrasts are simple, with only one factor in the model. A more complex model is created in other cases, with two, three or more covariates included in the model!
The "ComparisonContrast" field reflects these complex models for the contrast with this syntax:
Controlled Factor 1: Controlled Factor 2: Contrasted Factor 3=> Controlled Level 1:Controlled Level 2→ Case vs Control for Factor 3
For example:
TreatmentStatus:Response:TreatmentHistory => nivolumab:no response -> ipilimumab vs none
Figure 13. Expression of DAZ2 in GSE91061, grouped by TreatmentStatus+Response+Treatment History. Samples included in Comparison contrast of TreatmentHistory:ipilimumab vs none in TreatmentStatus:nivolumab and Response:none colored blue.
The QIAGEN OmicSoft Team invites you to request new OncoLand, DiseaseLand and Single Cell Land expression projects to be curated for upcoming releases, which will be included as part of your subscription.
Let us know if there are important datasets that you would like to have curated and represented in the Lands. Public (GEO, SRA and Array Express) expression studies for human, mouse and rat will be evaluated; single-cell transcriptomic projects, projects for bulk RNA-seq and commercial expression arrays from Affymetrix, Illumina and Agilent are compatible platforms. Contact us at omicsoft.support@qiagen.com for more information.
We are also evaluating proteomics datasets for curation starting later this year.
The QIAGEN OmicSoft Team continues to reprocess our most popular Land databases to move from human genome B37 to human genome B38 and Gencode Gene Model version 33. We recommend that you use the latest version (“B38_GC33”) as soon as it becomes available, as this version contains the newest and most comprehensively curated data.
Lands available in B38_GC33 include GTEx, TCGA, Blueprint, CCLE, HumanDisease and TRACERx, in addition to the latest Single Cell Lands. Upcoming B38_GC33 Lands include OncoGEO and TARGET.
If your subscription includes access to QIAGEN OmicSoft Land “text dump” flat-file exports, you can request the latest data in the form of a series of indexed, tab-delimited tables. These files are perfect for larger exploratory meta-analysis and machine learning studies. If you are interested in accessing the latest data via flat file, ask your QIAGEN OmicSoft account administrator to request the links.
New projects were added to OncoGEO (including Cancer Moonshot consortium projects) and Hematology, and a comprehensive update of CCLE metadata was performed. In the next release, OncoGEO will be available on human genome B38/GenCode.33.
The latest release of CCLE Land includes hundreds of new samples, new data types and extensive new curated metadata fields sourced from DepMap and other sources.
Figure 1. CCLE Cell Lines, grouped on the Y-axis by DiseaseCategory and subgrouped by CancerType.
The 2021R4 release of CCLE_B38_GC33 increases the number of cell lines profiled from 1114 to 1805 cell lines, corresponding to the 2021R3 DepMap release. All data use Human.B38 and OmicSoft Gencode.V33 as the reference genome and gene model.
The QIAGEN OmicSoft Team of scientists spent over 1300 hours reviewing the latest available data from over 40 publications and applied our curation standards to standardize the newest information about cell-line origins and features. New metadata fields have clearly explained ToolTips to describe the purpose of the field. Download the full dictionary here: https://resources.omicsoft.com/downloads/land/CCLE/CCLE_B38_GC33_2021R4_Updates.xlsx
Key columns, such as DiseaseState, are updated to the latest OmicSoft controlled vocabularies and standards that have been used in other QIAGEN OmicSoft Lands. These standards are based on the curation of information from DepMap, CCLE publications, and original papers. Additional metadata columns can be harnessed to select general cancer types or specific histological types.
Tissue and DiseaseLocation[PrimarySite] were added to reflect the best current information about the tissue of origin for each cell line.
Finally, CancerType[Cellosaurus] captures the disease classification of these cell lines from Cellosaurus – Expasy, which reflects the observed discrepancies between various sources.
Two comment fields (Comment and Comment[PMID22460905]) provide additional free-text notes from QIAGEN OmicSoft curators that were added during the review process.
Important note: With the latest update, cell lines now use DepMapID instead of the deprecated CCLE ID as SampleID. DepMapID has been adopted by Cellosaurus and other resources for easier cross-reference.
This release adds 9666 new samples and 408 new comparisons from 54 projects (54 unique project IDs), focusing on prostate, lung, breast, gastrointestinal system, thyroid, pancreas, kidney, melanoma, lung and central nervous system cancers.
This release encompasses studies that explore gene expression profiles associated with drug-induced gene modulation, patient outcome, tumor/non-tumor paired samples, metastases and organoids, or that focus on alternative splicing. The release also includes studies that investigate gene signatures as a prognostic factor and insights into the tumor micro-environment.
Figure 2. Distribution of samples by DiseaseState in OncoGEO 2021R4.
Figure 3. Find Cancer Moonshot consortium projects using the Project-level “Keywords” filter.
This release adds 2291 new samples and 226 new comparisons from 19 projects (17 unique project IDs), focusing on leukemia and different subtypes of lymphoma.
Included in this release are studies that explore the gene expression profiling associated with the mechanisms of action of certain drugs in vitro, pre-post treatment paired samples, CAR-T cell therapy and CRISPR-Cas9 genome editing.
Figure 4. New samples in Hematology Land profiled by DiseaseState, sub-grouped by Project ID. A large number of projects focusing on DLBCL were added to this release.
Figure 5. New samples in HumanDisease, grouped by DiseaseState and colored by TissueCategory (Normal Control and Disease Control samples are hidden).
This release adds 2651 new samples and 1404 comparisons from 80 projects (79 unique project IDs).
This release includes studies on the following:
Figure 6. New samples in MouseDisease, grouped by DiseaseState and colored by Tissue (Normal Control and Disease Control samples are hidden).
This release adds 1750 new samples and 1678 comparisons from 45 projects and includes studies of the following:
QIAGEN OmicSoft Studio includes powerful functions for importing and exploring “measurement data”, quantitative data for samples in Lands.
For CCLE Land, extensive data (drug sensitivity, metabolomics, proteomics) from studies performed on cancer cell lines can be added and analyzed alongside CCLE 'omics data, including the following:
Adding new measurement data is pretty simple, but new data can only be added by an OmicSoft Server administrator.
If you use CCLE Land but don’t see measurement data, talk to your administrator and review this wiki page for information: http://www.arrayserver.com/wiki/index.php?title=Manage_Measurement_Data_in_ArrayLand.
We recently released a major update to TCGA Land. With this update, we released the TCGA metadata dictionary, a lookup guide that you may find useful. This is available in both pdf and xlsx formats. Please find the links and explanation about the utility in the whitepaper titled: Navigating TCGA metadata
In this content release, OncoLand and DiseaseLand added hundreds of new projects, and Human Disease data are now available on Human Genome B38!
If you don't see a Land of interest listed under "Select Land", please ask your OmicSoft Server administrator to check the Cloud Land Publishing function for available data.
The OmicSoft team is inviting requests for new OncoLand or DiseaseLand expression projects to curate for upcoming releases, which will be included as part of your subscription.
Let us know if there are important datasets that you would like to see curated and represented in the Lands. Public (GEO, SRA or Array Express) expression studies for human, mouse, and rat will be evaluated; single-cell transcriptomic projects, projects for bulk RNA-seq and commercial expression arrays from Affymetrix, Illumina and Agilent are compatible platforms. Please email omicsoft.support@qiagen.com for more information.
Figure 1. Distribution of disease samples in OncoGEO and Hematology update by Disease Category.
This release adds 2058 new samples and 399 new comparisons from 63 projects, focusing on breast, melanoma, lung, central nervous system and stomach cancer.
Some highlights in this release are studies that explore the following:
This release adds 1830 new samples from 44 unique project IDs with 222 comparisons, focusing on subtypes of leukemia, lymphoma and myeloma.
Some highlights in this release are studies that explore the following:
Figure 2. The comprehensive update of TCGA metadata included the review of over 1200 files, the definition of over 1000 fields, the unification and grouping of hundreds of columns, the update of fields representing TCGA publication results and the curation of hundreds of treatment labels.
We are in the final stages of a comprehensive update of TCGA Land (TCGA_B38_GC33).
Look for comprehensive metadata field definitions and tooltips, improved metadata field names, curated treatment information, additional marker paper and PanCanAtlas cluster information, and more!
Figure 3. Distribution of disease samples in the latest release of DiseaseLand by Disease Category.
With this release the Human Disease collection is now available on Human Genome B38/GenCode version 33. All new content requests will be added to HumanDisease_B38_GC33.
If you don't see HumanDisease_B38_GC33, be sure to ask your OmicSoft Server administrator to use "Publish Cloud Land" to select the new Land.
This release adds 7749 new samples and 2297 comparisons from 102 unique project IDs.
This release includes the following:
This release adds 1200 new samples and 357 comparisons from 59 projects, including studies on the following:
This release adds 383 new samples and 125 comparisons from 13 projects on nervous system disease, cardiovascular and metabolic diseases and aging.
With the latest Single Cell Lands content update, new datasets on ophthalmology, oncology, neurology, gastroenterology, endocrinology, dermatology and more are now available.
Figure 4. Human UMI datasets in the latest release, plotting the number of cell clusters with different cell types (colored) by tissue.
Looking ahead to the next release, expect 55 additional projects with 96 "cell map" dimension reduction datasets, profiling 3.2 million cells from 847 samples.
Our new "Single Cell Lite" protocol for integrating pre-quantified datasets with full manual curation enables us to bring in datasets without raw data. Key datasets to be integrated include Tabula Sapiens (UMI and nonUMI) profiling normal tissue expression in humans, and Allen Mouse Brain Atlas (GSE116470).
In OmicSoft curation, we annotate in vivo and in vitro treatment studies in different columns.
OmicSoft curates controlled vocabulary terms from PubChem, NCIT, DrugBank, ChemSpider and the company web site of the treatment source.
Using the SubjectTreatment and TreatmentStatus columns, you can group and subset in vivo treatment studies to reveal interesting patterns in the data, for example, by showing pre-treatment gene expression between patients with differential response to a treatment.
Figure 5. KCNB2 is up-regulated in pre- and post-treatment samples of pancreatic ductal adenocarcinoma from GSE131050. All samples are curated as SubjectTreatment=5-fluorouracil;irinotecan;leucovorin;oxaliplatin;PF-4136309. Pre-treatment samples are identified by TreatmentStatus=none, post-treatment samples are identified by TreatmentStatus=5-fluorouracil;irinotecan;leucovorin;oxaliplatin;PF-4136309. Treatment response is indicated by Response (partial response or stable disease).
In this content release, OncoLand and DiseaseLand added hundreds of new projects.
If you don't see a Land of interest listed under "Select Land", please ask your OmicSoft Server administrator to check the Cloud Land Publishing function for available data.
The OmicSoft team is inviting requests for new OncoLand or DiseaseLand expression projects to curate for upcoming releases, which will be included as part of your subscription.
Let us know if there are important datasets that you would like to see curated and represented in the Lands. Public (GEO, SRA or Array Express) expression studies for human, mouse and rat will be evaluated; single-cell transcriptomic projects, projects for bulk RNA-seq and commercial expression arrays from Affymetrix, Illumina and Agilent are compatible platforms. Please email omicsoft.support@qiagen.com for more information.
GTEx_B38_GC33 now has comparisons revealing the top up- and down-regulated genes for 52 tissues.
Figure 1. Search for a gene and see in which tissues it is enriched; use “Specify Profile Columns” to change to Case.tissueDetail_GTEx or similar, color by Case.TissueCategory.
Figure 2. Discover co-enriched genes with the Comparison Correlation View to find additional genes that are enriched or depleted in a similar pattern as your gene of interest.
Figure 3. Browse and filter the comparisons to find the most up- and down-regulated genes (the top genes are so significant that they are compressed to the top)
Figure 4. Distribution of new oncology-focused samples in OncoGEO 2021R2, grouped on the Y-axis by Tissue and colored by DiseaseState.
This release adds 5246 new samples and 598 new comparisons from 106 projects, focusing on melanomas, breast, liver and pancreatic cancers. Included in this release are studies that explore the gene expression profiling associated with the mechanisms of action of various drugs both in vivo and in vitro, pre- vs. post-treatment paired samples, tumor vs. non-tumor paired samples, potential biomarkers and gene signatures that could predict patient outcome, xenograft models and drug resistance.
Highlighted OncoGEO projects:
Figure 5. Distribution of new hematologic cancer-focused samples in Hematology 2021R2, grouped on the Y-axis by DiseaseState and colored by CellType.
This release adds 1231 new samples and 364 new comparisons from 30 projects, focusing on different subtypes of lymphoma and leukemia, the most highly represented subtypes are acute myeloid leukemia (LAML) followed by acute lymphoblastic leukemia (ALL) and chronic myeloid leukemia (CML). Included in this release are studies that explore the mechanisms of action of certain drugs both in vivo and in vitro and discovery of potential biomarkers and gene signatures that could predict drug resistance.
Figure 6. Distribution of new Disease-focused samples in HumanDisease 2021R2, grouped on the Y-axis by DiseaseState and colored by Tissue. Normal control and Disease Control samples were hidden.
This release adds 15,358 new samples and 2099 comparisons from 126 projects, including a collection of studies on immune mediated diseases (systemic lupus erythematosus, Sjogren’s syndrome, psoriasis and others), amyotrophic lateral sclerosis, obesity, as well as new studies on respiratory diseases, infectious diseases, muscular dystrophy and nervous system diseases.
Highlighted HumanDisease projects
Figure 7. Distribution of new Disease model samples in MouseDisease 2021R2, grouped on the Y-axis by DiseaseState and colored by Tissue.
This release adds 1589 new samples and 1413 comparisons from 28 projects on vaccines, degenerative diseases of the CNS, aging, metabolic and immune mediated diseases.
Highlighted MouseDisease projects
With the latest Single Cell Lands content update, new datasets on ophthalmology, oncology, neurology, gastroenterology, endocrinology, dermatology and more are now available.
Figure 8. Human UMI datasets in 2021R2, plotting the number of clusters with different cell types (colored) by tissue.
Figure 9. Human UMI datasets in 2021R2, plotting the number of clusters with different cell types (colored) by tissue.
Figure 10. Mouse UMI datasets in 2021R2, plotting the number of clusters with different cell types (colored) by tissue.
Figure 11. Mouse non-UMI datasets in 2021R2, plotting the number of clusters with different cell types (colored) by tissue.
With this content release, QIAGEN OncoLand and QIAGEN DiseaseLand provide hundreds of new projects. If you don't see a Land of interest listed under "Select Land", please ask your QIAGEN OmicSoft Server administrator to check the
In case you missed it
OmicSoft is in the process of re-analyzing all of our Human Lands on Human.B38 genome and the OmicsoftGenCode.V33 model. Several reprocessed Lands have been released, and the most up-to-date versions of the relevant Lands can be identified by the B38_GC33 suffix. Updated Lands include Blueprint_B38_GC33, CCLE_B38_GC33, GTEx_B38_GC33, TCGA_B38_GC33 and TRACERx_B38_GC33, as well as the controlled-access DLBCL_NCI_B38_GC33 Land.
QIAGEN OncoLand highlights
TCGA is now available based on alignment to GenCodeV33. With this Land, you can now build the latest VirtualLands, such as the popular CCLE.GTEx.TCGA VirtualLands. Later this year, we will update the extensive TCGA metadata to TCGA_B38_GC33, as well as comparisons between tumor samples that have key mutations in oncogenes and tumor suppressor genes vs samples that do not have these key mutations.
Figure 1. BMP2 expression in tissue samples from CCLE, GTEx and TCGA, using the latest Human.B38/Gencode.V33 releases. The Y-axis is profiled on Tissue Category, SourceLand and Tumor or Normal.
In this release, we added 6591 new samples and 618 comparisons from 99 projects, with a focus on GI, reproductive and male urogenital cancers. Included in these are samples from studies of gene expression characterization of metastatic lesions (in which some cases are paired with primary tumors), pre- and post-treatment paired samples, explorations of the prognostic value of particular gene signatures, the effects of established treatments, comparisons of alternate therapeutic strategies and drug resistance.
Figure 2. Distribution of new oncology-focused samples in OncoGEO 2021R1, grouped on the Y-axis by Tissue and color-coded according to DiseaseState.
With 1303 new samples and 250 comparisons from 54 projects, this release adds Hodgkin and non-Hodgkin lymphoma, leukemia and myeloma samples, with experiments that explore the mechanisms of action of specific drugs, the discovery of potential biomarkers and the gene signatures that could predict patient outcome.
Figure 3. Distribution of new hematologic cancer-focused samples in Hematology 2021R1, grouped on the Y-axis by DiseaseState and color coded by CellType.
QIAGEN DiseaseLand highlights
This release contains datasets exploring the following: obesity, diabetes, immune-mediated diseases, vaccines (transcriptional response induced by influenza, BCG and Hantavirus vaccines in human and mouse) and vaccine adjuvants, viral and bacterial diseases, cellular-stress response and compound profiling (including genotoxicity studies). We've also added several profiling studies of the eye (cornea and retina).
This release adds 3969 new samples and 949 comparisons from 108 projects, including a collection of studies on Zika virus and detailed profiling of eye expression, as well as new studies on cardiovascular, musculoskeletal and nervous system diseases.
Figure 4. Distribution of new disease-focused samples in HumanDisease 2021R1, grouped on the Y-axis by DiseaseState and color coded by Tissue. Normal Control and Disease Control samples were hidden.
This release adds 1559 new samples and 543 comparisons from 52 projects, with a focus on cellular stress in normal tissues and cells (GSE118660, GSE35681, GSE49598, GSE700, GSE84450, GSE90070, GSE54581, GSE29929, GSE11496, GSE11684, GSE122507).
Figure 5. Distribution of new disease-model samples in MouseDisease 2021R1, grouped on the Y-axis by DiseaseState and color coded by Tissue. Normal Control and Disease Control samples were hidden.
This release adds 2617 new samples and 2329 comparisons from 14 projects that focus on in vivo (SubjectTreatment metadata) and in vitro (Treatment metadata) compound-profiling and toxicity studies: GSE119122, GSE119129, GSE119133, GSE144219, GSE119933, GSE144219, GSE129814, GSE122184.
Figure 6. Subset of the distribution of new in vivo compound-profiling or toxicity samples, which are grouped on the Y-axis by SubjectTreatment.
Figure 7. Distribution of new in vitro compound-profiling or toxicity samples, grouped on the Y-axis by Treatment.
In addition to the unparalleled collection of normal tissue and blood expression data that can be found in GTEx and Blueprint, HumanDisease, MouseDisease, and RatDisease contain thousands of Normal Control samples from other tissue-profiling projects (i.e., projects not focused on comparing disease vs normal). These projects provide a complement for tissues (including fetal tissues) that are not covered by GTEx or Blueprint and that focus on precise definitions of samples.
To find these projects, use the Project filter tab Disease and select "Normal Control", which will include only studies that are focused on normal tissues.
If you don't select for "Normal Control" projects, Normal Control samples will be returned from thousands of additional studies that included both disease and normal tissues.
Subsequently, you can filter out any remaining disease samples with the Sample filter tab "Disease", selecting "Normal Control" (you can include "Disease Control" as well). Learn more about the difference between Normal Control and Disease Control.
At this point, you will probably want to group by Tissue to see the available tissues, and use the Sample level filter "TissueCategory" to hide hematopoietic and lymphoid system samples.
Finally, select the sample-level Treatment filter to "No Info" and "None", to eliminate samples that were treated.
To quickly apply these filters the next time you want to explore normal tissues, be sure to save this combination of filters by clicking “Manage Filters”.
Figure 8. Save your filters to quickly apply them in future sessions.
After applying these filters (or your saved filter set), you can perform searches to explore patterns of expression across diverse normal tissue samples.
Figure 9. Microarray expression of SerpinB6 across samples from Normal Control projects.
In this content release, OncoLand and DiseaseLand added hundreds of new projects. If you don't see a Land of interest listed under "Select Land", please ask your OmicSoft Server administrator to check the Cloud Land Publishing function for available data.
The latest versions of OmicSoft GTEx, Blueprint and CCLE Lands were released and mapped to Human.B38 and OmicsoftGenCode.V33. Be sure to use these Lands to get the most up-to-date data!
In addition, TRACERx_B38_GC33 (multi-omics non-small cell lung cancer) and DLBCL_NCI_B38_GC33 (diffuse large cell B cell lymphoma, controlled-access application required) are available on the latest gene model.
In this release, there are 4110 new samples and 1245 new comparisons from 102 projects added to OncoGEO, focusing on renal clear cell carcinoma, hepatocellular carcinoma, glioblastoma, colon and colorectal cancers, cervix carcinoma and breast carcinoma.
Figure 1. Distribution of new oncology-focused samples in OncoGEO 2020R4, grouped on the Y-axis by Tissue and colored by DiseaseState.
We added 1822 samples and 557 comparisons from 55 projects in this release, with new studies on multiple myeloma, acute myeloid leukemia, diffuse large B-cell lymphoma, chronic lymphocytic leukemia and more.
Figure 2. Distribution of new hematologic cancer-focused samples in Hematology 2020R4, grouped on the Y-axis by DiseaseState and colored by CellType.
In this release, we added 361 samples and 121 comparisons from 19 projects to OncoMouse_B38, with new studies relevant to chronic lymphocytic leukemia, myelodysplastic syndrome, multiple myeloma, mantle cell lymphoma and more.
Figure 3. Distribution of new oncology model samples in OncoMouse 2020R4, grouped on the Y-axis by DiseaseState and colored by Tissue.
In this release, we added 7579 samples and 3234 comparisons from 137 projects to HumanDisease_B37. Among the many diseases covered in the new projects, a particular focus was on viral infection, including further studies on COVID-19, MERS and HIV, as well as studies on schizophrenia, autism spectrum, bipolar, celiac disease, diabetes and more.
Figure 4. Distribution of new Disease-focused samples in HumanDisease 2020R4, grouped on the Y-axis by DiseaseState and colored by Tissue. Normal control and Disease Control samples were hidden.
With 4364 samples and 994 comparisons from 97 projects, MouseDisease_B38 has new content on allergy, Alzheimer's disease, autism spectrum, chronic kidney disease, graft-vs-host disease, Huntington's disease, toxoplasmosis and diabetes.
Figure 5. Distribution of new Disease model samples in MouseDisease 2020R4, grouped on the Y-axis by DiseaseState and colored by Tissue. Normal control and Disease Control samples were hidden.
In this content release, we added hundreds of new projects to OncoLand and DiseaseLand, and a new Land focused on non-small cell lung cancer (NSCLC). If you are not able to access a Land of interest to you, please ask your OmicSoft Server administrator to check the Cloud Land Publishing function for available data.
The TRACERx (TRAcking Cancer Evolution through therapy (Rx)) study focuses on the progression of NSCLC.
In this new TRACERx_B38_GC33 Land, 447 samples from 100 patients, with somatic mutation, copy number, clinical covariates and survival data, are available for analysis.
Figure 1. Sample distribution of lung samples in TRACERx_B38_GC33. Using the filters for Sample Origin (excluding peripheral blood and lymph node samples) and Sampling Time (excluding post-treatment samples) and grouping on Histology, the number of samples from different subtypes of NSCLC are displayed. Multiple tumor regions (up to 8) were sampled per tumor.
Figure 2. Differential mutation frequency in pre-treatment invasive vs. squamous NSCLC samples in TRACERx Land. After filtering for pre-treatment lung samples with histology indicating either invasive or squamous adenocarcinoma NSCLC, a Sample Set was generated to compare the two histologies for mutation frequencies with the OmicSoft Lands "Sample Grouping to Mutation" function. Among the top mutations found enriched in one group vs. the other, TP53, PIK3CA, CDKN2A and many other genes were more frequently mutated in squamous (green) samples, whereas KRAS and AMER3 were more frequently mutated in invasive (blue) samples.
Lands continue to be updated to the new OmicsoftGenCode.V33 gene model on Human.B38 genome, with Blueprint (normal blood cell type expression) and CCLE (cancer cell line expression) updated this release. Look for the "B38_GC33" suffixes to find these latest data; your QIAGEN OmicSoft Administrator will need to add these to your OmicSoft Server with Publish Cloud Lands.
CCLE_B38_GC33 also includes a significant update to available data, with new RNA-seq, mutation, copy number and protein data, along with the DepMap CRISPR/RNAi gene dependency data.
In this release, 3314 samples from 65 projects were added to OncoGEO, with a focus on immune checkpoint therapies targeting PD-1 pathway and CTLA4, CNS cancers, female reproductive cancers, liver, prostate, and colorectal cancers, prostate cancer, lung cancers and skin cancers.
Figure 3. Sample distribution of OncoGEO 2020R3 additions, filtering out disease control and normal control samples.
In this release, we added 6781 samples from 109 projects to Hematology_B37. New studies for a variety of leukemias and lymphomas were added.
MMRC update
MMRC-related projects (ProjectIDs GSE26760, GSE26849, and MMRC) were updated with new metadata to enhance the interpretation of these datasets. For ProjectID MMRC, the columns Translocation[IGH], Cytogenetics, Gender, AgeAtDiagnosis[years], SampleMaterial, CellType, CellMarkers, and CellPurity were added. For ProjectIDs GSE26760 and GSE26849, DiseaseHistory, PatientStatus, SampleMaterial, and CellPurity were added; and HeavyChainClass and LightChainClass columns were merged in ImmunoglobulinClass.
Figure 4. Sample distribution of Hematology 2020R3 additions, filtering out normal control samples.
OncoMouse - disease areas
In this release, we added 511 samples from 24 projects to OncoMouse_B38, with new studies relevant to anti-PD1/PD-L1 and anti-CTLA4 immunotherapy agents, Female Reproductive Cancers of Breast and Ovary, lung cancers, and kidney and bladder cancers.
Figure 5. Sample distribution of OncoMouse 2020R3 additions, filtering out disease control and normal control samples.
Highlights:
In this release, we added 7021 samples from 133 projects to HumanDisease_B37. Among the many diseases covered in the new projects, a particular focus was on aging-related gene expression changes in aging of the brain, eye, immune system, liver, muscle, skin and more (use the project filter Keywords to find "aging" studies).
In addition, new studies relevant to coronavirus research were added (COVID-19, SARS, MERS, ARDS and other complications), as well as Alzheimer's Disease, Huntington's Disease, Parkinson's Disease, arthritis, asthma, chronic obstructive pulmonary disease (COPD) and skin disorders.
Figure 6. Sample distribution of HumanDisease 2020R3 additions, filtering out disease control and normal control samples.
MouseDisease
With 2362 samples from 83 projects, MouseDisease_B38 has new content on aging, Alzheimer's Disease and Parkinson's Disease models, immune-related diseases such as graft-vs-host disease and lupus, as well as skin diseases.
Figure 7. Sample distribution of MouseDisease 2020R3 additions, filtering out disease control and normal control samples.
RatDisease
In RatDisease_B6, we added 637 samples from 21 projects, with studies focused on aging, cirrhosis, Alzheimer's disease and Parkinson's disease.
Figure 8. Sample distribution of RatDisease 2020R3 additions.
Our latest Land content updates, released July 1, bring you new datasets, ready to be explored to discover patterns of gene and transcript expression across normal tissue and disease expression. Check out the new projects added to HumanDisease, MouseDisease and OncoGEO, and the thousands of new normal tissue samples in GTEx_B38.
GTEx_B38 V8 - First Land on GenCode.V33
With 2020R2, we released our first Land on Human_B38/OmicSoftGenCode.V33, with over 16,000 RNA-seq samples profiling normal tissue expression.
To maintain compatibility with older Virtual Lands that include GTEx_B38, we released this update as GTEx_B38_GC33 (B38 refers to Human Genome version B38; GC33 refers to GenCode Version 33).
Figure 1. Gene FPKM of ACE2 across 16,963 samples from GTEx_B38_GC33.
This Land has been added automatically to hosted servers; if you have an onsite Land installation, please use Cloud Land Publishing to add it to your collection.
Figure 2. GTEx_B38_GC33 and other Lands, ready to be installed to the Land collection.
We will continue to release updated versions on this new genome and gene model, starting with the most popular Lands. We will continue to use the OmicSoft Aligner (OSA) and RSEM quantification; a benchmark white paper is in progress.
DiseaseLand
DiseaseLand content highlights:
Coronavirus-related research: In this release, we added 1119 samples and 920 comparisons from 23 projects to HumanDisease, and 357 samples and 203 comparisons from 11 projects MouseDisease. These provide insights into coronavirus infection, associated lung damage, treatment and immune response.
New data: With the latest release, we've added the following data:
Figure 3. Sample distribution of new data added to HumanDisease_B37 in 2020R2.
HumanDisease Projects:
MouseDisease Projects:
OncoLand
With the latest update to OncoGEO, we added 4622 new samples and 832 comparisons from 112 projects.
Areas of focus:
Fig 4. Sample distribution of new data added to OncoGEO_B37 in 2020R2.
Note to OmicSoft Server Administrators
If you haven't restarted your Land server recently, consider doing this during a period of low usage. We've released several new improvements, and this also ensures that the latest files have been synchronized.
The OncoLand and DiseaseLand 2020R1 release is out! Servers should automatically update during low-traffic periods overnight.
To enable the fastest release of data, this release was released in two batches: GTEx_B37, OncoGEO and HumanDisease were released on April 24, 2020; OncoMouse, MouseDisease and RatDisease were released on May 11, 2020.
OncoLand has several new Lands available, be sure to check them out! If you do not see this in your OncoLand collection, please contact your OmicSoft Server administrator to add the Lands to your server.
GTEx_B37 has 8,711 new RNA-seq samples, with16,964 total RNA-seq samples. GTEx_B38 is scheduled to be updated to GTEx V8 with 2020R2.
Figure 1: Sample distribution of GTEx samples across tissues, colored by whether they were added in the latest release.
The Tissue metadata column now uses OmicSoft's controlled vocabularies, making it simpler to build virtual Lands. GTEx metadata terms can be found in Tissue_GTEx and TissueDetail_GTEx.
New projects in OncoLand 2020R1
Figure 2: New projects in OncoGEO and OncoMouse.
New projects in DiseaseLand 2020R1
Figure 3: New projects in Human, Mouse, and Rat Disease.
Figure 4: Comparisons from E-MTAB-6814, a developmental map of the human transcriptome across 7 tissues. Similar datasets are in MouseDisease (E-MTAB-6798) and RatDisease (E-MTAB-6811). In the Comparisons Distribution View, the ProjectName filter was used to find E-MTAB-6814. Comparison groups were specified by "Specify Histogram Columns: Case.ExperimentGroup", and subgrouped with "Specify Group Column: Case.ExperimentGroup".
This release includes several minor improvements. Please review these latest improvements and update if any would be useful for your research.
For example, if your data were run across multiple lanes, and the output files for Read1 are saved as "Batch2_1_S1_L001_R1_001.fastq.gz" in multiple directories (each directory holding data from one lane), you can ensure proper file pairing by specifying the order with "Add List" or during sample registration, and by selecting "pair files in order" when specifying alignment options.
In this example, "Pair Files In Order" will take all the files for Sample201 in the listed order, and properly pair those in folders "aRename" and "bRename".
Server improvements:
Bug fixes:
A new version of QIAGEN OmicSoft Suite has been released. Please review the latest improvements and update your OmicSoft Server at the next available opportunity to take advantage of these new features included in version 10.2.7!
With version 10.2.1 we are proud to support for Docker images in "External Scripts". This is considered an advanced feature for OmicSoft "power-users" who want to extend their OmicSoft Suite capabilities beyond tools integrated into the software. Because of the wide variety of tools that can run in Docker images, OmicSoft Support cannot provide debugging support for each tool, but will be happy to answer questions about External Scripts syntax, provide tutorials and example scripts. The QIAGEN Discovery Services team can also work with you to build full pipelines and workflows using External Scripts and Docker images for a variety of bioinformatics needs.
To support External Scripts on AWS, you will need to use an updated AMI. Please visit http://www.arrayserver.com/wiki/index.php?title=Build_AWS_Ubuntu_AMI_for_OmicSoft_Cloud_Computing
To support Docker in External Scripts on your onsite OmicSoft Server installation, please install Docker v19.
More useful resources:
A full log of all the changes is located in the Help menu of the Analysis tab in OmicSoft Studio. To download this log, click here:
Land visualizations: CRISPR/RNAi dependency screen data with multi-'omics integration views. Directly explore correlations of expression, mutation and gene dependency data in the updated CCLE Land, updated with DepMap data. Use the “Add Measurement Data” function to bring in additional data such as drug sensitivity and metabolomic data.
Cloud Analysis: Map S3 buckets from multiple AWS accounts, and on master/analytic server setup. More flexible cloud configurations allow you to map buckets from collaborators and other shared buckets with your access/secret keys.
In case you missed it: Find out what was included in the 10.1 release (October 2019)
Cloud analysis: Spot Instance support. AWS spot instances use idle EC2 resources, which can be requested at significant cost savings over on-demand instances.
Single-cell analysis: Improved importing of Single Cell Expression Matrices. Merge memory-efficient Zero-Inflated Matrix (ZIM) data from multiple samples to compare single-cell data from multiple experiments.
.NET 4.5 Framework: Update from .NET 3.5 framework.
IPA integration: Multi-identifier uploads. Now you can specify up to five molecule identifier columns in your inference table when uploading from OmicSoft to IPA. This feature is especially useful for metabolomic studies.
Human disease updates: 48 new projects, with a focus on amyotrophic lateral sclerosis, Alzheimer’s disease, Huntington’s disease and HIV.
Mouse disease updates: 50 new projects, with a focus on models of amyotrophic lateral sclerosis, Alzheimer’s disease and Huntington’s disease.
In case you missed it: We added 67 projects and 1285 samples, with a focus on ophthalmology.
New Land: OncoMouse. Oncology-focused studies in mouse models, with 48 projects in the initial release.
OncoGEO updates: 68 new projects, with a focus on cancers of the reproductive system, GI system, respiratory system, urinary system, skin and CNS.
CCLE update: CRISPR/RNAi screen data have been integrated into CCLE Lands, enabling new multi-'omics comparisons.
In case you missed it: We added a new Land. BeatAML includes RNA-seq, DNA-seq and ex-vivo drug responses for over 500 patients.
OncoGEO/hematology added 58 new projects and 3653 samples, with an emphasis on hematologic cancers.
Land Explorer now supports web-based access to Land data with over 100 visualizations for expression, fusions, protein levels and mutations. Custom visualizations can easily be shared with colleagues using customized web links. Access and explore all the Land data that are important to you
The Sample Explorer and Comparison Explorer pages summarize data across every Land in interactive plots. Use filters to identify the samples of interest to you and discover every Land with data relevant to your research.