In this content release, we added hundreds of new projects to OncoLand and DiseaseLand, and a new Land focused on non-small cell lung cancer (NSCLC). If you are not able to access a Land of interest to you, please ask your OmicSoft Server administrator to check the Cloud Land Publishing function for available data.
The TRACERx (TRAcking Cancer Evolution through therapy (Rx)) study focuses on the progression of NSCLC.
In this new TRACERx_B38_GC33 Land, 447 samples from 100 patients, with somatic mutation, copy number, clinical covariates and survival data, are available for analysis.
Figure 1. Sample distribution of lung samples in TRACERx_B38_GC33. Using the filters for Sample Origin (excluding peripheral blood and lymph node samples) and Sampling Time (excluding post-treatment samples) and grouping on Histology, the number of samples from different subtypes of NSCLC are displayed. Multiple tumor regions (up to 8) were sampled per tumor.
Figure 2. Differential mutation frequency in pre-treatment invasive vs. squamous NSCLC samples in TRACERx Land. After filtering for pre-treatment lung samples with histology indicating either invasive or squamous adenocarcinoma NSCLC, a Sample Set was generated to compare the two histologies for mutation frequencies with the OmicSoft Lands "Sample Grouping to Mutation" function. Among the top mutations found enriched in one group vs. the other, TP53, PIK3CA, CDKN2A and many other genes were more frequently mutated in squamous (green) samples, whereas KRAS and AMER3 were more frequently mutated in invasive (blue) samples.
Lands continue to be updated to the new OmicsoftGenCode.V33 gene model on Human.B38 genome, with Blueprint (normal blood cell type expression) and CCLE (cancer cell line expression) updated this release. Look for the "B38_GC33" suffixes to find these latest data; your QIAGEN OmicSoft Administrator will need to add these to your OmicSoft Server with Publish Cloud Lands.
CCLE_B38_GC33 also includes a significant update to available data, with new RNA-seq, mutation, copy number and protein data, along with the DepMap CRISPR/RNAi gene dependency data.
In this release, 3314 samples from 65 projects were added to OncoGEO, with a focus on immune checkpoint therapies targeting PD-1 pathway and CTLA4, CNS cancers, female reproductive cancers, liver, prostate, and colorectal cancers, prostate cancer, lung cancers and skin cancers.
Figure 3. Sample distribution of OncoGEO 2020R3 additions, filtering out disease control and normal control samples.
In this release, we added 6781 samples from 109 projects to Hematology_B37. New studies for a variety of leukemias and lymphomas were added.
MMRC-related projects (ProjectIDs GSE26760, GSE26849, and MMRC) were updated with new metadata to enhance the interpretation of these datasets. For ProjectID MMRC, the columns Translocation[IGH], Cytogenetics, Gender, AgeAtDiagnosis[years], SampleMaterial, CellType, CellMarkers, and CellPurity were added. For ProjectIDs GSE26760 and GSE26849, DiseaseHistory, PatientStatus, SampleMaterial, and CellPurity were added; and HeavyChainClass and LightChainClass columns were merged in ImmunoglobulinClass.
Figure 4. Sample distribution of Hematology 2020R3 additions, filtering out normal control samples.
OncoMouse - disease areas
In this release, we added 511 samples from 24 projects to OncoMouse_B38, with new studies relevant to anti-PD1/PD-L1 and anti-CTLA4 immunotherapy agents, Female Reproductive Cancers of Breast and Ovary, lung cancers, and kidney and bladder cancers.
Figure 5. Sample distribution of OncoMouse 2020R3 additions, filtering out disease control and normal control samples.
In this release, we added 7021 samples from 133 projects to HumanDisease_B37. Among the many diseases covered in the new projects, a particular focus was on aging-related gene expression changes in aging of the brain, eye, immune system, liver, muscle, skin and more (use the project filter Keywords to find "aging" studies).
In addition, new studies relevant to coronavirus research were added (COVID-19, SARS, MERS, ARDS and other complications), as well as Alzheimer's Disease, Huntington's Disease, Parkinson's Disease, arthritis, asthma, chronic obstructive pulmonary disease (COPD) and skin disorders.
Figure 6. Sample distribution of HumanDisease 2020R3 additions, filtering out disease control and normal control samples.
With 2362 samples from 83 projects, MouseDisease_B38 has new content on aging, Alzheimer's Disease and Parkinson's Disease models, immune-related diseases such as graft-vs-host disease and lupus, as well as skin diseases.
Figure 7. Sample distribution of MouseDisease 2020R3 additions, filtering out disease control and normal control samples.
In RatDisease_B6, we added 637 samples from 21 projects, with studies focused on aging, cirrhosis, Alzheimer's disease and Parkinson's disease.
Figure 8. Sample distribution of RatDisease 2020R3 additions.
Our latest Land content updates, released July 1, bring you new datasets, ready to be explored to discover patterns of gene and transcript expression across normal tissue and disease expression. Check out the new projects added to HumanDisease, MouseDisease and OncoGEO, and the thousands of new normal tissue samples in GTEx_B38.
GTEx_B38 V8 - First Land on GenCode.V33
With 2020R2, we released our first Land on Human_B38/OmicSoftGenCode.V33, with over 16,000 RNA-seq samples profiling normal tissue expression.
To maintain compatibility with older Virtual Lands that include GTEx_B38, we released this update as GTEx_B38_GC33 (B38 refers to Human Genome version B38; GC33 refers to GenCode Version 33).
Figure 1. Gene FPKM of ACE2 across 16,963 samples from GTEx_B38_GC33.
This Land has been added automatically to hosted servers; if you have an onsite Land installation, please use Cloud Land Publishing to add it to your collection.
Figure 2. GTEx_B38_GC33 and other Lands, ready to be installed to the Land collection.
We will continue to release updated versions on this new genome and gene model, starting with the most popular Lands. We will continue to use the OmicSoft Aligner (OSA) and RSEM quantification; a benchmark white paper is in progress.
DiseaseLand content highlights:
Coronavirus-related research: In this release, we added 1119 samples and 920 comparisons from 23 projects to HumanDisease, and 357 samples and 203 comparisons from 11 projects MouseDisease. These provide insights into coronavirus infection, associated lung damage, treatment and immune response.
New data: With the latest release, we've added the following data:
Figure 3. Sample distribution of new data added to HumanDisease_B37 in 2020R2.
With the latest update to OncoGEO, we added 4622 new samples and 832 comparisons from 112 projects.
Areas of focus:
Fig 4. Sample distribution of new data added to OncoGEO_B37 in 2020R2.
Note to OmicSoft Server Administrators
If you haven't restarted your Land server recently, consider doing this during a period of low usage. We've released several new improvements, and this also ensures that the latest files have been synchronized.
The OncoLand and DiseaseLand 2020R1 release is out! Servers should automatically update during low-traffic periods overnight.
To enable the fastest release of data, this release was released in two batches: GTEx_B37, OncoGEO and HumanDisease were released on April 24, 2020; OncoMouse, MouseDisease and RatDisease were released on May 11, 2020.
OncoLand has several new Lands available, be sure to check them out! If you do not see this in your OncoLand collection, please contact your OmicSoft Server administrator to add the Lands to your server.
GTEx_B37 has 8,711 new RNA-seq samples, with16,964 total RNA-seq samples. GTEx_B38 is scheduled to be updated to GTEx V8 with 2020R2.
Figure 1: Sample distribution of GTEx samples across tissues, colored by whether they were added in the latest release.
The Tissue metadata column now uses OmicSoft's controlled vocabularies, making it simpler to build virtual Lands. GTEx metadata terms can be found in Tissue_GTEx and TissueDetail_GTEx.
New projects in OncoLand 2020R1
Figure 2: New projects in OncoGEO and OncoMouse.
New projects in DiseaseLand 2020R1
Figure 3: New projects in Human, Mouse, and Rat Disease.
Figure 4: Comparisons from E-MTAB-6814, a developmental map of the human transcriptome across 7 tissues. Similar datasets are in MouseDisease (E-MTAB-6798) and RatDisease (E-MTAB-6811). In the Comparisons Distribution View, the ProjectName filter was used to find E-MTAB-6814. Comparison groups were specified by "Specify Histogram Columns: Case.ExperimentGroup", and subgrouped with "Specify Group Column: Case.ExperimentGroup".
This release includes several minor improvements. Please review these latest improvements and update if any would be useful for your research.
For example, if your data were run across multiple lanes, and the output files for Read1 are saved as "Batch2_1_S1_L001_R1_001.fastq.gz" in multiple directories (each directory holding data from one lane), you can ensure proper file pairing by specifying the order with "Add List" or during sample registration, and by selecting "pair files in order" when specifying alignment options.
In this example, "Pair Files In Order" will take all the files for Sample201 in the listed order, and properly pair those in folders "aRename" and "bRename".
A new version of QIAGEN OmicSoft Suite has been released. Please review the latest improvements and update your OmicSoft Server at the next available opportunity to take advantage of these new features included in version 10.2.7!
With version 10.2.1 we are proud to support for Docker images in "External Scripts". This is considered an advanced feature for OmicSoft "power-users" who want to extend their OmicSoft Suite capabilities beyond tools integrated into the software. Because of the wide variety of tools that can run in Docker images, OmicSoft Support cannot provide debugging support for each tool, but will be happy to answer questions about External Scripts syntax, provide tutorials and example scripts. The QIAGEN Discovery Services team can also work with you to build full pipelines and workflows using External Scripts and Docker images for a variety of bioinformatics needs.
To support External Scripts on AWS, you will need to use an updated AMI. Please visit http://www.arrayserver.com/wiki/index.php?title=Build_AWS_Ubuntu_AMI_for_OmicSoft_Cloud_Computing
To support Docker in External Scripts on your onsite OmicSoft Server installation, please install Docker v19.
More useful resources:
A full log of all the changes is located in the Help menu of the Analysis tab in OmicSoft Studio. To download this log, click here:
Land visualizations: CRISPR/RNAi dependency screen data with multi-'omics integration views. Directly explore correlations of expression, mutation and gene dependency data in the updated CCLE Land, updated with DepMap data. Use the “Add Measurement Data” function to bring in additional data such as drug sensitivity and metabolomic data.
Cloud Analysis: Map S3 buckets from multiple AWS accounts, and on master/analytic server setup. More flexible cloud configurations allow you to map buckets from collaborators and other shared buckets with your access/secret keys.
In case you missed it: Find out what was included in the 10.1 release (October 2019)
Cloud analysis: Spot Instance support. AWS spot instances use idle EC2 resources, which can be requested at significant cost savings over on-demand instances.
Single-cell analysis: Improved importing of Single Cell Expression Matrices. Merge memory-efficient Zero-Inflated Matrix (ZIM) data from multiple samples to compare single-cell data from multiple experiments.
.NET 4.5 Framework: Update from .NET 3.5 framework.
IPA integration: Multi-identifier uploads. Now you can specify up to five molecule identifier columns in your inference table when uploading from OmicSoft to IPA. This feature is especially useful for metabolomic studies.
Human disease updates: 48 new projects, with a focus on amyotrophic lateral sclerosis, Alzheimer’s disease, Huntington’s disease and HIV.
Mouse disease updates: 50 new projects, with a focus on models of amyotrophic lateral sclerosis, Alzheimer’s disease and Huntington’s disease.
In case you missed it: We added 67 projects and 1285 samples, with a focus on ophthalmology.
New Land: OncoMouse. Oncology-focused studies in mouse models, with 48 projects in the initial release.
OncoGEO updates: 68 new projects, with a focus on cancers of the reproductive system, GI system, respiratory system, urinary system, skin and CNS.
CCLE update: CRISPR/RNAi screen data have been integrated into CCLE Lands, enabling new multi-'omics comparisons.
In case you missed it: We added a new Land. BeatAML includes RNA-seq, DNA-seq and ex-vivo drug responses for over 500 patients.
OncoGEO/hematology added 58 new projects and 3653 samples, with an emphasis on hematologic cancers.
Land Explorer now supports web-based access to Land data with over 100 visualizations for expression, fusions, protein levels and mutations. Custom visualizations can easily be shared with colleagues using customized web links. Access and explore all the Land data that are important to you
The Sample Explorer and Comparison Explorer pages summarize data across every Land in interactive plots. Use filters to identify the samples of interest to you and discover every Land with data relevant to your research.