New Pattern Search instantly discovers relevant QIAGEN OmicSoft analyses
The new Pattern Search capability in IPA builds on Analysis Match by enabling you to instantly scan >96,000 OmicSoft analyses starting with any collection of genes in a network or pathway as your “query”. With Pattern Search, you can answer questions, such as:
Start with any network or pathway from an analysis, or simply place genes of interest into a new My Pathway, then either overlay a dataset or manually color the genes with the MAP paint bucket to “activate” or “inhibit” them. Click the “Pattern Search” button to discover which OmicSoft analyses have similar or different gene expression patterns. You can subsequently create a gene heatmap to examine how the analyses match your query gene by gene.
For example, a set of gene signatures representing the consensus transcriptional effects of certain classes of drugs or chemicals in cell lines has been published (PMID 30552330). Figure 1 shows a My Pathway in which the HDAC inhibitor signature from the paper has been added. Pattern Search compares the added set of up- and down-regulated genes to the significantly differentially expressed genes in the >96,000 OmicSoft datasets.
Figure 1: A simple gene pattern (as a query) in a My Pathway. This signature represents the consensus transcriptional effects of HDAC inhibitors in cell lines, which the authors derived using LINCS project data (PMID 30552330). The genes indicated in red are expected to be up-regulated and in green down-regulated by HDAC inhibitors. Clicking the highlighted “Pattern Search” button searches the >96,000 OmicSoft analyses for genes with a matching or anti-matching pattern of up- or down-regulation, as shown in Figure 2.
The query pattern closely matches a number of LINCS datasets from cancer cell lines that have been treated with various HDAC inhibitors, such as belinostat and mocetinostat (see Figure 2). This pattern is expected for a positive control, as the paper used data from the LINCS project as input to originally create the signatures.
This result indicates that the z-score algorithm, used for matching, effectively finds the appropriately matching datasets. Note that this z-score algorithm simply evaluates whether the “analysis-ready” genes in each OmicSoft analysis directionally match the query signature pattern. In other words, genes that are either activated/upregulated (red) or inhibited/downregulated (green) in both the query and the analysis contribute positively to the z-score.
Figure 2: The HDAC signature strongly matches the expression after LINCS HDAC-inhibitor treatment for various cell lines. As expected, the Pattern Search algorithm recovered the matches to the varieties of LINCS datasets that would be expected from this query signature. These LINCS matches can be filtered to explore unexpected matches as well (see Figure 3).
Filtering out the LINCS datasets to focus on other dataset sources, matches were detected to other experiments involving treatment with HDAC inhibitors, such as SAHA treatment of cell lines for neuroendocrine tumor, colon cancer and bladder cancer (see Figure 3). There were also matches to datasets unrelated to HDAC inhibitors. For example, one match is for a dataset involving ixazomib treatment of non-resistant control cells. Ixazomib, a proteasome 20S subunit beta 5 inhibitor, has been shown to act synergistically with HDAC inhibitors in Hodgkin and T-cell lymphomas (PMID 31452195). Thus, the Pattern Search results uncovered datasets that are unexpected but biologically relevant for follow up.
Figure 3: The HDAC signature strongly matches non-LINCS datasets as well. Filtering the data sources used in Figure 2 to exclude selected dataset sources identified unexpected results that could be interesting to follow up. In this example, the HDAC-treated gene expression pattern matched with analyses of other treatments, such as ixazomib.
From the Activity Plot (such as those in Figures 2 or 3), you can easily create a gene heatmap to examine exactly which genes were significantly differentially expressed in the matching or anti-matching analyses. Figure 4 shows a gene heatmap with the query analysis in the leftmost column and several matching HDAC inhibitor-treatment analyses in the other columns.
Figure 4: The gene heatmap shows the details of how the query signature matches selected analyses. In this example, the HDAC signature is shown in the leftmost column as solid red or green squares indicating genes that are expected to be up-regulated down-regulated, respectively, in the matching analyses. Color intensity indicates the actual log-fold changes with a few matching analyses from the OmicSoft collection. Analyses were selected according the use of various cell lines treated with SAHA. The selected analysis, highlighted in blue, displays its associated metadata in the table on the right. Note that none of the matching analyses are a perfect match for the query, but there are clear similarities among the differentially expressed genes.
It is important to note that Pattern Search works with up- and-down-regulated genes in the query as well as with the inferred activities of nodes produced by Molecule Activity Predictor (MAP). For example, you can use a Regulator Effects network as a query, which displays up- and down-regulated target genes from your dataset (indicted in red and green, respectively), and also indicates activation and inhibition of the inferred upstream regulators with orange and blue, respectively. In this case, activated upstream regulators are treated as up-regulated (red) and inhibited upstream regulators are treated as down-regulated (green) in the query pattern. Figure 5 illustrates this query and the results.
Figure 5: Pattern Search uses both known and inferred activity in the query. In this example, the query uses a Regulator Effects network as a query. For the purpose of pattern matching, the upstream regulators SNAI1 and F2R are treated as upregulated and CCN5 as downregulated. Functions, such as the epithelial-mesenchymal transition, are not used in the query as they will never match the OmicSoft datasets, which consist solely of molecules such, as genes or proteins.
Pattern Search is the newest of several different approaches to augment your biological interpretation in IPA by using the OmicSoft expression analyses. Figure 6 highlights the rationale behind each of these approaches.
Figure 6: The three different features of the “Analysis Match” capability in IPA.
Note that Pattern Search and the other features shown in Figure 6 require an Analysis Match license for your IPA account. If this license is not currently active for you, contact customer support or your local QIAGEN Account Manager for more information about gaining access to this feature.
Comparison Analysis from Activity Plot results
You can now create a Comparison Analysis directly from any Activity Plot, including plots in Pattern Search, as shown in Figure 7.
Figure 7: Create a Comparison Analysis directly from selected rows in the Activity Plot table. In this example, the Activity Plot (left) was used to search for matching OmicSoft Lands datasets in which the upstream regulator NFE2L2 was significantly involved. The “comparisoncategory” column was filtered to include only datasets that were collected from treated samples. After selecting the top five results in the Activity Plot table and clicking the “View Comparison” button, a heat map (right) was calculated using the Hierarchical Clustering options for both the Canonical Pathways and the analyses.
Newly revised Land Explorer links in Gene Views improve navigation
The Land Explorer links section in IPA Gene Views now provides direct links to additional Lands and views in Land Explorer and the available data sources are labeled more clearly. For example, there are now links to survival plots and to sources such as ICGC (International Cancer Genome Consortium) that had not been directly linked before (Figure 8).
Figure 8: Revised Land Explorer links section in Gene Views. The links are categorized by source, type, consortia, etc. Now you can navigate directly to additional views and Lands.
Figure 9: A Kaplan–Meier survival curve is now just one click away from a Gene View in IPA. This survival curve was generated from the FOXM1 Gene View by clicking the “TARGET” link in the Survival by Expression row in the Oncology Consortia column shown in Figure 8. This view shows the survival rate over time and is categorized based on the expression of this gene.
Improve publications by customizing font size in networks and pathways
To communicate the importance of certain nodes or tags in a network or pathway, the font size can be selectively increased or decreased. These changes can easily be made by selecting one or more nodes or tags and using the new font size button, as shown below in Figure 9.
Figure 10: Now you can enlarge the font on just one node or tag using the new font size button. The importance of these nodes or tags can be emphasized by font size and provide better communication with your pathway images.
New Canonical Signaling Pathway
Updates to four existing Canonical Signaling Pathways
>300,000 new findings (bringing the total in IPA to over 8.1 million), including the following:
>238,000 expert findings
>6600 protein–protein interaction findings from BioGRID
>2200 protein–protein findings from IntAct
>1000 findings from Clinical Genome Resource (ClinGen, a new source for this release)
>51,000 cancer mutation findings from ClinVar
>12,700 findings from the Mouse Genome Database (MGD)
>5600 findings from the Online Mendelian Inheritance in Man (OMIM)
>100 Gene Ontology findings
>1300 target-to-disease findings from ClinicalTrials.gov
>1800 drug-to-disease findings from ClinicalTrials.gov
~350 newly mappable chemicals
>6000 new datasets (for a total of >96,000) are now available in Analysis Match, Activity Plot and Pattern Search
Table 1: OmicSoft analysis content in Analysis Match, Activity Plot and Pattern Search. More than 6000 new analyses have been added in this release.