A new method of matching to other analyses has been developed that directly scores the analysis-ready genes from your analysis
of interest against those in each analysis in the OmicSoft repository in Ingenuity® Pathway Analysis (IPA®). This contrasts with
the original method in Analysis Match, which scores the overlap among Upstream Regulators, Canonical Pathways, etc.,
between the query and the other analyses.
We call this new method Dataset Matching because the matching occurs at the level of the dataset genes that go into each
analysis. The new score appears in the rightmost column in the Analysis Match table, adjacent to the original overall z-score
column. The new method can be more precise than the prior matching method. In addition, it can be used to match extremely
small datasets: those that are less than 100 genes, and even as small as 10–20 genes. While this method is powerful, it may
offer fewer opportunities to discover analyses that are related at more distant “biological” levels but not as closely at the gene
level.
Figure 1 shows snippets of Analysis Match tables for the same analysis sorted by the original score contrasted to sorting by the
new score. The lower panel (where the matches are sorted by the new score) returns what appear to be closer matches to the
cardiomyocyte versus embryonic stem cell query analysis than the original method (shown in the top panel).
Figure 1. Analysis Match results sorted by the original score (top) and new score (bottom). The red arrows indicate analyses that are not from the expected muscle
or heart tissue. The lower table indicates that the new scoring method tends to return fewer of these unexpected tissues than the original method.
The set of genes that overlap between the query analysis and the matching ones can be seen by first creating a heatmap as
shown in Figure 2 (after selecting analyses that you wish to compare with your query), then clicking on a heatmap square of
interest in the row labeled “Analysis-ready genes”.
Figure 2. Heatmap of the top forty matching analyses. Each orange-colored square in the top row of the heatmap represents the z-score for that analysis versus the query, based on matches between the sets of analysis-ready genes. The bright orange square at the far left is the “self” match between the analysis-ready (AR) genes from the query and the query itself, which is shown in the pink-colored column. Note that this coloration is distinct from the orange coloring representing positive activation z-scores for the biological entities (e.g., Upstream Regulators) that are shown in the rest of the heatmap. Clicking on one of the squares will open a pathway in the adjacent pane that displays the genes that overlap between the query’s AR genes and the matching analysis (shown in more detail in Figure 3).
Clicking on a heatmap square will open a pathway displaying the set of analysis-ready genes that overlap between the query and the matching analysis. You can then open the pathway in a new window, and if desired, add an overlay of the query analysis as shown in Figure 3.
Figure 3. 250 genes match between the cardiomyocyte analysis and its best matching analysis. The “cardiomyocytes versus embryonic stem cell” analysis (derived from GSE47948, PMID: 22981692) strongly matched an analysis that examined myotubes differentiated for one day versus embryonic stem cells (GSE63136, PMID: 25801824). This pathway view was created by clicking the heatmap square and then manually overlaying the query analysis using the Analyses, Datasets & Lists feature in the Overlay tool. All 250 genes have the same expression direction between the two analyses (i.e., either up-regulated in both analyses or downregulated in both analyses). In contrast, the 10th best match has 214 genes in common with the cardiomyocyte query analysis, and 10 of those genes have a mismatch in direction (not shown).
As mentioned above, the new scoring method often works on small datasets, where there are typically too few genes to generate
robust Upstream Regulator, Causal Network, Canonical Pathway, and Disease and Function signatures to match to other
analyses.
As an example, Figure 4 shows the Analysis Match results for an analysis of the top 10 genes (by P value and fold change)
from the cardiomyocyte dataset.
Figure 4. The new scoring method using a small dataset. The analysis of a 10-gene dataset from the cardiomyocytes versus embryonic stem cells matches the expected types of analyses.
The new Dataset Match scoring method is complementary to the original scoring method, and we hope you make interesting
discoveries with it!
A year ago, approximately 1500 disease and phenotype networks were created with machine learning (ML) techniques and made available in IPA. These “ML Disease Pathways” (originally called “Inferred Networks”) contain well-known genes and proteins that not only impact the diseases and phenotypes displayed in each network but also contain inferred molecules from machine learning that are not yet known to be involved, or whose relationship to these outcomes were not yet curated, in the IPA Knowledge Base (Krämer, et al. 2022). These pathways are searchable in IPA by keyword — and you can view and overlay data onto them — but until now were not scored against datasets in Core Analyses.
Now when you run a Core Analysis, these ML pathways are automatically scored by z-score and p-value to your dataset, and the results can provide an opportunity to discover potentially novel relationships between your analysis and diseases and phenotypes. As an example, Figure 5 shows the results for the ML pathways scored against the transcriptional profile of simvastatin-treated human HUVEC cells (expression data derived from GSE85799).
Figure 5. ML Disease Pathways scored against simvastatin-treated rats (liver). The most significant result by Fisher’s Exact Test (right-tailed) is “Severe sepsis".
Double-clicking on the bar for severe sepsis brings up its pathway diagram, as shown in Figure 6. The expression pattern from the overlaid simvastatin treatment (red or green nodes) combined with the effects on neighboring nodes predicted with the Molecule Activity Predictor (orange-or-blue-colored nodes) indicates that this drug may decrease sepsis. Interestingly, the Chem View page for Simvastatin in IPA indicates that the drug is in a phase 4 clinical trial in sepsis (though not severe sepsis as in this example).
Figure 6. The severe sepsis ML pathway overlaid with simvastatin differential expression data. IPA predicts that simvastatin may decrease severe sepsis.
To make it simple to choose which Canonical Pathways to include in a chart, an auto-complete box has been added to the Customize Chart dialog box. If you wish to exclude a certain pathway, just start typing a word in its name, then uncheck it when you see it in the results. On the other hand, if you want to quickly focus on just one or a handful of pathways, you can uncheck the Select All checkbox first, then type text related to the pathway to find those you want to include, and finally, select their checkboxes. Figure 7 shows an example of the latter case, where the user only wants to show actin-related pathways in the chart.
Figure 7. Quickly focus on Canonical Pathways of interest in the Customize Chart dialog box. Uncheck the Select All button first (as shown) if you wish to search for and show a small number of pathways in the chart.
>29,000 protein-protein interaction findings from BioGrid
>407,000 cancer mutation findings from ClinVar
>1,800 target-to-disease findings from ClinicalTrials.gov
>1,700 drug-to-disease findings from ClinicalTrials.gov
>800 Gene Ontology findings
>220 mappable chemicals
> 3,800 Lipid Maps IDs
If you have further questions, please contact your local QIAGEN® representative or contact our Technical Support Center at
www.qiagen.com/support/technical-support