QIAGEN powered by

Latest improvements for IPA

  Current line          Archive

What’s new in the QIAGEN® Ingenuity Pathway Analysis Spring Release (2023)

Identify potential cell types based on the set of genes on networks and pathways

Ingenuity Pathway Analysis (IPA®) can now predict cell types associated with the genes on your network or pathway. The prediction is based on an enrichment calculation for the set of genes on your pathway canvas versus sets of genes that are known to be expressed relatively highly in particular cell types. The underlying cell type expression data comes from The Human Protein Atlas (www.proteinatlas.org/).

Figure 1 shows a screenshot of the new Cells and Tissues overlay applied to a network derived from expression data from a natural killer single cell cluster (from human fetal liver, PMID 31597962). As expected, the overlay indicates that the network is enriched in natural killer cell genes (P value: 2.04E-20).

Figure 1. Enrichment of natural killer enriched genes on a network. An overlay tag (labeled “CT: natural killer cells”) was added to the pathway after the genes on the network were found to be enriched in genes expressed relatively highly in natural killer cells. CT stands for “Cells or Tissues”. The underlying sets of genes that are considered cell-type enriched are defined as genes expressed in one cell type at more than three times the median of expression across all other cell types in the collection from The Human Protein Atlas.

The cell types are organized into three major branches of the Ingenuity Ontology, namely the physiological system, eukaryotic cells, and gross anatomical part. A specific cell type will typically be found in two or three branches of those major branches. In the example of Figure 1, natural killer cells are found under the immune system (within the physiological system branch), and as shown in Figure 2, also under blood cells in the eukaryotic cells branch.

Figure 2. Natural killer cells are categorized under the eukaryotic cells branch of the Ingenuity Ontology as well.

 

Causally score My Pathways in Core Analysis

With this new capability in IPA, you can set a pattern of activated or inhibited genes on a My Pathway, which IPA can then score by comparing that pattern to the differential expression of the analysis-ready molecules in your dataset. In so doing, IPA can predict whether My Pathway is activated or inhibited in the context of your dataset. The activation state (red or green) for each node can be set by overlaying an analysis or a dataset, either manually with the red or green paint bucket in the MAP (Molecule Activity Predictor) feature, or by using a combination of the paint buckets along with either an overlaid analysis or dataset.

Figure 3 shows an example of a My Pathway created in IPA depicting several key epithelial–mesenchymal transition-related genes and biological functions. The gene nodes have been colored with the MAP paint buckets (red for activated and green for inhibited). Once the pathway has been saved and approved for scoring, the pathway can be scored in the context of future Core Analyses.

Figure 3. A custom My Pathway with nodes assigned by the user as activated (red) or green (inhibited). This pathway can be saved and scored in any future Core Analysis. Note that any orange or blue coloring for molecules or any diseases or functions are not saved as part of this pathway pattern for scoring purposes.

The scoring is done using a z score algorithm, akin to how Canonical Pathways are scored, accomplished by comparing the up- or downregulated states of the analysis-ready molecules in your dataset to the activity state (red or green color) of matching molecules on each saved My Pathway. Figure 4 shows the My Pathways tab for a Core Analysis of expression data from claudin-low breast cancer cell lines ratioed to luminal cell lines (PMID 20813035).

IPA predicts that the custom EMT “My Pathway” is activated in the aggressive cancer lines, which is the expected result for these cells. The z score is positive because the actual expression direction in the dataset (shown in the fourth column in the table in Figure 4) matches the expected direction assigned in the saved My Pathway (displayed in the seventh column in Figure 4, labeled “Expected”).

Figure 4. Causally scoring a My Pathway. The My Pathway named “EMT key TF” shown in Figure 3 has been scored in a Core Analysis and is indicated with the orange bar above. The orange color indicates the pathway is predicted to be activated in this expression analysis of aggressive breast cancer cell lines. As shown above the table, the z score for the pathway is 2.646.

This new capability provides you with the tools to create any pathway you can imagine and find out how it is impacted in your experimental setting. The genes on the pathway do not need to be connected by relationships. You can also modify a Canonical Pathway or other IPA pathway as your starting point for your My Pathway.

 

Set the User Dataset as the reference set when uploading a dataset

When analyzing a dataset, the most precise definition of the “universe” of genes to use in statistical calculations is the one that most closely matches the set of genes that you measured (or could measure) in your experimental setting. For example, if you are analyzing a panel of 400 genes, then the universe or “reference set” should be those 400 genes (or better yet, the subset of those genes that are measurable in the experimental conditions at hand). It would be statistically incorrect to set the reference set to all genes in the genome if you know you can only measure changes in those 400.

Or, for example, if you are performing whole transcriptome RNA-seq from mouse kidney tissue, then the reference set would ideally be the set of all genes in your experiment that you could reliably measure, for example, those with RPKM values that passed some threshold in at least one sample (e.g., RPKM > 1). That way, the universe is set to “mouse kidney-expressed genes” rather than all possible genes in the genome, some of which are not expressed in mouse kidney.

IPA has always enabled you to upload the entire set of detectable molecules and then when analyzing the data, to set the User Dataset as the reference set. However, it was easy to forget to use that setting when creating the analysis, resulting in effectively using the entire genome as the universe instead. In this release of IPA, you can set the reference set to User Dataset during dataset upload instead, when you are more likely to remember to set it correctly.

Figure 5 shows the new upload setting.

Figure 5. Setting the reference set to User Dataset reference during dataset upload.

This new feature should reduce the chance of accidentally using a less-than-ideal reference set in your analyses.

Please remember that you should not use the “User Dataset” reference set option if your dataset represents only the significantly differentially expressed genes from your experiment. In such a case, if you do not set even more stringent cutoffs at analysis time, then the statistics will be incorrect, because in that case there is no difference between the analysis-ready genes and the reference set. The statistics are designed to look for enrichment among a smaller set of genes drawn from the universe of possible genes.

 

Legend for Graphical Summary

A legend specific to the Graphical Summary (a tab in Core Analysis) appears in the top right corner of the screen when viewing that tab as shown in Figure 6.

Figure 6. Graphical Summary legend. The legend appears at the top right. A high-resolution copy can be downloaded from the help portal for inclusion in publications.

Other software improvements

  • The View Details menu item (available when right-clicking in a pathway canvas on nodes or in the white space between them) has been moved to the top of the right-click menu for easier access.
  • Now when you right-click and re-run an analysis, the prior filters and cutoffs are automatically applied upon opening. You no longer need to click the Recalculate button to update the analysis-ready molecule counts when first opening it. Of course, if you make changes to any filters or cutoffs, you will still need to press the Recalculate button to see the effects those changes make.
  • Clicking on a pathway name in the Canonical Pathways Summary tab now opens the correct tab in a Core Analysis.
  • Double clicking on a horizontal bar in a bar chart now opens the corresponding pathway diagram.
  • A mapping issue has been corrected where a gene symbol like C9 could be inadvertently assigned as a KEGG ID (“phosphate” in this case) instead of the correct Complement protein.
  • A discrepancy has been resolved between the results of “Grow to Canonical Pathway” versus “Overlay Canonical Pathway” for groups and complexes, so now both approaches produce the same results. In prior releases, the overlay tag method was too promiscuous, by using the members of groups or complexes to seek the overlaps. For example, if you asked the question “Which Canonical Pathways overlay onto the APC-FZR1 complex (i.e., which pathways contain that complex)?”, the tag method would return the correct pathways but also additional pathways that contained just the components FZR1 or APC.

 

Content updates

Explore new areas with four new and four updated pathways

New pathways

  • ABRA Signaling Pathway
  • Activin Inhibin Signaling Pathway
  • DHCR24 Signaling Pathway
  • WNK Renal Signaling Pathway

Existing pathways updated to include an activity pattern

  • RAR Activation
  • Role of Tissue Factor in Cancer
  • Serotonin Receptor Signaling
  • Transcriptional Regulatory Network in Embryonic Stem Cells

 

Addition of >263,000 new findings (bringing the total in IPA to over 12.3 million)

  • >66,000 Expert findings
  • >24,000 protein–protein interaction findings from BioGrid
  • >500 protein–protein interaction findings from IntAct
  • >122,000 cancer mutation findings from ClinVar
  • >650 Gene Ontology findings
  • >800 gene to disease associations from Online Mendelian Inheritance in Man (OMIM)
  • >1,800 cancer mutation findings from COSMIC
  • >350 gene to disease findings from ClinGen
  • >900 target-to-disease findings from ClinicalTrials.gov
  • >1,100 drug-to-disease findings from ClinicalTrials.gov
  • >300 gene to disease or phenotype associations from the Mouse Genome Database (MGD or "Jax”)
  • >42,500 gene to cell type findings from The Human Protein Atlas (THPA)

 

135,641 Expression datasets will be available in mid-April 2023 (10,714 added)

Breakdown of the OmicSoft datasets by land (mid-April 2023)

Land Explorer links for TARGET have been updated

Links on Gene Views for TARGET (Therapeutically Applicable Research to Generate Effective Treatments) for childhood cancer have been updated to point to the B38 GC33 gene model in Land Explorer, rather than the older B38 data.

If you have further questions, please contact your local QIAGEN representative or contact our Technical Support Center at www.qiagen.com/support/technical-support.