Ingenuity Pathway Analysis (IPA®) can now predict cell types associated with the genes on your network or pathway. The prediction is based on an enrichment calculation for the set of genes on your pathway canvas versus sets of genes that are known to be expressed relatively highly in particular cell types. The underlying cell type expression data comes from The Human Protein Atlas (www.proteinatlas.org/).
Figure 1 shows a screenshot of the new Cells and Tissues overlay applied to a network derived from expression data from a natural killer single cell cluster (from human fetal liver, PMID 31597962). As expected, the overlay indicates that the network is enriched in natural killer cell genes (P value: 2.04E-20).
Figure 1. Enrichment of natural killer enriched genes on a network. An overlay tag (labeled “CT: natural killer cells”) was added to the pathway after the genes on the network were found to be enriched in genes expressed relatively highly in natural killer cells. CT stands for “Cells or Tissues”. The underlying sets of genes that are considered cell-type enriched are defined as genes expressed in one cell type at more than three times the median of expression across all other cell types in the collection from The Human Protein Atlas.
The cell types are organized into three major branches of the Ingenuity Ontology, namely the physiological system, eukaryotic cells, and gross anatomical part. A specific cell type will typically be found in two or three branches of those major branches. In the example of Figure 1, natural killer cells are found under the immune system (within the physiological system branch), and as shown in Figure 2, also under blood cells in the eukaryotic cells branch.
Figure 2. Natural killer cells are categorized under the eukaryotic cells branch of the Ingenuity Ontology as well.
With this new capability in IPA, you can set a pattern of activated or inhibited genes on a My Pathway, which IPA can then score by comparing that pattern to the differential expression of the analysis-ready molecules in your dataset. In so doing, IPA can predict whether My Pathway is activated or inhibited in the context of your dataset. The activation state (red or green) for each node can be set by overlaying an analysis or a dataset, either manually with the red or green paint bucket in the MAP (Molecule Activity Predictor) feature, or by using a combination of the paint buckets along with either an overlaid analysis or dataset.
Figure 3 shows an example of a My Pathway created in IPA depicting several key epithelial–mesenchymal transition-related genes and biological functions. The gene nodes have been colored with the MAP paint buckets (red for activated and green for inhibited). Once the pathway has been saved and approved for scoring, the pathway can be scored in the context of future Core Analyses.
Figure 3. A custom My Pathway with nodes assigned by the user as activated (red) or green (inhibited). This pathway can be saved and scored in any future Core Analysis. Note that any orange or blue coloring for molecules or any diseases or functions are not saved as part of this pathway pattern for scoring purposes.
The scoring is done using a z score algorithm, akin to how Canonical Pathways are scored, accomplished by comparing the up- or downregulated states of the analysis-ready molecules in your dataset to the activity state (red or green color) of matching molecules on each saved My Pathway. Figure 4 shows the My Pathways tab for a Core Analysis of expression data from claudin-low breast cancer cell lines ratioed to luminal cell lines (PMID 20813035).
IPA predicts that the custom EMT “My Pathway” is activated in the aggressive cancer lines, which is the expected result for these cells. The z score is positive because the actual expression direction in the dataset (shown in the fourth column in the table in Figure 4) matches the expected direction assigned in the saved My Pathway (displayed in the seventh column in Figure 4, labeled “Expected”).
Figure 4. Causally scoring a My Pathway. The My Pathway named “EMT key TF” shown in Figure 3 has been scored in a Core Analysis and is indicated with the orange bar above. The orange color indicates the pathway is predicted to be activated in this expression analysis of aggressive breast cancer cell lines. As shown above the table, the z score for the pathway is 2.646.
This new capability provides you with the tools to create any pathway you can imagine and find out how it is impacted in your experimental setting. The genes on the pathway do not need to be connected by relationships. You can also modify a Canonical Pathway or other IPA pathway as your starting point for your My Pathway.
When analyzing a dataset, the most precise definition of the “universe” of genes to use in statistical calculations is the one that most closely matches the set of genes that you measured (or could measure) in your experimental setting. For example, if you are analyzing a panel of 400 genes, then the universe or “reference set” should be those 400 genes (or better yet, the subset of those genes that are measurable in the experimental conditions at hand). It would be statistically incorrect to set the reference set to all genes in the genome if you know you can only measure changes in those 400.
Or, for example, if you are performing whole transcriptome RNA-seq from mouse kidney tissue, then the reference set would ideally be the set of all genes in your experiment that you could reliably measure, for example, those with RPKM values that passed some threshold in at least one sample (e.g., RPKM > 1). That way, the universe is set to “mouse kidney-expressed genes” rather than all possible genes in the genome, some of which are not expressed in mouse kidney.
IPA has always enabled you to upload the entire set of detectable molecules and then when analyzing the data, to set the User Dataset as the reference set. However, it was easy to forget to use that setting when creating the analysis, resulting in effectively using the entire genome as the universe instead. In this release of IPA, you can set the reference set to User Dataset during dataset upload instead, when you are more likely to remember to set it correctly.
Figure 5 shows the new upload setting.
Figure 5. Setting the reference set to User Dataset reference during dataset upload.
This new feature should reduce the chance of accidentally using a less-than-ideal reference set in your analyses.
Please remember that you should not use the “User Dataset” reference set option if your dataset represents only the significantly differentially expressed genes from your experiment. In such a case, if you do not set even more stringent cutoffs at analysis time, then the statistics will be incorrect, because in that case there is no difference between the analysis-ready genes and the reference set. The statistics are designed to look for enrichment among a smaller set of genes drawn from the universe of possible genes.
A legend specific to the Graphical Summary (a tab in Core Analysis) appears in the top right corner of the screen when viewing that tab as shown in Figure 6.
Figure 6. Graphical Summary legend. The legend appears at the top right. A high-resolution copy can be downloaded from the help portal for inclusion in publications.
Links on Gene Views for TARGET (Therapeutically Applicable Research to Generate Effective Treatments) for childhood cancer have been updated to point to the B38 GC33 gene model in Land Explorer, rather than the older B38 data.
If you have further questions, please contact your local QIAGEN representative or contact our Technical Support Center at www.qiagen.com/support/technical-support.
Now you can open your Core Analysis as soon as it begins processing and view your results in real time as they become available. The analysis can be opened as soon as it is submitted by double clicking the Core Analysis icon in the Project Manager. If the icon for the analysis shows a clock ( ),the analysis is in the queue for processing, and you can view its position in the queue by double-clicking the icon. Once the analysis is running, the file icon ( ) gradually fills with green to indicate the progress of the analysis. Individual tabs appear as processing starts and each tab name changes from gray to black font when processing is complete. Some results tabs, such as Canonical Pathways, My Pathways, Tox Lists and My Lists are usually ready in less than one minute, so you can get started exploring your results right away.
Figure 1 shows a screenshot of the Summary tab, which has been re-purposed as a dashboard, showing the status of each tab while the analysis is running. Once the analysis is completely finished, the Summary tab reverts to summarizing the overall results.
Figure 1. Start viewing your Core Analysis results even as it is running. The Summary tab initially shows the status of each tab and reverts to summarizing the results once the overall analysis is finished.
Figure 2 shows the completed Canonical Pathways tab in the context of a partially complete analysis. This completed tab is fully active and usable even before the other processes (grayed out tabs) are ready.
Figure 2. The Canonical Pathway tab completes quickly. This tab is fully usable even while other tabs are still running on IPA servers.
Publishers prefer compact representations of data to save page space in journals. Now, when bar charts are displayed with the horizontal option (i.e., bar orientation Is horizontal rather than vertical), the spacing of the bars is closer than previously, and the chart labels are right-aligned, as shown in Figure 3.
Figure 3. New compact layout for horizontal bar charts. The bars are now tightly spaced when the horizontal display option is chosen. Note that, for this image, font sizes have been increased to 20 points in the Customize Chart dialog.
In this release, some tabs in Core Analysis have been consolidated to be more intuitive and space-saving. The Canonical Pathways tab and the My Pathways tab have been brought together as sub-tabs under a new Pathways tab, and the Molecules tab has been moved to the rightmost position. Now, all the tabs can be seen by default without enlarging the analysis window.
Now you can right-click any node in a pathway or network to see its corresponding Gene View, Chem View and Disease View web page. By right-clicking a gene node and selecting “View Details”, the Gene View page is displayed in your default web browser, as shown In Figure 4.
Figure 4. Right-click a node in a network or pathway and select "View Details" to view the details for the corresponding gene, chemical, disease or function.
~96,000 expert findings
~10,000 protein–protein interaction findings from BioGRID
~8000 cancer-mutation findings from ClinVar
~4000 Gene Ontology findings
~1200 target-to-disease findings from ClinicalTrials.gov
~1200 drug-to-disease findings from ClinicalTrials.gov
~150 gene-to-disease or phenotype associations from the Mouse Genome Database (MGD or "Jax”)
~200 newly mappable chemicals
Bubble charts present multi-variate data in an easily understood graphical form that can help explain your data to your audience.
In this release, you can now show individual Canonical Pathway names along either axis.
Figure 1 shows a bubble chart that plots the predicted effects of an Nrf2-activating compound on Canonical Pathway activity in mouse kidney. The pathways are sorted by significance (Benjamini Hochberg-corrected, right-tailed Fisher’s exact test, which tests the significance of the overlap of dataset genes with each pathway) and colored according to predicted activation. The sizes of the bubbles increase with the number of genes that overlap each pathway.
The most significant pathways are on the right side of the x-axis, and, in this dataset, are mostly activated, also having the largest number of genes that overlap the dataset (as indicated by bubble size).
One advantage of this view (in contrast to the default IPA bar-chart view) is the compactness of the bubble chart. Every significant pathway is displayed in the single image in Figure 1. Bar charts tend to be much wider than bubble charts. With bubble charts, a third variable can be displayed in addition to p-value and z-score. In Figure 1, the circle size indicates the size of the overlap for each pathway.
Figure 1. Nrf2-activator Canonical Pathway scores plotted as pathway name vs. significance. The colors indicate the z-score (see legend at top right), and the size of the bubble increases with the number of overlapping genes. Therefore, the large orange bubbles at the right of the chart represent pathways that are statistically significant, are predicted to be activated, and have many overlapping genes with the dataset. The gene expression data are from kidney of mice that were fed the Nrf2 activating chemical CDDO-me (Shelton, L.M., et al, 2015, Kidney Int. 88:1261. PMC4676083). Click the figure for high-resolution image.
There are other ways to plot your data so that pathway names appear on an axis. For example, you can create a chart that emphasizes the categorization of the pathways (see Figure 2).
Figure 2. Canonical Pathway scores from single-cell data plotted as pathway name versus pathway category. This view helps you see how the pathways and their scores cluster into categories. The analyzed data are from the natural killer (NK) cell cluster from developing human liver. The data were re-processed using the CLC Workbench single-cell module from data published in Popescu, D.-M., et al (2019) Decoding human fetal liver haematopoiesis Nature 574: 365. The analysis in IPA is derived from the gene expression for the single-cell cluster of natural killer cells (compared to the rest of the clusters). As listed in the legend (top right), the bubbles are colored according to z-score, and the bubble size Is based on the -log(p-value), where the largest bubbles have the most significant Benjamini Hochberg-corrected right-tailed Fisher’s exact test scores. For example, Natural Killer Cell Signaling and T Cell Receptor Signaling pathways in the lower left corner are predicted to be activated in these NK cells. These pathways are members of the Cellular Immune Response category (as shown on the y-axis). Click the figure for high-resolution image.
You can now toggle the height of the column headers in heatmaps to show the full analysis names, as shown in Figure 3. The taller headers are present in exported image files as well.
Figure 3. Expand column headers in comparison heatmaps to see the entire analysis names. This expansion applies both to Comparison Analysis heatmaps and Analysis Match heatmaps. Click the figure for high-resolution image.
Explore new areas with three new and eight updated pathways
Updated to include an activity pattern
>1.5 million new findings (bringing the total in IPA to over 11.8 million)
~60,000 expert findings
~995,000 from TargetScan mouse
~45,000 protein–protein interaction findings from BioGrid
~5000 Gene Ontology findings
~2000 target-to-disease findings from ClinicalTrials.gov
~2100 drug-to-disease findings from ClinicalTrials.gov
~2300 protein–protein interaction findings from IntAct
~1000 findings from the Online Mendelian Inheritance in Man (OMIM)
~600 gene-to-disease or phenotype associations from the Mouse Genome Database (MGD or "Jax”)
~18,500 from Human Metabolome Database (HMDB)
~550,000 findings from The Human Protein Atlas (THPA)
Visualize your data in new ways with Canonical Pathway bubble charts
Bubble charts present multi-variate data in an easily understood graphical form that can help explain your data to your audience. Now, you can display and export bubble charts in QIAGEN Ingenuity Pathway Analysis (IPA) for your Core Analysis Canonical Pathway scores. For example, Figure 1 shows a bubble chart which plots the effect of an NRF2 activator on Canonical Pathway activity. In this chart, the scores are organized by pathway category and colored according to predicted activation.
In this example, the bubble size is related to the number of genes that overlap each pathway. The figure shows that this NRF2 activator turns on multiple pathways related to xenobiotic metabolism, toxicity and cellular stress (see bottom right area of Figure 1). These pathways also exhibit a higher number of overlapping genes relative to other pathways in the chart.
Figure 1. NRF2 activator Canonical Pathway scores arranged by category. The colors indicate the z-score, and bubble size corresponds to the number of overlapping genes. The large orange bubbles represent pathways that are statistically significant, predicted to be activated, and have many overlapping genes from the dataset. Gene expression data from Shelton, L.M. et al. (2015) Kidney Int. 88-1261. PMC4676083.
There are several ways to plot your data with the new bubble charts. For example, you can create a chart similar to the one in Figure 2 to spotlight the highly significant and activated pathways with many overlapping genes (see top right of chart).
Figure 2. NRF2 activator pathway data plotted to highlight the most significant pathways that are activated or inhibited. Pathway bubbles near the top of the chart are the most significant. The blue bubbles towards the left are inhibited, and the orange bubbles towards the right are activated. As in the other figures, the size of the bubbles is proportional to the number of overlapping genes.
Visualize OmicSoft single-cell data in Land Explorer
Now you can easily explore single-cell expression for any gene in the public data curated by OmicSoft. These single-cell views are available via new links in IPA Gene Views (Figure 3).
Figure 3. New links for single-cell views in Gene View (highlighted in the red box).
Improve the readability of your networks with new node-label placement
The labels of nodes (e.g., gene names, disease names) in IPA networks can now be positioned below their node shapes to make them easier to read, especially when they otherwise would be superimposed over dark fill colors. Figure 4 below shows two examples of IPA networks in which the node labels are positioned below their corresponding shapes.
Figure 4. Interpret your networks more readily with the node labels placed below the node shapes. Examples are shown for Graphical Summary (left) and for an interaction network in which the data values also displayed (right).
Expand your research with support for the upload of new species datasets
Now you can analyze datasets from a wider range of species in IPA, including crab-eating macaque, pig and Chinese hamster ovary (CHO) cells. IPA now supports an additional 11 species for a total of 25 supported species.
See this help article for the full list of 25 species.
Take advantage of inferred disease and phenotype networks with Search
When investigating a disease, it is useful to understand the key genes involved and how they interact to drive the occurrence or severity of the condition. To this end, a large library of Disease and Phenotype Networks has been created by leveraging an unsupervised machine learning (ML) model of the literature-derived QIAGEN Knowledge Graph (QKG).
Each network in the collection focuses on a single disease or phenotype and contains key genes and impacted biological functions, as well as relationships between, them that drive the condition. In addition, a colored pattern of predicted activation is overlaid to show how the activation or inhibition of genes leads to the disease.
The intent is to provide a relatively small snapshot of the primary factors involved. The network does not contain all molecules known to be related to the disease in the QKG; including all molecules would often result in an unreadable, densely connected network with hundreds, if not thousands, of nodes. Instead, the ML algorithm prioritizes the most important genes and functions and generates networks of reasonable sizes (~50 nodes on average) that provide a good overview in a comprehensible manner.
This release provides >1,500 of these networks, which are also now fully discoverable in Search. As shown in Figure 5, you can search for these networks in the Pathways and Lists tab using terms, such as diseases, phenotypes or gene names.
Figure 5. Discover IPA’s recently developed Disease and Phenotype networks via Search. In the top panel, the disease term "cholestasis" has been used to search, whereas, in the bottom panel, the gene name "ABCB4" has been used. Each row in the results is a specific network. Clicking a blue hyperlink in the result will open the corresponding network.
To make such prioritizations, the algorithm uses unsupervised gene and function embeddings derived from causal relationships in the QKB. Unlike many ML applications for biology, the algorithm does not train on differential expression or other forms of raw data; instead, it leverages the QKB's causal associations curated from biomedical literature by experts for more than 20 years. More details about the approach are available in our recently published paper: “Mining hidden knowledge: Embedding models of cause-effect relationships curated from the biomedical literature", Krämer, A., et al. (2022) Bioinformatics Advances.
The results are generated algorithmically without further curation by human experts. Each network generally includes well-known participants in the disease and predicts new associations not previously present in the QKB. Some of these predictions may be opportunities for novel discoveries.
Speed up your work by opening Canonical Pathways directly from links in Gene Views
Clicking a Canonical Pathway name in a Gene View will now open the pathway in the IPA client, as shown in Figure 6. The gene of interest will be highlighted in the opened pathway.
Figure 6. Canonical Pathway links on Gene Views. Clicking a link opens the pathway diagram and highlights the corresponding gene of interest (i.e., the gene represented by the clicked Gene View).
Reduced runtimes for Core Analysis
With this update, Core Analyses complete more quickly than before. Improving the performance of IPA is an important issue to the IPA team at QIAGEN, and this is the first of several performance improvements in upcoming releases.
Option to turn Molecule Activity Predictor (MAP) off by default
Now you can turn off MAP prediction by default (Preferences > Application Preferences > Graph Appearance). MAP is normally turned on by default, so that, if you open a pathway from an analysis, the MAP color overlay is automatically enabled. Now you can turn off MAP prediction globally and still have the option to use it in an “on demand” fashion with the MAP option in the Overlay menu.
Single sign on availability
The latest version of the IPA client launcher (https://analysis.ingenuity.com/pa/installer/select) enables signing on using your institution’s single sign on (SSO) service. Using SSO means that you no longer need to maintain a separate password for IPA or perform multi-factor authentication (e.g., requesting an emailed code) — you simply use the same institutional password that you are already using at work for other systems. This is a free (no cost) service. For more information, visit https://apps.ingenuity.com/ingsso/ssoInstructions.
Explore new areas with 10 new Canonical Pathways
>450,000 new findings (bringing the total in IPA to over 10.3 million)
~45,000 Expert findings
~400,000 cancer mutation findings from ClinVar
~18,000 protein–protein interaction findings from BioGrid
~2100 target-to-disease findings from ClinicalTrials.gov
~1900 drug-to-disease findings from ClinicalTrials.gov
~800 Gene Ontology findings
~300 protein–protein interaction findings from IntAct
~180 gene to disease or phenotype associations from the Mouse Genome Database (MGD or "Jax”)
~40 chemical to cancer findings from the Chemical Carcinogenesis Research Information System (CCRIS)
~175 newly mappable chemicals
118,293 expression datasets are now available (6,858 added)
This release offers a new source of data, ENCODE RNA binding, which contains RNA-seq experiments of 1122 samples for two popular cell lines (K562 and HEPG2) after shRNA knockdown targeting various proteins (Van Nostrand, E.L., et al. (2020) Nature 583:711; https://www.nature.com/articles/s41586-020-2077-3):
These are represented by 486 comparison datasets in IPA. You can also look up the absolute expression or the differential expression for a gene of interest among these shRNA treatments. Look for the links entitled “ENCODE RNA-associated gene knockdown” in any Gene View in IPA.
Improved ease of use and increased search capabilities
The Search tool in IPA is now easier to use because all search options have been consolidated to the main tool bar. Furthermore, you can speed up your research by finding entities (i.e., genes, chemicals, diseases and biological functions) of interest inside Canonical Pathways and Tox Lists. You can also search for your own custom pathways and lists either by name or by entities within them — even in notes you have added to the pathways.
For example, you can search for a gene name, such as “FASLG”, and find all the pathways and lists in which the gene participates (Figure 1).
Figure 1. A search for FASLG in “Pathways and Lists” returned 57 Canonical Pathways and Tox Lists in which FASLG is a participant. In prior releases of IPA, this query would not have returned results, as the prior search tool searched pathway names and categories only, and did not search entities contained within the pathways or lists.
Likewise, a search for a disease of function term will find Canonical Pathways or Tox Lists for which the term is either in a node on the pathway or in the pathway’s name (Figure 2).
Figure 2. Searching for a disease node within pathways and lists finds one match in the name of one pathway and several matches inside another pathway.
For Canonical Pathways, the search automatically “expands” groups and complexes to look within them for genes and protein names. For example, a search for RAF, ARAF, BRAF or RAF1 would return a Canonical Pathway that contains the group “RAF”. Note, however, that, when you open the pathway, you will not immediately see the ARAF, BRAF or RAF1 nodes as they are members of the RAF group, which appears as a single node in the diagram. You can manually expand the RAF node to view these molecules using the “Expand Members/Membership” option available in the Graph Options button in the My Pathways toolbar or from the menu that appears when you right-click the group.
~83,000 Expert findings
~187,000 protein–protein interaction findings from BioGrid
~4,600 gene-to-cancer-type findings from Catalogue of Somatic Mutations in Cancer (COSMIC)
~1,800 target-to-disease findings from ClinicalTrials.gov
~1,800 drug-to-disease findings from ClinicalTrials.gov
~1,000 newly mappable chemicals
~950 gene-to-disease associations from Online Inheritance in Man (OMIM)
~600 protein–protein-interaction findings from IntAct
~300 gene-to-disease or -phenotype associations from the Mouse Genome Database (MGD or "Jax”)
~180 chemical-to-cancer findings from the Chemical Carcinogenesis Research Information System (CCRIS)
~14,000 cancer-mutation findings from ClinVar
~600 target-to-disease findings from ClinicalTrials.gov
~800 drug-to-disease findings from ClinicalTrials.gov
~55 newly mappable chemicals
~1,000,000 RNA expression-to-cell-type findings from The Human Protein Atlas (THPA). In this release, this new content source is used only for filtering in Tissue and Cell Lines filters and includes 49 cell types.
Explore previews of disease and phenotype networks
Explore >1500 new networks, which integrate key molecules that impact a single disease and its associated phenotypes. Machine learning was used to create disease, phenotype and function networks by mining the QIAGEN Knowledge Base. Whereas many genes in each network are known players in the disease, some genes are inferred to impact the disease and may represent novel participants in the disease or its etiology.
Figure 1: Example of a Disease and Phenotype network. Machine learning techniques and other heuristics were used to prioritize key genes that impact psoriasis and to associate the disease with potential phenotypes. The nodes have been colored to indicate their predicted activity in the network: red and green nodes indicate increased or decreased activity in the disease state, respectively. To make it easier to see the molecules and other nodes, the relationship lines in the network have been faded using the “Fade Edges” feature of PathTracer (described below, see Figure 5).
For example, in the psoriasis network shown in Figure 1, the TANK gene is predicted to be activated in the disease state. Yet this gene, highlighted in Figure 2, is not directly connected to psoriasis or its phenotypes in the Knowledge Base at this time. Intriguingly, a search of the literature finds a possible connection between TANK and psoriasis: the authors state that “ubiquitination events involving UBAC1 and TANK should be considered within the molecular mechanisms that modulate the physiological function of CARMA2sh and of its psoriasis-linked mutants. Future work will further address this aspect.” (Mazzone, P. et al. 2020).
Figure 2: New psoriasis network with PathTracer applied. The network shows that TANK is connected to other genes, but is not directly connected to psoriasis or other phenotypes.
The networks are experimental and are not meant to comprehensively catalog every gene associated with a particular disease. Rather, these networks have been constructed to visualize a human-readable set of the most important genes causally connected to the disease and phenotypes and to one another. The networks may help identify genes with similar regulatory patterns that implicate them as potentially important in the disease. Note that all the relationships in the network are supported by findings from the Knowledge Base, and that these networks were not inferred using any expression datasets. In particular, the red and green coloring in the networks is derived from prediction, not expression data.
The methodology used to create these Disease and Phenotype networks in IPA is described in this submitted manuscript: https://www.biorxiv.org/content/10.1101/2021.10.07.463598v1.full.pdf
Disease names in the networks can be browsed alphabetically in the “Disease and Phenotype Networks” subfolder within the Project Manager “Libraries” folder, as shown in Figure 3. Double click an icon to view the network of interest.
Figure 3: Location of the preview for the Disease and Phenotype Networks in the Project Manager.
Currently, these networks are not searchable in IPA, nor are they scored in analyses. You can, however, 1) overlay your own analyses or datasets on the networks, 2) copy them to your own project folders and approve them for p-value scoring in your own analyses or 3) click the Pattern Search button to explore whether OmicSoft analyses have an expression pattern that matches or anti-matches a particular network.
For a list of the networks and to look up which genes and functions are in them, see this Excel spreadsheet.
We would appreciate any feedback you have about these networks. Send your feedback with the “Provide Feedback” link at the top of the main IPA window.
Improvements to enhance interpretation in IPA
Figure 4: Direct link to QIAGEN Land Explorer.
Please note that, although a user with any IPA license type can launch this sample browser view, an IPA Analysis Match Explorer license is needed to drill down further into Land Explorer.
Contact support or your QIAGEN customer solutions manager if you wish to inquire about upgrading to a license that includes QIAGEN OmicSoft Land Explorer.
Figure 5: PathTracer can now be used to fade all the relationships (edges) in a network. Fading the edges can make it easier to see the nodes on the pathway, as shown in this Graphical Summary network.
Figure 6. Gene names have been standardized to their official symbol as used elsewhere in IPA (right panel).
Figure 7: MAP is now enabled by default. Opening a pathway from an analysis will automatically show the orange (increasing) and blue (decreasing) prediction coloring of nodes as shown in the right panel.
The latest IPA client installer is required to launch IPA
The client installer for IPA has been updated and now includes OpenJDK Java (version 11.0.2) instead of Oracle Java. The installer supports two-factor authentication (2FA), which all users are now required to use.
Go to https://analysis.ingenuity.com/pa/installer/select to download and install the new IPA client installer on your computer at your earliest convenience. The page hosts installer packages for both Windows PCs and Macs.
Note that rare problems can occur with some computer hardware and OpenJDK, causing a gray summary screen in Core Analyses, a white area after searching, or other display issues. As a workaround, Customer Support can provide a 2FA-enabled version of the IPA installer that uses Oracle Java instead of OpenJDK. Please contact Customer Support at firstname.lastname@example.org if you experience these or any other issue with the new installer.
Six new Canonical Signaling Pathways
Activity pattern added and content updated for two pathways
Activity pattern added for one pathway
>56,000 new findings (bringing the total in IPA to over 8.5 million), including:
~41,000 Expert findings
~14,000 cancer mutation findings from ClinVar
~600 target-to-disease findings from ClinicalTrials.gov
~800 drug-to-disease findings from ClinicalTrials.gov
~55 newly mappable chemicals
>6700 new expression datasets (for a total of >109,000) available in Analysis Match, Activity Plot and Pattern Search
Note that the recent TCGA metadata upgrade from OmicSoft is not yet included in the TCGA comparisons found in IPA and will be added in the Q1 2022 IPA release.
There are several new ways to customize your bar charts to make them easier to present and publish.
Canonical pathways for a breast cancer dataset filtered to display only pathways containing the ETS1 gene.
The unfiltered Canonical Pathway bar chart of the breast cancer pathways are arranged so that the pathways with the highest positive z-scores (predicted activation) are at the left .
The figure above was modified to increase the font size of the bar and y-axis labels.
The client installer for IPA has been updated and now includes OpenJDK Java (version 11.0.2) instead of Oracle Java. The installer supports two-factor authentication (2FA), and all users will be required to use 2FA as of the December release of IPA.
The new installer will be required to launch the upcoming December release of IPA. Download the installer from https://analysis.ingenuity.com/pa/installer/select. This page hosts installer packages for both Windows PCs and Macs. Note that a 32-bit version of OpenJDK is not available, so Windows users with a 32-bit operating system must instead download the Oracle Java version (also available from the link above). After downloading, install the new IPA client installer on your computer at your earliest convenience.
If you are unable to upgrade to the new installer for some reason, you can use Web Start with Oracle Java to launch IPA. Web Start cannot, however, be used with a Mac that has an M1 chip, as Oracle does not supply a version of Java Web start that is compatible with the M1 chip for Macs. Therefore, M1 Mac users must upgrade to the new installer to run the IPA December release.
Please contact Customer Support at email@example.com if you experience any issues with the new installer.
Six new Canonical Signaling Pathways
Activity pattern added and content updated for two pathways
Content updated for one pathway
>325,000 new findings (bringing the total in IPA to over 8.4 million), including the following:
~143,000 Expert findings
~66,600 protein–protein interaction findings from BioGRID
~400 protein–protein findings from IntAct
~12,000 findings from COSMIC
~86,350 cancer mutation findings from ClinVar
~12,000 findings from the Mouse Genome Database (MGD)
~1430 findings from the Online Mendelian Inheritance in Man (OMIM)
~1800 Gene Ontology findings
~1400 target-to-disease findings from ClinicalTrials.gov
~1800 drug-to-disease findings from ClinicalTrials.gov
~300 newly mappable chemicals
Over 102,000 OmicSoft analyses are currently being computed on our servers and will be released when they have completed in the first week of October, 2021.
The more than 6000 additional analyses include a new Land repository called “Normal Cells and Tissues”. In this release, this Land contains 52 comparisons, based on RNA-seq data from the GTEx consortium, in which a set of samples from a single normal human tissue type is compared to a pool of samples taken from a large variety of other normal tissue samples.
IPA’s Graphical Summary of the “Brain – Hippocampus vs Others” comparison.
Normal tissue comparisons are useful for looking at tissue-specific markers and, for example, for matching with single-cell analyses.
“Brain – Hippocampus vs Others” analysis is matched against all single-cell data in IPA.
For Land Explorer users, the GTEx differential expression is available for each gene on IPA Gene Views.
New Pattern Search instantly discovers relevant QIAGEN OmicSoft analyses
The new Pattern Search capability in IPA builds on Analysis Match by enabling you to instantly scan >96,000 OmicSoft analyses starting with any collection of genes in a network or pathway as your “query”. With Pattern Search, you can answer questions, such as:
Start with any network or pathway from an analysis, or simply place genes of interest into a new My Pathway, then either overlay a dataset or manually color the genes with the MAP paint bucket to “activate” or “inhibit” them. Click the “Pattern Search” button to discover which OmicSoft analyses have similar or different gene expression patterns. You can subsequently create a gene heatmap to examine how the analyses match your query gene by gene.
For example, a set of gene signatures representing the consensus transcriptional effects of certain classes of drugs or chemicals in cell lines has been published (PMID 30552330). Figure 1 shows a My Pathway in which the HDAC inhibitor signature from the paper has been added. Pattern Search compares the added set of up- and down-regulated genes to the significantly differentially expressed genes in the >96,000 OmicSoft datasets.
Figure 1: A simple gene pattern (as a query) in a My Pathway. This signature represents the consensus transcriptional effects of HDAC inhibitors in cell lines, which the authors derived using LINCS project data (PMID 30552330). The genes indicated in red are expected to be up-regulated and in green down-regulated by HDAC inhibitors. Clicking the highlighted “Pattern Search” button searches the >96,000 OmicSoft analyses for genes with a matching or anti-matching pattern of up- or down-regulation, as shown in Figure 2.
The query pattern closely matches a number of LINCS datasets from cancer cell lines that have been treated with various HDAC inhibitors, such as belinostat and mocetinostat (see Figure 2). This pattern is expected for a positive control, as the paper used data from the LINCS project as input to originally create the signatures.
This result indicates that the z-score algorithm, used for matching, effectively finds the appropriately matching datasets. Note that this z-score algorithm simply evaluates whether the “analysis-ready” genes in each OmicSoft analysis directionally match the query signature pattern. In other words, genes that are either activated/upregulated (red) or inhibited/downregulated (green) in both the query and the analysis contribute positively to the z-score.
Figure 2: The HDAC signature strongly matches the expression after LINCS HDAC-inhibitor treatment for various cell lines. As expected, the Pattern Search algorithm recovered the matches to the varieties of LINCS datasets that would be expected from this query signature. These LINCS matches can be filtered to explore unexpected matches as well (see Figure 3).
Filtering out the LINCS datasets to focus on other dataset sources, matches were detected to other experiments involving treatment with HDAC inhibitors, such as SAHA treatment of cell lines for neuroendocrine tumor, colon cancer and bladder cancer (see Figure 3). There were also matches to datasets unrelated to HDAC inhibitors. For example, one match is for a dataset involving ixazomib treatment of non-resistant control cells. Ixazomib, a proteasome 20S subunit beta 5 inhibitor, has been shown to act synergistically with HDAC inhibitors in Hodgkin and T-cell lymphomas (PMID 31452195). Thus, the Pattern Search results uncovered datasets that are unexpected but biologically relevant for follow up.
Figure 3: The HDAC signature strongly matches non-LINCS datasets as well. Filtering the data sources used in Figure 2 to exclude selected dataset sources identified unexpected results that could be interesting to follow up. In this example, the HDAC-treated gene expression pattern matched with analyses of other treatments, such as ixazomib.
From the Activity Plot (such as those in Figures 2 or 3), you can easily create a gene heatmap to examine exactly which genes were significantly differentially expressed in the matching or anti-matching analyses. Figure 4 shows a gene heatmap with the query analysis in the leftmost column and several matching HDAC inhibitor-treatment analyses in the other columns.
Figure 4: The gene heatmap shows the details of how the query signature matches selected analyses. In this example, the HDAC signature is shown in the leftmost column as solid red or green squares indicating genes that are expected to be up-regulated down-regulated, respectively, in the matching analyses. Color intensity indicates the actual log-fold changes with a few matching analyses from the OmicSoft collection. Analyses were selected according the use of various cell lines treated with SAHA. The selected analysis, highlighted in blue, displays its associated metadata in the table on the right. Note that none of the matching analyses are a perfect match for the query, but there are clear similarities among the differentially expressed genes.
It is important to note that Pattern Search works with up- and-down-regulated genes in the query as well as with the inferred activities of nodes produced by Molecule Activity Predictor (MAP). For example, you can use a Regulator Effects network as a query, which displays up- and down-regulated target genes from your dataset (indicted in red and green, respectively), and also indicates activation and inhibition of the inferred upstream regulators with orange and blue, respectively. In this case, activated upstream regulators are treated as up-regulated (red) and inhibited upstream regulators are treated as down-regulated (green) in the query pattern. Figure 5 illustrates this query and the results.
Figure 5: Pattern Search uses both known and inferred activity in the query. In this example, the query uses a Regulator Effects network as a query. For the purpose of pattern matching, the upstream regulators SNAI1 and F2R are treated as upregulated and CCN5 as downregulated. Functions, such as the epithelial-mesenchymal transition, are not used in the query as they will never match the OmicSoft datasets, which consist solely of molecules such, as genes or proteins.
Pattern Search is the newest of several different approaches to augment your biological interpretation in IPA by using the OmicSoft expression analyses. Figure 6 highlights the rationale behind each of these approaches.
Figure 6: The three different features of the “Analysis Match” capability in IPA.
Note that Pattern Search and the other features shown in Figure 6 require an Analysis Match license for your IPA account. If this license is not currently active for you, contact customer support or your local QIAGEN Account Manager for more information about gaining access to this feature.
Comparison Analysis from Activity Plot results
You can now create a Comparison Analysis directly from any Activity Plot, including plots in Pattern Search, as shown in Figure 7.
Figure 7: Create a Comparison Analysis directly from selected rows in the Activity Plot table. In this example, the Activity Plot (left) was used to search for matching OmicSoft Lands datasets in which the upstream regulator NFE2L2 was significantly involved. The “comparisoncategory” column was filtered to include only datasets that were collected from treated samples. After selecting the top five results in the Activity Plot table and clicking the “View Comparison” button, a heat map (right) was calculated using the Hierarchical Clustering options for both the Canonical Pathways and the analyses.
Newly revised Land Explorer links in Gene Views improve navigation
The Land Explorer links section in IPA Gene Views now provides direct links to additional Lands and views in Land Explorer and the available data sources are labeled more clearly. For example, there are now links to survival plots and to sources such as ICGC (International Cancer Genome Consortium) that had not been directly linked before (Figure 8).
Figure 8: Revised Land Explorer links section in Gene Views. The links are categorized by source, type, consortia, etc. Now you can navigate directly to additional views and Lands.
Figure 9: A Kaplan–Meier survival curve is now just one click away from a Gene View in IPA. This survival curve was generated from the FOXM1 Gene View by clicking the “TARGET” link in the Survival by Expression row in the Oncology Consortia column shown in Figure 8. This view shows the survival rate over time and is categorized based on the expression of this gene.
Improve publications by customizing font size in networks and pathways
To communicate the importance of certain nodes or tags in a network or pathway, the font size can be selectively increased or decreased. These changes can easily be made by selecting one or more nodes or tags and using the new font size button, as shown below in Figure 9.
Figure 10: Now you can enlarge the font on just one node or tag using the new font size button. The importance of these nodes or tags can be emphasized by font size and provide better communication with your pathway images.
New Canonical Signaling Pathway
Updates to four existing Canonical Signaling Pathways
>300,000 new findings (bringing the total in IPA to over 8.1 million), including the following:
>238,000 expert findings
>6600 protein–protein interaction findings from BioGRID
>2200 protein–protein findings from IntAct
>1000 findings from Clinical Genome Resource (ClinGen, a new source for this release)
>51,000 cancer mutation findings from ClinVar
>12,700 findings from the Mouse Genome Database (MGD)
>5600 findings from the Online Mendelian Inheritance in Man (OMIM)
>100 Gene Ontology findings
>1300 target-to-disease findings from ClinicalTrials.gov
>1800 drug-to-disease findings from ClinicalTrials.gov
~350 newly mappable chemicals
>6000 new datasets (for a total of >96,000) are now available in Analysis Match, Activity Plot and Pattern Search
Table 1: OmicSoft analysis content in Analysis Match, Activity Plot and Pattern Search. More than 6000 new analyses have been added in this release.
Easily find relevant analyses in the 90,000+ dataset Analysis Match collection
Searching for analyses and datasets of interest to view or overlay is now much easier and more accurate with filterable and customizable metadata columns. Simply enter keywords of interest in a project search, and then use the filters above the metadata columns to narrow the results to exactly what you need.
Data Search and Analysis has new options for searching and filtering. This example shows a search to find mouse liver studies of NASH.
Elucidate biology by finding enriched subcellular locations
The subcellular locations of proteins can provide clues to their function and role(s) in the cell. For example, proteins found in the mitochondria will have different roles than those found in lysosomes.
Now, you can automatically discover and annotate the detailed subcellular locations of molecules in any network or pathway with detailed subcellular locations of the molecules.
A pathway configured to show subcellular localization information.
Detail of the above pathway showing proteins that are localized to the Golgi apparatus.
Other application improvements
New appearance of the Gene View.
New QIAGEN OmicSoft Single Cell Land and SARS-CoV-2 datasets are included with the more than 90,000 datasets that are now available in Analysis Match.
Two new Canonical Signaling Pathways
Addition of Activity Patterns to four existing Canonical Signaling Pathways
One updated Canonical Signaling Pathway
>120,000 new findings (bringing the total in IPA to over 7.8 million), including the following:
In this release, several features in IPA have been improved for ease of use. An important aspect of IPA is the ability to overlay and visualize data on a network or pathway. Now, you can simultaneously see more rows in the ‘Matching molecules’ table and more easily display node bar charts on a network or pathway.
In the ‘Matching molecules’ table in ‘Overlay’ > ‘Analyses, Datasets, & Lists’, the default view now shows only the overlaid expression or phosphorylation measurement type (such as ‘fold change’) rather than all the available measurement types in the dataset. This modification enables you to view more rows of the table and is especially useful when the dataset has several measurement types (such as ‘p-value’, ‘FDR’ and ‘Intensity’). Additionally, when you add a second analysis or dataset, the small bar charts (node charts) now appear next to the nodes in the network or pathway by default (see Figure 1).
Figure 1: Streamlined overlay of ‘Analyses, Datasets, & Lists’ content. The defaults and controls for data overlay have been changed to simplify your work, allowing you to view more rows at once by default. Now, only the measurement that is overlaid on the network or pathway (such as ‘log ratio’ or ‘fold change’) is shown in the ‘Matching molecules’ table at the left. You can click the gear icon to add more measurement types for display. In addition, by default, node bar charts will appear automatically on the network or pathway at the right when multiple analyses or datasets are overlaid. In prior releases, you had to first select one or more rows in the ‘Matching molecules’ table to show the bar charts in the network.
Another improvement in this release allows you to visualize a miRNA–mRNA network directly from ‘microRNA Target Filter’ with one click. Select one or more rows in the dataset and click the new ‘Display as Network’ button (see Figure 2) to create a ‘My Pathway’, which automatically shows the miRNAs connected to their targets.
Figure 2: New ‘Display as Network’ button in ‘microRNA Target Filter’. Visualize rows in ‘microRNA Target Filter’ as a network by selecting one or more rows and clicking the ‘Display as Network’ button. The miRNAs are connected to their targets with miT and E edges (miRNA targeting and expression, respectively).
To ensure consistent network visualization in IPA, the ‘Display as Network’ button is also available in the ‘Causal Networks’ and ‘Regulator Effects’ tabs in ‘Core Analyses’.
Search and filtering of ‘Analysis Match’ analyses enhanced with added metadata
Now there are more ways to find and filter relevant analyses: ‘PubMed ID’ and ‘Therapeutic Area’ fields have been added to QIAGEN OmicSoft datasets and analyses. In Figure 3, a PubMed ID (PMID) is used in the ‘Dataset and Analysis Search’.
Figure 3: Searching for OmicSoft datasets using PMIDs. Enter only the integer portion of the PMID to search, as shown.
You can also search the OmicSoft ‘Therapeutic Area’ metadata with many keywords, such as neurology, rheumatology or endocrinology.
Three new Canonical Signaling Pathways
Addition of Activity Patterns to three existing Canonical Signaling Pathways
>104,000 new findings (bringing the total in IPA to over 7.7 million), including the following:
~24,000 expert findings
~44,000 cancer-mutation findings from ClinVar
~18,000 proteinâ€“protein-interaction findings from BioGRID
~2,300 Gene Ontology findings
~750 drug-to-disease findings from ClinicalTrials.gov
~600 target-to-disease findings from ClinicalTrials.gov
~31,000 proteinâ€“protein-interaction findings from IntAct
~500 newly mappable chemicals
>6900 new datasets for a total of >80,000 in Analysis Match and Activity Plot
Table 1: OmicSoft analysis content in ‘Analysis Match’ and ‘Activity Plot’. More than 6900 new analyses have been added in this release.
The new look of this IPA release includes a redesigned Quick Start window and highly informative short videos to help orient you with quick overviews of IPA’s key features. The new look brings sharper graphics, new iconography and crisper fonts to IPA.
The new look for IPA. Almost all of the functionality of the new version is the same as the prior version, but now has a new appearance.
Please note that Mac users must switch to using the native command key modifier for Mac OS when using IPA. For example, to select all nodes in a network, press ⌘-A on the keyboard for Mac (instead of control-A, used for Windows).
Although any recent IPA installer is compatible with this release, we recommend downloading the latest IPA installer to obtain the new app icon: https://analysis.ingenuity.com/pa/installer/select. Note that IPA requires Java 8 or higher (which is built into this installer) to launch.
The IPA Quick Start window has been reimagined to provide helpful explanations and links so that you can use IPA more effectively right away. You will find sections on how to analyze various data types, links to the top help articles, new case studies, shortcuts to the main functions in IPA, how to contact Support and more.
The new Quick Start window. Headers at the left of the Quick Start give you an overview of available content: click the header to view the content. To open this window, go to Help > Quick Start or press control-9 on your keyboard for Windows or ⌘-9 for Mac).
New short videos can be accessed from the Core Analysis tabs. To view a video, click the video play button icon , which appears in the right corner of most tabs. As an example, here is a video for Upstream Regulators in IPA. A new overview video is also available for the My Pathway window.
The new IPA Graphical Summary provides you with a quick overview of the major biological themes in your IPA Core Analysis and illustrates how these concepts relate to one another. This feature selects and connects a subset of the most significant entities predicted in the analysis, creating a coherent and comprehensible synopsis of the analysis. The Graphical Summary can include entities such as canonical pathways, upstream regulators, diseases and biological functions. The algorithm constructs the summary using machine learning techniques to prioritize and connect entities and infers relationships to connect entities not yet connected by findings in the QIAGEN Knowledge Graph. These inferred relationships help you visualize related biological activities.
The Graphical Summary tab is available in the Core Analysis window. The example below shows the Graphical Summary result for an expression analysis of aggressive “claudin-low” breast cancer cell lines compared (i.e., ratio’ed) to less aggressive luminal A type breast cancer cell lines. In this dataset, which is based on PMID 20813035, the epithelial-mesenchymal transition (EMT) in claudin-low cells has been activated by specific transcription factors, such as ZEB1, SNAI1 and SNAI2, and the cells exhibit tissue-invasive tendencies — all of which are well represented in the generated summary:
The new Graphical Summary presents a synopsis of the top results from your Core Analysis as a small network. The algorithm for this feature uses a number of methods, including machine learning, to select and connect the top predicted entities. Analyses run prior to this release will take a little longer to open the first time they are viewed as IPA generates and saves the network presented in the Graphical Summary.
The summary takes a number of factors into account and is described in more detail in this help article. If you wish, you can generate a new summary with more or fewer nodes, and these changes will be saved automatically with your analysis.
This release enables you to add any number of “sticky” notes to a network or pathway to help you capture ideas or details of interest. Simply add notes to a pathway or network and save as a “My Pathway”. This tool is helpful for explaining the components of a custom pathway to others.
Communicate more effectively by adding your own notes to pathways and networks. Make notes for yourself or to share with others. You can attach notes to individual molecules and relationships or to provide general information about the pathway. Save the annotated pathway or share with your colleagues.
Four new Canonical Signaling Pathways
Addition of Activity Patterns to two existing Canonical Signaling Pathways
Approximately 425,000 new findings (bringing the total in IPA to over 7.6 million)
OmicSoft analysis content in Analysis Match and Activity Plot. More than 7,700 new analyses have been added in this release.
Now you can explore the predicted activity of a single IPA entity such as an Upstream Regulator, Causal Network, Disease/Function, or Canonical Pathway across your own analyses (in addition to the >65,000 from OmicSoft).
Figure 1 shows an Activity Plot for the upstream regulator NFE2L2 (also known as Nrf2). Two of the selected analyses (green dots) are highlighted with a red box. These are analyses from the user’s own projects that predict high activation of NFE2L2, and they also appear in the table below the chart. The results suggest that NFE2L2 can be activated in hepatocellular carcinoma (HCC) tumors, because the datasets were derived from expression data from patients with HCC (GSE33294).
This new Activity Plot feature will help you identify interesting and relevant analyses you have run previously in QIAGEN IPA.
Figure 1: Activity Plot for NFE2L2 as an upstream regulator.
Each dot in the plot represents the scores for NFE2L2 in an individual analysis, where the dot’s position represents its z-score (x-axis) and p-value (y-axis) in the analysis. NFE2L2 is predicted to be activated in an analysis when its z-score is ≥2. Note that several analyses have been selected (green dots), and two of the selected analyses are from the user’s projects. The Activity Plot feature is included with Analysis Match licenses in IPA.
Other enhancements to launching and using Activity Plot
Figure 2: Improved hover panel in the Analysis Match heatmap.
The Comparison Contrast field is shown as the second line in the panel. This example indicates that the analysis is a comparison of prednisone treatment vs. none.
QIAGEN Coronavirus Networks
Recently, Gordon et al. identified human host proteins that interact with SARS-CoV-2 viral proteins using an affinity-purification mass spectrometry screen. We extended this work using the QIAGEN Knowledge Base, by connecting the SARS-CoV-2 host proteins to biological functions or diseases likely affected by viral infection in order to explore how the virus may interfere with various host cell functions, and also to identify additional drug targets and other genes that could potentially be modulated as therapy towards COVID-19. The results are presented as interactive network visualizations, that allow exploration of underlying experimental evidence, made available to the scientific community with the Coronavirus Network Explorer.
In this release of IPA we have made the same networks available within the IPA app in a new My Pathways folder in a Project folder called QIAGEN Coronavirus Networks:
Figure 3: QIAGEN Coronavirus Networks folder in IPA. Seventy specially constructed coronavirus-host protein networks are now available in IPA. If you wish to score these networks in your Core Analyses, please copy them to your own Project Folders and approve them for scoring. Note you will be scoring overlap of your datasets to the host proteins, not to the viral proteins.
Note that information about viral proteins and their connections to the host genes networks cannot be viewed by double-clicking on their nodes, as IPA currently supports only human, mouse and rat identifiers.
Six new Canonical Signaling Pathways
Addition of Activity Patterns to three existing Canonical Signaling Pathways
~64,000 new findings (bringing the total to over 7.25 million findings), including:
169 newly mappable chemicals
COVID-19 is now in the disease ontology with findings from ClinicalTrials.gov
Figure 4: BioProfiler displaying a disease search of ‘COVID-19’. There are three related COVID-19 terms now in the QIAGEN Ontology: COVID-19, Mild COVID-19, Severe COVID-19, and Critical COVID-19. Currently they are backed with findings from ClinicalTrials.gov.
>65,000 OmicSoft Analyses available in Analysis Match and Activity Plot
Table 1: OmicSoft analysis content in Analysis Match and Activity Plot.
Now you can visualize and explore the activity of a single QIAGEN IPA entity, such as an Upstream Regulator, Causal Network, Canonical Pathway, Disease or Function, across >60,000 OmicSoft Land analyses. The Activity Plot is a novel approach that helps you gain insights into an IPA entity by exploring its predicted biological activity across thousands of datasets that represent disease conditions, drug or other treatments, knockouts and much more in the Analysis Match database. Please note that the Activity Plot feature is included with Analysis Match licenses in IPA.
With this new capability, you can answer questions, such as “Which treatments are predicted to inhibit the epithelial to mesenchymal transition (EMT)? What disease states activate the ILK Signaling Pathway? In which cancer types is STAT3 activated as an upstream regulator?”
The figure below shows the predicted activity of the EMT function across >60,000 analyses in the Analysis Match database. EMT is significant in >3500 analyses (represented by the dots in the plot and indicated in the plot title). A subset of analyses with strong inhibition of EMT (z-score < –2) were selected and further filtered for those of the comparison type “Treatment vs. Control” (green dots in the image).
Many of the compounds that are predicted to inhibit EMT are kinase inhibitors, such as erlotinib, selumetinib, AZD8330, KIN001-043 and others. Selumetinib is a MEK inhibitor and a known inhibitor of EMT (PMID: 28179307). Interestingly, a top scoring analysis (that is not a compound) is an siRNA knockdown of the Q61R NRAS activating mutation, underscoring that an NRAS mutation can drive EMT but can be reversed by knocking down the expression of the mutated gene. Finally, the HDAC inhibitor pracinostat scored strongly and was recently shown to reverse EMT in a breast cancer cell line (PMID: 32109485). Data mining using the the IPA Activity Plot may help you discover novel inhibitors of EMT or other diseases and functions.
An Activity Plot for an upstream regulator is shown below. In the plot, 49 analyses for which SMAD4 is predicted to be activated as an upstream regulator are highlighted in green.
You can also run a quick computation to evaluate whether any particular metadata values are significantly enriched in the selected analyses compared to all of the unselected analyses. Notably, TGF, TGF beta and TGF beta1 (all synonyms of TGF-) were identified as metadata terms significantly enriched in the analyses that activate SMAD4. In each case, the cells or tissue had been exposed to TGF-, and, in each case, SMAD4 was predicted to be activated. This result confirms published research that has identified SMAD4 as a “central mediator” of TGF- signaling (PMID: 29483830)
To enable deeper exploration across ‘omics data for individual genes, expression correlation across genes and visualization of the expression details of Analysis Match datasets, this release brings a large expansion of the integration between QIAGEN IPA and QIAGEN Land Explorer. Now you can seamlessly jump from IPA into more granular sample- and gene-level details in Land Explorer, the web-based portal to OmicSoft’s massive Lands databases of curated disease ‘omics data (>500,000 samples). With this capability, you can navigate from a gene of interest in IPA to quickly discover its tissue or cell expression, the diseases and treatments that cause it to be up-or-down-regulated, the cancers in which it is frequently mutated, the effect of mutations on patient survival and much more.
You can easily answer questions, such as “Is ALAS2 expressed in a certain type of blood cell or cell line? In which types of viral infection is IRF7 upregulated? Is the expression of IRF7 and CXCL10 correlated, and, if so, in what tissues, cell types or disease conditions? In which cancer types is SMAD4 most often mutated, and how does that affect patient survival?”
Links to help you easily answer these questions have been added to IPA Gene Views, connecting you directly to the relevant visualization in Land Explorer. Note that accessing these links requires a Land Explorer license. However, as part of this IPA release you are automatically enrolled in a free 30-day trial of Land Explorer starting March 29th 2020.
Each OmicSoft link in a Gene View leads to a particular data visualization in Land Explorer. For example, the figure below shows the default Land Explorer view when clicking the HumanDisease link in the OmicSoft Differential Expression section for IRF7. Each dot in the visualization represents an analysis corresponding to an Analysis Match dataset, and the dot’s size and position correspond to the statistical significance of IRF7 and fold change in the analysis. Note that many of these comparisons involve treatments and other perturbations.
You can easily limit the results to datasets that are relevant to you by using filters to reveal that IRF7 has been observed to be up-regulated in several types of human infections, in particular, Dengue hemorrhagic fever and influenza.
Another option is to examine the expression of IRF7 in various hematopoietic cells by clicking the BluePrint link from the Gene View. IRF7 is expressed most abundantly in neutrophils.
You can also create visualizations in Land Explorer, such as survival plots, and gene–gene correlation plot, such as the correlation plot for IRF7 and CSF3 shown below. The two genes are clearly expressed in a similar manner; for example, they are both highly expressed in macrophages. In contrast, IRF7 is uniquely expressed in certain memory effector T cells, and CSF3 is present in a particular pancreatic cancer cell line that does not express IRF7.
The figure below is a survival plot for SMAD4 indicating that mutations in this gene reduce the duration of patient survival for pancreatic adenocarcinoma.
Land Explorer offers many more visualizations linked from IPA than can be shown here. Please visit this page for more information: https://digitalinsights.qiagen.com/products-overview/discovery-insights-portfolio/content-exploration-and-databases/qiagen-omicsoft-land-explorer/
Finally, IPA users with both Analysis Match and Land Explorer licenses can navigate to a volcano plot in Land Explorer for each underlying Analysis Match dataset with just one click from Project Search results or from the Analysis Match tab.
Now you can explore fusion-gene biology in cancer by adding known fusion genes to your networks and pathways. There are approximately 500 fusion genes available in IPA today, with Gene Views and interactive nodes that can participate in networks and pathways.
The image below displays a small network created by adding the BCR-ABL1 fusion gene to a blank pathway. A subset of other molecules, pathways and diseases were added to extend the network upstream and downstream of the fusion gene, and the Molecule Activity Predictor or MAP tool was used to simulate adding the drug imatinib to the system. The same types of relationships used for “standard” genes are used for fusion genes, as shown in the figure.
Fusion genes have an associated Gene View, such as the one for NPM1-ALK:
Several Canonical Pathways contain fusion genes; for example, Chronic Myeloid Leukemia Signaling is shown below.
Five new Canonical Signaling Pathways
Addition of Activity Patterns to 10 existing Canonical Signaling Pathways
Now you can quickly discover expected and unexpected commonalities among sets of analyses of interest in Analysis Match using a new capability that detects statistically significant associations in their metadata. For example: Are the analyses that match yours often derived from a particular tissue type, disease state or treatment? Do they tend to derive from a particular mouse strain, or from cells with specific cell surface markers? This approach can help easily identify similarities among matching analyses that may have been previously hidden.
QIAGEN IPA scans across more than 90 metadata fields from the set of repository-based analyses that you select in Analysis Match and performs a calculation to detect potential enrichment among their metadata. Figure 1A shows an Analysis Match result filtered for analyses that strongly match (or anti-match) an analysis of gemfibrozil-treated rats. Gemfibrozil is a classical PPAR agonist. Selecting the matching set (those in the red dotted box in Figure 1A) and then clicking the Evaluate Metadata button generates p-values that are calculated using a right-tailed Fisher’s Exact Test. The results are displayed in a table like the one shown in Figure 1B. The most significant term among the selected analyses is ‘PPAR agonists’ in Figure 1B in the case.subjecttreatment field with p-value = 6.98E-08. Other examples of overrepresented terms are ‘white adipose cell’ and ‘preadipocyte’ in the case.celltype field.
Note that the case.subjecttreatment and case.celltype fields are not shown in the Analysis Match table by default, calling attention to the fact that this new feature sifts through and surfaces metadata which may be initially hidden, due to space constraints in the user interface (UI).
Figure 1: New feature in Analysis Match to discover commonalities among analyses of interest via shared metadata. Figure 1A shows Analysis Match results for the transcriptomics analysis of the liver of rats who were treated with the PPAR-alpha agonist gemfibrozil (RNA-seq data from PMID 25150839). The table has been filtered to retain only the strongest matching (average matching percentage >43) or anti-matching analyses (average matching percentage < -43). The matching analyses enclosed in the red dotted box were selected and the ‘Evaluate Metadata’ button was chosen. Figure 1B shows the results of the enrichment calculation, where the term ‘PPAR agonists’ was found to be highly enriched (p-value = 6.98E-08) among the matching analyses in the ‘case.subjecttreatment field’. This level of significance arose because of the 18 analyses that were selected, three of them shared the ‘PPAR agonists’ term, while there are only nine analyses in the entire set of over 57,000 analyses in the Analysis Match repository with that term. Other examples of overrepresented terms are ‘white adipose cell’ and ‘preadipocyte’ in the ‘case.celltype’ field.
The analyses that were identified as being treated with “PPAR agonists” were specifically treated with tesaglitazar, fenofibrate, or rosiglitazone, which are well-known PPAR agonists.
The metadata results table can be filtered to focus on certain fields or terms of interest. In Figure 2, the metadata evaluation results are narrowed to show only fields involving the ‘case’ samples (rather than the controls).
Figure 2: Filtering the metadata results table. You can filter the results data to focus on certain types of fields or values, such as fields involving the cases rather than the controls.
Note that the computation only considers the metadata in the repository-based analyses. It does not evaluate any metadata that you may have entered for any of your own analyses.
The Build > Grow > Diseases & Functions feature is a powerful way to add biological context to a pathway or network. However, its calculation of statistical over-representation is computationally expensive and often takes 30–60 seconds. In the past, after performing the first ‘Grow to Diseases & Functions’ operation on a network, QIAGEN IPA would repeat the calculation immediately any time nodes were added or subtracted from the network, forcing you to wait for updated statistical results with each change. Now you control when to perform the calculation using the new Recalculate button (Figure 3). You can make numerous changes, and when ready, determine which diseases and functions are statistically relevant.
Figure 3: Recalculate over-representation of Diseases & Functions on demand. Now you can make multiple additions or subtractions to the network or pathway before performing the computationally expensive overlap calculation.
QIAGEN IPA now supports the upload of .csv dataset files. Some upstream software such as 10x Genomics Loupe Cell Browser exports comma-separated data files. QIAGEN IPA now supports their direct import.
Three new Canonical Signaling Pathways
Addition of Activity Patterns to six existing Canonical Signaling Pathways
Now you can enjoy nearly 175,000 new findings (with a total of over 7 million findings), as well as ~350 newly mappable chemicals, including:
Increase data interpretation power and simplify pathway modeling by adding interactive nodes representing Canonical Pathways to networks or pathways. These newly available nodes correspond 1:1 with the Canonical Pathways that have always been in IPA and behave similarly to disease or function nodes. You can connect them to molecules that are part of each particular Canonical Pathway and simulate the effect of activating or inhibiting these molecules on the pathway as a whole. The activity simulation is available only for pathways that have a Pathway Activity Pattern.
Figure 1 shows the TNFR1 signaling pathway as a node on a ‘My Pathway’ connected to its primary activating ligand TNF. IPA’s MAP tool was used to “activate” TNF (shown in red), predicting this would lead to activation of the TNFR1 pathway (shown in orange).
Figure 1. An example of a Canonical Pathway displayed as an interactive node in IPA (connected to its principal activating ligand for purposes of illustration).Each pathway can be linked to the full collection of genes that make up that pathway by using the Build > Grow tool, starting with the pathway node. The interactive pathway diagram that accompanies each Canonical Pathway can be visualized by double-clicking the pathway icon. Note that when using the Grow tool to go from a pathway node to genes, all genes that are part of that pathway are added to the pathway, including those that are members of groups and complexes. If you wish to find all the genes that are included in a pathway for scoring against your dataset, it is best to use the search engine to search for that pathway, and add both the pathway and the nodes to a new ‘My Pathway’. This method will show the groups and complexes that belong to the pathway but are not included in scoring. These nodes can be removed with the Build > Trim tool.
Canonical Pathway nodes can be added to any network to increase interpretability. Figure 2 shows an example of adding Canonical Pathway nodes to an interaction network from a Core Analysis of stem cells differentiating to cardiomyocytes, indicating that several of the molecules in the network are activators of the apelin endothelial signaling and paxillin signaling pathways.
Figure 2. Two Canonical Pathways manually added to an interaction network. Using Build > Grow, Canonical Pathways were added to a pre-existing interaction network from a Core Analysis.
You can also add Canonical Pathways to Regulator Effects networks or include them inside other Canonical Pathways, as shown in Figures 3 and 4, respectively.
Figure 3. Two Canonical Pathways manually added to a Regulator Effects network. Using Build > Grow, Canonical Pathways were added to a pre-existing Regulator Effects network from a Core Analysis. These pathways are predicted to be activated due to the increased activity of the molecules colored in red in the network.
Figure 4. Canonical Pathway manually added inside another Canonical Pathway. Using Build > Grow, Canonical Pathways can be added inside another Canonical Pathway. The MAP tool coloring indicates the added pathway is inhibited (blue color) with this overlaid dataset.
Over 90 pathways in IPA have an existing pathway embedded within them, represented as a single node. Previously, these were shown using a non-interactive node. Now, these “pathways on pathways” are interactive and their activity can be predicted. Figure 5 shows a portion of the CDC42 signaling pathway that embeds two Canonical Pathways (ERK/MAPK signaling and SAPK/JNK signaling) which are predicted to be activated downstream of the CDC42 pathway.
Figure 5. Canonical Pathways already existing inside another Canonical Pathway.Over 90 pathways in IPA already have one or more Canonical Pathways embedded within them. The MAP tool in IPA was turned on to predict the effect of activating c-RAF and the JNK protein family on each of the connected pathways.
When using Build tools such as Grow and Connect, sometimes you need to repeatedly perform the same operations on every network or pathway that you open. For example, you might need to always Grow upstream to transcription regulators. Now, you can make the appropriate selections in the various Build filters and save them as defaults. From that moment on, each new Build tool you use will remember your saved settings. You can always reset your custom settings back to “factory defaults” when needed.
Figure 6. The new “save as preferences” in the context of the Grow tool.The node types of ligand-dependent nuclear receptor and transcription regulator have been saved as defaults. Now, whenever a pathway or network is opened, the Grow tool will add molecules only of that type. The Build Preferences panel in IPA’s Application Preferences will show your saved settings as shown in Figure 7.
Figure 7. The new Build Filters preferences.> The panel is located in File > Preferences > Application Preferences.
Explore new Canonical Pathways related to two important research topics
Enhance your research and discovery of the mechanisms driving the development of lupus and cancer immunotherapy with these new pathway maps:
Now from the Overlay tool, you can search for analyses and datasets to overlay onto networks and pathways, rather than by manually browsing in the Project Manager tree. Furthermore, Analysis Match analyses and datasets from OmicSoft are included in the search results for users with an Analysis Match QIAGEN IPA license.
This workflow enables rapid visual assessment of any analysis or dataset on the pathway or network of interest. The figure below shows the Interferon Signaling Pathway overlain with expression data from mouse lung infected with an Influenza A virus vs. uninfected lung (analysis from GEO dataset GSE36328 as processed by OmicSoft for Analysis Match).
In this release, many more columns in the Core Analysis tabs are filterable, which will help you narrow down and focus your results. Furthermore, now you can use the less than (<) or greater than (>) symbols to tailor the results.
The new filters can be found in the following tabs: Upstream Analysis, Diseases & Functions, Regulator Effects, Networks, and Molecules. Some of the filters also appear in Comparison Analyses.
Metadata values that differ between case and control are now displayed in a table at the top of the metadata panel in the Project Search results. An example is shown in the figure below.
Also, specific fields that are important in understanding the dataset (such as the organism, tissue and platform) have been extracted into a small section called “Comparison Context” that follows the case/control table.
If you have chosen to use the standard QIAGEN IPA case and control metadata keys for your datasets, they will also be automatically displayed in a table and placed into the Comparison Context section.
We are excited to introduce brand new features in the QIAGEN IPA Spring 2019 Release:
QIAGEN IPA can now improve your success of mapping identifiers in your datasets by evaluating more than one column of gene or chemical IDs. Assign up to five columns of IDs, and QIAGEN IPA will scan from left to right across the columns of identifiers and stop (for that row) when it successfully maps an ID.
Mapping across multiple columns of IDs is especially valuable in the case of metabolite (chemical) IDs. Figure 1 shows a dataset during the upload process with four columns of metabolite IDs, which resulted in more rows being mapped than when using any one identifier column alone.
Figure 1: Assigning multiple columns of IDs during upload to increase mapping coverage. This dataset has four columns of IDs that are assigned for mapping. The dataset summary tab is automatically updated each time a new ID column is assigned and its source(s) chosen. In this example, 344 chemicals (rows) were mapped using only one column (HMDB), but when all of the ID columns (HMDB, PubChem, CAS, and KEGG) were used together, QIAGEN IPA scanned from left to right and was able to map 379 of the rows.
QIAGEN IPA now calculates a Benjamini-Hochberg (B-H) corrected p-value for Upstream Regulators and for Causal Networks, increasing the statistical stringency of these results in Core Analyses. The B-H p-value corrects for multiple testing-- the fact that the more statistical tests you run, the greater the chance that you will observe a false positive result. Figure 2 shows the Upstream Regulator tab in a Core Analysis with the new B-H column. Note that these new p-values won’t be present for any analysis that you have run prior to this release. Please re-run previous analyses to calculate the values.
Figure 2: Upstream Regulator tab now has an optional “B-H corrected p-value column”. The column is not shown by default, and you must click the Customize Table button, then tick the B-H corrected p-value checkbox to display the column. In this example, note that the B-H p-values for these regulators are at a B-H statistical significance of ~ 0.01, whereas the standard p-value are approximately three orders of magnitude more significant.
B-H p-values have been available in QIAGEN IPA for Canonical Pathways and for Diseases and Functions for several years, however, the values were not easily accessible for the latter. An optional B-H column is now available in the Diseases & Functions tab as shown below:
Figure 3: The Diseases & Functions tab now has an optional “B-H corrected p-value column” in the table. The column is not shown by default, and you must click the Customize Table button then tick the B-H p-value checkbox to display the column.
The Diseases & Functions TreeMap can be visualized using the B-H corrected p-value. The rectangles can be colored by and/or sized by the -log of the B-H p-value, as shown below in Figure 4.
Figure 4: The Diseases & Functions TreeMap can be visualized using the B-H corrected p-value. Use the menus (highlighted above) to color and/or size the heatmap by the -log of the B-H p-value.
The B-H statistics are also available in Comparison Analysis for your analyses that are run (or re-run) after this release, and are calculated for all Analysis Match analyses as well.
The Help menu in QIAGEN IPA now has a quick link to a set of video tutorials to help you get started with how to use QIAGEN IPA. The topics range from how to format and upload your data, how to analyze your data, and how the p-values in QIAGEN IPA are calculated:
Figure 5: New Video Tutorials link in QIAGEN IPA’s Help menu. The link leads to a new Help Portal page with a set of videos to help you use QIAGEN IPA.
We are excited to introduce brand new features in the QIAGEN IPA Winter 2018 Release:
Now you can examine detailed expression patterns across human tissues directly from QIAGEN IPA’s Isoform Views. QIAGEN IPA now offers access to a lite version of OmicSoft Land Explorer. With this new feature, you can provide interactive plots of gene expression in 51 different human tissues from the GTEx project, for both gene level and individual splice variants. You can filter the view for a particular tissue, or filter on metadata, such as tissue donor age or gender. You can also download the detailed sample-level expression data for the gene.
QIAGEN IPA users can access the lite version of Land Explorer at no extra cost and does not require registration or manual sign-in. For broader access to hundreds of thousands of samples from healthy and disease tissue, please take a tour of the full OmicSoft Land Explorer (launching soon!).
Figures 1-3 demonstrate how you can access a lite version of Land Explorer via QIAGEN IPA for no extra cost. The figures show how the FABP4-201 isoform of FABP4 (the longest protein-coding isoform of the Fatty Acid Binding Protein 4 gene) is expressed at higher levels in adipose and breast tissues than in other tissues.
Figure 1. Navigate to sample-level human tissue expression for human genes via links in isoform view. Click the link (shown in the red box) to view Land Explorer via the QIAGEN IPA web page that plots the expression of the isoforms (splice variants) of a human gene in 51 different human tissues. Gene-level expression is also available in Land Explorer.
Figure 2. View of human isoform-level expression in human tissue samples for FABP4. The underlying RNA-seq data were reprocessed by OmicSoft (a QIAGEN company) from raw fastq files obtained from the GTEx consortium, and represents the expression of the isoforms of a particular gene in >8000 samples harvested from one of 51 different human tissues. Each chart displays the expression for one human transcript ID (either RefSeq, or Ensembl as shown above) where each circle represents the quantity of RNA (in FPKM) in one particular tissue sample. The pink bars show a box plot that summarizes the distribution of FPKM in that tissue or set of tissues.
The plot can be switched to show gene-level expression as well, as shown below in Figure 3.
Figure 3: Land Explorer Views can be switched to show gene-level rather than isoform-level expression. (1) The menu at the top middle of the screen can be used to switch to “Gene FPKM” as shown. (2) There are a number of filters available as well in the Add Filter menu. (3) Note that by default the tissues are grouped into similar types. For example, there is initially just one “row” for brain as shown above. Use the Grouping menu to choose “Tissue Detail Type” to expand to show all the individual tissues.
Create and open QIAGEN IPA Comparison Analyses much more quickly and add statistical stringency to your Comparison Analyses with the Benjamini–Hochberg correction. B-H corrected p-values are now available for display and filtering in Canonical Pathways and Diseases and Function tabs, as shown below in Figure 5.
Figure 4: Comparison Analyses can now be created and reopened more quickly than in prior releases.
Figure 5: Benjamini-Hochberg corrected p-values are now available in Comparison Analysis for display and filtering. In both the Canonical Pathways tab and the Diseases & Functions tab, you can color the heatmap squares by B-H p-value and can use the filter as shown to hide rows that don’t meet a particular cutoff that you enter.
• FAT10 Cancer Signaling Pathway
• T Cell Exhaustion Signaling Pathway
~38,500 new Expert findings
~400 new ExpertAssist findings
~50,800 new cancer mutation disease association findings from COSMIC
~1300 new ontology findings from GO
~2100 new disease-to-target findings from ClinicalTrials.gov
~1500 new drug-to-disease findings from ClinicalTrials.gov
~9000 new protein-protein interactions from the BioGRID database
~700 new protein-protein interactions from the IntAct database
~160 new mouse knockout-to-phenotype findings from MGD (JAX Labs)
~150 newly mappable chemicals
The Analysis Match repositories will be updated in QIAGEN IPA on Jan 4th, 2019. There will be over 3,500 new Analysis Match datasets in this release, as outlined in Table 1.
Analysis Match enhances interpretation and drives discovery by placing your dataset in the context of thousands of QIAGEN IPA analyses that have been processed from data from public sources using Array Suite.
Powered by QIAGEN IPA Advanced Analytics, Analysis Match automatically identifies the analyses of curated datasets that have significant similarities and differences, enabling you to compare results, validate interpretation and better understand causal connections between diseases, genes, and networks of upstream regulators.
Table 1:>52,000datasets will be available in QIAGEN IPA Analysis Match in this release (on Jan 4th, 2019).
QIAGEN IPA Core Analysis now opens much more quickly! Just double click the analysis icon as usual and the analysis will open into a ready-state much faster than in prior releases.
Note: A change has been made in the information that is displayed in the molecules tab. The tab now lists all of the molecules in the original dataset and indicates (in bold in the Symbol column) those that are "analysis ready", meaning they passed filters and cut-offs and were therefore submitted for analysis.
There are nearly 1,800 new Analysis Match datasets in this release (see below).
Analysis Match enhances interpretation and drives discovery by placing your dataset in the context of thousands of QIAGEN IPA analyses that have been processed from public sources using Array Suite.
Powered by QIAGEN IPA Advanced Analytics, Analysis Match automatically identifies the analyses of curated datasets that have significant similarities and differences, enabling you to compare results, validate interpretation and better understand causal connections between diseases, genes, and networks of upstream regulators.
Table 1: >49,000 datasets are available in QIAGEN IPA Analysis Match in this QIAGEN IPA release. *Redundant LIMMA and Voom-based datasets have been removed in the RatDisease repository. Now GLM and DeSeq2 are used exclusively for microarray and RNA-seq data, respectively.
QIAGEN IPA can now predict metabolic activities in a dataset using its entire collection of more than 300 metabolic pathways. The prediction is based on the set of up and down regulated molecules in your datasets and the directionality of the metabolic pathway itself. See Figure 1 below which shows the Canonical Pathways tab in a Core Analysis, with metabolic pathways marked with red arrows. The orange color of the bars indicated they are predicted to have increased activity in this dataset.
Figure 1: The Canonical Pathways tab in a Core Analysis highlighting metabolic pathways with orange bars. These pathways are predicted to be activated in this analysis of kidney RNA of mice treated with the NRF2 activator CDDO-me (vs. DMSO, PMID 26422507).
The methodology QIAGEN IPA uses to predict the metabolic activity from a dataset in Core Analysis is described here. QIAGEN IPA can predict metabolic activity from your differential gene expression dataset, differential metabolomics dataset, or a dataset where you have concatenated both differential gene expression and differential metabolite concentrations into one “observation”.
The metabolic pathway activity scores contribute to Canonical Pathway signatures in Analysis Match*, as shown below in Figure 2.
Figure 2: Heatmap in Analysis Match filtered to show only the Canonical Pathway scores. Several metabolic pathways participate in the signature as shown.
*Analysis Match requires additional licensing. Please contact us at AdvancedGenomicsSupport@qiagen.com for info.
There are 1,100+ new analyses for Analysis Match in this release, bringing the total available in QIAGEN IPA to >8,000. This includes two new repositories, RatDisease (under DiseaseLand) and Pediatrics (under OncoLand). Table 1 compares the repositories and their respective sizes in this release versus the prior one.
Table 1: Comparison of the number of datasets and repositories in this release (green color) to the prior release (red color). There are over 1,100 new datasets and their corresponding analyses in the current release.
Analysis Match* automatically discovers other QIAGEN IPA Core Analyses with similar (or opposite) biological results as compared to yours, to help confirm your interpretation of the results or to provide unexpected insights into underlying shared biological mechanisms across experimental situations. QIAGEN IPA matches your analysis against other analyses you have created (in your Project Manager) as well as thousands of other human and mouse expression analyses curated from public sources. This “analysis-to-analysis” matching is based on shared patterns of Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions.
In this release, improvements to Analysis Match enable you to more easily control which of the Lands are used in the matching, and the detailed results in the heat map are more easily interpreted and available for follow up. You can now manually add experiment metadata to your own datasets to label them more clearly in the Analysis Match table and to find them using Project Search.
Fig 1. Filtering the Analysis Match results by source (Land). Use the enhanced Project menu in the Analysis Match tab to choose which Lands you would like to use for matching. Click on one or on multiple repository names to select them. You can also include your own projects by expanding the My Projects tree and clicking on your project’s name(s). Or use the radio button and switch to doing a free text search by project name (i.e. Land name). MetastaticCancer is a new Land in this release.
Fig 2. New option in the Analysis Match heatmap to indicate signature entities that are NOT significant in the other analyses. The heat map in Analysis Match is constructed by showing all the signature entities from the analysis you opened (the analysis of interest), using color to represent each entity’s z-score in that analysis as well as in the other analyses you selected when you created the heatmap. However, although the heat map square for a particular entity in another analysis may be colored orange or blue, its underlying z-score may be too small to be considered significant. Now you can mark such instances as insignificant as shown above. In this example, a threshold of “2” was entered in the “Insignificance Threshold” field to label with a dot those heat map squares which have a smaller value than that threshold (i.e. <2), enabling you to visually ignore the insignificant z-scores.
Fig 3. Explore a signature entity’s underlying network by clicking on a heatmap square. By clicking on a heatmap square in the Analysis Match heatmap, you can now display its underlying network or pathway. As shown above in part A, clicking on the ACKR2 heat map square in the first column displays its network in the right panel. The molecules from the dataset are shown in the Molecules tab (part B above), and clicking on the name of an analysis in the header of the heatmap will display a tab showing the analysis’ metadata (if it has any) in the Metadata tab (part C above). See Figure 6 below to see how you can enter metadata for your own datasets.
Fig 4. Conveniently select a set of signature entities in the Analysis Match heatmap for further exploration via the row or column dendrograms. To explore a set of related signature entities, select them as a group by clicking on their dendrogram. For example, the top-most cluster of entities (rows) was clicked to select a group of related signature entities. The selected group can be sent to a new pathway or a new list by using the buttons along the top of the heat map. Or the selection can first be modified by command-clicking (Mac) or control-clicking (Windows).
Fig 5. Select a set of analyses for further exploration in a full Comparison Analysis. Select a set of related analyses by clicking on their cluster in the column dendrogram. As shown above, a cluster of analyses (columns) was selected by clicking on the portion of dendrogram above them. The analyses can then be viewed more fully by clicking on the View Comparison button. Up to 20 analyses can be viewed in a Comparison Analysis. The selection can first be modified by command-clicking (Mac) or control-clicking (Windows) to limit to <20.
Now you can annotate your uploaded datasets with information that will help you quickly find those datasets (or analyses created from them) using project search, or help you to remember details about them when interpreting the results of their analysis. This is especially useful in the context of Analysis Match, where metadata from the dataset can be displayed in columns in the Analysis Match tab.
When you upload your dataset, you can enter relevant metadata about it in the QIAGEN IPA user interface. For example, you could annotate them by leveraging existing OmicSoft fields such as “case.disease” or “case.tissue” by typing in values such as “asthma” or “lung”, or create your own custom fields to annotate. For example, you could create a new field called “eNotebook record” and enter a clickable hyperlink that points to an internal online record about the experiment that led to that dataset, or create a field called “Collaborators” and put in names of colleagues involved with that dataset. The metadata you add to a dataset is automatically propagated to any Core Analysis created from it. Keep in mind that the metadata you enter is for your purposes only, and is not used by QIAGEN IPA to influence the analysis results. Figure 6 shows how you can enter metadata for a dataset.
Figure 6. Entering metadata for a dataset. Existing keys from OmicSoft can be used, or you can create a custom field as shown above. In this instance, a new field called “Hyperlink to paper” was created and a hyperlink was pasted in (control-v). Other metadata was added as well such as tissue type, disease state etc. The metadata will propagate to any Core Analysis created from this dataset.
Figure 7. Searching for datasets and analyses using the metadata you entered for the dataset. In this example, an analysis was found using the keyword “GSE11352”, which had been entered as metadata in the OmicSoft field “projectname” for the dataset. In this example, there are also OncoGEO analyses with that same GSE#.
Metadata can be added or edited either before or after saving the dataset file. It is also possible to insert metadata at the top of the dataset text or Excel file itself before you upload it, by following instructions here. This is especially useful when you wish to enter a large amount of metadata or if you have many similarly derived datasets that have mostly the same metadata. In this release, you can edit that uploaded metadata in the metadata tab (during upload), or after saving and re-opening it.
*Analysis Match requires additional licensing. Please contact us for info.
QIAGEN IPA now gives you more flexibility to use your creativity to build and modify networks and pathways. You can globally select nodes on pathways by additional criteria to take further actions on the nodes. Specifically, you can highlight or select nodes by their overlay and by their connectivity. For example, if you have overlaid expression fold change values, you can first select only the up-regulated genes and move them all at once to a different place on the network canvas, and do the same for the down-regulated nodes. Or you can select all the unconnected nodes and delete them. Or you could highlight the most highly connected nodes in the network.
Figure 8. Highlighting or selecting nodes via their overlay. The Highlight menu in the Overlay tools has been renamed to “Highlight or Select” because you now have the choice to either highlight or to select nodes meeting your criteria. Highlighting means coloring the borders of the nodes purple (the “Outline” option in the menu at the bottom right of the window) or filling them with a dark blue color (the “Fill” option in the menu at the bottom right of the window). Selecting means coloring their borders blue (using the “Select” option in the menu at the bottom right of the window) to put them in a state where you can do further actions on them, for example deleting them or moving them around on the pathway canvas as a group. In the example above, nodes with no values in the overlaid dataset (i.e. white colored nodes) are selected as a group.
Figure 9. Highlighting or selecting nodes via their connectivity. The new Node Connectivity filter is used to select nodes via how connected they are to other nodes on the network or pathway. As shown above, nodes connected to >6 other nodes were selected. This resulted in the 3 most highly connected nodes (“hubs”) being selected.
Figure 10. Trimming nodes via their connectivity. The Node Connectivity filter is also available in Trim and Keep in the Build menu. In this example, the Node Connectivity filter is used in the Trim tool to remove all unconnected nodes.
Separate up and down cutoffs must now be entered (rather than a single absolute value) for directional measurement types such as fold change or log ratio. This gives you more control over the makeup of the set of molecules that QIAGEN IPA analyzes from your dataset, as compared to using a single absolute cutoff. Figure 11 below shows an example of this.
Figure 11. Set separate up and down cutoffs for Core Analysis. Now when setting up a Core Analysis, when you use a cutoff for a directional measurement (those with both positive and negative values like fold change or log ratio), you must enter a separate value for a negative and positive cutoff. As shown above in this example, a cutoff of -1.5 and 3 is used for Expr Fold Change for down and up, respectively. This means that genes with expression fold changes >-1.5 and <3 will not be used in the analysis. Notice that the counts of “down genes” vs. “up genes” that survive the cutoffs are displayed next to the recalculate button and indicated in the image above with red arrows.
September 30, 2017
Analysis Match* automatically discovers other QIAGEN IPA Core Analyses with similar (or opposite) biological results as compared to yours, to help confirm your interpretation of the results or to provide unexpected insights into underlying shared biological mechanisms. It matches your analysis against other analyses you have created (in your Project Manager) as well as thousands of other human and mouse expression analyses curated from public sources. This “analysis-to-analysis” matching is based on shared patterns of Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions.
With this new capability, you can:
The analyses included in Analysis Match were generated in QIAGEN IPA from more than 6,000 highly curated and quality-controlled human and mouse disease and oncology datasets re-processed from SRA, GEO, Array Express, TCGA and more. These datasets were generated by QIAGEN’s recently acquired company, OmicSoft, and are the “comparisons” found in DiseaseLand and OncoLand representing various contrasts between disease and normal, treatment vs. non-treatment and much more.
Figure 1 shows the new Analysis Match tab from one of QIAGEN IPA’s Example Analyses based on the expression data derived from mouse lung exposed to welding fumes. The results in the figure have been filtered to show only the highest scoring results against all the analyses in the OmicSoft repository within QIAGEN IPA. Of the more than 6,000 in the repository, 125 analyses had an overall score of >60% or <-60%, corresponding to strongly similar or dissimilar patterns, respectively. You can further filter the results in a number of ways, for example by type of comparison, by disease state, tissue, and much more. The keyword filtering is possible because each analysis has been extensively annotated by OmicSoft using a controlled vocabulary which can be displayed in columns as shown in figure 1. Only a few columns are shown in QIAGEN IPA by default due to screen space limitations.
The analyses are matched based on a set of signatures that are created for each analysis, namely one signature for the Canonical Pathways, one for Upstream Regulators, one for Causal Networks, and one for Diseases and Functions. Each signature is used independently to match against other analyses, and an overall average is computed.
*Analysis Match requires additional licensing. Please contact us for info.
Fig 1. Analysis Tab displaying matching analyses. By default, the analyses are ranked from most similar to least similar based on the overall similarity score (the right-most column shown above). The analyses are matched based on a set of signatures that are created for each analysis, namely for Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions. Each signature is used independently to match against other analyses. In the image above, each of the first four colored columns at the right represents the percentage similarity of each type of signature to the analysis you opened. The fuchsia color indicates similarity (shown here) and cyan color indicates dissimilarity (not shown here). The first scoring column (“CP”) is the match for the Canonical Pathway signature, the second (“UR”) is for Upstream Regulators, the third (“CN”) is for Causal Networks and the last (“DE”) is for Downstream Effects (i.e. Diseases and Functions). The final column shown above is the average of those four signature matches. More detail on the signature scoring algorithm can be found here. Note that some of the columns normally shown by default in the Analysis Match tab have been hidden in this figure.
As shown in Figure 1, the analysis with the best overall match from the repository is an expression analysis from mouse lung exposed to heat killed influenza virus (from GSE41684), which has strong similarity across all 4 signature types. The next step is to explore the signatures themselves across all or a subset of matching analyses, to understand in more detail which “entities” (the set of upstream regulators, canonical pathways, etc.) drove the similarity scoring. In this example, the matching analyses were further filtered to limit to the repository folder called “MouseDisease” which retained 75 of the analyses, and a heatmap was created by clicking the View as Heatmap button. Figure 2 shows this heatmap, where the rows are the entities from the four signatures with columns for the 75 similar (and dissimilar) analyses. The z-score for each entity from each analysis is represented in the cells with an orange or blue color (for positive and negative z-score respectively).
Fig 2. The heatmap of the signatures vs. the matching analyses reveals similarities and differences. The ”4 hr lung” analysis (highlighted in pink above) by definition has a significant z-score for every entity that is listed in the left column, because those entities represent the union of all 4 types of signatures derived for that analysis. The other selected analyses are shown for reference and may or may not have a significant z-score for each entity. The rows and columns were clustered using agglomerative clustering with Euclidean distance and average linkage (UPGMA linkage).
The heatmap is filterable to enable you to focus on the types of entities of interest to you. Figure 3 shows the heatmap filtered for upstream regulators which are classified as transcription regulators. The clustering of the rows reveals which transcription regulators have similar patterns across the analyses, whereas the clustering of the columns shows which analyses are most closely correlated to one another based on the underlying transcriptional regulator pattern.
Fig. 3. Analysis Match Heatmap filtered to show only upstream regulators which are classified as transcription regulators. The heatmap offers several filters to enable you to explore the nature of the signatures. Clicking on a column header for an analysis in the repository displays its metadata at the right side of the window as shown.
The clustering of the entities (the rows) can reveal interesting similarities among the entities. For example, after removing the prior filter in order to show all the entities, Figure 4 shows that the drug bexarotene clusters closely with the “PPAR/RXR activation” canonical pathway in a larger cluster containing CR1L, ALDH1A2, SUMO1, and ABCB4. Bexarotene is an RXRA and RXRB agonist, providing a rationale why it tightly correlates with this pathway in the heatmap. SUMO1 is a regulator of PPAR activity, whereas it is not as clear why the other entities appear in this cluster, an observation which could provide interesting avenues of investigation.
Fig.4. Heatmap showing a cluster which contains both an upstream regulator and a canonical pathway. Tight clustering of entities may reveal correlations that may be of biological interest.
You can select and send entities (except Canonical pathways) to a My Pathway for further analysis, for example to connect nodes together or to discover drugs that target them.
Another valuable way to use the OmicSoft analysis repository is to start by finding analyses of interest by using QIAGEN IPA’s Dataset and Analysis Search by entering keywords such as disease name or tissue. Figure 5 below shows a search for human asthma analyses but excluding those involving albuterol. From search results like these, you can double click to open an analysis, or select up to 20 to visualize in a full comparison analysis.
Fig 5. Discovering analyses of interest using Dataset and Analysis Search. The query “human AND asthma NOT albuterol” finds 136 analyses with those keywords in the OmicSoft repository in QIAGEN IPA. Double-click to open one or create a Comparison Analysis with up to 20. Metadata about the selected analysis (or analyses) is displayed on the right side of the search screen.
The repository of datasets and analyses are stored in QIAGEN IPA’s Libraries folder in the project manager as shown in Figure 6. Note that these are read-only and cannot be exported out of QIAGEN IPA.
Fig 6. OmicSoft repository in QIAGEN IPA with over 6000 datasets and corresponding datasets and analyses. The repository is read-only and cannot be exported out of QIAGEN IPA.
Analysis Match combines literature-powered causal analytics from QIAGEN IPA with a massive dataset collection provided by OmicSoft, creating a unique opportunity for you to make biological discoveries.
Changes in the phosphorylation states of proteins provide an important regulatory mechanism in mammalian cells. Now you can get more from your phosphoproteomics datasets in QIAGEN IPA with a new Phosphorylation Core Analysis*.
Discover upstream regulators and causal network master regulators that may be driving the changes in phosphorylation levels of the proteins in your phosphoproteomics dataset. These results provide testable hypotheses by identifying potential upstream signaling cascades from the phosphorylation patterns in your dataset.
To illustrate this new feature, we analyzed a phosphoproteomics experiment obtained from the literature, in which insulin was applied to starved mouse adipocytes that had been differentiated from 3T3-L1 cells in vitro (PMC3690479). Phosphorylated proteins were isolated from the cells by the authors during a time course of 15 seconds to 1 hour.
As shown below in Figure 1, after 15 seconds of insulin exposure, a characteristic phosphorylation pattern is established in these adipocytes highlighted by the fact QIAGEN IPA predicts insulin (gene symbol Ins1 below) as one of the top predicted upstream regulators which is activated.
Fig 1. Upstream Regulator Analysis. The pattern of differentially phosphorylated proteins in the dataset of insulin- treated cells was used to predict the responsible upstream molecules.
Fig 2. The Ins1 Upstream Regulator network in the 15 second time point. Insulin is a top upstream regulator predicted to be “activated” based on the pattern of phosphorylation of insulin targets in adipocytes treated with insulin for 15 seconds. Proteins with red fill color have increased phosphorylation relative to the untreated control, and the green node have relative decreases in phosphorylation. Clicking on the badge next to each protein displays the differentially phosphorylated peptides that were uploaded in the dataset (as shown for the insulin receptor, INSR).
Figure 2 indicates there is a positive phosphorylation relationship (orange line) between Ins1 and GAB1. This is supported by a paper that showed that in differentiated 3T3-L1 cells, insulin can increase the phosphorylation of GAB1. For the relationship between Ins1 and STAT3, a different paper showed that insulin can increase the phosphorylation of Stat3 in RAW 264.7 cells (see Figure 3 below).
Fig 3. Examples of phosphorylation findings curated from the literature in the QIAGEN Knowledge Base. Both indicate that insulin can increase a target protein’s phosphorylation (indirectly through unspecified mediators).
Causal Network Analysis predicts regulatory networks to explain phosphorylation changes exhibited in a dataset. Causal Network Analysis enables the discovery of novel regulatory mechanisms by expanding upstream analysis to include regulators that do not yet have known “direct” connections to the targets in your dataset.
For example, stimulating adipocytes with insulin is predicted to activate the master regulator FLT1 (also known as the vascular endothelial growth factor receptor 1) after 15 seconds of exposure. In this causal hypothesis, FLT1 is predicted to drive the activity of nine other regulators which in turn drive changes in the phosphorylation of a larger number of dataset proteins as shown below in Figure 4.
Fig 4. Causal Network Analysis. FLT1 is predicted to activate or inhibit several intermediate regulators leading to the changes in phosphorylation in dataset proteins.
If you’re an existing customer, launch QIAGEN IPA from your desktop and check out the new features. If you need to install QIAGEN IPA, click here.
Changes in the phosphorylation states of proteins is an important regulatory mechanism in cells. Now you can get more from your phosphoproteomics datasets in QIAGEN IPA with improvements to phosphorylation data upload and visualization.
Last September the QIAGEN IPA Fall Release added a new data type to QIAGEN IPA to support the upload of protein or gene IDs along with corresponding phosphorylation increases or decreases represented as fold change (or log ratio). With this December release you can now upload the corresponding individual phospho sites for display on networks and pathways. These can be represented with any text you wish; such as the actual phosphorylated peptide, e.g. _FSSS(ph)QPEPR_ as shown in Figure 1 below, just a residue number (e.g. Y347), or any combination of text and numbers.
1) Visualize multiple differentially phosphorylated sites (phospho peptides) on networks and pathways.
Fig 1. Display multiple phospho sites from an uploaded “phospho” dataset. Top image: The small badge at the top right of the node indicates how many phospho sites are in the dataset or that passed your cutoffs in an analysis (depending on whether a dataset or analysis is overlaid). In this example, two phospho peptides for Chk1 passed the analysis cutoff for Phospho Fold Change. Clicking the badge shows the differential phosphorylation as a heat map alongside the phosphorylated peptide sites (if uploaded in the dataset). Bottom image: Example of phosphorylation sites uploaded in the dataset (right column).
2) Easily identify the proteins on networks and pathways where QIAGEN IPA predicts that increases in phosphorylation inhibits their activity or where decreases in phosphorylation increases their activity. The activity of certain proteins is more likely to be inhibited by phosphorylation than activated by it. In the example below the Molecular Activity Predictor, with overlaid phospho data, indicates this by using blue or orange halos to indicate the predicted activity.
Fig 2. MAP (Molecule Activity Predictor) now uses colored halos around nodes on networks and pathways to indicate the activity for proteins which are inhibited by phosphorylation. Phosphorylation fold change data has been overlaid on CFL1 and GSK3B. CFL1 has increased phosphorylation in this dataset and MAP indicates that its activity is inhibited with the blue halo. GSK3B has decreased phosphorylation in the dataset and MAP indicates that it is likely activated using the orange halo. The full list of proteins where phosphorylation is expected to be inhibitory is available here in the QIAGEN IPA help portal.
Get more from your phosphoproteomics datasets in QIAGEN IPA. If you’re an existing customer, launch QIAGEN IPA from your desktop and check out the new features. If you need to install QIAGEN IPA, click here.
RNA sequencing technologies can generate datasets with thousands of differentially spliced transcripts. IsoProfiler helps you determine which isoforms have interesting biological properties relevant to your research project.
Fig 1. Overview of IsoProfiler, with highlights indicating the new features. IsoProfiler can visualize one or more transcript-level RNA sequencing datasets in a single view and enables you to filter and sort to focus on isoforms that have biologically relevant attributes. The top right table shows each gene in your dataset with its associated transcripts and expression data. When a gene is selected, the bottom right table shows the specific isoform-level details for that gene. 1) A new column displaying diseases and functions known to be associated at the gene-level (as well as at the isoform level) has been added to the top table. This may help you identify the specific isoforms in your experiment that drive the known gene level associations. 2) New filters have been added to search for specific gene name or specific disease and function terms that are pertinent to your dataset(s). See Figure 2 for additional details. 3) New dynamically re-sizable schematics of the isoforms are now displayed in the lower table for the gene selected enabling you to see the overall splicing pattern of each transcript.
Fig 2. Gene-level Disease or Function filter in IsoProfiler. Simply start typing in the text box to focus the list down to relevant filters. In this example, “epith” has been typed, which instantly limits the list of filters to terms like “chemotaxis of epithelial cells”, etc. The same type of filter is now also provided for isoform-level diseases and functions.
IsoProfiler is available in QIAGEN IPA with Advanced Analytics.
Enhance your multi-omics research approaches by uploading simplified phosphoproteomics datasets to QIAGEN IPA for overlay onto networks and pathways. In a first step to better support the understanding of phosphorylation state and the associated biology, a new “phospho” measurement type is being introduced with this release of QIAGEN IPA. Overlay phosphorylation and expression profiles on networks and pathways to identify key areas where phosphorylation is impacting the biological activity of the encoded proteins.
If you have performed both gene expression and phosphoproteomics profiling, you can visualize both of these data types simultaneously as bar charts on networks and pathways. Figure 3 below shows the upstream regulator MAPK1 which QIAGEN IPA predicted to be activated by alpha-toxin (hemolysin) treatment of S9 cells. This prediction was based on a Core Analysis of the gene expression data after exposure to the toxin. The expression data shows that MAPK1 is not itself differentially expressed, but overlaying the accompanying phosphoproteomics dataset on the MAPK1 network provides a possible mechanism for its activation—MAPK1’s phosphorylation level is increased which is likely to activate it and lead to the observed expression changes downstream. In Figure 3, you can see in contrast that JUN is both upregulated and exhibits higher protein phosphorylation after the treatment.
Fig 3. Upstream Regulator Network for MAPK1 with expression and phosphorylation data overlaid. MAPK1 is differentially phosphorylated, which may explain its predicted activation as a regulator of the expression of the genes connected to it in the network. In contrast, JUN is both phosphorylated and differentially expressed. The microarray and phosphoproteomics data used in this figure was obtained from http://dx.doi.org/10.1371/journal.pone.012208
RNA sequencing technologies can generate datasets with thousands of differentially spliced transcripts. IsoProfiler helps you determine which isoforms have interesting biological properties relevant to your research project.
Fig 1. Overview of IsoProfiler. Visualize one or more transcript-level RNA sequencing datasets; filter and sort to focus on isoforms that have biologically relevant attributes. The top table shows each gene and their associated transcripts while the bottom table shows isoform-level details for one gene at a time (based on the row you select in the top table). Click on the plus (+) sign in the left filter panel to display filter options that can be added. In the example shown above, the dataset is filtered for isoforms with fold change less than -2 or greater than +2, and only shows genes where isoforms are both up and down regulated in the dataset. Transcripts are represented as circles in the Expression Patterns column in the top table, with green circles indicating down regulation and pink or red circles corresponding to upregulated transcripts. The size of the circles represents the abundance of expression (for example RPKM) if you have included at least one such column in your dataset—larger circles have higher abundance transcripts.
Fig 2. Compare up to 20 transcript-level datasets in IsoProfiler. In this example, human endometrioid endometrial carcinoma (EEC) and hepatocellular carcinoma (HCC) RNA-seq datasets are compared. The results are shown after using IsoProfiler to set expression value cutoffs, filter for protein-coding isoforms, and keeping only those genes with isoforms in the dataset that have known disease and function associations.
Drill-down into the “IsoProfiler Findings” view to explore the details about the isoforms that have disease or biological function findings captured from the literature. This is done by selecting rows (or all rows) in the top table and clicking the IsoProfiler Findings button at the top of the table. This will open a special window as shown in Figure 3. Only isoforms with disease or function associations will appear in this window. This table enables filtering on findings-level details using the funnels, or filters, above each column.
Fig 3. Explore the details of isoform-level disease and function associations. Filter and explore the findings that connect isoforms to disease and functions.
IsoProfiler is part of Advanced Analytics.
Identify significant trends in genes involved in a pathway or network across conditions such as time or dose and elucidate possible mechanisms driving gene expression results with both variant gain or loss of function and expression results. Visualize multiple ‘omics datasets simultaneously on QIAGEN IPA networks and pathways.
Fig 1. Three RNA-seq time points taken during in vitro mouse cardiomyocyte development overlaid on the Integrin Signaling Pathway (zoomed in).
As the cells differentiate from embryonic stem cells into beating cardiomyocytes in vitro, a number of genes on this pathway are progressively upregulated. Several genes in the myosin subunit regulatory light chain family are upregulated over the time course. The new bar charts can show multiple measurements and datasets at one time to give you more insight into the details of the differential expression. In this example both the RNA-seq fold change and the intensity (RPKM) across the three analyses are shown. From this visualization, one can deduce that Myl7 becomes much more highly expressed than Myl2 (RPKM ~3800 vs ~115), even though Myl7 has a lower fold change than Myl2 (~955 vs. ~19,149). The fold changes alone don’t reveal this level of detail across the time points.
QIAGEN IPA also presents the multi-dataset / multi-measurement results in a table view that can be exported. Figure 2 shows an example of a portion of that table.
Fig 2. Clearly identify trends across genes, conditions, and datasets with the exportable table view.
The same genes shown in Figure 1 above are shown here in the new table view within the Overlay Datasets, Analyses & Lists tool, though in this table a line is drawn to connect the bars when possible to help visualize patterns.
Elucidate possible mechanisms driving gene expression results by simultaneously overlaying both gene expression analysis and variant loss/gain datasets on a pathway or network. In this way you can see which genes are differentially expressed and harbor potentially deleterious variants.
Fig 3. Uncover possible mechanisms driving gene expression results. RNA-seq gene expression data from three hepatocellular carcinoma (HCC) patients was used to predict that the NONO protein is inhibited. Expression from the three patients was processed in Biomedical Genomics Workbench (BxWB) and then analyzed in QIAGEN IPA, which led to the prediction of NONO inhibition using Causal Network Analysis. Variants were also called on the transcript sequences from these patients using BxWB and analyzed using Ingenuity Variant Analysis. All three patients were found to have potentially deleterious frameshift and missense variants in the NONO gene. Data from both BxWB and Variant Analysis were exported directly to QIAGEN IPA. The three green bars in Figure 3 correspond to predicted loss of function variants for each of the patients, and the red bar indicates that the expression was upregulated in the patients, perhaps as a compensatory mechanism for loss of function. NONO has been found to be mutated in a number of cancer types.
Quickly see which diseases, functions, and phenotypes are associated with differentially expressed isoforms in your RNA-seq experiment using QIAGEN IPA’s new IsoProfilerBETA. Get early access to IsoProfiler as part of Advanced Analytics.
Simply filter to determine if certain isoforms (splice variants and their products) are known to drive a disease or process. For example, Figure 1 shows isoforms driving metastatic processes in a human breast cancer RNA-seq dataset.
Fig 1. IsoProfiler results. The table displays all the isoforms that have a curated relationship to a biological function, phenotype, or disease. In this example, the table has been filtered to display the isoforms known to be involved in metastasis. This isoform of ADAM12 is upregulated in the dataset, providing an avenue of experimental inquiry – perhaps this short form is responsible for the aggressiveness of these breast cancer cells.
Fig 2. ADAM12 isoform view shows that a shorter isoform, ADAM12S, is upregulated in the breast cancer cells, with a fold change of 66.3.
Import genetic gain/loss information for a set of genes and predict the variant effect on diseases, functions, phenotypes and canonical pathways. QIAGEN IPA now supports a new data type for gain or loss of function variants that result from genome or transcriptome sequencing data.
Overlay Gain or Loss of function variant values onto genes on networks and pathways to display their effects on genes and use MAP (Molecule Activity Predictor) to compute the impact on neighboring connected genes.
Fig 3. Gain or Loss of function variants (green-colored nodes indicating loss of function variant) in genes on the ERK5 Signaling Pathway could lead to increased cell survival and decreased gene expression in this endometrioid endometrial carcinoma analysis.
Combining Gain or Loss of Function variant data with expression data unlocks the ability to investigate whether upstream regulator predictions based on expression data may in fact derive from variants that activate or inactivate the regulator itself.
Using Upstream Regulator Analysis, if there are cases where an upstream molecule has been predicted to be activated or inhibited, you can quickly discover if the gene for that regulator has a corresponding gain or loss of function variant.
Fig 4. Upstream regulator analysis of an endometrioid endometrial cancer patient (tumor vs. normal adjacent tissue). The result shows that the NFKBIA protein is predicted to be an inhibited upstream regulator AND has a likely loss of function (see red box above), which corresponds with and may explain the predicted loss of its activity as an upstream regulator.