Now you can quickly discover expected and unexpected commonalities among sets of analyses of interest in Analysis Match using a new capability that detects statistically significant associations in their metadata. For example: Are the analyses that match yours often derived from a particular tissue type, disease state or treatment? Do they tend to derive from a particular mouse strain, or from cells with specific cell surface markers? This approach can help easily identify similarities among matching analyses that may have been previously hidden.
QIAGEN IPA scans across more than 90 metadata fields from the set of repository-based analyses that you select in Analysis Match and performs a calculation to detect potential enrichment among their metadata. Figure 1A shows an Analysis Match result filtered for analyses that strongly match (or anti-match) an analysis of gemfibrozil-treated rats. Gemfibrozil is a classical PPAR agonist. Selecting the matching set (those in the red dotted box in Figure 1A) and then clicking the Evaluate Metadata button generates p-values that are calculated using a right-tailed Fisher’s Exact Test. The results are displayed in a table like the one shown in Figure 1B. The most significant term among the selected analyses is ‘PPAR agonists’ in Figure 1B in the case.subjecttreatment field with p-value = 6.98E-08. Other examples of overrepresented terms are ‘white adipose cell’ and ‘preadipocyte’ in the case.celltype field. Note that the case.subjecttreatment and case.celltype fields are not shown in the Analysis Match table by default, calling attention to the fact that this new feature sifts through and surfaces metadata which may be initially hidden, due to space constraints in the user interface (UI).Figure 1: New feature in Analysis Match to discover commonalities among analyses of interest via shared metadata. Figure 1A shows Analysis Match results for the transcriptomics analysis of the liver of rats who were treated with the PPAR-alpha agonist gemfibrozil (RNA-seq data from PMID 25150839). The table has been filtered to retain only the strongest matching (average matching percentage >43) or anti-matching analyses (average matching percentage < -43). The matching analyses enclosed in the red dotted box were selected and the ‘Evaluate Metadata’ button was chosen. Figure 1B shows the results of the enrichment calculation, where the term ‘PPAR agonists’ was found to be highly enriched (p-value = 6.98E-08) among the matching analyses in the ‘case.subjecttreatment field’. This level of significance arose because of the 18 analyses that were selected, three of them shared the ‘PPAR agonists’ term, while there are only nine analyses in the entire set of over 57,000 analyses in the Analysis Match repository with that term. Other examples of overrepresented terms are ‘white adipose cell’ and ‘preadipocyte’ in the ‘case.celltype’ field.
The analyses that were identified as being treated with “PPAR agonists” were specifically treated with tesaglitazar, fenofibrate, or rosiglitazone, which are well-known PPAR agonists.
The metadata results table can be filtered to focus on certain fields or terms of interest. In Figure 2, the metadata evaluation results are narrowed to show only fields involving the ‘case’ samples (rather than the controls).
Figure 2: Filtering the metadata results table. You can filter the results data to focus on certain types of fields or values, such as fields involving the cases rather than the controls. Note that the computation only considers the metadata in the repository-based analyses. It does not evaluate any metadata that you may have entered for any of your own analyses.
The Build > Grow > Diseases & Functions feature is a powerful way to add biological context to a pathway or network. However, its calculation of statistical over-representation is computationally expensive and often takes 30–60 seconds. In the past, after performing the first ‘Grow to Diseases & Functions’ operation on a network, QIAGEN IPA would repeat the calculation immediately any time nodes were added or subtracted from the network, forcing you to wait for updated statistical results with each change. Now you control when to perform the calculation using the new Recalculate button (Figure 3). You can make numerous changes, and when ready, determine which diseases and functions are statistically relevant.Figure 3: Recalculate over-representation of Diseases & Functions on demand. Now you can make multiple additions or subtractions to the network or pathway before performing the computationally expensive overlap calculation.
QIAGEN IPA now supports the upload of .csv dataset files. Some upstream software such as 10x Genomics Loupe Cell Browser exports comma-separated data files. QIAGEN IPA now supports their direct import.
Increase data interpretation power and simplify pathway modeling by adding interactive nodes representing Canonical Pathways to networks or pathways. These newly available nodes correspond 1:1 with the Canonical Pathways that have always been in IPA and behave similarly to disease or function nodes. You can connect them to molecules that are part of each particular Canonical Pathway and simulate the effect of activating or inhibiting these molecules on the pathway as a whole. The activity simulation is available only for pathways that have a Pathway Activity Pattern.
Figure 1 shows the TNFR1 signaling pathway as a node on a ‘My Pathway’ connected to its primary activating ligand TNF. IPA’s MAP tool was used to “activate” TNF (shown in red), predicting this would lead to activation of the TNFR1 pathway (shown in orange).
Figure 1. An example of a Canonical Pathway displayed as an interactive node in IPA (connected to its principal activating ligand for purposes of illustration).Each pathway can be linked to the full collection of genes that make up that pathway by using the Build > Grow tool, starting with the pathway node. The interactive pathway diagram that accompanies each Canonical Pathway can be visualized by double-clicking the pathway icon. Note that when using the Grow tool to go from a pathway node to genes, all genes that are part of that pathway are added to the pathway, including those that are members of groups and complexes. If you wish to find all the genes that are included in a pathway for scoring against your dataset, it is best to use the search engine to search for that pathway, and add both the pathway and the nodes to a new ‘My Pathway’. This method will show the groups and complexes that belong to the pathway but are not included in scoring. These nodes can be removed with the Build > Trim tool.
Canonical Pathway nodes can be added to any network to increase interpretability. Figure 2 shows an example of adding Canonical Pathway nodes to an interaction network from a Core Analysis of stem cells differentiating to cardiomyocytes, indicating that several of the molecules in the network are activators of the apelin endothelial signaling and paxillin signaling pathways.
Figure 2. Two Canonical Pathways manually added to an interaction network. Using Build > Grow, Canonical Pathways were added to a pre-existing interaction network from a Core Analysis.
You can also add Canonical Pathways to Regulator Effects networks or include them inside other Canonical Pathways, as shown in Figures 3 and 4, respectively.
Figure 3. Two Canonical Pathways manually added to a Regulator Effects network. Using Build > Grow, Canonical Pathways were added to a pre-existing Regulator Effects network from a Core Analysis. These pathways are predicted to be activated due to the increased activity of the molecules colored in red in the network.
Figure 4. Canonical Pathway manually added inside another Canonical Pathway. Using Build > Grow, Canonical Pathways can be added inside another Canonical Pathway. The MAP tool coloring indicates the added pathway is inhibited (blue color) with this overlaid dataset.
Over 90 pathways in IPA have an existing pathway embedded within them, represented as a single node. Previously, these were shown using a non-interactive node. Now, these “pathways on pathways” are interactive and their activity can be predicted. Figure 5 shows a portion of the CDC42 signaling pathway that embeds two Canonical Pathways (ERK/MAPK signaling and SAPK/JNK signaling) which are predicted to be activated downstream of the CDC42 pathway.
Figure 5. Canonical Pathways already existing inside another Canonical Pathway.Over 90 pathways in IPA already have one or more Canonical Pathways embedded within them. The MAP tool in IPA was turned on to predict the effect of activating c-RAF and the JNK protein family on each of the connected pathways.
When using Build tools such as Grow and Connect, sometimes you need to repeatedly perform the same operations on every network or pathway that you open. For example, you might need to always Grow upstream to transcription regulators. Now, you can make the appropriate selections in the various Build filters and save them as defaults. From that moment on, each new Build tool you use will remember your saved settings. You can always reset your custom settings back to “factory defaults” when needed.
Figure 6. The new “save as preferences” in the context of the Grow tool.The node types of ligand-dependent nuclear receptor and transcription regulator have been saved as defaults. Now, whenever a pathway or network is opened, the Grow tool will add molecules only of that type. The Build Preferences panel in IPA’s Application Preferences will show your saved settings as shown in Figure 7.
Figure 7. The new Build Filters preferences.> The panel is located in File > Preferences > Application Preferences.
Explore new Canonical Pathways related to two important research topics Enhance your research and discovery of the mechanisms driving the development of lupus and cancer immunotherapy with these new pathway maps:
Now from the Overlay tool, you can search for analyses and datasets to overlay onto networks and pathways, rather than by manually browsing in the Project Manager tree. Furthermore, Analysis Match analyses and datasets from OmicSoft are included in the search results for users with an Analysis Match QIAGEN IPA license.
This workflow enables rapid visual assessment of any analysis or dataset on the pathway or network of interest. The figure below shows the Interferon Signaling Pathway overlain with expression data from mouse lung infected with an Influenza A virus vs. uninfected lung (analysis from GEO dataset GSE36328 as processed by OmicSoft for Analysis Match).
In this release, many more columns in the Core Analysis tabs are filterable, which will help you narrow down and focus your results. Furthermore, now you can use the less than (<) or greater than (>) symbols to tailor the results.
The new filters can be found in the following tabs: Upstream Analysis, Diseases & Functions, Regulator Effects, Networks, and Molecules. Some of the filters also appear in Comparison Analyses.
Metadata values that differ between case and control are now displayed in a table at the top of the metadata panel in the Project Search results. An example is shown in the figure below.
Also, specific fields that are important in understanding the dataset (such as the organism, tissue and platform) have been extracted into a small section called “Comparison Context” that follows the case/control table.
If you have chosen to use the standard QIAGEN IPA case and control metadata keys for your datasets, they will also be automatically displayed in a table and placed into the Comparison Context section.
We are excited to introduce brand new features in the QIAGEN IPA Spring 2019 Release:
QIAGEN IPA can now improve your success of mapping identifiers in your datasets by evaluating more than one column of gene or chemical IDs. Assign up to five columns of IDs, and QIAGEN IPA will scan from left to right across the columns of identifiers and stop (for that row) when it successfully maps an ID.
Mapping across multiple columns of IDs is especially valuable in the case of metabolite (chemical) IDs. Figure 1 shows a dataset during the upload process with four columns of metabolite IDs, which resulted in more rows being mapped than when using any one identifier column alone.
Figure 1: Assigning multiple columns of IDs during upload to increase mapping coverage. This dataset has four columns of IDs that are assigned for mapping. The dataset summary tab is automatically updated each time a new ID column is assigned and its source(s) chosen. In this example, 344 chemicals (rows) were mapped using only one column (HMDB), but when all of the ID columns (HMDB, PubChem, CAS, and KEGG) were used together, QIAGEN IPA scanned from left to right and was able to map 379 of the rows.
QIAGEN IPA now calculates a Benjamini-Hochberg (B-H) corrected p-value for Upstream Regulators and for Causal Networks, increasing the statistical stringency of these results in Core Analyses. The B-H p-value corrects for multiple testing-- the fact that the more statistical tests you run, the greater the chance that you will observe a false positive result. Figure 2 shows the Upstream Regulator tab in a Core Analysis with the new B-H column. Note that these new p-values won’t be present for any analysis that you have run prior to this release. Please re-run previous analyses to calculate the values.
Figure 2: Upstream Regulator tab now has an optional “B-H corrected p-value column”. The column is not shown by default, and you must click the Customize Table button, then tick the B-H corrected p-value checkbox to display the column. In this example, note that the B-H p-values for these regulators are at a B-H statistical significance of ~ 0.01, whereas the standard p-value are approximately three orders of magnitude more significant.
B-H p-values have been available in QIAGEN IPA for Canonical Pathways and for Diseases and Functions for several years, however, the values were not easily accessible for the latter. An optional B-H column is now available in the Diseases & Functions tab as shown below:
Figure 3: The Diseases & Functions tab now has an optional “B-H corrected p-value column” in the table. The column is not shown by default, and you must click the Customize Table button then tick the B-H p-value checkbox to display the column.
The Diseases & Functions TreeMap can be visualized using the B-H corrected p-value. The rectangles can be colored by and/or sized by the -log of the B-H p-value, as shown below in Figure 4.
Figure 4: The Diseases & Functions TreeMap can be visualized using the B-H corrected p-value. Use the menus (highlighted above) to color and/or size the heatmap by the -log of the B-H p-value.
The B-H statistics are also available in Comparison Analysis for your analyses that are run (or re-run) after this release, and are calculated for all Analysis Match analyses as well.
The Help menu in QIAGEN IPA now has a quick link to a set of video tutorials to help you get started with how to use QIAGEN IPA. The topics range from how to format and upload your data, how to analyze your data, and how the p-values in QIAGEN IPA are calculated:
Figure 5: New Video Tutorials link in QIAGEN IPA’s Help menu. The link leads to a new Help Portal page with a set of videos to help you use QIAGEN IPA.
We are excited to introduce brand new features in the QIAGEN IPA Winter 2018 Release:
Now you can examine detailed expression patterns across human tissues directly from QIAGEN IPA’s Isoform Views. QIAGEN IPA now offers access to a lite version of OmicSoft Land Explorer. With this new feature, you can provide interactive plots of gene expression in 51 different human tissues from the GTEx project, for both gene level and individual splice variants. You can filter the view for a particular tissue, or filter on metadata, such as tissue donor age or gender. You can also download the detailed sample-level expression data for the gene.
QIAGEN IPA users can access the lite version of Land Explorer at no extra cost and does not require registration or manual sign-in. For broader access to hundreds of thousands of samples from healthy and disease tissue, please take a tour of the full OmicSoft Land Explorer (launching soon!).
Figures 1-3 demonstrate how you can access a lite version of Land Explorer via QIAGEN IPA for no extra cost. The figures show how the FABP4-201 isoform of FABP4 (the longest protein-coding isoform of the Fatty Acid Binding Protein 4 gene) is expressed at higher levels in adipose and breast tissues than in other tissues.
Figure 1. Navigate to sample-level human tissue expression for human genes via links in isoform view. Click the link (shown in the red box) to view Land Explorer via the QIAGEN IPA web page that plots the expression of the isoforms (splice variants) of a human gene in 51 different human tissues. Gene-level expression is also available in Land Explorer.
Figure 2. View of human isoform-level expression in human tissue samples for FABP4. The underlying RNA-seq data were reprocessed by OmicSoft (a QIAGEN company) from raw fastq files obtained from the GTEx consortium, and represents the expression of the isoforms of a particular gene in >8000 samples harvested from one of 51 different human tissues. Each chart displays the expression for one human transcript ID (either RefSeq, or Ensembl as shown above) where each circle represents the quantity of RNA (in FPKM) in one particular tissue sample. The pink bars show a box plot that summarizes the distribution of FPKM in that tissue or set of tissues.
The plot can be switched to show gene-level expression as well, as shown below in Figure 3.
Figure 3: Land Explorer Views can be switched to show gene-level rather than isoform-level expression. (1) The menu at the top middle of the screen can be used to switch to “Gene FPKM” as shown. (2) There are a number of filters available as well in the Add Filter menu. (3) Note that by default the tissues are grouped into similar types. For example, there is initially just one “row” for brain as shown above. Use the Grouping menu to choose “Tissue Detail Type” to expand to show all the individual tissues.
Create and open QIAGEN IPA Comparison Analyses much more quickly and add statistical stringency to your Comparison Analyses with the Benjamini–Hochberg correction. B-H corrected p-values are now available for display and filtering in Canonical Pathways and Diseases and Function tabs, as shown below in Figure 5.
Figure 4: Comparison Analyses can now be created and reopened more quickly than in prior releases.
Figure 5: Benjamini-Hochberg corrected p-values are now available in Comparison Analysis for display and filtering. In both the Canonical Pathways tab and the Diseases & Functions tab, you can color the heatmap squares by B-H p-value and can use the filter as shown to hide rows that don’t meet a particular cutoff that you enter.
• FAT10 Cancer Signaling Pathway• T Cell Exhaustion Signaling Pathway
~38,500 new Expert findings~400 new ExpertAssist findings ~50,800 new cancer mutation disease association findings from COSMIC ~1300 new ontology findings from GO ~2100 new disease-to-target findings from ClinicalTrials.gov ~1500 new drug-to-disease findings from ClinicalTrials.gov ~9000 new protein-protein interactions from the BioGRID database ~700 new protein-protein interactions from the IntAct database ~160 new mouse knockout-to-phenotype findings from MGD (JAX Labs) ~150 newly mappable chemicals
The Analysis Match repositories will be updated in QIAGEN IPA on Jan 4th, 2019. There will be over 3,500 new Analysis Match datasets in this release, as outlined in Table 1.
Analysis Match enhances interpretation and drives discovery by placing your dataset in the context of thousands of QIAGEN IPA analyses that have been processed from data from public sources using Array Suite.
Powered by QIAGEN IPA Advanced Analytics, Analysis Match automatically identifies the analyses of curated datasets that have significant similarities and differences, enabling you to compare results, validate interpretation and better understand causal connections between diseases, genes, and networks of upstream regulators.
Table 1:>52,000datasets will be available in QIAGEN IPA Analysis Match in this release (on Jan 4th, 2019).
QIAGEN IPA Core Analysis now opens much more quickly! Just double click the analysis icon as usual and the analysis will open into a ready-state much faster than in prior releases.
Note: A change has been made in the information that is displayed in the molecules tab. The tab now lists all of the molecules in the original dataset and indicates (in bold in the Symbol column) those that are "analysis ready", meaning they passed filters and cut-offs and were therefore submitted for analysis.
There are nearly 1,800 new Analysis Match datasets in this release (see below).Analysis Match enhances interpretation and drives discovery by placing your dataset in the context of thousands of QIAGEN IPA analyses that have been processed from public sources using Array Suite. Powered by QIAGEN IPA Advanced Analytics, Analysis Match automatically identifies the analyses of curated datasets that have significant similarities and differences, enabling you to compare results, validate interpretation and better understand causal connections between diseases, genes, and networks of upstream regulators.
Table 1: >49,000 datasets are available in QIAGEN IPA Analysis Match in this QIAGEN IPA release. *Redundant LIMMA and Voom-based datasets have been removed in the RatDisease repository. Now GLM and DeSeq2 are used exclusively for microarray and RNA-seq data, respectively.
QIAGEN IPA can now predict metabolic activities in a dataset using its entire collection of more than 300 metabolic pathways. The prediction is based on the set of up and down regulated molecules in your datasets and the directionality of the metabolic pathway itself. See Figure 1 below which shows the Canonical Pathways tab in a Core Analysis, with metabolic pathways marked with red arrows. The orange color of the bars indicated they are predicted to have increased activity in this dataset.
Figure 1: The Canonical Pathways tab in a Core Analysis highlighting metabolic pathways with orange bars. These pathways are predicted to be activated in this analysis of kidney RNA of mice treated with the NRF2 activator CDDO-me (vs. DMSO, PMID 26422507).
The methodology QIAGEN IPA uses to predict the metabolic activity from a dataset in Core Analysis is described here. QIAGEN IPA can predict metabolic activity from your differential gene expression dataset, differential metabolomics dataset, or a dataset where you have concatenated both differential gene expression and differential metabolite concentrations into one “observation”.
The metabolic pathway activity scores contribute to Canonical Pathway signatures in Analysis Match*, as shown below in Figure 2.
Figure 2: Heatmap in Analysis Match filtered to show only the Canonical Pathway scores. Several metabolic pathways participate in the signature as shown.
*Analysis Match requires additional licensing. Please contact us at AdvancedGenomicsSupport@qiagen.com for info.
There are 1,100+ new analyses for Analysis Match in this release, bringing the total available in QIAGEN IPA to >8,000. This includes two new repositories, RatDisease (under DiseaseLand) and Pediatrics (under OncoLand). Table 1 compares the repositories and their respective sizes in this release versus the prior one.
Table 1: Comparison of the number of datasets and repositories in this release (green color) to the prior release (red color). There are over 1,100 new datasets and their corresponding analyses in the current release.
Analysis Match* automatically discovers other QIAGEN IPA Core Analyses with similar (or opposite) biological results as compared to yours, to help confirm your interpretation of the results or to provide unexpected insights into underlying shared biological mechanisms across experimental situations. QIAGEN IPA matches your analysis against other analyses you have created (in your Project Manager) as well as thousands of other human and mouse expression analyses curated from public sources. This “analysis-to-analysis” matching is based on shared patterns of Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions.
In this release, improvements to Analysis Match enable you to more easily control which of the Lands are used in the matching, and the detailed results in the heat map are more easily interpreted and available for follow up. You can now manually add experiment metadata to your own datasets to label them more clearly in the Analysis Match table and to find them using Project Search.
Fig 1. Filtering the Analysis Match results by source (Land). Use the enhanced Project menu in the Analysis Match tab to choose which Lands you would like to use for matching. Click on one or on multiple repository names to select them. You can also include your own projects by expanding the My Projects tree and clicking on your project’s name(s). Or use the radio button and switch to doing a free text search by project name (i.e. Land name). MetastaticCancer is a new Land in this release.
Fig 2. New option in the Analysis Match heatmap to indicate signature entities that are NOT significant in the other analyses. The heat map in Analysis Match is constructed by showing all the signature entities from the analysis you opened (the analysis of interest), using color to represent each entity’s z-score in that analysis as well as in the other analyses you selected when you created the heatmap. However, although the heat map square for a particular entity in another analysis may be colored orange or blue, its underlying z-score may be too small to be considered significant. Now you can mark such instances as insignificant as shown above. In this example, a threshold of “2” was entered in the “Insignificance Threshold” field to label with a dot those heat map squares which have a smaller value than that threshold (i.e. <2), enabling you to visually ignore the insignificant z-scores.
Fig 3. Explore a signature entity’s underlying network by clicking on a heatmap square. By clicking on a heatmap square in the Analysis Match heatmap, you can now display its underlying network or pathway. As shown above in part A, clicking on the ACKR2 heat map square in the first column displays its network in the right panel. The molecules from the dataset are shown in the Molecules tab (part B above), and clicking on the name of an analysis in the header of the heatmap will display a tab showing the analysis’ metadata (if it has any) in the Metadata tab (part C above). See Figure 6 below to see how you can enter metadata for your own datasets.
Fig 4. Conveniently select a set of signature entities in the Analysis Match heatmap for further exploration via the row or column dendrograms. To explore a set of related signature entities, select them as a group by clicking on their dendrogram. For example, the top-most cluster of entities (rows) was clicked to select a group of related signature entities. The selected group can be sent to a new pathway or a new list by using the buttons along the top of the heat map. Or the selection can first be modified by command-clicking (Mac) or control-clicking (Windows).
Fig 5. Select a set of analyses for further exploration in a full Comparison Analysis. Select a set of related analyses by clicking on their cluster in the column dendrogram. As shown above, a cluster of analyses (columns) was selected by clicking on the portion of dendrogram above them. The analyses can then be viewed more fully by clicking on the View Comparison button. Up to 20 analyses can be viewed in a Comparison Analysis. The selection can first be modified by command-clicking (Mac) or control-clicking (Windows) to limit to <20.
Now you can annotate your uploaded datasets with information that will help you quickly find those datasets (or analyses created from them) using project search, or help you to remember details about them when interpreting the results of their analysis. This is especially useful in the context of Analysis Match, where metadata from the dataset can be displayed in columns in the Analysis Match tab.
When you upload your dataset, you can enter relevant metadata about it in the QIAGEN IPA user interface. For example, you could annotate them by leveraging existing OmicSoft fields such as “case.disease” or “case.tissue” by typing in values such as “asthma” or “lung”, or create your own custom fields to annotate. For example, you could create a new field called “eNotebook record” and enter a clickable hyperlink that points to an internal online record about the experiment that led to that dataset, or create a field called “Collaborators” and put in names of colleagues involved with that dataset. The metadata you add to a dataset is automatically propagated to any Core Analysis created from it. Keep in mind that the metadata you enter is for your purposes only, and is not used by QIAGEN IPA to influence the analysis results. Figure 6 shows how you can enter metadata for a dataset.
Figure 6. Entering metadata for a dataset. Existing keys from OmicSoft can be used, or you can create a custom field as shown above. In this instance, a new field called “Hyperlink to paper” was created and a hyperlink was pasted in (control-v). Other metadata was added as well such as tissue type, disease state etc. The metadata will propagate to any Core Analysis created from this dataset.
Figure 7. Searching for datasets and analyses using the metadata you entered for the dataset. In this example, an analysis was found using the keyword “GSE11352”, which had been entered as metadata in the OmicSoft field “projectname” for the dataset. In this example, there are also OncoGEO analyses with that same GSE#.
Metadata can be added or edited either before or after saving the dataset file. It is also possible to insert metadata at the top of the dataset text or Excel file itself before you upload it, by following instructions here. This is especially useful when you wish to enter a large amount of metadata or if you have many similarly derived datasets that have mostly the same metadata. In this release, you can edit that uploaded metadata in the metadata tab (during upload), or after saving and re-opening it.
*Analysis Match requires additional licensing. Please contact us for info.
QIAGEN IPA now gives you more flexibility to use your creativity to build and modify networks and pathways. You can globally select nodes on pathways by additional criteria to take further actions on the nodes. Specifically, you can highlight or select nodes by their overlay and by their connectivity. For example, if you have overlaid expression fold change values, you can first select only the up-regulated genes and move them all at once to a different place on the network canvas, and do the same for the down-regulated nodes. Or you can select all the unconnected nodes and delete them. Or you could highlight the most highly connected nodes in the network.
Figure 8. Highlighting or selecting nodes via their overlay. The Highlight menu in the Overlay tools has been renamed to “Highlight or Select” because you now have the choice to either highlight or to select nodes meeting your criteria. Highlighting means coloring the borders of the nodes purple (the “Outline” option in the menu at the bottom right of the window) or filling them with a dark blue color (the “Fill” option in the menu at the bottom right of the window). Selecting means coloring their borders blue (using the “Select” option in the menu at the bottom right of the window) to put them in a state where you can do further actions on them, for example deleting them or moving them around on the pathway canvas as a group. In the example above, nodes with no values in the overlaid dataset (i.e. white colored nodes) are selected as a group.
Figure 9. Highlighting or selecting nodes via their connectivity. The new Node Connectivity filter is used to select nodes via how connected they are to other nodes on the network or pathway. As shown above, nodes connected to >6 other nodes were selected. This resulted in the 3 most highly connected nodes (“hubs”) being selected.
Figure 10. Trimming nodes via their connectivity. The Node Connectivity filter is also available in Trim and Keep in the Build menu. In this example, the Node Connectivity filter is used in the Trim tool to remove all unconnected nodes.
Separate up and down cutoffs must now be entered (rather than a single absolute value) for directional measurement types such as fold change or log ratio. This gives you more control over the makeup of the set of molecules that QIAGEN IPA analyzes from your dataset, as compared to using a single absolute cutoff. Figure 11 below shows an example of this.
Figure 11. Set separate up and down cutoffs for Core Analysis. Now when setting up a Core Analysis, when you use a cutoff for a directional measurement (those with both positive and negative values like fold change or log ratio), you must enter a separate value for a negative and positive cutoff. As shown above in this example, a cutoff of -1.5 and 3 is used for Expr Fold Change for down and up, respectively. This means that genes with expression fold changes >-1.5 and <3 will not be used in the analysis. Notice that the counts of “down genes” vs. “up genes” that survive the cutoffs are displayed next to the recalculate button and indicated in the image above with red arrows.
Analysis Match* automatically discovers other QIAGEN IPA Core Analyses with similar (or opposite) biological results as compared to yours, to help confirm your interpretation of the results or to provide unexpected insights into underlying shared biological mechanisms. It matches your analysis against other analyses you have created (in your Project Manager) as well as thousands of other human and mouse expression analyses curated from public sources. This “analysis-to-analysis” matching is based on shared patterns of Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions.
With this new capability, you can:
The analyses included in Analysis Match were generated in QIAGEN IPA from more than 6,000 highly curated and quality-controlled human and mouse disease and oncology datasets re-processed from SRA, GEO, Array Express, TCGA and more. These datasets were generated by QIAGEN’s recently acquired company, OmicSoft, and are the “comparisons” found in DiseaseLand and OncoLand representing various contrasts between disease and normal, treatment vs. non-treatment and much more.
Figure 1 shows the new Analysis Match tab from one of QIAGEN IPA’s Example Analyses based on the expression data derived from mouse lung exposed to welding fumes. The results in the figure have been filtered to show only the highest scoring results against all the analyses in the OmicSoft repository within QIAGEN IPA. Of the more than 6,000 in the repository, 125 analyses had an overall score of >60% or <-60%, corresponding to strongly similar or dissimilar patterns, respectively. You can further filter the results in a number of ways, for example by type of comparison, by disease state, tissue, and much more. The keyword filtering is possible because each analysis has been extensively annotated by OmicSoft using a controlled vocabulary which can be displayed in columns as shown in figure 1. Only a few columns are shown in QIAGEN IPA by default due to screen space limitations.
The analyses are matched based on a set of signatures that are created for each analysis, namely one signature for the Canonical Pathways, one for Upstream Regulators, one for Causal Networks, and one for Diseases and Functions. Each signature is used independently to match against other analyses, and an overall average is computed.
*Analysis Match requires additional licensing. Please contact us for info.
Fig 1. Analysis Tab displaying matching analyses. By default, the analyses are ranked from most similar to least similar based on the overall similarity score (the right-most column shown above). The analyses are matched based on a set of signatures that are created for each analysis, namely for Canonical Pathways, Upstream Regulators, Causal Networks, and Diseases and Functions. Each signature is used independently to match against other analyses. In the image above, each of the first four colored columns at the right represents the percentage similarity of each type of signature to the analysis you opened. The fuchsia color indicates similarity (shown here) and cyan color indicates dissimilarity (not shown here). The first scoring column (“CP”) is the match for the Canonical Pathway signature, the second (“UR”) is for Upstream Regulators, the third (“CN”) is for Causal Networks and the last (“DE”) is for Downstream Effects (i.e. Diseases and Functions). The final column shown above is the average of those four signature matches. More detail on the signature scoring algorithm can be found here. Note that some of the columns normally shown by default in the Analysis Match tab have been hidden in this figure.
As shown in Figure 1, the analysis with the best overall match from the repository is an expression analysis from mouse lung exposed to heat killed influenza virus (from GSE41684), which has strong similarity across all 4 signature types. The next step is to explore the signatures themselves across all or a subset of matching analyses, to understand in more detail which “entities” (the set of upstream regulators, canonical pathways, etc.) drove the similarity scoring. In this example, the matching analyses were further filtered to limit to the repository folder called “MouseDisease” which retained 75 of the analyses, and a heatmap was created by clicking the View as Heatmap button. Figure 2 shows this heatmap, where the rows are the entities from the four signatures with columns for the 75 similar (and dissimilar) analyses. The z-score for each entity from each analysis is represented in the cells with an orange or blue color (for positive and negative z-score respectively).
Fig 2. The heatmap of the signatures vs. the matching analyses reveals similarities and differences. The ”4 hr lung” analysis (highlighted in pink above) by definition has a significant z-score for every entity that is listed in the left column, because those entities represent the union of all 4 types of signatures derived for that analysis. The other selected analyses are shown for reference and may or may not have a significant z-score for each entity. The rows and columns were clustered using agglomerative clustering with Euclidean distance and average linkage (UPGMA linkage).
The heatmap is filterable to enable you to focus on the types of entities of interest to you. Figure 3 shows the heatmap filtered for upstream regulators which are classified as transcription regulators. The clustering of the rows reveals which transcription regulators have similar patterns across the analyses, whereas the clustering of the columns shows which analyses are most closely correlated to one another based on the underlying transcriptional regulator pattern.
Fig. 3. Analysis Match Heatmap filtered to show only upstream regulators which are classified as transcription regulators. The heatmap offers several filters to enable you to explore the nature of the signatures. Clicking on a column header for an analysis in the repository displays its metadata at the right side of the window as shown.
The clustering of the entities (the rows) can reveal interesting similarities among the entities. For example, after removing the prior filter in order to show all the entities, Figure 4 shows that the drug bexarotene clusters closely with the “PPAR/RXR activation” canonical pathway in a larger cluster containing CR1L, ALDH1A2, SUMO1, and ABCB4. Bexarotene is an RXRA and RXRB agonist, providing a rationale why it tightly correlates with this pathway in the heatmap. SUMO1 is a regulator of PPAR activity, whereas it is not as clear why the other entities appear in this cluster, an observation which could provide interesting avenues of investigation.
Fig.4. Heatmap showing a cluster which contains both an upstream regulator and a canonical pathway. Tight clustering of entities may reveal correlations that may be of biological interest.
You can select and send entities (except Canonical pathways) to a My Pathway for further analysis, for example to connect nodes together or to discover drugs that target them.
Another valuable way to use the OmicSoft analysis repository is to start by finding analyses of interest by using QIAGEN IPA’s Dataset and Analysis Search by entering keywords such as disease name or tissue. Figure 5 below shows a search for human asthma analyses but excluding those involving albuterol. From search results like these, you can double click to open an analysis, or select up to 20 to visualize in a full comparison analysis.
Fig 5. Discovering analyses of interest using Dataset and Analysis Search. The query “human AND asthma NOT albuterol” finds 136 analyses with those keywords in the OmicSoft repository in QIAGEN IPA. Double-click to open one or create a Comparison Analysis with up to 20. Metadata about the selected analysis (or analyses) is displayed on the right side of the search screen.
The repository of datasets and analyses are stored in QIAGEN IPA’s Libraries folder in the project manager as shown in Figure 6. Note that these are read-only and cannot be exported out of QIAGEN IPA.
Fig 6. OmicSoft repository in QIAGEN IPA with over 6000 datasets and corresponding datasets and analyses. The repository is read-only and cannot be exported out of QIAGEN IPA.
Analysis Match combines literature-powered causal analytics from QIAGEN IPA with a massive dataset collection provided by OmicSoft, creating a unique opportunity for you to make biological discoveries.
Changes in the phosphorylation states of proteins provide an important regulatory mechanism in mammalian cells. Now you can get more from your phosphoproteomics datasets in QIAGEN IPA with a new Phosphorylation Core Analysis*.
Discover upstream regulators and causal network master regulators that may be driving the changes in phosphorylation levels of the proteins in your phosphoproteomics dataset. These results provide testable hypotheses by identifying potential upstream signaling cascades from the phosphorylation patterns in your dataset.
To illustrate this new feature, we analyzed a phosphoproteomics experiment obtained from the literature, in which insulin was applied to starved mouse adipocytes that had been differentiated from 3T3-L1 cells in vitro (PMC3690479). Phosphorylated proteins were isolated from the cells by the authors during a time course of 15 seconds to 1 hour.
As shown below in Figure 1, after 15 seconds of insulin exposure, a characteristic phosphorylation pattern is established in these adipocytes highlighted by the fact QIAGEN IPA predicts insulin (gene symbol Ins1 below) as one of the top predicted upstream regulators which is activated.
Fig 1. Upstream Regulator Analysis. The pattern of differentially phosphorylated proteins in the dataset of insulin- treated cells was used to predict the responsible upstream molecules.
Fig 2. The Ins1 Upstream Regulator network in the 15 second time point. Insulin is a top upstream regulator predicted to be “activated” based on the pattern of phosphorylation of insulin targets in adipocytes treated with insulin for 15 seconds. Proteins with red fill color have increased phosphorylation relative to the untreated control, and the green node have relative decreases in phosphorylation. Clicking on the badge next to each protein displays the differentially phosphorylated peptides that were uploaded in the dataset (as shown for the insulin receptor, INSR).
Figure 2 indicates there is a positive phosphorylation relationship (orange line) between Ins1 and GAB1. This is supported by a paper that showed that in differentiated 3T3-L1 cells, insulin can increase the phosphorylation of GAB1. For the relationship between Ins1 and STAT3, a different paper showed that insulin can increase the phosphorylation of Stat3 in RAW 264.7 cells (see Figure 3 below).
Fig 3. Examples of phosphorylation findings curated from the literature in the QIAGEN Knowledge Base. Both indicate that insulin can increase a target protein’s phosphorylation (indirectly through unspecified mediators).
Causal Network Analysis predicts regulatory networks to explain phosphorylation changes exhibited in a dataset. Causal Network Analysis enables the discovery of novel regulatory mechanisms by expanding upstream analysis to include regulators that do not yet have known “direct” connections to the targets in your dataset.
For example, stimulating adipocytes with insulin is predicted to activate the master regulator FLT1 (also known as the vascular endothelial growth factor receptor 1) after 15 seconds of exposure. In this causal hypothesis, FLT1 is predicted to drive the activity of nine other regulators which in turn drive changes in the phosphorylation of a larger number of dataset proteins as shown below in Figure 4.
Fig 4. Causal Network Analysis. FLT1 is predicted to activate or inhibit several intermediate regulators leading to the changes in phosphorylation in dataset proteins.
If you’re an existing customer, launch QIAGEN IPA from your desktop and check out the new features. If you need to install QIAGEN IPA, click here.
Changes in the phosphorylation states of proteins is an important regulatory mechanism in cells. Now you can get more from your phosphoproteomics datasets in QIAGEN IPA with improvements to phosphorylation data upload and visualization.
Last September the QIAGEN IPA Fall Release added a new data type to QIAGEN IPA to support the upload of protein or gene IDs along with corresponding phosphorylation increases or decreases represented as fold change (or log ratio). With this December release you can now upload the corresponding individual phospho sites for display on networks and pathways. These can be represented with any text you wish; such as the actual phosphorylated peptide, e.g. _FSSS(ph)QPEPR_ as shown in Figure 1 below, just a residue number (e.g. Y347), or any combination of text and numbers.
1) Visualize multiple differentially phosphorylated sites (phospho peptides) on networks and pathways.
Fig 1. Display multiple phospho sites from an uploaded “phospho” dataset. Top image: The small badge at the top right of the node indicates how many phospho sites are in the dataset or that passed your cutoffs in an analysis (depending on whether a dataset or analysis is overlaid). In this example, two phospho peptides for Chk1 passed the analysis cutoff for Phospho Fold Change. Clicking the badge shows the differential phosphorylation as a heat map alongside the phosphorylated peptide sites (if uploaded in the dataset). Bottom image: Example of phosphorylation sites uploaded in the dataset (right column).
2) Easily identify the proteins on networks and pathways where QIAGEN IPA predicts that increases in phosphorylation inhibits their activity or where decreases in phosphorylation increases their activity. The activity of certain proteins is more likely to be inhibited by phosphorylation than activated by it. In the example below the Molecular Activity Predictor, with overlaid phospho data, indicates this by using blue or orange halos to indicate the predicted activity.
Fig 2. MAP (Molecule Activity Predictor) now uses colored halos around nodes on networks and pathways to indicate the activity for proteins which are inhibited by phosphorylation. Phosphorylation fold change data has been overlaid on CFL1 and GSK3B. CFL1 has increased phosphorylation in this dataset and MAP indicates that its activity is inhibited with the blue halo. GSK3B has decreased phosphorylation in the dataset and MAP indicates that it is likely activated using the orange halo. The full list of proteins where phosphorylation is expected to be inhibitory is available here in the QIAGEN IPA help portal.
Get more from your phosphoproteomics datasets in QIAGEN IPA. If you’re an existing customer, launch QIAGEN IPA from your desktop and check out the new features. If you need to install QIAGEN IPA, click here.
RNA sequencing technologies can generate datasets with thousands of differentially spliced transcripts. IsoProfiler helps you determine which isoforms have interesting biological properties relevant to your research project.
Fig 1. Overview of IsoProfiler, with highlights indicating the new features. IsoProfiler can visualize one or more transcript-level RNA sequencing datasets in a single view and enables you to filter and sort to focus on isoforms that have biologically relevant attributes. The top right table shows each gene in your dataset with its associated transcripts and expression data. When a gene is selected, the bottom right table shows the specific isoform-level details for that gene. 1) A new column displaying diseases and functions known to be associated at the gene-level (as well as at the isoform level) has been added to the top table. This may help you identify the specific isoforms in your experiment that drive the known gene level associations. 2) New filters have been added to search for specific gene name or specific disease and function terms that are pertinent to your dataset(s). See Figure 2 for additional details. 3) New dynamically re-sizable schematics of the isoforms are now displayed in the lower table for the gene selected enabling you to see the overall splicing pattern of each transcript.
Fig 2. Gene-level Disease or Function filter in IsoProfiler. Simply start typing in the text box to focus the list down to relevant filters. In this example, “epith” has been typed, which instantly limits the list of filters to terms like “chemotaxis of epithelial cells”, etc. The same type of filter is now also provided for isoform-level diseases and functions.
IsoProfiler is available in QIAGEN IPA with Advanced Analytics.
Enhance your multi-omics research approaches by uploading simplified phosphoproteomics datasets to QIAGEN IPA for overlay onto networks and pathways. In a first step to better support the understanding of phosphorylation state and the associated biology, a new “phospho” measurement type is being introduced with this release of QIAGEN IPA. Overlay phosphorylation and expression profiles on networks and pathways to identify key areas where phosphorylation is impacting the biological activity of the encoded proteins.
If you have performed both gene expression and phosphoproteomics profiling, you can visualize both of these data types simultaneously as bar charts on networks and pathways. Figure 3 below shows the upstream regulator MAPK1 which QIAGEN IPA predicted to be activated by alpha-toxin (hemolysin) treatment of S9 cells. This prediction was based on a Core Analysis of the gene expression data after exposure to the toxin. The expression data shows that MAPK1 is not itself differentially expressed, but overlaying the accompanying phosphoproteomics dataset on the MAPK1 network provides a possible mechanism for its activation—MAPK1’s phosphorylation level is increased which is likely to activate it and lead to the observed expression changes downstream. In Figure 3, you can see in contrast that JUN is both upregulated and exhibits higher protein phosphorylation after the treatment.
Fig 3. Upstream Regulator Network for MAPK1 with expression and phosphorylation data overlaid. MAPK1 is differentially phosphorylated, which may explain its predicted activation as a regulator of the expression of the genes connected to it in the network. In contrast, JUN is both phosphorylated and differentially expressed. The microarray and phosphoproteomics data used in this figure was obtained from http://dx.doi.org/10.1371/journal.pone.012208
RNA sequencing technologies can generate datasets with thousands of differentially spliced transcripts. IsoProfiler helps you determine which isoforms have interesting biological properties relevant to your research project.
Fig 1. Overview of IsoProfiler. Visualize one or more transcript-level RNA sequencing datasets; filter and sort to focus on isoforms that have biologically relevant attributes. The top table shows each gene and their associated transcripts while the bottom table shows isoform-level details for one gene at a time (based on the row you select in the top table). Click on the plus (+) sign in the left filter panel to display filter options that can be added. In the example shown above, the dataset is filtered for isoforms with fold change less than -2 or greater than +2, and only shows genes where isoforms are both up and down regulated in the dataset. Transcripts are represented as circles in the Expression Patterns column in the top table, with green circles indicating down regulation and pink or red circles corresponding to upregulated transcripts. The size of the circles represents the abundance of expression (for example RPKM) if you have included at least one such column in your dataset—larger circles have higher abundance transcripts.
Fig 2. Compare up to 20 transcript-level datasets in IsoProfiler. In this example, human endometrioid endometrial carcinoma (EEC) and hepatocellular carcinoma (HCC) RNA-seq datasets are compared. The results are shown after using IsoProfiler to set expression value cutoffs, filter for protein-coding isoforms, and keeping only those genes with isoforms in the dataset that have known disease and function associations.
Drill-down into the “IsoProfiler Findings” view to explore the details about the isoforms that have disease or biological function findings captured from the literature. This is done by selecting rows (or all rows) in the top table and clicking the IsoProfiler Findings button at the top of the table. This will open a special window as shown in Figure 3. Only isoforms with disease or function associations will appear in this window. This table enables filtering on findings-level details using the funnels, or filters, above each column.
Fig 3. Explore the details of isoform-level disease and function associations. Filter and explore the findings that connect isoforms to disease and functions.
IsoProfiler is part of Advanced Analytics.
Identify significant trends in genes involved in a pathway or network across conditions such as time or dose and elucidate possible mechanisms driving gene expression results with both variant gain or loss of function and expression results. Visualize multiple ‘omics datasets simultaneously on QIAGEN IPA networks and pathways.
Fig 1. Three RNA-seq time points taken during in vitro mouse cardiomyocyte development overlaid on the Integrin Signaling Pathway (zoomed in).
As the cells differentiate from embryonic stem cells into beating cardiomyocytes in vitro, a number of genes on this pathway are progressively upregulated. Several genes in the myosin subunit regulatory light chain family are upregulated over the time course. The new bar charts can show multiple measurements and datasets at one time to give you more insight into the details of the differential expression. In this example both the RNA-seq fold change and the intensity (RPKM) across the three analyses are shown. From this visualization, one can deduce that Myl7 becomes much more highly expressed than Myl2 (RPKM ~3800 vs ~115), even though Myl7 has a lower fold change than Myl2 (~955 vs. ~19,149). The fold changes alone don’t reveal this level of detail across the time points.
QIAGEN IPA also presents the multi-dataset / multi-measurement results in a table view that can be exported. Figure 2 shows an example of a portion of that table.
Fig 2. Clearly identify trends across genes, conditions, and datasets with the exportable table view.
The same genes shown in Figure 1 above are shown here in the new table view within the Overlay Datasets, Analyses & Lists tool, though in this table a line is drawn to connect the bars when possible to help visualize patterns.
Elucidate possible mechanisms driving gene expression results by simultaneously overlaying both gene expression analysis and variant loss/gain datasets on a pathway or network. In this way you can see which genes are differentially expressed and harbor potentially deleterious variants.
Fig 3. Uncover possible mechanisms driving gene expression results. RNA-seq gene expression data from three hepatocellular carcinoma (HCC) patients was used to predict that the NONO protein is inhibited. Expression from the three patients was processed in Biomedical Genomics Workbench (BxWB) and then analyzed in QIAGEN IPA, which led to the prediction of NONO inhibition using Causal Network Analysis. Variants were also called on the transcript sequences from these patients using BxWB and analyzed using Ingenuity Variant Analysis. All three patients were found to have potentially deleterious frameshift and missense variants in the NONO gene. Data from both BxWB and Variant Analysis were exported directly to QIAGEN IPA. The three green bars in Figure 3 correspond to predicted loss of function variants for each of the patients, and the red bar indicates that the expression was upregulated in the patients, perhaps as a compensatory mechanism for loss of function. NONO has been found to be mutated in a number of cancer types.Read the full release notes
Quickly see which diseases, functions, and phenotypes are associated with differentially expressed isoforms in your RNA-seq experiment using QIAGEN IPA’s new IsoProfilerBETA. Get early access to IsoProfiler as part of Advanced Analytics.
Simply filter to determine if certain isoforms (splice variants and their products) are known to drive a disease or process. For example, Figure 1 shows isoforms driving metastatic processes in a human breast cancer RNA-seq dataset.
Fig 1. IsoProfiler results. The table displays all the isoforms that have a curated relationship to a biological function, phenotype, or disease. In this example, the table has been filtered to display the isoforms known to be involved in metastasis. This isoform of ADAM12 is upregulated in the dataset, providing an avenue of experimental inquiry – perhaps this short form is responsible for the aggressiveness of these breast cancer cells.
Fig 2. ADAM12 isoform view shows that a shorter isoform, ADAM12S, is upregulated in the breast cancer cells, with a fold change of 66.3.
Import genetic gain/loss information for a set of genes and predict the variant effect on diseases, functions, phenotypes and canonical pathways. QIAGEN IPA now supports a new data type for gain or loss of function variants that result from genome or transcriptome sequencing data.
Overlay Gain or Loss of function variant values onto genes on networks and pathways to display their effects on genes and use MAP (Molecule Activity Predictor) to compute the impact on neighboring connected genes.
Fig 3. Gain or Loss of function variants (green-colored nodes indicating loss of function variant) in genes on the ERK5 Signaling Pathway could lead to increased cell survival and decreased gene expression in this endometrioid endometrial carcinoma analysis.
Combining Gain or Loss of Function variant data with expression data unlocks the ability to investigate whether upstream regulator predictions based on expression data may in fact derive from variants that activate or inactivate the regulator itself.
Using Upstream Regulator Analysis, if there are cases where an upstream molecule has been predicted to be activated or inhibited, you can quickly discover if the gene for that regulator has a corresponding gain or loss of function variant.
Fig 4. Upstream regulator analysis of an endometrioid endometrial cancer patient (tumor vs. normal adjacent tissue). The result shows that the NFKBIA protein is predicted to be an inhibited upstream regulator AND has a likely loss of function (see red box above), which corresponds with and may explain the predicted loss of its activity as an upstream regulator.