maximizing the biological interpretation of gene ... training.pdf · sample to insight maximizing...
TRANSCRIPT
Sample to Insight
Maximizing the Biological Interpretation of Gene, Transcript & Protein Expression Data with IPA(Ingenuity Pathway Analysis)
Qian Dong, Ph.D.
Field Application Scientist
Sample to Insight
To install IPA client
1. Go to our website: http://www.ingenuity.com/products/ipa
2. Click on ‘LOGIN’
3. Click on ‘NEW: INSTALL IPA CLIENT’ under Ingenuity Pathway Analysis
Sample to Insight
Sample Prep
AssayData
Statistics on raw data
Variants/Genes/Protein
s of Interest
Biological interpretation
Hypothesis generation
Samp
le
Insigh
t
Pathway Analysis
Upstream Analysis ‘Primary’ ‘Secondary’ ‘Tertiary’
What is Ingenuity Pathway Analysis [IPA]?
Sample to Insight
QIAGEN Sample to Insight
Sam
ple
Ins
igh
t
Upstream Analysis ‘Primary’ ‘Secondary’ ‘Tertiary’
Sample
Prep
Assay
Data
Sequence-
Level
Statistics
Biology of
Interest
(Genes,
Variants, etc.)
Annotation &
Comparative
(Statistical)
Analysis
Annotation &
Biological
Interpretation
Sample to Insight
Sample to Insight: Secondary, Tertiary analysis
5
Data Analysis
Interpretation
RNA-seq
Sample to Insight
IPA gives insight to data
IPA analysis
‘Omics data Pathways
Functions
Regulators
Mechanisms
• Given a large set of gene/proteins that are activated/inhibited/affected:
• Which physiological processes are being affected?
• What specific pathways are likely being perturbed?
• What upstream regulators are involved?
Sample to Insight
IPA gives insight to data
• Given a gene/protein/compound
• What other molecules does it interact with?
• What side effects/processes is it associated with?
• What compounds affect its activity?
• Given a disease/process of interest:
• Which genes/proteins/metabolites are good biomarker candidates?
• Which are promising treatment targets?
IPA analysis
Pathways
Functions
Regulators
Mechanisms
Sample to Insight
The Ingenuity Knowledge Base
The Ingenuity Ontology
Ingenuity FindingsIngenuity® Expert Findings – Manually
curated Findings that are reviewed, from the
full-text, rich with contextual details, and are
derived from top journals.
Ingenuity® ExpertAssist Findings –
Automated text Findings that are reviewed,
from abstracts, timely, and cover a broad
range of publications.
Ingenuity Modeled KnowledgeIngenuity® Expert Knowledge – Content
we model such as pathways, toxicity lists,
etc.
Ingenuity® Supported Third Party
Information – Content areas include
Protein-Protein, miRNA, biomarker, clinical
trial information, and others
Ingenuity Knowledge Base
Ingenuity Knowledge Base
Sample to Insight
Directional Finding (Example)
genezygosity effect on disease
species evidence mutation type
disease
Activity of the
molecule in this
finding (decreased)
Infer
These findings power our analytics as well as our pathway building functionality
Sample to Insight
10
1 18 104 297686
1306
2306
3786
5608
7704
9972
12,513
14311
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Thru Jul2015
14,311 publications and growing!
Peer-reviewed publications citing QIAGEN’s Ingenuity products
Sample to Insight
Training Agenda
1111
SEARCHING THE KNOWLEDGEBASE:
– BIOPROFILER
– ISOPROFILER
– SEARCH & EXPLORE
DATASET ANALYSIS:
– CORE ANALYSIS
– COMPARISON ANALYSIS
Sample to Insight
Resources
12
Tutorials:
http://ingenuity.force.com/ipa/IPATutorials
Help and support:
http://ingenuity.force.com/ipa/IPA2SupportPage
Whitepapers:
http://www.ingenuity.com/products/ipa#/?tab=resources
Training Webinars:
http://www.ingenuity.com/science/training
Sample to Insight
Searching the knowledge base
Title, Location, Date 13
BioProfiler: Quickly profile a disease or phenotype by understanding its associated genes and compounds
IsoProfiler: Quickly profile isoforms in datasets
Search and explore: build your own pathway with knowledge base
Sample to Insight
Title, Location, Date 14
Searching the knowledge base
BioProfiler: Quickly profile a disease or phenotype by understanding its associated genes and compounds
IsoProfiler: Quickly profile isoforms in datasets
Search and explore: build your own pathway with knowledge base
Sample to Insight
BioProfiler: Quickly profile a disease, phenotype, or function
Get access to Ingenuity’s Knowledge Base:
use the KB to identify list of compelling genes/proteins/compounds that have
demonstrated some specified activity.
• Filter down to genes known to be causally associated with
Alzheimer’s
• Which genes when decreased in activity increase liver
cholestasis?
• What types of genetic evidence support this?
Sample to Insight
BioProfiler: Examples
Targets of toxicity:
Which genes when [decreased] in activity [increase] [liver cholestasis]? What
types of [genetic] evidence support this?
Target discovery:
What [heterozygous knockouts] in [mouse] can [decrease] [asthma]?
Which drugs or which targets have been in late stage clinical trials or approved
to decrease [diabetes]?
Biomarker research:
Which genes are potential [diagnosis OR prognosis] biomarkers of [breast
cancer] and are [upregulated] in breast cancer?
Sample to Insight
BioProfiler
Identify possible Drug Targets for Breast Cancer:
Which proteins/genes when decreased in activity are shown to decrease breast cancer phenotype in Mice?
Sample to Insight
18
Sample to Insight
19
Sample to Insight
20
Sample to Insight
21
Sample to Insight
22
Sample to Insight
Searching the knowledge base
Title, Location, Date 23
BioProfiler: Quickly profile a disease or phenotype by understanding its associated genes and compounds
IsoProfiler: Quickly profile biological functions of isoforms in datasets
Search and explore: building your own pathway with knowledge base
Sample to Insight
24
filter to determine if certain isoforms (splice variants and their products) are known to drive a disease or process
IsoProfiler
Sample to Insight
Searching the knowledge base
Title, Location, Date 25
BioProfiler: Quickly profile a disease or phenotype by understanding its associated genes and compounds
IsoProfiler: Quickly profile biological functions of isoforms in datasets
Search and explore: building your own pathway with knowledge base
Sample to Insight
26
Building networks based solely on current literature
Organize existing data: Visualize what is currently known in the literature and
databases.
‘What is known to be affected by SNAI1 gene?’
‘Is SNAI1 associated with breast cancer?’
Maximize biological interpretation: Once have obtained a subset of genes from core
analysis, further explore the biological network. (This will be demonstrated in the
afternoon)
‘what pathways are potentially regulated by predicted upstream regulator TGFB1?’
What drugs are targeting the ILK signaling pathway which is perturbed according to
my RNA-seq data?
Search & Explore
Sample to Insight
Mesenchymal / stem cell-like breast cancerLuminal Breast cancer Basal HER2-enriched
SNAI1 is overexpressed in
Claudin-low cell lineLuminal cell lines Claudin-low cell lines
Epithelial to Mesenchymal Transition
Case study: breast cancer
Sample to Insight
Questions:
28
Given that SNAI1 is overexpressed in Claudin-low cell line:
• What genes/proteins are known to be affected by SNAI1 activity?
• Are these SNAI1-affected molecules involved in a common
biological function/process?
• If so, what would be the ultimate impact of SNAI1
activation/suppression on that process?
Sample to Insight
SNAI1 downstream effect modeling
Search and Explore
Modeled effect on
EMT when
activating
expression of
SNAI1
Sample to Insight
Searching the knowledge base
Title, Location, Date 30
Search and explore: build your own pathway with
knowledge base
‘What is known to be affected by SNAI1
gene?’
BioProfiler: Quickly profile a disease or
phenotype by understanding its associated
genes and compounds
‘Which kinases when decreased in activity are
shown to decrease breast cancer phenotype in
Mice?’
IsoProfiler: Quickly profile isoforms in datasets
Sample to Insight
Core Analysis
Sample to Insight
Mesenchymal / stem cell-like breast cancerLuminal Breast cancer Basal HER2-enriched
Ratio Claudin-low to Luminal5 vs 5 cell lines, RNA-Seq dataLuminal cell lines Claudin-low cell lines
Epithelial to Mesenchymal Transition
Case study: breast cancer
Sample to Insight
Given the large dataset, You may want to ask these questions:
33
How are Claudin-low and Luminal cell lines different regarding genes and
pathways?
What known biological pathways appear most significantly affected by the
genes in my data set?
What genes within a pathway are changing in expression and what effect
might that change have on the pathway?
Can I identify a drug or drug target?
What are the downstream effects of EMT? Can I learn more about the biology
of claudin-low cell line? What cellular functions are affected?
Can I create hypotheses that explain what may be occurring upstream to cause
particular phenotypic or functional outcomes downstream?
How are the genes in my dataset connect to each other? Can I visualize the
gene network?
Sample to Insight
IPA core analysis: data-derived networks
Canonical Pathway Analysis
Predicts pathways that are changing
based on your dataset
Predict directional effects on the
pathway molecules not in dataset
(MAP overlay tool)
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Sample to Insight
IPA core analysis: data-derived networks
Upstream Regulator Analysis
Predicts activated/inhibited regulators
responsible for observed data
Predicts master regulators (causal
network)
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Sample to Insight
IPA core analysis: data-derived networks
Diseases and Functions Analysis
Predicts the directional biological
effects (cellular processes, biological
function) of gene/protein set
– “Increase in cell cycle”
– “Decrease in apoptosis”
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Sample to Insight
IPA core analysis: data-derived networks
Regulator Effects
Identifies specific hypothesis:
upstream regulator pathways leading
to a downstream phenotype.
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Sample to Insight
IPA core analysis: data-derived networks
Networks
Identifies gene networks within
dataset.
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Sample to Insight
Data upload
39
Three file formats:
Excel spreadsheet
(single sheet only)
tab delimited text file
Cuffdiff file
One ID column and
header row
Multiple observation or
single observation
Sample to Insight
Dataset with Multiple Observations
Title, Location, Date 40
Sample to Insight
Supported identifiers for data upload
Vendor IDs Gene Protein RNA-Seq MicroRNA SNP Chemical
Affymetrix Entrez Gene
(LocusLink)*
GenPept Ensembl miRBase
(mature)
Affy SNP IDs CAS Registry
Number
Agilent GenBank International
Protein Index
(IPI)
RefSeq miRBase
(stemloop)
dbSNP HMDB
ABI Gene Symbol-
human (HUGO/
HGNC, EG)
UniProt/ Swiss-
Prot Accession
UCSC
(hg18)
KEGG
Codelink Gene Symbol-
mouse (EG)
UCSC
(hg19)
PubChem CID
Illumina Gene Symbol- rat
(EG)
Ingenuity GI Number
UniGene
INGENUITY PATHWAY ANALYSIS
Sample to Insight
Data Upload Format Examples
42
Typical value-types that are uploaded to IPA
Identifier List
+differential
expression
+significance stat
+RPKM
(maximum RPKM between
experimental condition and control
recommended for RNAseq)
+variant gain/loss
Sample to Insight
Data Upload Format Examples
43
Typical value-types that are uploaded to IPA
Identifier List
+differential
expression
+significance stat
+RPKM
(maximum RPKM between
experimental condition and control
recommended for RNAseq)
+variant gain/loss
Required
Sample to Insight
IPA Upstream Regulator Analysis
↑ ↓↓ ↑ ↑ Differential Gene Expression (Uploaded Data)↑
1 -11 1 1 1
+++-
Note that the actual z-score is weighted by the underlying findings, the relationship bias, and dataset bias
• z-score is a statistical measure of the match between expected relationship
direction and observed gene expression
• z-score > 2 or < -2 is considered significant
Literature-based effect TF/UR has on
downstream genes
Every possible TF & Upstream Regulator in the
Ingenuity Knowledge Base is analyzed
++
= (7-1)/√8 = 2.12 (= predicted activation)
↓
-
1
↑
1
+
Predicted activation state of TF/UR:1 = Consistent with activation of UR
-1 = Consistent with inhibition of UR
Sample to Insight
Causal Network Analysis
Advanced Analytics
Alternate method of predicting upstream regulators based on causal
relationships and allowing multiple interaction steps to gene expression
changes
Identify potential novel master-regulators of your gene expression by creating
pathways of literature-based relationships
Expands predictions to include indirect upstream regulators not in mechanistic
networks
Upstream Regulators
A
Targets in the dataset
Upstream
Regulator
Scoring
Casual connection to
disease, phenotype,
function, or gene of
interest
B
Causal Networks
Master Regulator
TF
Sample to Insight
47
Upstream regulator
Causal network
Sample to Insight
Regulator Effects
49
Targets in the dataset
Upstream
Regulator
Disease or
Function
Algorithm
Causally consistent networks score higher
The algorithm runs iteratively to merge additional regulators with diseases and functions
First iteration
Hypotheses for how activated or inhibited upstream
regulators cause downstream effects on biology
Displays a relationship between the
regulator and disease/function if it exists
Downstream Effects Analysis
Disease or
Function
Upstream Regulator Analysis
Upstream
Regulator
Simplest Regulator Effects result
Sample to Insight
IPA core analysis: data-derived networks
Question: How are Claudin-low and Luminal
cell lines different regarding genes and known
pathways?
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Sample to Insight
IPA core analysis: data-derived networks
Question: What upstream molecules are
regulating the changes of EMT? Can I identify
a drug or drug target?
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Genes/P
roteins
affected
in
dataset
Known
targets
of UR
Molecules detected in
dataset AND targets of
UR
Enrichment p-value
Sample to Insight
IPA core analysis: data-derived networks
Question: What are the downstream
effects of EMT? Can I learn more about
the biology of EMT? What cellular
functions are affected?
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Sample to Insight
IPA core analysis: data-derived networks
Question: Can I create hypotheses that
explain what may be occurring upstream
to cause particular phenotypic or
functional outcomes downstream?
Hypothesis: SNAI1/Mek/ZEB1 regulate
EMT by activating or repressing ten
genes in the dataset.
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Sample to Insight
IPA core analysis: data-derived networks
Question: How are the genes in my
dataset connect to each other? Can I
visualize the gene network?
Regulatory
Effects
Disease and
Functions
Canonical
Pathways
Upstream Regulator
Analysis
Networks
Sample to Insight
58
Search and ExploreBuild, Overlay and other tools
Bioprofiler and IsoprofilerProfiling genes, isoforms, phenotypes and diseases
Core AnalysisPathways, Diseases, regulators enrichment
Tools are interconnected
Sample to Insight
Core Data Analysis- Summary
59
Canonical Pathway Analysis
Predicts pathways that are changing based on your dataset
Predict directional effects on the pathway molecules not in dataset (MAP overlay
tool)
Upstream Regulator Analysis
Predicts activated/inhibited regulators responsible for observed data
Predicts master regulators
Diseases and Functions Analysis
Predicts the directional biological effects (cellular processes, biological function)
of gene/protein set
– “Increase in cell cycle”
– “Decrease in apoptosis”
Regulator Effects
Identifies specific hypothesis: upstream regulator pathways leading to a
downstream phenotype.
Networks
Identifies gene networks within dataset.
Sample to Insight
60
Comparison Analysis
Sample to Insight
61
Comparison Analysis
Compare data from multiple observation datasets
Timecourse: 12hr, 24hr, 48hr
Different treatment: drug 1, drug 2, drug 3
Same treatment, different models: human cells, mice model 1/2/3
Questions:
Between three different core analyses/observations,
Any canonical pathway/upstream regulator/disease and function in
common?
How are they different?
Sample to Insight
Data Upload: one or multiple excel files
62
MCF7 breast cancer cell line treated with estrogen: 12hr, 24hr, 48hr
Sample to Insight
Title, Location, Date 63
Sample to Insight
64
Sample to Insight
65
Sample to Insight
66
Sample to Insight
Gene heatmap
67
Sample to Insight
Sort
68
Sample to Insight
Title, Location, Date 69
Sample to Insight
70
Sample to Insight
71
Sample to Insight
microRNA Target Filter
Sample to Insight
microRNA analysis in IPA
73
Core analysis directly on microRNA IDs:
Canonical pathways affected by microRNA
Upstream regulators that act on microRNA
Downstream diseases and functions affected by
microRNA
Derive regulator effects networks
Little is known about pathways that involve microRNAs and phenotypes associated with microRNAs directly.
Much is known about which genes your microRNAs target!
Analyze known gene targets of expressed/repressed microRNAs
Sample to Insight
microRNA target filter workflow
Title, Location, Date 74
Sample to Insight
Case Study: Metastatic melanoma
Title, Location, Date 75
Experimental conditions: IGR37 metastatic melanoma cell line ratio'ed to NHEM normal skin cell line
Same cells – miRNA differential expression (miRNA expression profiling)
-- mRNA differential expression (microarray)
Questions:
1. What are the potential miRNA targets?
2. Any of those miRNA targets overlap with mRNA data?
3. What are the potential impact of miRNA differential expression?
Sample to Insight
Title, Location, Date 76
microRNA Target Filter: identify relevant gene targets
miRNA measured Relationship Target mRNA
Sample to Insight
microRNA Target Filter: identify relevant gene targets
77
miRNA measured Relationship Target mRNA Target info
Sample to Insight
microRNA Target Filter: identify relevant gene targets
78
Identify target relationships with best evidence
miRNA measured Relationship Target mRNA Target info
Sample to Insight
79
miRNA measured Relationship Target mRNATarget info
microRNA Target Filter: identify relevant gene targets
79
Identify pathways and downstream gene functions with Core Analysis
Sample to Insight
microRNA Target Filter: identify relevant gene targets
Title, Location, Date 80
Visualize miRNA-mRNA relationships
Sample to Insight
microRNA Target Filter: identify relevant gene targets
Title, Location, Date 81
Visualize miRNA-mRNA relationships
Sample to Insight
8282
ADVANCED ANALYTICS
– CAUSAL NETWORK
– BIOPROFILER
– ISOPROFILER
– RELATIONSHIP EXPORT
Sample to Insight
Relationship Export
Title, Location, Date 83
Sample to Insight
84
Ingenuity Pathway Analysis
Learn about genes, proteins, and metabolites and explore their relationships with
biological processes and diseases in Ingenuity Knowledge Base™
Discover and explore known relationships between drugs and targets
Derive biological pathways, regulatory mechanisms, and interaction networks from
your ‘omics data
Identify changes in pathways and regulatory mechanisms across experimental
conditions (time course, dose response, etc)
Compare pathways and regulators across experimental groups
QUESTIONS?CONTACTS:
General: [email protected]