microarray data analysis in genespring gx 11 · · 2016-08-30microarray data analysis in...

Month ##, 200X

Microarray Data Analysis

in

GeneSpring GX 11

Agenda

Genome Browser

GO

GSEA

Data Visualization Options

GeneSpring GX Genome Browser

Flexible and User-friendly Genome Browser

Scatter Plot

Histogram

Profile Plot

Annotation Tracks

GeneSpring GX 11 Genome Browser

Navigation:

• Intelligent navigation by

jumping to next entity of

interest (Copy number

region or SNP in list etc)

• Zoom in by dragging box in

Chromosome Navigator or

within the tracks

• Pan around region of

interest

• Search for Entity of Interest

by annotation (gene symbol

and coordinates, etc)

GeneSpring GX 11 Genome Browser

• Multiple samples or conditions can be displayed as individual tracks or “merged” in the same track

• Data from different experiment types can be displayed in same browser and “merged”

• Plot raw and normalized intensity values, copy number, LOD, and other list associated values

• Select multiple annotation tracks to be displayed (i.e. miRNA, CpG islands, CNVs from DGV etc)

Merge tracks

1 and 2

Genome Browser Terminology

• Organism: Species (Scientific and Common Name, Taxonomy ID)

• Build: Chromosomal number, size, and optional banding patterns (cytoband file)

• Features: Annotation Tracks

Hierarchy is

Organism

Build (one or more per organism)

Feature (one or more per build)

Adding Information to Genome Browser

• Create the organism if not present.

Human, mouse, rat, C. elegans available from Agilent server.

• Create a [new] build using a cytoband file.

• Add annotation tracks (.BED files or advanced import). Drag

and drop (text files) or menu-driven.

Where to get the data? UCSC is a good source.

http://www.genome.ucsc.edu

UCSC: Source for Genome Browser

Data

http://hgdownload.cse.ucsc.edu/downloads.html

Useful Manipulations

• Overlay or merge them (select the tracks to be overlaid

and use:

• Resize tracks to fit in window (right click in track window

and Change Track Size).

Why would I use this?

The Genome Browser is a visualization tool for annotations

as well as data.

For expression studies, you might view your expression data

in the context of chromosomal location (up- or down-

regulated genes that are physically adjacent might be due to

chromosomal aberrations).

Does your down-regulated gene have a known CpG island?

Visual thinkers?

GeneSpring GX 11:

Results Interpretations

GO Analysis (Fx, Process, Location)

GSEA (Gene Set Enrichment Analysis)

GSA (Gene Set Analysis)

Pathway Analysis

Find Significant Pathways (BioPax.org)

Link to IPA (Ingenuity Systems)

Extract relation via NLP (mine literature)

MeSH Pathway Builder

GO Analysis

Gene Ontology

Gene Ontology is a controlled vocabulary that describes gene products in

terms of their associated cellular components, molecular functions, and

biological process in a species-independent manner:

• Cellular Component: Where does the gene product act? Ex: cytoplasm,

extracellular matrix, etc

• Molecular Function: What is the activity of the gene product? Ex: insulin

receptor binding, drug transport activity, etc

• Biological Process: A series of events accomplished by one or more

ordered assemblies of molecular functions. Ex: cell division, cell motility,

etc.

GeneSpring GX is packaged with GO terms and their relationship as

provided in files provided from GO Ontology Consortium

• Ontology files will be periodically updated and provided as data updates in

tool (Annotations > Update Technology Annotations> From Agilent Server)

Is there a significant enrichment of my genes of

interest in a particular GO term?

Differentially expressed genes

identified from statistical analysis

GO Analysis Enrichment Scoring

The p-value calculated for each GO term indicates the likelihood that your genes of interest fell into that category, just by chance

If 400 out of 40,000 entities on the array were found in the cell division category, what is the likelihood that you would find 40 out of 400 entities in the input Entity List in the cell division category just by chance?

GeneSpring GX uses standard hypergeometric distribution and the following information to compute p-value:

• Number of entities in the Entity List with the particular GO term and its children

• Number of entities with the GO term on entire array

• The total number of entities in the Entity List

• The total number of entities on array

For unbiased calculation, multiple probes corresponding to same Entrez ID (gene) are collapsed before enrichment scoring performed

GO Ontology

Gene X

• Multiple null hypotheses are tested (one for each term

tested), so should apply a form of correction

• Hypotheses are not independent, as GO terms are related

within a hierarchy (each GO term can have multiple parents)

• GeneSpring GX applies the Benjamini Yekutelli correction

which takes into account the dependency among the GO

categories

GO Analysis Results

Gene Set Enrichment Analysis

GSEA (Gene Set Enrichment Analysis)

GSEA interrogates genome-wide expression profiles from

samples belonging to two different classes (e.g., normal and

tumor) and determines whether genes in a pre-defined gene set

correlate with class distinction.

Useful for hypothesis generation (before analysis) as well as

post-statistical analysis.

Reference: Subramanian et al. Gene set enrichment analysis:

A knowledge-based approach for interpreting genome-wide

expression profiles. PNAS. September 30, 2005, 10.1073

GSEA

GSEA can use either BROAD lists or any lists that you create.

Download gene sets at: http://www.broad.mit.edu/gsea/

Broad Institute has defined five categories of gene sets:

• C1- Grouped based on cytogenetic location.

• C2- Functional lists. ~1000 gene lists corresponding to pathways or functional process (if they are both involved in inflammatory response, they can also be in the same list)

• C3- Regulation lists. Grouped according by promoter analysis. Genes are regulated by the same motif (may or may not know transcription factor). Cases where they simply share same binding motif and therefore assumed to be co-regulated.

• C4- Proximity to known oncogene and tumor suppresors. For example, all the neighbors of BRCA.

• C5 – GO gene sets

Gene Symbol is required annotation field.

http://www.broad.mit.edu/gsea/

GSEA Method

1. Rank genes based on the correlation between their

expression intensities and class distinction

• Genes that differ most in their expression between the two classes will

appear at the top and bottom of the list

• Assumption is that genes related to the phenotypic distinction of the

classes will tend to be found at the top and bottom of the list

2. Calculate enrichment score (ES) to reflect the degree of

overrepresentation of genes in a particular gene set at the

top and bottom of the entire ranked list

3. Derive p-value for the ES to estimate its significance level

4. Adjust p-value for multiple testing

GSEA Input Parameters

All Entities list can be used as input list

Multiple pairs of conditions can be analyzed at the same time

GSEA Input Parameters

Minimum number of matching Genes:

minimum number of genes that must

match between the Gene Set and input

Entity List for Gene Set to be tested

Maximum number of permutations: max

number of permutation to be performed for

p-value computation

Search Options: Entity Lists can be used

as Gene Sets. Click Advanced Search

and Next>> to search for Entity List to use

BROAD Gene Sets: Choose Gene Sets

to use for analysis

Any Entity List in GeneSpring GX can be used as

Gene Set

Identifiers Necessary for GSEA

Technology must contain Gene Symbol

Columns that must be marked in custom technology to perform

GSEA:

• Annotation file must contain a column (Column X) containing Gene Symbol

– Column X must be marked “Gene Symbol”

– Select “Gene Symbol” mark from the drop-down menu while creating

Custom technology.

Gene Set Analysis (GSA)

GSEA and GSA share the same idea that it is more powerful to take a

genome-wide approach by ranking genes based on their correlation with the

phenotypes being tested and seeing if there is enrichment at the top and

bottom of this correlation matrix of genes in any gene sets.

The approach that they use to determine whether or not there is an

enrichment differs in a few key ways:

1. GSA differs from GSEA in the method of calculating the test statistic

2. GSA uses a different approach for estimation of false discovery rates

3. GSA can handle multiple classes

Algorithm and computation of associated metric is detailed in the paper

http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf




Pathway Analysis

Pathway Analysis

Two types of pathway analysis in GeneSpring GX:

1. Pathway Enrichment Analysis: (Statistical)

„Find Significant Pathways‟ Tool

(via BioPax format pathways)

2. Network Analysis: (Visualization)

„Pathway Analysis‟ Tool

Survey relationships from published literature OR

View networks based on experimental data

Importing and Visualizing BioPAX Pathways

BioPAX (Biological Pathway Exchange) is a standard pathway

data exchange format.

Pathways in the biopax format will have the extension .owl

Allows GeneSpring users to import pathway data from KEGG,

Reactome and many other standard pathway sites.

Database for any organism of interest can be created using the

Biopax files- Rice, zebra fish, chimpanzee, dog

Find Significant Pathways Tool

Is there a significant enrichment of my genes of interest in a particular

pathway?

Analysis will be performed on every pathway that has been imported into

GeneSpring GX and every pathway created in GeneSpring GX


• Click Finish to add all significant pathways into the currently active experiment

• Double Click on a pathway to open it in the viewer

Pathway Viewer

Layout of proteins can be changed – 6 options including cellular

view

Adding Pathways to Experiment

Any pathways imported into GeneSpring GX can be searched for and subsequently added to active experiment

• Search > Pathways

• Select Pathways to add

• Click on Add selected pathways to active experiment icon

Add selected

pathways to

active

experiment

icon Selected

pathways

Pathway Visualization

Using Pathways to View PubMed Abstract

Relationships

Create and view pathways based on MeSH terms.

Start with empty pathway, add terms, and expand network.

Use NLP tool to extract information from particular articles

(full-length) and then display relationships.

Viewing Pathways from Experimental Data

Use Network Building (Pathway Analysis) to answer

questions like:

How do the differentially regulated genes relate to each

other? Are they directly related? Or are they related via

intermediate proteins?

What are the common regulators of this set of genes?

Which small molecules might interact with a gene or set

of genes?

Relation from NLP inference

UBE2L3 up-regulates the

expression of MT1G. VEGFA

modulates this up-regulation of

expression

UBE2L3 up-regulates the

expression of MT1G. VEGFA

modulates this up-regulation of

expression

Example –advanced network building

Find all “protein targets” in the given list that are modified by “drugs”

Solution:

Algorithm: Expand the given list of proteins

Filter:

• Entity type = Small molecules

• Relation type = Binding, Regulation, Expression

Small molecules (drugs)

Proteins

Essential constituents of a network

• Nodes (molecular and biological entities)

• Relations (biological relations between entities)

• Edges (regulatory effects of a node on another)

Entities

Relations

Edges

Relationships and Entity Types

Pathway Viewer

Proteins with blue halo are in currently selected list

Hover over proteins or connections for information

Pathway Viewer

Expression information can be overlaid on the pathway

• Right click and select properties

• Choose interpretation overlap and the appropriate interpretation

Fold Change Data Overlay

• Fold Change data may also be overlaid on pathways.

• Right click in open space of pathway.

• Choose Overlay Properties and select an entity list that

contains FC.

• Overlay Column: change to FC Absolute.

Pathway Relation Database

The general computer requirements are the same as GeneSpring GX, but there are additional space requirements.

Database Size:

Infrastructure database 150 MB

Interaction databases: full install ~4 GB

Relations and entities are organized into separate databases for each organism

• Human- >1.4 million relations

• Mouse- 674,725 relations

• Rat- 767,296 relations

• Drosophila- 82,090 relations

• C. elegans- 43,122 relations

• Yeast- 94,992 relations

• E. coli- 10,876 relations

• Arabidopsis- 23,918 relations

More than 16 million abstracts were parsed.

NLP generated relations

NLP extraction pipeline: Majority of the GeneSpring pathway database relations are

derived from published Pubmed abstracts using text-mining.

DictionaryProteins, Enzymes etc..

PubMed

Molecular and Process/ Functions

InteractionsTEXT MINING

Input Sentence Entity Recognition Tagged

Sentence

Syntax Semantics

Inferencing

Apply grammar rules to

derive interactions

Pathway Analysis – Getting Started

Install the interaction database for organism of interest

Methods of Pathway Analysis in GeneSpring

GX 11

Simple Analysis

• This option allows the user to explore the most common functionalities of a pathway

analysis.

Advanced Analysis

• This option aims to allow the user to explore all the functionalities of a pathway analysis

in detail and change the settings at every step of the analysis, as required.

Building Pathways

Algorithms:

Direct – connects nodes in the given list

Expand – Step wise expansion of an initial set of nodes to include the first degree neighbors

Shortest connect – the minimum number of steps to trace a continuous path among all nodes in a list.

Filters:

Quality – confidence score assigned to the relations

Connectivity – Number of neighbors of each node (One can use this filter to rank the

neighboring entities and limit the number of entities in a pathway view.

• Total number of neighbors (Global)

• Number of neighbors within a given network (Local)

Type – Class of entities or relations

• Entity types (proteins, small molecules, enzymes, etc.)

• Relation types (binding, protein modifications, expression, transport, etc.)

Pathways are built using a combination of Algorithms and Filter parameters

Simple Pathway Analysis

Predefined Algorithms determine the type of analysis performed :

•Direct Interactions: Finds relations that connect the entities in the selected entity list.

•Network Targets: Finds downstream entity targets that connect two or more entities from the

original list.

•Network Regulators: Finds upstream entity regulators that connect to two or more entities from

the original list.

•Shortest Connect: Finds the smallest set of relations that will connect all entities in a given list

into a single network.

Simple Pathway Analysis

Preview pane for pathway shows if any relationships were found and how many new entities were added.

Click Next to save Pathway in Experiment

Modifying Pathway

Save Pathway View

Selection Mode

Zoom Mode

Pan Mode

Zoom to fit view to

visible area

Zoom Selected

Region

Zoom in

Zoom Out

Select all

Remove unlinked

entities from the view

Copy

selection Paste

Selection

Undo

Redo

Change Layout

Viewing Relationships and References

Double click on any node to discover more about the

relationship and source of information

Advanced Pathway Analysis

In Advanced Analysis, there are 3 algorithms to choose from:

• Direct: Finds relations that connect the entities in the selected entity list.

• Expand: Expands the existing network to include the first-degree neighbors

of the selected entities.

• Shortest Connect: Finds the smallest set of relations that would connect a

set of entities into a single network. Note that some intermediate entities may

be introduced in this process


Direct Interactions – Choose the types of relationships that are

of interest


Expand Interactions – Choose the types of relationships and

entities that are of interest


Shortest Connect – Choose the types of relationships and

entities that are of interest

Deleting Pathways

Deleting a pathway will remove the pathway from the database

within GeneSpring

Choose „Remove pathway‟ if you don‟t want to delete the

pathway permanently

Identifiers Necessary for Viewing Pathways and


Technology must contain Entrez Gene ID and/or SwissProt

Columns that must be marked in custom technology to view

pathways and perform Find Similar Pathways tool:

• Annotation file must contain a column X containing Entrez Gene ID and/or

a column Y containing SwissProt IDs

– Column X must be marked “Entrez Gene ID”

– Column Y must be marked “SwissProt”

Data Visualisation

Easier way of selecting multiple Entity Lists for

Venn Diagram

Entity List Selection window for Venn Diagram automatically opens to display all Entity Lists for all open experiments

Multiple Entity Lists can be selected from window at once (Ctrl click) to display in Venn Diagram

Entity Lists can also be dragged and dropped into Venn Diagram

Select Entity Lists

from window

Drag-and-Drop

Entity Lists

OR

Combine Entity List Associated Values from

Multiple Entity Lists Using Venn Diagram

Thank you!

microarray data analysis in genespring gx 11 · · 2016-08-30microarray data analysis in...

Documents