seqexpress: introduction. features visualisation tools data: gene expression, gene function and...

30
SeqExpress: Introduction

Upload: nancy-chandler

Post on 11-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

SeqExpress: Introduction

Page 2: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Features

Visualisation Tools Data: gene expression, gene function and gene location. Analysis: probability models, hierarchies and clusters.

Analysis Tools Cluster analysis, refinement and validation. Using mixture modelling. Graphs and Hierarchies.

Data Tools Data Import/Export tools (Remote access of GEO, local

access of tab separated and MAGE format). Data Integration: optional underlying data and annotation

database. Data Manipulation.

Page 3: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

SeqExpress: Visualisation Tools

Page 4: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Visualisations

Data Visualisation: Gene Expression; Gene Variance; Gene Function/Ontology; and Chromosome Features.

Analysis Visualisations: Hierarchies/Graphs; Probabilistic Methods; and Cluster Comparison.

Page 5: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Gene Expression

Also: Histograms, Annotation lists and Gene Tables

Scatter Plots Parallel Plots

Page 6: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Gene VarianceGene Spectrums Gene Clouds

Page 7: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Gene Ontology VisualisationsTreeMaps

Graphs

Tables

Page 8: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Chromosome Feature Visualisations

Page 9: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Data AnalysisProbability Models

Dendrograms

Cluster Comparison

Page 10: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Example: Viewing Clusters

A cluster has been selected in the gene tab. The genes are then selected in a scatter plot, a parallel plot and the histogram.

Page 11: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Example: Gene Function Selection

The binding term has been selected from the results of an ontology term search. The binding term is then automatically selected in the Function tab, as well as the open Tree Map visualisation. All genes that have been annotated with the binding term are also selected in the parallel plot.

Page 12: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Example: Genome Location

A combined expression profile and location-based cluster analysis has been performed and the results viewed. The parallel plot shows the similar expression profiles, whilst the two genome views show the locale of the genes. The genome view in the middle is set to auto-zoom, and so shows the locale in detail.

Page 13: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Example: Data Analysis

A series of models have been generated, and the genes with a high probability of belonging to one of the models has been selected in the model viewer. The corresponding location of the genes and their expression profiles are then shown

Page 14: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Summary Number of visualisations available to support

variety of tasks: Expression Ontology (plus pathway and protein-protein interaction) Location Hierarchies Cluster comparison Variance Probability-theory

Visualisations inter-linked

Page 15: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

SeqExpress: Analysis Tools

Page 16: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Analysis Tools 1: Clusters, Hierarchies and Concepts Clustering:

Distance based Refinement (ontology or model based). Validation (C-Index)

Hierarchies: SDD*, Hierarchical Projection:

Covariance*: eigen(covar(A)) or A=USVT

Co-occurrence*: P(g,e)=P(g)ΣP(e|z)P(z|g)

*Used for global/enterprise-wide information retrieval

Page 17: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Cluster Distances

TERM:1

TERM:2

TERM:4

TERM:3

TERM:5

TERM:6

Y Z

1 2

43

Expression FunctionLocation

Pearson, CosineEuclidian, Manhattan.

Information theory:2*N3/(N1+N2+2*N3)

Intra gene distance distance to feature

Page 18: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

SAGE: Semi Discrete Decomposition

TYDXA

100

010

001

001

300

030

005

110

111

001

3300

3355

0055

•Immunity to outliers•Uses local density•Describes both experiments and genes•Hierarchical description•Stencils means that fold-in possible•Highly scalable

Page 19: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Analysis Tools 2: Models and Graphs

Graphs: Two factor analysis using (1)Graph Connectivity and (2) Edge Length.

Models: N-factor analysis using product rule: P(A,B|C)=P(A|BC)*P(B|C).

Multi-factor analysis to identify complex features within the data (e.g. genes which have both a similar expression profile and are located on the same part of a chromosome)

Page 20: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Models: Discovery

Different models can be found, and altered using energy parameters and tempering.

Page 21: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Size: 27

Size: 55

Size: 34 ( Missing values )

Size: 32 ( Ribosomal and phosphate metabolism )

Size: 42 ( mRNA, rRNA and tRNA processing )

Size: 53 ( Respiration and carbon regulation )

Size: 31 ( Energy and Osmotic stress I )

Size: 63 ( Energy, osmolarity and cAMP signaling )

Unsupervised Clusters Regulatory Modules

Size: 211

Size: 71

Size: 555

Size: 30 ( Cell cycle (G2/M) )

Size: 34 ( Missing values )

Size: 32 ( Ribosomal and phosphate metabolism )

Size: 42 ( mRNA, rRNA and tRNA processing )

Size: 38 ( AA metabolism II )

Size: 28 ( Mixed I )

Size: 76 ( DNA and RNA processing )

Size: 71 ( Cell cycle, TFs and DNA metabolism )

Size: 41 ( Energy and Osmotic stress II )

Size: 30 ( Nitrogen catabolite repression )

Size: 77 ( Sporulation and Cell wall )

Size: 59 ( Sporulation and cAMP pathway )

Size: 87 ( Unkown (sub-telomeric) )

Size: 54 ( Mixed II )

Unsupervised Clusters Regulatory Modules

Size: 27Size: 101

Size: 37Size: 13Size: 28Size: 72Size: 53Size: 52Size: 53Size: 49Size: 88Size: 19Size: 37Size: 40Size: 39Size: 29Size: 26Size: 75Size: 69Size: 79Size: 87Size: 43

Size: 34 ( Missing values )Size: 87 ( Mitochondrial and Signaling )Size: 74 ( Snf kinase regulated processes )Size: 28 ( Mixed I )Size: 77 ( ER and Nuclear )Size: 48 ( TFs and nuclear transport )Size: 59 ( Sporulation and cAMP pathway )Size: 64 ( Cell cycle and general TFs )Size: 41 ( Mixed III )Size: 32 ( Ribosomal and phosphate metabolism )Size: 42 ( mRNA, rRNA and tRNA processing )Size: 38 ( AA metabolism II )Size: 28 ( Unknown genes II )Size: 53 ( Respiration and carbon regulation )Size: 31 ( Energy and Osmotic stress I )Size: 63 ( Energy, osmolarity and cAMP signaling )Size: 41 ( Energy and Osmotic stress II )Size: 71 ( Cell cycle, TFs and DNA metabolism )Size: 86 ( Trafficking and Mitochondrial )Size: 47 ( Nuclear )Size: 77 ( Sporulation and Cell wall )Size: 59 ( Protein modification and trafficking )Size: 40 ( Cell differentiation )Size: 23 ( Cell wall and transport I )Size: 34 ( Mixed IV )

Unsupervised Clusters Regulatory Modules

Size: 37

Size: 76

Size: 75

Size: 36

Size: 107

Size: 51

Size: 122

Size: 81

Size: 49

Size: 789

Size: 52 ( AA and purine metabolism )

Size: 32 ( Ribosomal and phosphate metabolism )

Size: 42 ( mRNA, rRNA and tRNA processing )

Size: 38 ( AA metabolism II )

Size: 40 ( Cell differentiation )

Size: 41 ( Energy and Osmotic stress II )

Size: 61 ( Cell wall and Transport II )

Size: 48 ( TFs and nuclear transport )

Size: 53 ( Respiration and carbon regulation )

Size: 31 ( Energy and Osmotic stress I )

Size: 54 ( Mixed II )

Size: 64 ( Cell cycle and general TFs )

Unsupervised Clusters Regulatory Modules

Spline (beta 0.1) Linear (beta 0.6)

Cosine (beta 1.1)Normal (beta 0.1)

Page 22: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Models: Usage

Clusters generation: High probabilities equate to cluster membership.

Fitting data: Use normal tissues to fit models to genes, use disease tissues to fit genes to models. Changed behaviour equates to likelihood of model transition.

Combining models: complex feature identification (given feature X on condition Y).

Page 23: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Graph: Discovery

Graph connectivity equates to: MST of expression values Sub-graphs of the gene ontology Chromosome relationship

Edge Distance equates to: Expression distance Network (ontology) distance Linear chromosomal distance

Graph partitioned: regular (using Metis) irregular (Min/Max)

Page 24: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Analysis: Summary

Desktop analysis. Number of techniques available. Techniques can be customised for different

data sets (e.g. organism, array type). Borrows heavily from Information Retrieval. Probabilistic techniques show most promise.

Page 25: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

SeqExpress: Data Tools

Page 26: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Data Analysis

Data Import/Export tools: Remote access of GEO (one click access), Import tab separated and MAGE format. Export tab separated and Bioconductor format

Data Integration: data and annotation database. Automatic and configurable annotation mapping (e.g.

SAGE tag to locuslink (entrez gene?) to unigene) Data Manipulation: transformation, filtering and

constraining

Page 27: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Data Integration: GEO

Page 28: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Data Integration: Annotation Builder

Page 29: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

SeqExpress: Summary

Page 30: SeqExpress: Introduction. Features Visualisation Tools  Data: gene expression, gene function and gene location.  Analysis: probability models, hierarchies

Summary

Written in C#, is free and runs under windows. Not associated with any academic institution,

funding body or commercial organisation. Development is still ongoing. Plan to develop to the Expression Application

Class Specification. Looking for employment in Seattle…