vectorbase gene expression data in vectorbase fotis kafatos, george christophides, bob maccallum...

42
VectorBas e Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to EBI, Sanger and ND)

Upload: matilda-farmer

Post on 16-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Gene expression data in VectorBase

Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond

Imperial College London

(thanks also to EBI, Sanger and ND)

Page 2: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Outline

1. Project goals

2. What’s currently available

3. Current challenges and future plans

Page 3: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Project goals

• For vector biologists:– Easy access to gene expression data

• consistent data processing

• For array specialists:– ArrayExpress submission– Advanced analysis tools– Array annotation

Page 4: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

• BASE: BioArray Software Environment

• http://base.thep.lu.se/• Open source, active

development and user community

• LIMS, data storage, export and analysis

• Web-based, user/group access control

• BASE 2.x adoption will bring Affy support

Page 5: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

Data submission

• Community submission guidelines available• First batch of experiments loaded by us• Bulk data loader• Sample/experiment annotation requires

intervention from curators

Page 6: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

• Data held in BASE is largely MIAME compliant

• Script for semi-automated export in TAB2MAGE format

• One experiment submitted so far

Page 7: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

Page 8: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

DATASUMMARIES

• BASE web interface offers powerful and extendable analysis environment

• Can be used for multi-site collaborations on pre-publication data

• Steep learning curve/not 100% intuitive

• Not easily linked to• We provide simpler

views so the casual user can quickly draw biological inferences

Page 9: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to
Page 10: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to
Page 11: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to
Page 12: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to
Page 13: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to
Page 14: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to
Page 15: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Page 16: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Standardised data

All displayed data is processed in the same way:

1. Poor quality spots removed• Currently using submitted spot flags

2. Normalisation• “lowess” for two-colour experiments

Page 17: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Page 18: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

DATASUMMARIES

PROBEMAPPING

• 3 probe types

• 6 array designs

• Mapping handled via Ensembl pipeline:– Oligo exonerate– PCR e-PCR– cDNA

exonerate2genes

Page 19: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

GENOMICDATA

AUTOMATICANNOTATION

GENOMEBROWSER

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

DATASUMMARIES

PROBEMAPPING

GFF3

Page 20: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

contigview

Page 21: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

featureview

Page 22: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Page 23: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

VECTOR BIOLOGISTS

ARRAY BIOLOGISTS GENOME BIOLOGISTS

ArrayExpress

‘PUBLIC’STORAGE

VectorBaseVectorBase

GENOMICDATA

AUTOMATICANNOTATION

GENOMEBROWSER

DATASUMMARIES

PROBEMAPPING

DATA MINING

Page 24: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to
Page 25: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to
Page 26: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to
Page 27: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

BioMart

• Beta version currently available– http://base.vectorbase.org:9999/biomart/martview

• Improvements still needed:– experiment annotations– Alignments (i.e. handle split alignments)

• Federation with current marts• Integration with new data?

Page 28: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Current challenges and future plans

• How do you want to query?

• CVs & ontologies

• APIs

• Community submission

• Manual annotation

Page 29: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Querying strategy

• What do you want to query on?– Fetch all genes upregulated under condition X– Fetch all experiments with gene X and condition Y– Fetch all probes with expression similar to probe X

• All essentially boil down to:– Define probe (genes etc)

– Define significant expression• ANOVA? • Up/down-regulation WRT what?

– Define experimental conditions• Sample annotation• Experimental design

Page 30: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

VECTOR BIOLOGISTS

ARRAY BIOLOGISTS GENOME BIOLOGISTS

CV / ONTOLOGY

ArrayExpress

‘PUBLIC’STORAGE

GENOMICDATA

AUTOMATICANNOTATION

GENOMEBROWSER

DATASUMMARIES

PROBEMAPPING

DATA MINING

Page 31: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

STORAGE& ANALYSIS

‘PUBLIC’STORAGE

GENOMEBROWSER

DATASUMMARIES

DATA MINING

BULKLOADER

EXPRESSIONDATA

GENOMICDATA

AUTOMATICANNOTATION

CV / ONTOLOGY

ArrayExpress

Array API ?AE API ? e! API

MartJ / MQL

PROBEMAPPING

Page 32: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Array API

Perl / Java objects for retrieval / handling of array data– Dual purpose:

• Consistency & efficiency of VB expression website • Computational access to VB data for all

– Objects must be:• General, DB-independent• Compatible with pre-existing Bio API (BioPerl / BioJava)

– Nb. May be pre-existing solution:• ArrayExpress API?• BioPerl-Expression?• MAGE-OM-stk

• http://neuron.cse.nd.edu/vectorbase/index.php/Array_API_proposal

Page 33: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Page 34: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Community data submission

• Carrot? – Help with ArrayExpress submission– Analysis tools– Dissemination

• Stick? – Outreach (courses, conferences)– Networking

Page 35: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

GE data manual annotators

• Gene-build designed arrays– Negative evidence less compelling

• EST clone-based arrays– http://tinyurl.com/vlkwo

Page 36: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Longer term plans

Host-parasite GE data integration & analysis

GE-clusters “upstream” regions regulatory elements, upstream TFs

RNAi phenotypes Images

Page 37: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Page 38: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Page 39: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

CVs & ontologies

• Integrate MGED and specialist ontologies for– Body parts– Developmental stages– Disease processes– …

• Allows comparison across experiments with similar experimental conditions

Page 40: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

BioMartMost biomarts:

• Gene-based

• Mostly ‘binary’ data– e.g. a gene either has a

signal domain or doesn’t

• Easily linked with other (gene-based) biomarts

VB Biomart:

• Probe based– Many probes not aligned

• Exp data less clear– e.g. define ‘differential

expression’

• Exports gene/trans IDs

for linking to other Marts

Page 41: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

Clustering

• A priority?• Easy to do on reporter level within

experiments• Harder to do at gene level across all

experiments– Binary gene profile: “yes/no differentially

expressed in experiment” ?

• Amazon-style links to “genes which may have similar expression profiles”?

Page 42: VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to

VectorBaseVectorBase

BASE 2.x

• Adoption delayed, now in progress

• Brings Affymetrix support

• Cleaner/modern interface

• Better API (Java)