cross-species data integration

Post on 10-May-2015

322 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Centre for Molecular Biology and Neuroscience, Rikshospitalet-Radiumhospitalet, Oslo, Norway, October 25, 2006

TRANSCRIPT

Cross-species data integration

Lars Juhl Jensen

EMBL Heidelberg

Lars Juhl Jensen

promoter analysis

Jensen et al., Bioinformatics, 2000

genome visualization

Pedersen et al., Journal of Molecular Biology, 2000

protein function prediction

data integration

Jensen et al., Drug Discovery Today: Targets, 2004

cell cycle

temporal interaction network

de Lichtenberg et al., Science, 2005

cross-species comparison

Jensen et al., Nature, 2006

STRING

373 proteomes

Genome Reviews

RefSeq

Ensembl

model organism databases

functional interactions

genomic context methods

gene neighborhood

gene fusion

phylogenetic profiles

Cell

Cellulosomes

Cellulose

correct interactions

wrong associations

gene neighborhood

sum of intergenic distances

gene fusion

sequence similarity

phylogenetic profiles

SVDSingular Value Decomposition

Euclidian distance

raw quality scores

not comparable

sum of intergenic distances

sequence similarity

Euclidian distance

benchmarking

calibrate vs. gold standard

raw quality scores

probabilistic scores

curated knowledge

KEGGKyoto Encyclopedia of Genes and Genomes

Reactome

MIPSMunich Information center

for Protein Sequences

STKESignal Transduction Knowledge Environment

primary experimental data

many sources

many parsers

physical protein interactions

BINDBiomolecular Interaction Network Database

GRIDGeneral Repository for Interaction Datasets

MINTMolecular Interactions Database

DIPDatabase of Interacting Proteins

HPRDHuman Protein Reference Database

merge data by publication

topology-based scores

von Mering et al., Nucleic Acids Research, 2005

co-expression

GEOGene Expression Omnibus

correlation coefficient

literature mining

different gene identifiers

synonyms lists

MEDLINE

SGDSaccharomyces Genome Database

The Interactive Fly

OMIMOnline Mendelian Inheritance in Man

co-mentioning

NLPNatural Language Processing

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxgene The GAL4 gene]

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

calibrate vs. gold standard

combine all evidence

spread over many species

transfer by orthology

von Mering et al., Nucleic Acids Research, 2005

two modes

orthologous groups

von Mering et al., Nucleic Acids Research, 2005

fuzzy orthology

von Mering et al., Nucleic Acids Research, 2005

Bayesian scoring scheme

Bork et al., Current Opinion in Structural Biology, 2005

predicting “mode of action”

Jensen et al., Drug Discovery Today: Targets, 2004

Jensen et al., Drug Discovery Today: Targets, 2004

NetworKIN

the idea

mass spectrometry

phosphorylation sites

in vivo

kinases are unknown

sequence motifs

kinase families

overprediction

in vitro

protein networks

STRING

context

in vivo

the algorithm

benchmarking

Phospho.ELM

ATM signaling

ATM phosphorylates Rad50

summary

integration

high-throughput data

computational methods

biological discoveries

hypotheses

highly specific

testable

Acknowledgments

The STRING team (EMBL)– Christian von Mering

– Berend Snel

– Martijn Huynen

– Sean Hooper

– Samuel Chaffron

– Julien Lagarde

– Mathilde Foglierini

– Peer Bork

Literature mining project(EML Research)– Jasmin Saric

– Rossitza Ouzounova

– Isabel Rojas

Cell cycle project (CBS)– Ulrik de Lichtenberg

– Thomas Skøt Jensen

– Søren Brunak

• The NetworKIN project– Rune Linding

– Gerard Ostheimer

– Francesca Diella

– Karen Colwill

– Jing Jin

– Rob Russell

– Michael Yaffe

– Tony Pawson

top related