joint ebi-wellcome trust

35
Joint EBI-Wellcome Joint EBI-Wellcome Trust Trust Summer School Summer School 14-18 June 2010 14-18 June 2010

Upload: sian

Post on 18-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Joint EBI-Wellcome Trust. Summer School 14-18 June 2010. Concepts, historical milestones & the central place of bioinformatics in modern biology: a European perspective. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Joint EBI-Wellcome Trust

Joint EBI-Wellcome TrustJoint EBI-Wellcome Trust

Summer SchoolSummer School14-18 June 201014-18 June 2010

Page 2: Joint EBI-Wellcome Trust

04/21/23 2

Concepts, historical milestones & Concepts, historical milestones & the central place of bioinformatics the central place of bioinformatics

in modern biology: in modern biology: a European perspectivea European perspective

Teresa K.Attwood University of Manchester

Page 3: Joint EBI-Wellcome Trust

04/21/23 3

Concepts, historical milestones & Concepts, historical milestones & the central place of bioinformatics the central place of bioinformatics

in modern biology: in modern biology: a personal perspective from a Europeana personal perspective from a European

Teresa K.Attwood University of Manchester

Page 4: Joint EBI-Wellcome Trust

04/21/23 4

Concepts, Concepts, historical milestones historical milestones & & the central place of bioinformatics the central place of bioinformatics

in modern biology: in modern biology: a personal perspective from a Europeana personal perspective from a European

Teresa K.Attwood University of Manchester

Page 5: Joint EBI-Wellcome Trust

• Where the concept of bioinformatics originated• Some key milestones & key people• Its place in ‘the new biology’

04/21/23 Teresa K.Attwood University of Manchester

5

OverviewOverview

Page 6: Joint EBI-Wellcome Trust

DisclaimerDisclaimer• Bear in mind that this is a personal view• That it’s hard

– to step out of a situation & look back in• & remain objective

– to separate the European & American histories

• Observers from different perspectives will see & tell the story differently!

• So this is just my perspective…– & it’s bound up with sequences & dbs

04/21/23 Teresa K.Attwood University of Manchester

6

Page 7: Joint EBI-Wellcome Trust

Origin of bioinformaticsOrigin of bioinformatics

• The origins of bioinformatics are rooted in sequence analysis

• And driven by the desire to – collect them – annotate them– & analyse them

• systematically (i.e., using computers)!

04/21/23 Teresa K.Attwood University of Manchester

7

The concept ‘bioinformatics’ was barely known pre 1990…

Page 8: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

8

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

Key milestonesKey milestones

ARPAnet

Page 9: Joint EBI-Wellcome Trust

Margaret DayhoffMargaret Dayhoff1925-19831925-1983

• Pioneered development of computer methods to compare protein sequences – & to derive evolutionary histories from alignments

• Particularly interested in deducing evolutionary connections from sequence evidence

04/21/23 Teresa K.Attwood University of Manchester

9

Page 10: Joint EBI-Wellcome Trust

Margaret DayhoffMargaret Dayhoff

• Collected all the known protein sequences – made them available to the scientific community

• In 1965, she compiled a book– the 1st Atlas of Protein Sequence and Structure

04/21/23 Teresa K.Attwood University of Manchester

10

Page 11: Joint EBI-Wellcome Trust

Margaret DayhoffMargaret Dayhoff

04/21/23 Teresa K.Attwood University of Manchester

11

Page 12: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

12

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

ARPAnet

65 se

quen

ces

Auto

pro

tein

sequ

ence

rs

email

DNA

sequ

encin

g

PDB

Auto

DNA

sequ

encin

g

Internet

7 st

ruct

ures

Key milestonesKey milestones

Page 13: Joint EBI-Wellcome Trust

Data overload in the USAData overload in the USA

04/21/23 Teresa K.Attwood University of Manchester

13

Page 14: Joint EBI-Wellcome Trust

Data overload in the USAData overload in the USA

04/21/23 Teresa K.Attwood University of Manchester

14

Page 15: Joint EBI-Wellcome Trust

Data overload in EuropeData overload in Europe

• The data overload problem had also been noticed in Europe

• The solution was to create the 1st nucleotide sequence database– this was the EMBL databank

• this preceded the 1st release of GenBank by ~6 months

04/21/23 Teresa K.Attwood University of Manchester

15

Page 16: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

16

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

ARPAnet

65 se

quen

ces

Auto

pro

tein

sequ

ence

rs

email

DNA

sequ

encin

g

PDB

Auto

DNA

sequ

encin

g

EMBL

, Gen

Bank

568

sequ

ence

s

PIR-

PSD

859

sequ

ence

s

Internet

7 st

ruct

ures

Key milestonesKey milestones

Page 17: Joint EBI-Wellcome Trust

Enter Amos BairochEnter Amos Bairoch

• A crazy postgrad student in Switzerland– interested in space exploration & the search for ET life

• His project was to develop software to analyse protein & nucleotide sequences– PC/Gene

04/21/23 Teresa K.Attwood University of Manchester

17

Page 18: Joint EBI-Wellcome Trust

Amos BairochAmos Bairoch

• He published his 1st paper in 1982• A letter to the BJ suggesting the use of

checksums to “facilitate the detection of typographical & keyboard errors”– a true computer nerd!

04/21/23 Teresa K.Attwood University of Manchester

18

Page 19: Joint EBI-Wellcome Trust

Amos BairochAmos Bairoch

• Why did he do this?• In the process of developing PC/Gene,

he typed in >1,000 protein sequences– some from the literature, most from the Atlas

• by 1981, this was a large book & several supplements, & listed 1,660 proteins

• it was not then available electronically

04/21/23 Teresa K.Attwood University of Manchester

19

Page 20: Joint EBI-Wellcome Trust

Amos BairochAmos Bairoch

• In 1983, he acquired a computer tape of the EMBL databank– this was version 2, with 811 sequences

• In 1984, he received the 1st available computer tape copy of the Atlas– (which quickly became the PIR-PSD)– but he was deeply unhappy with the PIR format

04/21/23 Teresa K.Attwood University of Manchester

20

Page 21: Joint EBI-Wellcome Trust

Amos BairochAmos Bairoch

• So he decided to convert the PIR database into the semi-structured format of EMBL– part manually & part automatically– the result was PIR+– it was distributed as part of PC/Gene (now commercial)

• In summer 1986, he decided to release the database independently of PC/Gene– so that it would be available to all, free of charge

04/21/23 Teresa K.Attwood University of Manchester

21

Page 22: Joint EBI-Wellcome Trust

Amos BairochAmos Bairoch

• The new database was called Swiss-Prot • The 1st release was made on 21 July 1986

– the exact number of entries is unknown, as he can’t find the original floppy disks!

04/21/23 Teresa K.Attwood University of Manchester

22

Page 23: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

23

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

ARPAnet

65 se

quen

ces

Auto

pro

tein

sequ

ence

rs

email

DNA

sequ

encin

g

PDB

Auto

DNA

sequ

encin

g

EMBL

, Gen

Bank

568

sequ

ence

s

PIR

DDBJ

, Sw

iss-P

rot

859

sequ

ence

s~3

,900

sequ

ence

s

PROS

ITE

PRIN

TS

58 e

ntrie

s30

ent

ries

Internet

7 st

ruct

ures

Key milestonesKey milestones

Page 24: Joint EBI-Wellcome Trust

Global data overload Global data overload

• The number of sequences was growing• The number of structures was growing• So was the number of protein family signatures• Two extraordinary developments had yet to

take place– what were they?

04/21/23 Teresa K.Attwood University of Manchester

24

Page 25: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

25

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

ARPAnet

65 se

quen

ces

Auto

pro

tein

sequ

ence

rs

email

DNA

sequ

encin

g

PDB

Auto

DNA

sequ

encin

g

EMBL

, Gen

Bank

568

sequ

ence

s

PIR

DDBJ

, Sw

iss-P

rot

859

sequ

ence

s~3

,900

sequ

ence

s

PROS

ITE

PRIN

TS

58 e

ntrie

s30

ent

ries

Internet

7 st

ruct

ures

wwwFl

yBas

e

Key milestonesKey milestones

Page 26: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

26

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

ARPAnet

65 se

quen

ces

Auto

pro

tein

sequ

ence

rs

email

DNA

sequ

encin

g

PDB

Auto

DNA

sequ

encin

g

EMBL

, Gen

Bank

568

sequ

ence

s

PIR

DDBJ

, Sw

iss-P

rot

859

sequ

ence

s~3

,900

sequ

ence

s

PROS

ITE

PRIN

TS

58 e

ntrie

s30

ent

ries

Internet

7 st

ruct

ures

HT D

NA se

quen

cingwww

H.in

fluen

zae

geno

me

M.ja

nnac

hii g

enom

e

S.ce

revi

sae

geno

me

D.M

elan

ogas

ter g

enom

e

H.sa

pien

s gen

ome

C.el

egan

s gen

ome

FlyB

ase

Pfam

Inte

rPro

2,42

3ent

ries

TrEM

BL

70,0

00 se

quen

ces

Key milestonesKey milestones

Page 27: Joint EBI-Wellcome Trust

04/21/23 27

InterProInterPro

PfamPfamProfilesProfiles

ProDomProDom PRINTSPRINTS

PrositeProsite

ProDomProDom

Original InterPro partnersOriginal InterPro partners

Teresa K.Attwood University of Manchester

Page 28: Joint EBI-Wellcome Trust

What is InterPro?What is InterPro?“InterPro is an integrated documentation resource

for protein families, domains & sites. By uniting databases that use different methodologies & a

varying degree of biological information, InterPro capitalises on their individual strengths,

producing a powerful integrated database & diagnostic tool.”

04/21/23 28Teresa K.Attwood University of Manchester

Page 29: Joint EBI-Wellcome Trust

The vision?The vision?• Naïvely, we wanted to make life easier!• We aimed to

– simplify & rationalise protein family analysis– centralise & streamline the annotation process

• & reduce manual annotation burdens– &, in the wake of all the genome projects, to facilitate

automatic functional annotation of uncharacterised proteins

04/21/23 29Teresa K.Attwood University of Manchester

In fact (& now with 11 partners) we made life a lot harder! But that’s another story…

Page 30: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

30

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

ARPAnet

65 se

quen

ces

Auto

pro

tein

sequ

ence

rs

email

DNA

sequ

encin

g

PDB

Auto

DNA

sequ

encin

g

EMBL

, Gen

Bank

568

sequ

ence

s

PIR

DDBJ

, Sw

iss-P

rot

859

sequ

ence

s~3

,900

sequ

ence

s

PROS

ITE

PRIN

TS

58 e

ntrie

s30

ent

ries

Internet

7 st

ruct

ures

HT D

NA se

quen

cingwww

H.in

fluen

zae

geno

me

M.ja

nnac

hii g

enom

e

S.ce

revi

sae

geno

me

D.M

elan

ogas

ter g

enom

e

H.sa

pien

s gen

ome

C.el

egan

s gen

ome

FlyB

ase

Pfam

Inte

rPro

2,42

3ent

ries

TrEM

BL

70,0

00 se

quen

ces

Key milestonesKey milestones

Page 31: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

31

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

ARPAnet

65 se

quen

ces

Auto

pro

tein

sequ

ence

rs

email

DNA

sequ

encin

g

PDB

Auto

DNA

sequ

encin

g

EMBL

, Gen

Bank

568

sequ

ence

s

PIR

DDBJ

, Sw

iss-P

rot

859

sequ

ence

s~3

,900

sequ

ence

s

PROS

ITE

PRIN

TS

58 e

ntrie

s30

ent

ries

Internet

7 st

ruct

ures

HT D

NA se

quen

cingwww

H.in

fluen

zae

geno

me

M.ja

nnac

hii g

enom

e

S.ce

revi

sae

geno

me

D.M

elan

ogas

ter g

enom

e

H.sa

pien

s gen

ome

C.el

egan

s gen

ome

FlyB

ase

Inte

rPro

Pfam

TrEM

BL

70,0

00 se

quen

ces

UniP

rot

2,42

3ent

riesKey milestonesKey milestones

Page 32: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

32

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

ARPAnet

65 se

quen

ces

Auto

pro

tein

sequ

ence

rs

email

DNA

sequ

encin

g

PDB

Auto

DNA

sequ

encin

g

EMBL

, Gen

Bank

568

sequ

ence

s

PIR

DDBJ

, Sw

iss-P

rot

859

sequ

ence

s~3

,900

sequ

ence

s

PROS

ITE

PRIN

TS

58 e

ntrie

s30

ent

ries

Internet

7 st

ruct

ures

HT D

NA se

quen

cingwww

H.in

fluen

zae

geno

me

M.ja

nnac

hii g

enom

e

S.ce

revi

sae

geno

me

D.M

elan

ogas

ter g

enom

e

H.sa

pien

s gen

ome

C.el

egan

s gen

ome

FlyB

ase

Inte

rPro

Pfam

TrEM

BL

70,0

00 se

quen

ces

UniP

rot

2,42

3ent

ries

10,8

67,7

98 se

quen

ces

185,

231,

366

sequ

ence

s

ENA

517,

100

sequ

ence

s

Key milestonesKey milestones

Page 33: Joint EBI-Wellcome Trust

04/21/23 Teresa K.Attwood University of Manchester

33

1950 1960 1970 1980 1990 2000 2010 2020

insu

linrib

onuc

leas

eDa

yhoff

Atla

s

ARPAnet

65 se

quen

ces

Auto

pro

tein

sequ

ence

rs

email

DNA

sequ

encin

g

PDB

Auto

DNA

sequ

encin

g

EMBL

, Gen

Bank

568

sequ

ence

s

PIR

DDBJ

, Sw

iss-P

rot

859

sequ

ence

s~3

,900

sequ

ence

s

PROS

ITE

PRIN

TS

58 e

ntrie

s30

ent

ries

Internet

7 st

ruct

ures

HT D

NA se

quen

cingwww

H.in

fluen

zae

geno

me

M.ja

nnac

hii g

enom

e

S.ce

revi

sae

geno

me

D.M

elan

ogas

ter g

enom

e

H.sa

pien

s gen

ome

C.el

egan

s gen

ome

FlyB

ase

Inte

rPro

Pfam

TrEM

BL

70,0

00 se

quen

ces

UniP

rot

2,42

3ent

ries

10,8

67,7

98 se

quen

ces

ENA

517,

100

sequ

ence

s18

5,23

1,36

6 se

quen

ces

hundreds more

billions more

hundreds more

Key milestonesKey milestones

Page 34: Joint EBI-Wellcome Trust

The central place of bioinformatics The central place of bioinformatics in modern biologyin modern biology

04/21/23 Teresa K.Attwood University of Manchester

34

• Hopefully, this potted history speaks for itself• In the last 30 years, bioinformatics has given us

– the first ‘complete’ catalogues of DNA & protein sequences• including genomes & proteomes of organisms across the Tree of Life

– software to analyse biological data on an unprecedented scale– & hence tools to help understand

• more about evolutionary processes in general• our place on the Tree of Life in particular• &, ultimately, more about health & disease

• It isn’t a panacea, but its contribution has been huge

Page 35: Joint EBI-Wellcome Trust

04/21/23 35Teresa K.Attwood University of Manchester

Recommended readingRecommended readingA.B.Richon. A short history of bioinformatics (http://www.netsci.org/Science/Bioinform/feature06.html)

A.Bairoch (2000) Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times. Bioinformatics, 16(1), 48-64.M.Ashburner (2006) Won for all – How the Drosophila genome was sequenced. Cold Spring Harbor Laboratory Press.B.J.Strasser (2008) GenBank – Natural history in the 21st century? Science, 322, 537-538.