concepts, historical milestones & the central place of bioinformatics in modern biology:

33
07/04/22 1 Teresa K.Attwood University of Manchester

Upload: delta

Post on 16-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Concepts, historical milestones & the central place of bioinformatics in modern biology:. a European perspective. Overview. Where the term bioinformatics originated Where the ‘ modern ’ concept originated Some key events & folk Its place in ‘ the new biology ’. Origin of Bioinformatics. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Concepts, historical milestones & the central place of bioinformatics in modern biology:

04/21/23 1Teresa K.Attwood University of Manchester

Page 2: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• Where the Where the termterm bioinformatics originated bioinformatics originated

• Where the ‘modern’ Where the ‘modern’ conceptconcept originated originated

• Some key events & folkSome key events & folk

• Its place inIts place in‘‘the new biologythe new biology’’

04/21/23 Teresa K.Attwood University of Manchester

2

Page 3: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• The origin of the The origin of the termterm ‘bioinformatics’ has been ‘bioinformatics’ has been attributed to Paulien Hogewegattributed to Paulien Hogeweg– Dutch theoretical biologist Dutch theoretical biologist

• She & colleague Ben Hesper coined the term in the She & colleague Ben Hesper coined the term in the early ‘70s, defining it as early ‘70s, defining it as – ““the study of informatic processes in biotic systems”the study of informatic processes in biotic systems”

• Hogeweg, P. (2011) The roots of bioinformatics in theoretical Hogeweg, P. (2011) The roots of bioinformatics in theoretical biology. biology. PLoS Computational BiologyPLoS Computational Biology, , 77(3), e1002021(3), e1002021

• The term failed to gain traction for ~20 yearsThe term failed to gain traction for ~20 years

04/21/23 Teresa K.Attwood University of Manchester

3

Page 4: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• The origins of the ‘modern’ The origins of the ‘modern’ conceptconcept of bioinformatics of bioinformatics

are are rooted in rooted in sequencesequence analysisanalysis

• Driven Driven by the desire to by the desire to – collectcollect

– annotateannotate

– & analyse sequence data& analyse sequence data

• systematically (systematically (i.ei.e., using computers)!., using computers)!

04/21/23 Teresa K.Attwood University of Manchester

4

ThisThis concept of concept of‘‘bioinformaticsbioinformatics’’was barely known pre 1990…was barely known pre 1990…

Page 5: Concepts, historical milestones & the central place of bioinformatics in modern biology:

1950 1960 1970 1980 1990 2000 20101950 1960 1970 1980 1990 2000 2010

insu

linin

sulin

ribon

ucle

ase

ribon

ucle

ase

Dayh

off A

tlas

Dayh

off A

tlas

GIVEQCCASVCSLYQLENYCN

FVNQHLCGSHLVEALYLVCGERGFFYTPKA

CSD

CSD

Page 6: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• Pioneer of computer methods to compare proteins Pioneer of computer methods to compare proteins – & to derive evolutionary histories from & to derive evolutionary histories from alignments alignments

• Particular interest in deducing evolutionary connections Particular interest in deducing evolutionary connections

from sequence evidencefrom sequence evidence

04/21/23 Teresa K.Attwood University of Manchester

6

Page 7: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• Collected all the known protein sequences Collected all the known protein sequences – made them available to the scientific communitymade them available to the scientific community

• In 1965, she compiled a bookIn 1965, she compiled a book– Atlas of Protein Sequence & StructureAtlas of Protein Sequence & Structure

04/21/23 7Teresa K.Attwood University of Manchester

Page 8: Concepts, historical milestones & the central place of bioinformatics in modern biology:

““There is a tremendous amount of information There is a tremendous amount of information

regarding the evolutionary history and biochemical regarding the evolutionary history and biochemical

function implicit in each sequence andfunction implicit in each sequence and the number of the number of

known sequences is growing explosivelyknown sequences is growing explosively. . We feel it We feel it

is important to collect this significant information, is important to collect this significant information,

correlate it into a unified whole and interpret itcorrelate it into a unified whole and interpret it” ”

M.O.Dayhoff to C.Berkley, February 27, 1967M.O.Dayhoff to C.Berkley, February 27, 1967Strasser, B. (2008)Strasser, B. (2008)

““GenBank – Natural history in the 21GenBank – Natural history in the 21stst century?” century?” ScienceScience, , 322322, 537-538, 537-538

04/21/23 Teresa K.Attwood University of Manchester

8

Page 9: Concepts, historical milestones & the central place of bioinformatics in modern biology:

1950 1960 1970 1980 1990 2000 20101950 1960 1970 1980 1990 2000 2010

insu

linin

sulin

ribon

ucle

ase

ribon

ucle

ase

Dayh

off A

tlas

Dayh

off A

tlas

CSD

CSD

ARPAnet

ARPAnet

PDB

PDB

65 7Auto

pro

tein

sequ

ence

rs

Auto

pro

tein

sequ

ence

rs

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

Auto

DNA

sequ

encin

g

Auto

DNA

sequ

encin

g

Exam 1Exam 1

What pernicious, life-changing What pernicious, life-changing development occurred in 1971?development occurred in 1971?

Page 10: Concepts, historical milestones & the central place of bioinformatics in modern biology:

““the rate limiting step in the process of nucleic acid the rate limiting step in the process of nucleic acid

sequencing is now shifting from data acquisition sequencing is now shifting from data acquisition

towards the towards the organizationorganization and analysis of that data and analysis of that data””

Gingeras, T.R. & Roberts, R.J. (1980)Gingeras, T.R. & Roberts, R.J. (1980)

““Steps toward Computer Analysis of Nucleotide Sequences,” Steps toward Computer Analysis of Nucleotide Sequences,”

ScienceScience, , 209209, 1322-1328, 1322-1328

04/21/23 Teresa K.Attwood University of Manchester

10

Page 11: Concepts, historical milestones & the central place of bioinformatics in modern biology:

““a a centralized data bank centralized data bank [is] essential for the efficient [is] essential for the efficient

use of nucleic acid sequence informationuse of nucleic acid sequence information””

C.Anderson, Minutes, 1980C.Anderson, Minutes, 1980

04/21/23 Teresa K.Attwood University of Manchester

11

Page 12: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• While the US debated where to locate a new While the US debated where to locate a new

centralised resource, EMBL acted…centralised resource, EMBL acted…

• The 1The 1stst internationally funded, public ‘central’ internationally funded, public ‘central’

nucleotide sequence database was thus European nucleotide sequence database was thus European – the EMBL data library, Heidelbergthe EMBL data library, Heidelberg

• preceded the 1preceded the 1stst release of GenBank by ~6 months release of GenBank by ~6 months

04/21/23 Teresa K.Attwood University of Manchester

12

Attwood, T.K. Attwood, T.K. et alet al. (2011) . (2011) Concepts, Historical Milestones & the Central Place of Bioinformatics in Modern Biology: Concepts, Historical Milestones & the Central Place of Bioinformatics in Modern Biology:

A European PerspectiveA European PerspectiveIn In Bioinformatics - Trends & MethodologiesBioinformatics - Trends & Methodologies , Intech Online Publishers, , Intech Online Publishers,

Page 13: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• Copies of the EMBL data library & GenBank were Copies of the EMBL data library & GenBank were

being maintained in Cambridgebeing maintained in Cambridge– together with their search tools, together with their search tools, etc.etc.

• An integrated system gave access to the dbs & toolsAn integrated system gave access to the dbs & tools– ““this system is presently being used by over 30 researchers this system is presently being used by over 30 researchers

in 8 departments in the University & in local research in 8 departments in the University & in local research

institutes. These users can keep in touch with each other via institutes. These users can keep in touch with each other via

the MAIL commandthe MAIL command”!”!

04/21/23 Teresa K.Attwood University of Manchester

13

Page 14: Concepts, historical milestones & the central place of bioinformatics in modern biology:

1950 1960 1970 1980 1990 2000 20101950 1960 1970 1980 1990 2000 2010

insu

linin

sulin

ribon

ucle

ase

ribon

ucle

ase

Dayh

off A

tlas

Dayh

off A

tlas

CSD

CSD

ARPAnet

ARPAnetemail

email

PDB

PDB

65 7Auto

pro

tein

sequ

ence

rs

Auto

pro

tein

sequ

ence

rs

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

Auto

DNA

sequ

encin

g

Auto

DNA

sequ

encin

g

Internet

Internet

EMBL

, Gen

Bank

EMBL

, Gen

Bank

PIRPIR

568 859

Page 15: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• A A crazy crazy postgrad student in Switzerlandpostgrad student in Switzerland– interested in space exploration & the search for ET lifeinterested in space exploration & the search for ET life

• His project was to develop s/w to analyse protein & His project was to develop s/w to analyse protein &

nucleotide sequencesnucleotide sequences– PC/GenePC/Gene

04/21/23 Teresa K.Attwood University of Manchester

15

Page 16: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• Published his 1Published his 1stst paper in 1982 paper in 1982– a letter to the a letter to the BJBJ

• Suggested use of checksumsSuggested use of checksums– ““toto facilitate detection of typographical & keyboard errorsfacilitate detection of typographical & keyboard errors””

04/21/23 Teresa K.Attwood University of Manchester

16

Page 17: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• Why?Why?

• Alongside PC/Gene, he needed to supply a dbAlongside PC/Gene, he needed to supply a db

• The The AtlasAtlas wasn’t available electronically wasn’t available electronically– typed in >1,000 protein sequencestyped in >1,000 protein sequences

– some from the literaturesome from the literature

– most from the most from the AtlasAtlas

• by 1981, this was a large book, plus several by 1981, this was a large book, plus several

supplements, listing 1,660 proteinssupplements, listing 1,660 proteins

04/21/23 Teresa K.Attwood University of Manchester

17

Page 18: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• In 1983, he acquired a computer tape of the EMBL In 1983, he acquired a computer tape of the EMBL

Data LibraryData Library– version 2, with 811 sequencesversion 2, with 811 sequences

• In 1984, he received the 1In 1984, he received the 1stst available computer tape available computer tape

copy of the copy of the AtlasAtlas– (which became known as the PIR-PSD)(which became known as the PIR-PSD)

– but… he disliked the PIR formatbut… he disliked the PIR format

04/21/23 Teresa K.Attwood University of Manchester

18

Page 19: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• So he converted the PIR database into the semi-So he converted the PIR database into the semi-

structured format of EMBLstructured format of EMBL– part manually & part automaticallypart manually & part automatically

• The result was PIR+The result was PIR+– & was distributed as part of PC/Gene (now commercial)& was distributed as part of PC/Gene (now commercial)

• In summer 1986, he finally released the database In summer 1986, he finally released the database

independently of PC/Geneindependently of PC/Gene– to make it available to all, free of chargeto make it available to all, free of charge

04/21/23 Teresa K.Attwood University of Manchester

19

Page 20: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• This new database was called Swiss-Prot This new database was called Swiss-Prot

• 11stst released on 21 July 1986 released on 21 July 1986– the exact number of entries is unknown, as he the exact number of entries is unknown, as he lostlost the the

original floppy disks!original floppy disks!

04/21/23 Teresa K.Attwood University of Manchester

20

Page 21: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• As part of his work on PC/Gene, he created another As part of his work on PC/Gene, he created another key database key database – diagnostic tool for characterising protein familiesdiagnostic tool for characterising protein families

• 11stst released March1989, with 58 entries released March1989, with 58 entries– this was PROSITEthis was PROSITE

• Philosophy of his approachPhilosophy of his approach– coupling high quality data analysis with manual annotationcoupling high quality data analysis with manual annotation

04/21/23 Teresa K.Attwood University of Manchester

21

Page 22: Concepts, historical milestones & the central place of bioinformatics in modern biology:

21/04/23 Teresa K Attwood University of Manchester

22

PRINTSPRINTSPRINTSPRINTS

[IVM]-[AS]-L-W-S-L-V2-L-A-[IV]-E-R-Y-[IV]3-C-K-P-M[IVM]-[AS]-L-W-S-L-V2-L-A-[IV]-E-R-Y-[IV]3-C-K-P-M PROSITEPROSITEPROSITEPROSITE

Page 23: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• Database annotation…Database annotation…

21/04/23 Teresa K Attwood University of Manchester

23

DatabaseDatabaseMaintenaMaintenancence

DatabasDatabase e annotatiannotationon

Nirva

Nirva

nana

Page 24: Concepts, historical milestones & the central place of bioinformatics in modern biology:

21/04/23 Teresa K Attwood University of Manchester

24

““It is quite depressive to think that we are spending millions It is quite depressive to think that we are spending millions

in grants for people to perform experiments, produce new in grants for people to perform experiments, produce new

knowledge, hide this knowledge in often badly written text knowledge, hide this knowledge in often badly written text

and then spend some more millions trying to second guess and then spend some more millions trying to second guess

what the authors really did and found”what the authors really did and found”

Bairoch, A. (2009)Bairoch, A. (2009)

The future of annotation/biocurationThe future of annotation/biocuration

Nature PrecedingsNature Precedings

Page 25: Concepts, historical milestones & the central place of bioinformatics in modern biology:

1950 1960 1970 1980 1990 2000 20101950 1960 1970 1980 1990 2000 2010

insu

linin

sulin

ribon

ucle

ase

ribon

ucle

ase

Dayh

off A

tlas

Dayh

off A

tlas

CSD

CSD

ARPAnet

ARPAnetemail

email

PDB

PDB

65 7Auto

pro

tein

sequ

ence

rs

Auto

pro

tein

sequ

ence

rs

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

Auto

DNA

sequ

encin

g

Auto

DNA

sequ

encin

g

Internet

Internet

EMBL

, Gen

Bank

EMBL

, Gen

Bank

PIRPIR

568 859

Swiss

-Pro

t

Swiss

-Pro

tPR

OSI

TEPR

OSI

TEPR

INTS

PRIN

TS

3,900

Page 26: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• The number of sequences was growingThe number of sequences was growing

• The number of structures was growingThe number of structures was growing

• The number of protein family signatures was growingThe number of protein family signatures was growing

04/21/23 Teresa K.Attwood University of Manchester

26

Exam 2Exam 2

Two Two extraordinaryextraordinary developments had developments had yet to take place. What were they?yet to take place. What were they?

Page 27: Concepts, historical milestones & the central place of bioinformatics in modern biology:

1950 1960 1970 1980 1990 2000 20101950 1960 1970 1980 1990 2000 2010

insu

linin

sulin

ribon

ucle

ase

ribon

ucle

ase

Dayh

off A

tlas

Dayh

off A

tlas

CSD

CSD

ARPAnet

ARPAnetemail

email

PDB

PDB

65 7Auto

pro

tein

sequ

ence

rs

Auto

pro

tein

sequ

ence

rs

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

Auto

DNA

sequ

encin

g

Auto

DNA

sequ

encin

g

Internet

Internet

EMBL

, Gen

Bank

EMBL

, Gen

Bank

PIRPIR

568 859

Swiss

-Pro

t

Swiss

-Pro

tPR

OSI

TEPR

OSI

TEPR

INTS

PRIN

TS

3,900

HT D

NA se

quen

cing

HT D

NA se

quen

cing

wwwwww

H.in

fluen

zae

geno

me

H.in

fluen

zae

geno

me

S.ce

revi

sae

geno

me

S.ce

revi

sae

geno

me

D.m

elan

ogas

ter g

enom

e

D.m

elan

ogas

ter g

enom

eH.

sapi

ens g

enom

e

H.sa

pien

s gen

ome

C.el

egan

s gen

ome

C.el

egan

s gen

ome

FlyB

ase

FlyB

ase

TrEM

BLTr

EMBL

105,000Pf

amPf

amIn

terP

roIn

terP

ro

2,423

Page 28: Concepts, historical milestones & the central place of bioinformatics in modern biology:

21/04/23 28

InterProInterProInterProInterPro

ProDomProDomProDomProDom PRINTSPRINTSPRINTSPRINTS

PrositePrositePrositeProsite

PANTHERPANTHERPANTHERPANTHER

SMARTSMARTSMARTSMART

HAMAPHAMAPHAMAPHAMAPPIRSFPIRSFPIRSFPIRSF

TIGRFAMTIGRFAMTIGRFAMTIGRFAM

SUPERFAMILYSUPERFAMILYSUPERFAMILYSUPERFAMILYGene3DGene3DGene3DGene3D

PfamPfamPfamPfam ProfilesProfilesProfilesProfiles

Page 29: Concepts, historical milestones & the central place of bioinformatics in modern biology:

insu

linin

sulin

ribon

ucle

ase

ribon

ucle

ase

Dayh

off A

tlas

Dayh

off A

tlas

CSD

CSD

ARPAnet

ARPAnetemail

email

PDB

PDB

65 7Auto

pro

tein

sequ

ence

rs

Auto

pro

tein

sequ

ence

rs

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

Auto

DNA

sequ

encin

g

Auto

DNA

sequ

encin

g

Internet

Internet

EMBL

, Gen

Bank

EMBL

, Gen

Bank

PIRPIR

568 859

Swiss

-Pro

t

Swiss

-Pro

tPR

OSI

TEPR

OSI

TEPR

INTS

PRIN

TS

3,900

HT D

NA se

quen

cing

HT D

NA se

quen

cing

wwwwww

H.in

fluen

zae

geno

me

H.in

fluen

zae

geno

me

S.ce

revi

sae

geno

me

S.ce

revi

sae

geno

me

D.m

elan

ogas

ter g

enom

e

D.m

elan

ogas

ter g

enom

eH.

sapi

ens g

enom

e

H.sa

pien

s gen

ome

C.el

egan

s gen

ome

C.el

egan

s gen

ome

FlyB

ase

FlyB

ase

TrEM

BLTr

EMBL

105,000Pf

amPf

amIn

terP

roIn

terP

ro

2,423>500B

36.0M

ENA

ENA

1950 1960 1970 1980 1990 2000 20101950 1960 1970 1980 1990 2000 2010

UniP

rot

UniP

rot

ELIXIRELIXIRSIBSIBEBIEBI

EMBnetEMBnetNCBI

NCBI

Page 30: Concepts, historical milestones & the central place of bioinformatics in modern biology:

insu

linin

sulin

ribon

ucle

ase

ribon

ucle

ase

Dayh

off A

tlas

Dayh

off A

tlas

CSD

CSD

ARPAnet

ARPAnetemail

email

PDB

PDB

65 7Auto

pro

tein

sequ

ence

rs

Auto

pro

tein

sequ

ence

rs

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

DNA

sequ

enci

ng

Auto

DNA

sequ

encin

g

Auto

DNA

sequ

encin

g

Internet

Internet

EMBL

, Gen

Bank

EMBL

, Gen

Bank

PIRPIR

568 859

Swiss

-Pro

t

Swiss

-Pro

tPR

OSI

TEPR

OSI

TEPR

INTS

PRIN

TS

3,900

HT D

NA se

quen

cing

HT D

NA se

quen

cing

wwwwww

H.in

fluen

zae

geno

me

H.in

fluen

zae

geno

me

S.ce

revi

sae

geno

me

S.ce

revi

sae

geno

me

D.m

elan

ogas

ter g

enom

e

D.m

elan

ogas

ter g

enom

eH.

sapi

ens g

enom

e

H.sa

pien

s gen

ome

C.el

egan

s gen

ome

C.el

egan

s gen

ome

FlyB

ase

FlyB

ase

TrEM

BLTr

EMBL

105,000Pf

amPf

amIn

terP

roIn

terP

ro

2,423>500B

36.0M

ENA

ENA

1950 1960 1970 1980 1990 2000 20101950 1960 1970 1980 1990 2000 2010

UniP

rot

UniP

rot

ELIXIRELIXIRSIBSIBEBIEBI

EMBnetEMBnetNCBI

NCBI

thousands morethousands more

billions morebillions more

hundreds morehundreds more

Page 31: Concepts, historical milestones & the central place of bioinformatics in modern biology:

Red LineGrowth of EMBL since its inception

Green LineGrowth of manually annotated Swiss-Prot

Blue LineGrowth of PDB

ByBy 2020, NGS & 3Gen 2020, NGS & 3Gen technologies will be technologies will be producing data a producing data a million times faster million times faster than the current ratethan the current rate

04/21/23 31

282 M282 M

540 K540 K

35 M35 M

84 K84 K

Page 32: Concepts, historical milestones & the central place of bioinformatics in modern biology:

• Hopefully, this potted history speaks for itselfHopefully, this potted history speaks for itself

• In the last 30 years, bioinformatics has given usIn the last 30 years, bioinformatics has given us– the first the first ‘‘completecomplete’’ catalogues of DNA & protein sequences catalogues of DNA & protein sequences

• including genomes & proteomes of organisms across the Tree of Lifeincluding genomes & proteomes of organisms across the Tree of Life

– software to analyse biological data on an unprecedented scalesoftware to analyse biological data on an unprecedented scale

– & hence tools to help understand & hence tools to help understand • more about evolutionary processes in generalmore about evolutionary processes in general

• our place on the Tree of Life in particularour place on the Tree of Life in particular

• &, ultimately, more about health & disease&, ultimately, more about health & disease

• It It isnisn’’t t a panacea, but its contribution has been a panacea, but its contribution has been hugehuge04/21/23 Teresa K.Attwood

University of Manchester32

Page 33: Concepts, historical milestones & the central place of bioinformatics in modern biology:

Recommended readingRecommended readingRichon, A.B. A short history of bioinformatics (http://www.netsci.org/Science/Bioinform/feature06.html)Richon, A.B. A short history of bioinformatics (http://www.netsci.org/Science/Bioinform/feature06.html)Bairoch, A. (2000) Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting Bairoch, A. (2000) Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times. times. BioinformaticsBioinformatics, , 1616(1), 48-64.(1), 48-64.Ashburner, M. (2006) Won for all – How the Drosophila genome was sequenced. Cold Spring Harbor Lab. PressAshburner, M. (2006) Won for all – How the Drosophila genome was sequenced. Cold Spring Harbor Lab. PressStrasser, B.J. (2008) GenBank – Natural history in the 21Strasser, B.J. (2008) GenBank – Natural history in the 21stst century? century? ScienceScience, , 322322, 537-538., 537-538.Attwood, T.K., Gisel, A., Eriksson, N-E. & Bongcam-Rudloff, EAttwood, T.K., Gisel, A., Eriksson, N-E. & Bongcam-Rudloff, E. (2011) . (2011) Concepts, Historical Milestones and the Concepts, Historical Milestones and the Central Place of Bioinformatics in Modern Biology: A European PerspectiveCentral Place of Bioinformatics in Modern Biology: A European Perspective

04/21/23 Teresa K.Attwood University of Manchester

33