introduction to genomics and the tree of life friday, october 22, 2010 (part 1) monday, october 25,...

78
Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner [email protected]

Upload: calvin-fleming

Post on 16-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Introduction to Genomics and the Tree of Life

Friday, October 22, 2010 (part 1)Monday, October 25, 2010 (part 2)

Genomics260.605.01J. Pevsner

[email protected]

Page 2: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Many of the images in this powerpoint presentationare from Bioinformatics and Functional Genomics, 2nd edition by J Pevsner (© 2009 by Wiley-Blackwell).

These images and materials may not be used without permission from the publisher (instructors, email me at [email protected]).

Visit http://www.bioinfbook.org

Copyright notice

Page 3: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

We meet 3 times a week, from 10:30 to 11:50 am:

W4013 (lecture/discussion and occasional computer lab)

Announcements: where/when we meet

Page 4: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Textbook: Bioinformatics and Functional Genomics (2nd edition, Wiley-Blackwell, 2009) by J. Pevsner, ISBN 978-0-470-08585-1.

• We’ll cover chapters 13-20 in this course• For those who don’t want to buy a copy, I will share pdfs of

all the chapters with the class• You can buy a copy at the website www.bioinfbook.organd get a nice discount ($80). It’s $80 at Amazon.com.• The JHU bookstore may have copies.• Welch Library may have copies

Book’s website: www.bioinfbook.orgCourse website: http://www.bioinfbook.org/genomics.php or visit www.bioinfbook.org/chapter 13

Announcements: book, website

Page 5: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Outline of this course

Introduction to genomicsVirusesBacteria and archaea (Egbert Hoiczyk)Eukaryotes

The eukaryotic chromosomeFungi; yeast functional genomics (Jef Boeke)Protozoans (David Sullivan)Nematodes (Al Scott)Mosquitoes (George Dimopoulos)Rodents: mouse and ratPrimatesThe human genome (Dave Valle)Human disease

Page 6: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Outline of today’s lecture

Introduction: 5 perspectives, history of life

Genome-sequencing projects: chronology

Genome analysis: criteria, resequencing, metagenomics

DNA sequencing technologies: Sanger, 454, Solexa

Process of genome sequencing: centers, repositories

Genome annotation: features, prokaryotes, eukaryotes

Page 7: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Five approaches to genomics

As we survey the tree of life, consider these perspectives:

Approach I: cataloguing genomic informationGenome size; number of chromosomes; GC

content; isochores; number of genes; repetitive DNA; unique features of each genome

Approach V: Bioinformatics aspectsAlgorithms, databases, websites

Approach IV: Human disease relevance

Approach III: function; biological principles; evolutionHow genome size is regulated; polyploidization; birth and death of genes; neutral theory of

evolution; positive and negative selection; speciation

Approach II: cataloguing comparative genomic informationOrthologs and paralogs; COGs; lateral gene transfer

Page 519

Page 8: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Two projects for this course

Option [1] Select a genome and describe it in detail.

Option [2] Select a gene and describe it in detail.

For each, follow the five approaches just outlined, and apply the principles that we learn in this course.

Page 9: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Reading: Webb Miller et al. (2004) Comparative genomics

IntroductionLessons learned form comparative genomics What have we learned about genes by comparing genomic

sequences? What have we learned about regulation? About 5% of the human genome is under purifying selection Positively regulated regions Mechanisms and history of mammalian evolution Nonuniformity of neutral evolutionary rates within species Nonuniformity of evolution along the branches of phylogenyLearning more form existing data Choice of species Choice of toolsFuture of comparative genomics

Page 10: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Levels of analysis in genomics

level topics databasesDNA genes, chromosomes GenBankRNA ESTs, ncRNA UniGene, GEOprotein ORFs, composition UniProtcomplexes binary, multimeric BINDpathways COGs, KEGGorganellesorgansindividuals variation and disease HapMapspecies speciation TaxBrowser; SGDgenus JAX mouse phylum FishBasekingdom TOL

Page 11: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Definitions of terms

Genomics is the study of genomes (the DNA comprising an organism) using the tools of bioinformatics.

Bioinformatics is the study protein, genes, and genomes using computer algorithms and databases.

Systematics is the scientific study of the kinds and diversity of organisms and of any and all relationships among them.

Classification is the ordering of organisms into groups on the basis of their relationships. The relationships may be evolutionary (phylogenetic) or may refer to similarities of phenotype (phenetic).

Taxonomy is the theory and practice of classifying organisms.

Page 12: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Outline of today’s lecture

Introduction: 5 perspectives, history of life: trees

Genome-sequencing projects: chronology

Genome analysis: criteria, resequencing, metagenomics

DNA sequencing technologies: Sanger, 454, Solexa

Process of genome sequencing: centers, repositories

Genome annotation: features, prokaryotes, eukaryotes

Page 13: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Fig. 13.1Page 521

Pace (2001) described a tree of life based on small subunit rRNA sequences.

This tree shows the mainthree branches describedby Woese and colleagues.

Page 14: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Ernst Haeckel (1834-1919), a supporter of Darwin,published a tree of life (1879) including Monera(formless clumps, later named bacteria).

Introduction: Systematics

Page 520

Page 15: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

plants

animals

monera

fungi

protistsprotozoa

invertebrates

vertebrates

mammalsFive kingdom

system(Haeckel, 1879)

Page 516

Page 16: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Chatton (1937) distinguished prokaryotes (bacteriathat lack nuclei) from eukaryotes (having nuclei).

Whittaker (1969) and others described the five-kingdomsystem: animals, plants, protists, fungi, and monera.

In the 1970s and 1980s, Carl Woese and colleaguesdescribed the archaea, thus forming a tree of lifewith three main branches: archaea, bacteria, eukaryotes.

Introduction: Systematics

Page 520

Page 17: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Whittaker RH (1969) New concepts of kingdoms or organisms. Evolutionary relations are better represented by new classifications than by the traditional two kingdoms. Science. 163(863):150-60.

Whittaker (1969): The two-kingdom system as it might have appeared in the early 1900s

Plantae Animalia

Page 18: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Whittaker RH (1969) New concepts of kingdoms or organisms. Evolutionary relations are better represented by new classifications than by the traditional two kingdoms. Science. 163(863):150-60.

The Copeland four-kingdom system of the 1930s-1950s

Monera

Metaphyta

Metazoa

Protoctista

Pro

kar

yoti

cE

uk

aryo

tic

Un

icel

lula

rM

ult

icel

lula

r

Page 19: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Whittaker RH (1969) New concepts of kingdoms or organisms. Evolutionary relations are better represented by new classifications than by the traditional two kingdoms. Science. 163(863):150-60.

Whittaker (1969): The five-kingdom system

Plantae Fungi Animalia

Monera

Protista

Levels:prokaryotic (Monera)eukaryotic unicellulareukaryotic multicellular

Page 20: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Historically, trees were generated primarily usingcharacters provided by morphological data. Molecularsequence data are now commonly used, includingsequences (such as small-subunit RNAs) that arehighly conserved.

Visit the European Small Subunit Ribosomal RNAdatabase for 20,000 SSU rRNA sequences.

Molecular sequences as basis of trees

Page 523

Page 21: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Pace (2001) described a tree of life based on small subunit rRNA sequences.

This tree shows the mainthree branches describedby Woese and colleagues. It is the best currently accepted model of the tree of life.

Fig. 13.1Page 521

Page 22: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

http://www.zo.utexas.edu/faculty/antisense/Download.html

Tree of life from David Hillis’ lab (based on ~3000 rRNAs)

animalsplants

fungi

protists

bacteriaarchaea

you are here

10-10

Page 23: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

http://www.zo.utexas.edu/faculty/antisense/Download.html

you are here

Tree of life from David Hillis’ lab (based on ~3000 rRNAs)

10-10

Page 24: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Ribosomal RNA Database

Ribosomal Database Projecthttp://rdp.cme.msu.edu/index.jsp

Santos, S. R. and Ochman H. Identification and phylogenetic sorting of bacterial lineages with universally conserved genes and proteins. Environmental Microbiology. 2004. Jul(6)7:754-9.

►Download fusA (translation elongation factor 2 [EF-2])►Obtain DNA in the fasta format►Align by ClustalW in MEGA►Create a neighbor-joining tree

Page 524

10-10

Page 25: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org
Page 26: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

European Small Subunit Ribosomal RNA database(http://www.psb.ugent.be/rRNA/ssu/) 10-10

Page 27: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Bac

ant

hrac

is S

tern

e fu

sA

Bac

thur

ing

9727

fusA

Bac

ant

hrac

is A

mes

fusA

Bac

ant

hrac

is 0

581

fusA

Bac

cer

eus

1098

7 fu

sA

Bac

cer

eus

1457

9 fu

sA

Bac

sub

tilis

fusA

Bac

hal

odur

ans

fusA

List

inno

cua

Clip

1126

2 fu

sA

List

mon

ocyt

o 4b

F23

65 fu

sA

List

mon

ocyt

o EG

De

fusA

Oce

anob

ac ih

eyen

sis H

TE83

1 fu

sA

Staph

yl ep

ider

mi 1

2228

fusA

Staph

y aur

eus M

W2

fusA

Staphy

aure

us M

u50 f

usA

Staphy aureus N

315 fusA

Lactobac j

ohnsonii N

CC533 fusA

Lactobac p

lantarum WCFS1 fu

sA

Entero faeca

lis V583 fu

sA

Strep m

utans UA159 fusA

Lactococ lactis Il1403 fusA

Strep agalactiae NEM316 fusA

Strep agalactiae 2603VR fusA

Strep pneumoniae R6 fusA

Strep pneumoniae TIGR4 fusA

Strep pyogenes M1 GAS fusA

Strep pyogenes MGAS8232 fusA

Strep pyogenes MGAS315 fusAStrep pyogenes SSI1 fusAOnion yel phytoplasm OYM fusAMycoplas mobile 163K fusAMycoplas pulmonis UAB CTIP fusAMycoplas mycoides PG1 fusA

Mycoplas penetrans HF2 fusA

Ureaplasma parvum 700970 fusA

Mycoplas galli R fusA

Mycoplas genita G37 fusA

Mycoplas pneumon M129 fusA

Thermoanaero tengcongensis fusA

Fuso nucleatum ATCC25586 fusA

Clost perfringens 13 fusA

Clost acetobutylicum 824 fusA

Clost tetani E88 fusA

Parachlamydia UWE25 fusA

Chlamy muridarum fusA

Chlamy tracho DUW3CX fusA

Chlamydo caviae GPIC fusA

Chlamydo pneumon J138 fusA

Chlamydo pneumon CWL029 fusA

Chlamydo pneumon AR39 fusA

Chlamydo pneum

on TW183 fusA

Prochloro marinus CCM

P1375 fusA

Prochloro marinus CCM

P1986 fusA

Nostoc PCC7120 fusA

Synechocystis PCC6803 fusA

Gloeo violaceus PC

C7421 fusA

Thermosynecho elongatus BP1 fusA

Prochloro m

arinus MIT 9313 fusA

Synechococcus sp W

H8102 fusA

Hel

ico

pylo

ri 26

695

fusA

Hel

ico

pylo

ri J9

9 fu

sA

Hel

ico

hepa

ticus

514

49 fu

sA

Wol

inel

la s

ucci

noge

n D

SM

1740

fusA

Cam

pylo

jeju

ni N

CT

C11

168

fusA

Buc

h ap

hidi

AP

S fu

sA

Buc

h ap

hidi

Sg

fusA

Buc

h ap

hidi

Bp

fusA

Can

di B

loch

man

flor

i fus

A

Wig

gles

wor

thia

fusA

Nitr

o eu

ropa

ea 1

9718

fusA

Cox

iella

bur

netii

RS

A49

3 fu

sAX

ylel

la fa

stid

iosa

9a5

c fu

sAX

ylel

la fa

stid

iosa

Tem

ecu1

fusA

Vib

rio v

ulni

ficus

CM

CP

6 fu

sA

Vib

rio v

ulni

ficus

YJ0

16 fu

sA

Vib

rio p

arah

aem

olyt

RIM

D22

1063

3 fu

sA

Vib

rio c

hole

rae

N16

961

fusA

She

wan

ella

one

iden

sis

MR

1 fu

sA

Aci

neto

bact

er A

DP

1 fu

sA

Nei

s m

enin

git M

C58

fusA

Nei

s m

enin

git Z

2491

fusA

Hae

mo

ducr

eyi 3

5000

HP

fusA

Pas

teu

mul

toci

da P

m70

fusA

Hae

mo

influ

RdK

W20

fusA

Phot

o lu

min

es T

TO1

fusA

Yers

inia

pes

tis C

O92

fusA

Yersin

ia p

estis

KIM

fusA

Yersin

ia pe

stis 9

1001

fusA

Erwini

a ca

roto

vora

SCRI1

043

fusA

Salmon

enter

Typ

hi CT18

fusA

Salmon enter T

yphi T

y2 fu

sA

Salmon ty

phimuriu

m LT2 fusA

E coli O

157 H7 fusA

E coli O157 H7 EDL933 fusA

E coli CFT073 fusA

E coli K12 fusA

Shigella flexneri 2457T fusA

Shigella flexneri 301 fusA

Lepto inter lai 56601 fusA

Lepto inter Copen Fio L1130 fusA

Pirellula 1 fusA

Aquifex aeolicus fusA

Thermotoga maritima MSB8 fusA

Bacteroides thetaio VPI5482 fusA

Porphyro gingiv W83 fusA

Geo sulfur PCA fusAChloro tepidum TLS fusA

Bordet bronchi RB50 fusABordet pertussis TohamaI fusABordet parapert 12822 fusARalstonia solan GMI1000 fusA

Chromo violaceum 12472 fusA

Xanthomonas axonopodis 306 fusA

Xanthomonas campestris 33913 fusA

Pseudo aeruginosa PA01 fusA

Pseudo putida KT2440 fusA

Pseudo syringae DC3000 fusA

Desulfo vulgaris Hilden fusA

Agro tumefaciens C58 fusA

Sinorhiz meliloti 1021 fusA

Mesorhiz loti MAFF303099 fusA

Bruc suis 1330 fusA

Caulo crescentus CB15 fusA

Bradyrhiz japonicum USDA110 fusA

Rhodopseudo palustris CGA009 fusA

Deino radiodurans R1 fusA

Thermus therm

ophilus HB27 fusA

Coryne efficiens YS314 fusA

Coryne gluta 13032 fusA

Coryne diphtheriae N

CTC

13129 fusA

Bifido longum

fusA

Streptom

y avermitilis M

A4680 fusA

Streptom

y coelicol A3 2 fusA

Mycobac leprae T

N fusA

Mycobac avium

k10 fusA

Mycobac bovis A

F212297 fusA

Mycobac tubercu C

DC

1551 fusA

Mycobac tubercu H

37Rv fusA

Treponem

a denticola 35405 fusA

Treponem

a pallidum N

ichols fusA

Borrelia burgdorferi B

31 fusA

Bdello bacter H

D100 fusA

Tropherym

a whipplei T

W08 27 fusA

Tropherym

a whipplei T

wist fusA

Bart henselae H

oust1 fusAB

art quintana fusAW

olbachia fusAR

icket conorii Malish 7 fusA

Ricket prow

azekii MadridE

fusA

0.05Rickettsia Treponema

Mycobacterium

Aquifex aeolicus

Yersinia pestis

Clostridium

Mycoplasma

Bac. antracis

Neighbor-joining tree of ~150 fusA (GTPase) DNA sequences

Page 28: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Fig. 15.1Page 603

Page 29: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org
Page 30: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Eukaryotes(Baldauf et al. 2000)

Fig. 18.1Page 730

Page 31: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Outline of today’s lecture

Introduction: 5 perspectives, history of life: time lines

Genome-sequencing projects: chronology

Genome analysis: criteria, resequencing, metagenomics

DNA sequencing technologies: Sanger, 454, Solexa

Process of genome sequencing: centers, repositories

Genome annotation: features, prokaryotes, eukaryotes

Page 32: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

History of life on earth

4.55 BYA formation of earth (violent 100 MY period)4.4-3.8 BYA last ocean-evaporating impacts3.9 BYA oldest dated rocks3.8 BYA sun brightened to 70% of today’s luminosity

Ammonia, methane, or carbon dioxide atmosphere.Earliest life: RNA, protein

Source: Schopf J.W. (ed.), Life’s Origins (U. Calif. Press, 2002) Page 521

Page 33: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

History of life on earth: two major eons

Source: Schopf J.W. (ed.), Life’s Origins (U. Calif. Press, 2002)

Precambrian eon Phanerozoic eon

Extends from the formation of the planetto the appearance of fossils of hard-shelled animals 550 MYA

From Cambrian explosion to the present

1 BYA234

Page 34: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

4 3 2 1 0

Billions of years ago (BYA)

Origin oflife

Origin ofeukaryotes insects

Fungi/animalPlant/animal

Hadean eon

Archean eon Proterozoic eon Phanerozoic eon

Earliestfossils

Page 522

Page 35: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1000 100 0500

InsectsCambrianexplosion

Age of Reptiles ends

Land plants

Proterozoic eon Phanerozoic eon

deuterostome/protostome

echinoderm/chordate

Millions of years ago (MYA)

Page 522

Page 36: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Millions of years ago (MYA)

Dinosaurs extinct;Mammalian radiation

Human/chimpdivergence

100 10 050

Mass extinction

Page 522

Page 37: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Millions of years ago (MYA)

Homo sapiens/Chimp divergence

Emergence ofHomo erectus

Earlieststone tools

10 1 05

AustralepithecusLucy

Page 522

Page 38: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Homo erectusemerges in Africa

MitochondrialEve

1,000,000 100,000 0500,000

Years ago

Page 523

Page 39: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Years ago

Neanderthal and Homo erectus disappear

Emergence ofanatomically

modern H. sapiens

100,000 10,000 050,000

Page 523

Page 40: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Years ago

“Ice Man”from Alps Aristotle

10,000 1,000 05,000

Earliestpyramids

Page 523

Page 41: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Years ago

algebra calculusDarwin,MendelGutenberg

1,000 100 0500

Page 523

Page 42: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Page 524

Today’s continents derive from earlier land masses (Laurasia, Gondwana), affecting evolution of species

Page 43: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Outline of today’s lecture

Introduction: 5 perspectives, history of life: time lines

Genome-sequencing projects: chronology

Genome analysis: criteria, resequencing, metagenomics

DNA sequencing technologies: Sanger, 454, Solexa

Process of genome sequencing: centers, repositories

Genome annotation: features, prokaryotes, eukaryotes

Page 44: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

We will next summarize the major achievements ingenome sequencing projects from a chronologicalperspective.

Chronology of genome sequencing projects

Page 525

Page 45: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1976: first viral genomeFiers et al. sequence bacteriophage MS2 (3,569 base pairs,Accession NC_001417).

1977:Sanger et al. sequence bacteriophage X174.This virus is 5,386 base pairs (encoding 11 genes).See accession J02482; NC_001422.

Chronology of genome sequencing projects

Page 527

Page 46: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Fig. 13.5Page 528

Entrez nucleotide record for bacteriophage X174 (graphics display)

Page 47: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1981Human mitochondrial genome16,500 base pairs (encodes 13 proteins, 2 rRNA, 22 tRNA)Today (10/10), over 2200 mitochondrial genomes sequenced

1986Chloroplast genome 156,000 base pairs (most are 120 kb to 200 kb)

Chronology of genome sequencing projects

Page 527

Page 48: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

mitochondrion

chloroplast

Lackmitochondria (?)

Page 49: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/organelles.html

Entrez Genomes organelle resource at NCBI

10-10

Page 50: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

There are ~2500 eukaryotic organelles (10/10)

Page 51: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

http://www-lecb.ncifcrf.gov/mitoDat/

MitoDat: resource for organelle genomes

“This database is dedicated to the nuclear genes specifying the enzymes, structural proteins, and other proteins, many still not identified, involved in mitochondrial biogenesis and function. MitoDat highlights predominantly human nuclear-encoded mitochondrial proteins.”

Not updated recently.

10-10

Page 52: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

http://www.mitomap.org/

MitoMap: resource for organelle genomes

10-10

Page 53: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

It is possible to map mutations in human mitochondrial DNA that are responsible for disease

Page 54: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1995: first genome of a free-living organism, the bacterium Haemophilus influenzae

Chronology of genome sequencing projects

Page 530

Page 55: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1995: genome of the bacterium Haemophilus influenzae is sequenced

Fig. 13.7Page 531

Page 56: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

How to find information about a genome: NCBI All databases Genome follow link to Bacteria

Page 57: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Overview of bacterialcomplete genomes(2000) n=30

Page 58: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Overview of bacterialcomplete genomes(2010) n=3,330

Page 59: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Fig. 12.9Page 411

You can find functionalannotation through theCOGs database

(Clusters ofOrthologousGenes)

Page 60: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Click the circle to access the genomesequence

Page 61: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

You can find functionalannotation through theCOGs database (Clusters ofOrthologous Genes)

Entrez Genome view of H. influenzae (October 2009)

Page 62: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Click the circle to access the genomesequence

Genes are color-codedaccording to theCOGs scheme

Page 63: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1996: first eukaryotic genome

The complete genome sequence of the budding yeastSaccharomyces cerevisiae was reported. We willdescribe this genome soon.

Also in 1996, TIGR reported the sequence of the firstarchaeal genome, Methanococcus jannaschii.

Chronology of genome sequencing projects

Page 532

Page 64: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1996: a yeast genome is sequenced

Page 65: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

To learn about a genome of interest, visit NCBI TaxBrowser Genome Projects

Page 66: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

To learn about a genome of interest, follow theTaxBrowser Genome Projects links

Size (in megabases), number of chromosomes are given here

Page 67: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org
Page 68: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

To place the sequencingof the yeast genomein context, these are theeukaryotes…

Page 69: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Tree of eukaryotes(Baldauf et al. 2000)

Fungi

Page 70: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1997:More bacteria and archaeaEscherichia coli4.6 megabases, 4200 proteins (38% of unknown function)

1998: first multicellular organismNematode Caenorhabditis elegans 97 Mb; 19,000 genes.

1999: first human chromosomeChromosome 22 (49 Mb, 673 genes)

Chronology of genome sequencing projects

Page 532

Page 71: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

See the article by Webb Miller et al. (2004), “Comparative genomics” for a discussion of annotation and analysis progress made since 1998

Page 72: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1999: Human chromosome 22 sequenced

Page 73: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

1999: Human chromosome 22 sequenced

49 MB701 genes

Page 74: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

2000:Fruitfly Drosophila melanogaster (13,000 genes)

Plant Arabidopsis thaliana

Human chromosome 21

2001: draft sequence of the human genome(public consortium and Celera Genomics)

Chronology of genome sequencing projects

Page 534

Page 75: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

To explore human chromosome 21 at NCBIFind MapViewerChoose humanClick chromosome 21

Page 76: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

2000

Page 77: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

2001 draft human genome sequence2002 S. pombe (just 4,800 genes)2004 “finished” human genome2007 first individual human genome2009 1000 Genomes Project

Page 78: Introduction to Genomics and the Tree of Life Friday, October 22, 2010 (part 1) Monday, October 25, 2010 (part 2) Genomics 260.605.01 J. Pevsner pevsner@kennedykrieger.org

Outline of Monday’s lecture (Chapter 13)

Introduction: 5 perspectives, history of life: time lines

Genome-sequencing projects: chronology

Genome analysis: criteria, resequencing, metagenomics

DNA sequencing technologies: Sanger, 454, Solexa

Process of genome sequencing: centers, repositories

Genome annotation: features, prokaryotes, eukaryotes