practical bioinformatics tools for understanding evolution robert latek, phd bioinformatics and...

33
Practical Practical Bioinformatics Tools Bioinformatics Tools For For Understanding Evolution Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

Upload: annice-pearson

Post on 18-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

Practical Bioinformatics Practical Bioinformatics ToolsToolsForFor

Understanding EvolutionUnderstanding Evolution

Robert Latek, PhDBioinformatics and Research Computing

Whitehead Institute for Biomedical Research

Page 2: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 2

AimsAims

• Examine Techniques For Describing Examine Techniques For Describing Evolutionary RelationshipsEvolutionary Relationships

• Learn To Apply Bioinformatics Tools Learn To Apply Bioinformatics Tools To Study EvolutionTo Study Evolution

• Question, Interrupt, Discuss, SuggestQuestion, Interrupt, Discuss, Suggest

Page 3: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 3

Bioinformatics ?Bioinformatics ?• DefinitionDefinition

– Integration of computational and biological methods to promote biological discovery

– Combination of Biology, Statistics, CS, Clinical Research

• PurposePurpose– Predict, Decipher, Visualize

• MethodologyMethodology– Data Mining and Comparisons

Data Visualization

MSRKGPRAEVCADCSAPDPGWASISRGVLVCDECCSVHRLGRHISIVKHLRHSAWPPTLLQMVHTLASNGANSIWEHSLLDPAQVQSGRRKAN

G. Bell

Page 4: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 4

Bioinformatics :-)Bioinformatics :-)• Biological Comparisons (Evolutionary Analysis)Biological Comparisons (Evolutionary Analysis)

– How closely/distantly related are two populations?

• Gene Function PredictionGene Function Prediction– How and why does Gene X function/malfunction?

• Pharmaceutical Design & In Silico TestingPharmaceutical Design & In Silico Testing

Page 5: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 5

Bioinformatics@WIBioinformatics@WI• Bioinformatics and Research ComputingBioinformatics and Research Computing

• Collaboration, Consultation, Education Collaboration, Consultation, Education in Bioinformatics and Graphicsin Bioinformatics and Graphics

• Provide hardware, commercial/custom Provide hardware, commercial/custom software tools, training, and software tools, training, and bioinformatics expertisebioinformatics expertise

Decipher Predict

Page 6: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 6

Discussion MapDiscussion Map

• Relationships Among Groups Of GenesRelationships Among Groups Of Genes– Comparing Sequences– Building Sequence Families

• Sequence Conservation During EvolutionSequence Conservation During Evolution– Aligning Multiple Sequences

• Evolutionary DiagramsEvolutionary Diagrams– Tracing The Descent From Common Ancestors– Growing Phylogenetic Trees

Page 7: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 7

Evolutionary AnalysisEvolutionary Analysis

• DefinitionDefinition– The use of phylogeny to reveal relationships among

sets of genes

• PurposePurpose– To utilize information about common ancestors to

predict gene function and regulation

• MethodologyMethodology– Compare properties between genes/organisms and

identify commonalities and differences– Organization of genes into a evolutionary diagrams– Sequence by sequence comparisons

Page 8: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 8

Sequence-Based Sequence-Based ComparisonsComparisons

• Identify sequences within an organism that are related Identify sequences within an organism that are related to each other and/or across different speciesto each other and/or across different species– Within: Fetal and adult hemoglobin– Across : Human and chimpanzee hemoglobin

• Generate an evolutionary history of related genesGenerate an evolutionary history of related genes• Locate insertions, deletions, and substitutions that Locate insertions, deletions, and substitutions that

have occurred during evolutionhave occurred during evolution

CREATE CREASE -RELAPSE

GREASER

(C) Cysteine(R) Arginine(E) Glutamate(A) Alanine(T) Threonine(S) Serine(L) Leucine(P) Proline(G) Glycine

[Ancestor] [Progenitors]

Page 9: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 9

Homology & SimilarityHomology & Similarity• HomologyHomology

– Conserved sequences arising from a common ancestor– Orthologs: homologous genes that share a common

ancestor in the absence of any gene duplication (Mouse and Human Hemoglobin)

– Paralogs: genes related through gene duplication (one gene is a copy of another - Fetal and Adult Hemoglobin)

• SimilaritySimilarity– Genes that share common sequences but are not

necessarily related

Page 10: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 10

Sequences As ModulesSequences As Modules

• Proteins are derived from a limited Proteins are derived from a limited number of basic building blocks number of basic building blocks (Modules)(Modules)

• Evolution has shuffled these modules Evolution has shuffled these modules giving rise to a diverse repertoire of giving rise to a diverse repertoire of protein sequencesprotein sequences

• As a result, proteins can share a global As a result, proteins can share a global relationships or local relationship relationships or local relationship specific to a particular DOMAINspecific to a particular DOMAIN

Global

Local

Page 11: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 11

Sequence DomainsSequence DomainsModules Define Functional/Structural DomainsModules Define Functional/Structural Domains

Page 12: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 12

Sequence FamiliesSequence Families

• Definition Definition – Group of sequences that share a common function

and/or structure, that are potentially derived from a common ancestor (set of homologous sequences)

• Building A Family Building A Family – Domains are used to group different sequences into

common families

Page 13: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 13

Defining A Sequence Defining A Sequence FamilyFamily

Family A

Family B

Family D Family E

Family C

Page 14: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 14

Sequence Family Sequence Family ResourcesResources

• Search and Browse Family DatabasesSearch and Browse Family Databases

• PFAMPFAM– http://pfam.wustl.edu/

>src

MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL

Page 15: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 15

Sequence Family Sequence Family ResourcesResources

• NCBI Family Database ResourcesNCBI Family Database Resources• Conserved Domain DatabaseConserved Domain Database

– http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd

• Conserved Domain Architecture Retrieval ToolConserved Domain Architecture Retrieval Tool– http://www.ncbi.nlm.nih.gov/BLAST/

Page 16: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 16

Discussion MapDiscussion Map

• Relationships Among Groups Of GenesRelationships Among Groups Of Genes– Comparing Sequences– Building Sequence Families

• Sequence Conservation During EvolutionSequence Conservation During Evolution– Aligning Multiple Sequences

• Evolutionary DiagramsEvolutionary Diagrams– Tracing The Descent From Common Ancestors– Growing Phylogenetic Trees

Page 17: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 17

Multiple Sequence Multiple Sequence Alignments Alignments

• Place residues in columns that Place residues in columns that are derived from a common are derived from a common ancestral residueancestral residue

• Identify Identify MatchesMatches, , MismatchesMismatches, , and and GapsGaps

• MSA can reveal sequence MSA can reveal sequence patternspatterns– Demonstration of homology

between >2 sequences– Identification of functionally

important sites– Protein function prediction– Structure prediction

CRE-A-TE-

CRE-A-SE-

-RELAPSE-

GRE-A-SER

CREATE

CREASE

GREASER

RELAPSE

123456789

SeqA

SeqB

SeqC

SeqD

Page 18: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 18

Global vs. Local Global vs. Local AlignmentsAlignments

• GlobalGlobal– Search for alignments, matching over

entire sequences

• LocalLocal– Examine regions of sequence for

conserved segments

• Both Consider: Matches, Mismatches, Both Consider: Matches, Mismatches, GapsGaps

Page 19: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 19

Global Sequence AlignmentsGlobal Sequence Alignments

Yeast Prion-Like Proteins

Page 20: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 20

How To Make A Global How To Make A Global MSAMSA

• On The WebOn The Web– http://pir.georgetown.edu/pirwww/search/multaln.html

• On Your ComputerOn Your Computer– ClustalX: http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/

Page 21: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 21

MSA Example SequencesMSA Example Sequences

>KSYK_HUMANFFFGNITREEAEDYLVQGGMSDGLYLLRQSRNYLGGFALSVAHGRKAHHYTIERELNGTYAIAGGRTHASPADLCHYH

>ZA70_HUMANWYHSSLTREEAERKLYSGAQTDGKFLLRPRKEQGTYALSLIYGKTVYHYLISQDKAGKYCIPEGTKFDTLWQLVEYL

>KSYK_PIGWFHGKISRDESEQIVLIGSKTNGKFLIRARDNGSYALGLLHEGKVLHYRIDKDKTGKLSIPGGKNFDTLWQLVEHY

>MATK_HUMANWFHGKISGQEAVQQLQPPEDGLFLVRESARHPGDYVLCVSFGRDVIHYRVLHRDGHLTIDEAVFFCNLMDMVEHY

>CSK_CHICKWFHGKITREQAERLLYPPETGLFLVRESTNYPGDYTLCVSCEGKVEHYRIIYSSSKLSIDEEVYFENLMQLVEHY

>CRKL_HUMANWYMGPVSRQEAQTRLQGQRHGMFLVRDSSTCPGDYVLSVSENSRVSHYIINSLPNRRFKIGDQEFDHLPALLEFY

>YES_XIPHEWYFGKLSRKDTERLLLLPGNERGTFLIRESETTKGAYSLSLRDWDETKGDNCKHYKIRKLDNGGYYITTRTQFMSLQMLVKHY

>FGR_HUMANWYFGKIGRKDAERQLLSPGNPQGAFLIRESETTKGAYSLSIRDWDQTRGDHVKHYKIRKLDMGGYYITTRVQFNSVQELVQHY

>SRC_RSVPWYFGKITRRESERLLLNPENPRGTFLVRKSETAKGAYCLSVSDFDNAKGPNVKHYKIYKLYSGGFYITSRTQFGSLQQLVAYY

Standard FASTA Sequence Format

Page 22: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 22

MSA Example ResultMSA Example Result

YES_XIPHE WYFGKLSRKDTERLLLLPGNERGTFLIRESETTKGAYSLSLRDWDETKGDNCKHYKIRKLFGR_HUMAN WYFGKIGRKDAERQLLSPGNPQGAFLIRESETTKGAYSLSIRDWDQTRGDHVKHYKIRKLSRC_RSVP WYFGKITRRESERLLLNPENPRGTFLVRKSETAKGAYCLSVSDFDNAKGPNVKHYKIYKLMATK_HUMAN WFHGKISGQEAVQQLQPPED--GLFLVRESARHPGDYVLCVS-----FGRDVIHYRVLHRCSK_CHICK WFHGKITREQAERLLYPPET--GLFLVRESTNYPGDYTLCVS-----CEGKVEHYRIIYSCRKL_HUMAN WYMGPVSRQEAQTRLQGQRH--GMFLVRDSSTCPGDYVLSVS-----ENSRVSHYIINSLZA70_HUMAN WYHSSLTREEAERKLYSGAQTDGKFLLRPRK-EQGTYALSLI-----YGKTVYHYLISQDKSYK_PIG WFHGKISRDESEQIVLIGSKTNGKFLIRAR--DNGSYALGLL-----HEGKVLHYRIDKDKSYK_HUMAN FFFGNITREEAEDYLVQGGMSDGLYLLRQSRNYLGGFALSVA-----HGRKAHHYTIERE :: . : :: : * :*:* * : * : ** :

YES_XIPHE DNGGYYITTRTQFMSLQMLVKHYFGR_HUMAN DMGGYYITTRVQFNSVQELVQHYSRC_RSVP YSGGFYITSRTQFGSLQQLVAYYMATK_HUMAN -DGHLTIDEAVFFCNLMDMVEHYCSK_CHICK -SSKLSIDEEVYFENLMQLVEHYCRKL_HUMAN PNRRFKIGDQE-FDHLPALLEFYZA70_HUMAN KAGKYCIPEGTKFDTLWQLVEYLKSYK_PIG KTGKLSIPGGKNFDTLWQLVEHYKSYK_HUMAN LNGTYAIAGGRTHASPADLCHYH * . : .

Page 23: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 23

Discussion MapDiscussion Map

• Relationships Among Groups Of GenesRelationships Among Groups Of Genes– Comparing Sequences– Building Sequence Families

• Sequence Conservation During EvolutionSequence Conservation During Evolution– Aligning Multiple Sequences

• Evolutionary DiagramsEvolutionary Diagrams– Tracing The Descent From Common Ancestors– Growing Phylogenetic Trees

Page 24: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 24

Phylogenetic TreesPhylogenetic Trees

• A Graph Representing The A Graph Representing The Evolutionary History Of SequencesEvolutionary History Of Sequences– Relationship of sequences to one

another (How everything is connected)– Dissect the order of appearance of

insertions, deletions, and mutations

• Identify Related Sequences, Predict Identify Related Sequences, Predict Function, Observe Epidemiology Function, Observe Epidemiology (Analyze changes in viral strains)(Analyze changes in viral strains)

A

B

C

D

SimpleTree

Page 25: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 25

Tree ShapesTree Shapes

Rooted Un-rooted

Branches intersect at NodesLeaves are the topmost branches

A

B

C

D

A

B

C

D

A

B

C

D

Page 26: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 26

Tree CharacteristicsTree Characteristics

• Tree PropertiesTree Properties– Clade: all the descendants of a common

ancestor represented by a node

– Distance: number of changes that have taken place along a branch

• Tree TypesTree Types

– Cladogram: shows the branching order of nodes

– Phylogram: shows branching order and distances

A

B

C

D

.035

.009

.057

.044

.012

.016

Phylogram

Page 27: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 27

Tree Building MethodsTree Building Methods

• Group Most Common SequencesGroup Most Common Sequences– Find the tree that changes one sequence into all of the

others by the least number of steps – Sequences with the smallest number of differences have

the shortest distance between them and are called:

“related taxa”

Page 28: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 28

Tree Building MethodsTree Building Methods

A

B

A

B

C

DF

A

B

D C E

F2 11

A

B

F E

C

D

2 1

3

E

ABCDEF

CDEF

A

B

C

D

EF

F

A

B

C

D

E

A

B

C

D

E

F

1 2 3 4 5

Page 29: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 29

Example Evolutionary Example Evolutionary TreesTrees

• Tree Of LifeTree Of Life– http://tolweb.org/tree/phylogeny.html

• Theory Of Human Evolution At The SITheory Of Human Evolution At The SI– http://www.mnh.si.edu/anthro/humanorigins/ha/a_tree.html

Anthropological and Archeological

Page 30: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 30

How To Build A TreeHow To Build A Tree

• Create AlignmentCreate Alignment– http://pir.georgetown.edu/pirwww/search/multaln.html

• Create TreeCreate Tree– http://www.genebee.msu.su/services/phtree_reduced.html

• Draw TreeDraw Tree– http://iubio.bio.indiana.edu/treeapp/treeprint-form.html

Sequence Based

Page 31: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 31

MSA and Tree RelationshipMSA and Tree Relationship

• ““The optimal alignment of several sequences can The optimal alignment of several sequences can be thought of as minimizing the number of be thought of as minimizing the number of mutational steps in an evolutionary tree for which mutational steps in an evolutionary tree for which the sequences are the leaves” (Mount, 2001)the sequences are the leaves” (Mount, 2001)

+R

CREATE

CREASE

CREATE

CRE-A-TE-

CRE-A-SE-

-RELAPSE-

GRE-A-SER

SeqA

SeqB

SeqC

SeqD

T to S

C to G GREASE

CREASE

CREATE

+L +P

-G

Page 32: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 32

Summary ReviewSummary Review

• Relationships Among Groups Of GenesRelationships Among Groups Of Genes– Comparing Sequences– Building Sequence Families

• Sequence Conservation During EvolutionSequence Conservation During Evolution– Aligning Multiple Sequences

• Evolutionary DiagramsEvolutionary Diagrams– Tracing The Descent From Common Ancestors– Growing Phylogenetic Trees

Page 33: Practical Bioinformatics Tools For Understanding Evolution Robert Latek, PhD Bioinformatics and Research Computing Whitehead Institute for Biomedical Research

WIBR Bioinformatics, © Whitehead Institute 2004 33

ReferencesReferences

• Bioinformatics: Sequence and genome Bioinformatics: Sequence and genome Analysis. David W. Mount. CSHL Press, 2001.Analysis. David W. Mount. CSHL Press, 2001.

• Bioinformatics: A Practical Guide to the Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Andreas D. Analysis of Genes and Proteins. Andreas D. Baxevanis and B.F. Francis Ouellete. Wiley Baxevanis and B.F. Francis Ouellete. Wiley Interscience, 2001.Interscience, 2001.

• Bioinformatics: Sequence, structure, and Bioinformatics: Sequence, structure, and databanks. Des Higgins and Willie Taylor. databanks. Des Higgins and Willie Taylor. Oxford University Press, 2000.Oxford University Press, 2000.