evolutionary analysis - genome.gov€¦ · evolutionary analysis clustal: adding evolutionary...

38
Current Topics in Genome Analysis 2005 Evolutionary Analysis Evolutionary Analysis Fiona Brinkman Simon Fraser University, Greater Vancouver, BC, Canada Why care about Evolutionary Analysis? What do BLAST Protein motif searching Protein threading Multiple sequence alignment Have in common?

Upload: others

Post on 23-Jun-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Evolutionary Analysis

Fiona BrinkmanSimon Fraser University,

Greater Vancouver, BC, Canada

Why care about Evolutionary Analysis?

What do• BLAST• Protein motif searching• Protein threading• Multiple sequence alignment

Have in common?

Page 2: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Why care about Evolutionary Analysis?

Gene family identification

Gene discovery – inferring gene function, geneannotation

Origins of a genetic disease, characterizationof polymorphisms

Why care about Evolutionary Analysis?

Koski LB, Golding GB The closest BLAST hit is often not the nearest

neighbor.J Mol Evol. 2001 Jun;52(6):540-2.

Page 3: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Evolutionary Analysis: Key Concepts• Foundation of most bioinformatic analyses:

Evolutionary theory

• Unique verses non-unique characters

• Sequence alignments are important!

• Fundamentals of phylogenetics and interpretingphylogenetic trees (with cautionary notes)

• Overview of some common phylogeneticmethods

• Appreciate the need for new algorithms

18th and 19th centuries: Theevolution of a theory

• Earth erosion, sedimentdeposition, strata –present earth conditionsprovide keys to the past

Page 4: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

18th and 19thcenturies: The

evolution of a theory

• Discoveries of fossilsaccumulated– Remains of unknown but

still living species that areelsewhere on the planet?

– Cuvier (circa 1800): thedeeper the strata, theless similar fossils wereto existing species

• Discoveries of fossils accumulated– Remains of unknown but still living species that

are elsewhere on the planet?– Cuvier (circa 1800): the deeper the strata, the

less similar fossils were to existing species

Page 5: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Part of Darwin’s Theory• The world is not constant, but changing

• All organisms are derived from commonancestors by a process of branching.

Page 6: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Part of Darwin’s Theory• This explained…

– Fossil record– Similarities of organisms classified together

(shared traits inherited from common ancestor)– Similar species in the same geographic region

– Morphological character-based analysis

What is evolution?

• Think – Pair – Share!

• Come up with a definition of evolution that is6 words or less. Bonus points for 2-3 words!

Page 7: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Characters• Heritable changes in features (morphology,

DNA sequence etc…)

• The more similar characters you have, themore related you are

• However….. characters can be unique andnon-unique

Evolution and characters

time

Page 8: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

A Unique Character:Hair for Mammals

• Hair evolved only once and is “unreversed”• Presence of hair strong indication that

organism is a mammal

Homoplasy:The formation of tails

• Tails evolved independently in the ancestorsof frogs and humans

• Presence of a tail no useful conclusions

Page 9: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Non-unique Unique

Unique and non-unique characters

bioinformaticsbioinfortaticsbioinfortatios oinformatios informatios infortation information

time

Unique and non-unique characters

Example: Sequence analysis of functionally similar transporters

All share the same deleted sequence region, which is not foundin any other transporter examined to date

Unique character?

Further investigate for possible functional significance, or usefor classification

Page 10: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Unique and non-unique charactersExample: Sequence analysis of functionally similar transporters

All have isoleucine at the third position in the sequence,however some other transporters have isoleucine there too,while some other transporters have leucine at that position

Non-unique.

Changes from I L I are common (see BLOSUM ORPAM matrices). Not a high priority for further analysis ofsignificance and not useful for classification.

Classification according tocharacters – more characters can

be good

Colour Skin Cost Legs Feathers Hair

Beef red no $$$ four no hair

Duck red yes $$$ two yes no

Pork white no $$ four no often

Chicken white yes $ two yes no

Tofu white sometimes $ none no no

Chicken most similar to Tofu?

Page 11: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Colour Skin Cost Legs Feathers Hair

Beef red no $$$ four no hair

Duck red yes $$$ two yes no

Pork white no $$ four no often

Chicken white yes $ two yes no

Tofu white sometimes $ none no no

Classification according tocharacters

Classification according to characters– increasing the number of characters

Colour Skin Cost Legs Feathers Hair

Beef red no $$$ four no yes

Duck red yes $$$ two yes no

Pork white no $$ four no yes

Chicken white yes $ two yes no

Tofu white sometimes $ none no no

Chicken most similar to Duck?

Page 12: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Evolution and characters – theimportance of comparing characterswith common origins (homologous)

bioinformaticsbioinformaticsbioinformatiosoinformatiosinformatiosinformationinformation

time

Evolution and characters

bioinformaticsbioinformaticsbioinformatios--oinformatios---informatios---information---information

time

Page 13: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Multiple Sequence Alignment

VTISCTGSSSNIGAG-NHVKWYQQLPG

VTISCTGTSSNIGS--ITVNWYQQLPG

LRLSCSSSGFIFSS--YAMYWVRQAPG

LSLTCTVSGTSFDD--YYSTWVRQPPG

PEVTCVVVDVSHEDPQVKFNWYVDG--

ATLVCLISDFYPGA--VTVAWKADS--

AALGCLVKDYFPEP--VTVSWNSG---

VSLTCLVKGFYPSD--IAVEWESNG--

The sole purposeof multiplesequencealignments is toplace homologouspositions ofhomologoussequences intothe same column.

Multiple sequence alignments andphylogenetic analysis

• First step in any phylogenetic analysis

• Phylogenetic analysis only as good as thealignment

in out!

Page 14: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Clustal: Adding evolutionary theory tomultiple sequence alignment

Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive

multiple sequence alignment through sequenceweighting, positions-specific gap penalties and weightmatrix choice. Nucleic Acids Research, 22:4673-4680.

Page 15: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Page 16: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Clustal: Incorporating Biology intoSequence Alignment Algorithms

• Matrices varied at different alignment stagesaccording to the divergence of the sequences

• Gap penalties differ for hydrophilic regions toencourage new gaps in potential loop regions

• Gapped positions in early alignments - reduced gappenalties to encourage the opening up of new gapsat these positions

gh

Standard multiple sequencealignment approach

(first step for phylogenetic analysis)• Be as sure as possible that the sequences included

are homologous

• Know as much as possible about the gene/protein inquestion before trying to create an alignment(secondary structure etc..)

• Start with an automated alignment: preferably onethat utilizes some evolutionary theory such as Clustal

Page 17: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

• Examine alignment:– Are you confident that aligned residues/bases evolved

from a common ancestor?– Are domains of the proteins/predicted secondary

structures, etc. aligning correctly?

No? May need to edit sequences and redo…________________________________________________ ___ __ ____ _

Yes? Move on!

• Note indels (insertions and deletions)– Possible insights into functionally important regions…

• Use alignment as a based for subsequent analyses(identify consensus or other pattern recognition, for PSSM,HMM construction, phylogenetic analysis, etc..)

• Remove unreliably aligned regions for phylogeneticanalysis

ILPITSPSKEGYESGKAPDEFSSGGILPEH--IKDDGELGAAPHSFSTAGVLPLD-----S--AGRPADSFSAAGVLPVDR-------DGQARDEYT-VGVLPVDN-------KGEARDEYT-VGLLPYDD-------QGRPQDDYSRAGGIVSRSG---SNFDGEPKDSYGKVG

Delete?

Page 18: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

A phylogenetic tree

taxon -- Any named group of organisms – evolutionary theory notnecessarily involved.

clade -- A monophyletic taxon (evolutionary theory utilized)

Human

Mouse

Fly

A clade

A node

Page 19: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

A phylogenetic tree with branch lengths

Branch length can be significant… In this case the analysis suggests that the mouse

sequence/taxon is slightly more similar to flythan human is to fly

(i.e. sum of branches A+B+C is less than sum of A+B+D)

Human

Mouse

Fly

A clade

A node

A

BC

D

Phylogenetic analysis

• Organismal relationships

• Gene/Protein relationships

Page 20: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Organismal relationships

Page 21: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Improving our understanding of organismalrelationships

Realization that rates of change are not constant

Page 22: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Improving our understanding of organismalrelationships

Better appreciation for what sequences may be suitablefor analysis of different degrees of divergence

For the tree of life:

rRNA genes

Multiple genes

“Whole genome” datasets of genes

rRNA genes and multiple suitable genes

Gene/Protein Relationships

Homolog, ortholog, paralog??

Page 23: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Homologs

Have common origins but may or may not havecommon activity.

Homologous or not?: Often determined byarbitrary threshold level of similarity determinedby alignment

Homologs

…have common ancestry, but the way they are related can vary

(i.e. the reasons they have diverged into different sequences canvary)

• orthologs - Homologs produced only by speciation. They tend to havesimilar function.

• paralogs - Homologs produced by gene duplication. They tend tohave differing functions.

• xenologs -- Homologs resulting from horizontal gene transferbetween two organisms.

Page 24: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Orthologous or paralogous homologs

Early globin gene

mouse α

ß-chain geneα-chain gene

cattle ß human ß mouse ßhuman α cattle α

Orthologs (α) Orthologs (ß)Paralogs (cattle)

Homologs

Gene Duplication

Orthologs – diverged only after speciation – tend to have similar function

Paralogs – diverged after gene duplication – some functional divergence occurs

Therefore, for linking similar genes between species, or performing “annotation transfer”, identify orthologs

True or False?

A1x is the ortholog in species x of A1y?

A1x is a paralog of A2x?

A1x is a paralog of A2y?

Page 25: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Identifying Gene/Protein Relationshipsfrom Phylogenetic trees

• orthologs - Homologs produced only by speciation. ID: Gene phylogeny matches organismal phylogeny.

• paralogs - Homologs produced by gene duplication. ID: Multiple copies of homologs in a given species, or

genes more/less related than expected by organismal phylogeny.

• xenologs -- Homologs resulting from horizontal gene transferbetween two organisms.

ID: Gene phylogeny does not match organismal phylogeny in atree where most genes do match organismal phylogeny well.

What are the probable orthologs and paralogs of the fly genes BKA and WOOT?

Chimpanzee

Human

Mouse

Fly

Worm

Chimpanzee gene ABC

Human gene XYZ

Mouse gene LMNOP

Fly gene BKA

Fly gene LOTR

Known organismal phylogeny

Human gene SOS

Human gene CBA

Mouse gene PONML

Fly gene WOOT

Worm gene LOTRIII

Page 26: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

High Throughput Gene Orthology:How to detect?

• Most common high throughput computational method: Identifyreciprocal best BLAST hits (EGO, COGs,…)

Example Problem:

• If making comparisons between human and bovine, for example, thebovine gene dataset is still quite incomplete

• Therefore, current best hit may be a paralog now and the true ortholognot yet sequenced

cattle human cattle mouse

Can we improve orthology analysis for linkingfunctionally similar genes?

• One solution: Phylogenetic analysis of all putative human-bovineorthologs, using mouse as an outgroup

• Assumption:- Mouse and Human gene datasets are more complete, with more trueorthologs identified

Expect (organismal phylogeny): Reject:

cattle human mouse cattle

human mouse

Page 27: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

A bacteria

Bunchofbacteria

BunchofEukaryotes

BunchofEukaryotes

Twobacteria

TwoEukaryotes

2 Forms in 1 Species+ + ++ +

Slides from Jonathan Eisen

Page 28: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

2 Forms in 1 Species - LGT

Gene present incommon ancestor

Both formsmaintained

+ + + +

+

Red and blue formsdiverge

+

2 Forms in 1 Species - Gene Loss

Gene duplicated in common ancestor

+ + ++ +

++

LossLoss

Page 29: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Unusual Distribution Pattern+ +

Unusual Distribution - LGT

Gene originateshere

Acquires new type of gene

+ +

Page 30: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Unusual Distribution - Gene Loss+ +

Gene present in ancestor

Gene losthere

Unusual Distribution -Evolutionary Rate Variation -?

+

+

Gene too diverged to be found

Page 31: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Unusual Distribution -Incomplete Data

+ +

Gene present in ancestor

+/-+/-

Hope for the futureBetter sampling of all the species in our world

2004: Environmentalgenomics sampling takescentre stage

Tyson et al (2004) Community structureand metabolism through reconstructionof microbial genomes from theenvironment. Nature, 428, 37-43.

Venter et al (2004) Environmentalgenome shotgun sequencing of theSargasso Sea. Science, 304, 66-74.

Page 32: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

“So….. how do we construct a phylogenetic tree??”

Most common methods

• Parsimony• Neighbor-joining• Maximum Likelihood

Page 33: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Parsimony

• “Shortest-way-from-A-to-B” method• The tree implying the least number of changes in

character states (most parsimonious) is the best.

• Note:– May get more than one tree– No branch lengths– Uses all character data

Neighbor-joining(and other distance matrix methods)• “speedy-and-popular” method• distance matrix constructed• distance estimates the total branch length between

a given two species/genes/proteins• Neighbor-joining approach: Pairing those

sequences that are the most alike and using thatpair to join to next closest sequence.

Page 34: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Maximum Likelihood• “Inside-out” approach• produces trees and then sees if the data could

generate that tree.• gives an estimation of the likelihood of a

particular tree, given a certain model ofnucleotide substitution.

• Notes:– All sequence info (including gaps) is used– Based on a specific model of evolution – gives

probability– Verrrrrrrrrrrry slow (unless topology of tree is known)

How reliable is a result?• Non-parametric bootstrapping

– analysis of a sample of (eg. 100 or 1000) randomlyperturbed data sets.

– perturbation: random resampling with replacement,(some characters are represented more than once, someappear once, and some are deleted)

– perturbed data analysed like real data– number of times that each grouping of

species/genes/proteins appears in the resulting profileof cladograms is taken as an index of relative supportfor that grouping

Page 35: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Bootstrapping The number of times a

particular branch is formedin the tree (out of the Xtimes the analysis is done)can be used to estimate itsprobability, which can beindicated on a consensus tree

High bootstrap values don’tmean that your tree is thetrue tree!

Alignment and evolutionaryassumptions are key

Parametric Bootstrapping

Data are simulatedaccording to thehypothesis being tested.

Page 36: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Phylogenetics – More infoLi, Wen-Hsiung. 1997. Molecular evolution Sunderland,Mass. Sinauer Associates.

- a good starting book, clearly describing the basis ofmolecular evolution theory. It is a 1997 book, so isstarting to get a bit out of date.

Nei, Masatoshi & Kumar, Sudhir. 2000. Molecularevolution and phylogenetics Oxford ; New York. OxfordUniversity Press.

- a relatively new book, by two very well respectedresearchers in the field. A bit more in-depth than theprevious book, but very useful.

Phylogenetic Tree Construction:Examples of Common Software

PHYLIPhttp://evolution.genetics.washington.edu/phylip.htmlPAUPhttp://paup.csit.fsu.edu/MEGA 2.1www.megasoftware.net/

TREEVIEWhttp://taxonomy.zoology.gla.ac.uk/rod/treeview.html

Extensive list of softwarehttp://evolution.genetics.washington.edu/phylip/software.html

Page 37: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

Challenges

How do we classify?

Computational Challenges

• Need to incorporate more evolutionary theoryinto the multiple sequence alignment andphylogenetic algorithms used in phylogeneticanalysis

• Phylogenetic analyses are computationallyintensive – great way to benchmark your CPUspeed!

Page 38: Evolutionary Analysis - Genome.gov€¦ · Evolutionary Analysis Clustal: Adding evolutionary theory to multiple sequence alignment Thompson, J.D., Higgins, D.G. and Gibson, T.J

Current Topics in Genome Analysis 2005Evolutionary Analysis

More Challenges

• Increasing the sampling of our genetic world

• More accurately differentiating orthologs, paralogs,and horizontally acquired genes

• How frequent is gene loss, gene duplication, andhorizontal gene transfer in genome evolution?

• To what degree can we predict protein/gene functionusing phylogenetic analysis?

Remember:Evolutionary theory is evolving…