worth c.l. 2009. structural and functional constraints in the evolution of protein families

Upload: esau-bojorquez-velazquez

Post on 02-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    1/12

    Athogh amio aid sq dtmis th-dim-sioa poti stt somtims ith a itt hpfom chaperones ttiay stt tds to ttosd tha sq i otio1,2. Ths, i homo-ogos famiis of potis, ftios a oft taid adstts a say y simia thogh sqsha digd. This is mo idt i potispfamiis, i hih oa sq simiaity a isigifiat t stta ad ftioa simiaitissti poid id of distat ommo asty.

    Aod 40 yas ago Kima ad Ohta dopd thta thoy of otio, hih stats that most o-tioay hags at th moa a asd yneutraldrift th apta of stiy ta mtatios3,4.Thy sggstd that mtatios that dispt th xistigstt ad ftio of a mo o ss fqtyi otio tha ta mtatios. This as aoatdy Zkkad ad oags i th ftioa dsity

    hypothsis, hih poposs that th at of otio isdtmid y th popotio of a th possi mta-tios that pod a poti that is ftioay qiatto th id typ5,6. Mo ty it as fod that potisith may itatio pats o mo soy thathos ith f itatio pats79, t this has disptd9. Aayss of th aagmts of poypptidhais, oft ad poti fods, idiat that thos thato fqty td to adopt ga ahitts10.

    Isights ito th ffts of misss mtatios opoti fodig, stt ad ftio, ad thy itoth os of id-typ amio aids, ha otaidfom af xpimta appoahs to amio aid

    sstittio, sh as sit-spifi mtagsis. Shappoahs disst th aios otitios of idi-

    ida sid hais i a systmati ay. Fo xamp, thompx atioships t amio aid sstittiosad th fodig, staiity ad atiity of potis sh asp53 ha xpod y omiig moa ioogyad physia-ogai hmisty1115. Th atiophagT4 ysozym has sd as a mod systm to isti-gat th toa of potis to amio aid pamt,istio ad dtio of oth sig amio aids adog sgmts of th poypptid hai sig high so-tio X-ay ystaogaphy1619. Ths assia stdisha sho that a poti a toat sstatia hags,osistt ith th osatios of oig potis.Simia xpimts ha istigatd ho mtatiosa toatd i th ati sits of zyms; fo xamp,th stdy of mtat lactamase zyms i od todstad th mtat atias sista to piii

    aaogs shod that as piiis om ag thzyms o ag ati sits ad om ss sta16.This dstadig has xpoitd i th dsig of ihiitos. Th omiatio of moa ioogy,ogai hmisty ad stat-of-th-at high-thoghptsig thoogis i ditd otio to gat potis ith taio-mad poptis dmostatsthat ta dift a ad to mo pomisos zymsith oad ftios17,18.

    Ths xpimta stdis ha poidd ia-a qatitati ifomatio that is ompmtay toad agy osistt ith th sts of ompaisosof th sqs ad stts of poti famiis ad

    *Biochemistry Department,

    University of Cambridge,

    Cambridge, CB2 1GA, UK.Leibniz-Institut fr

    Molekulare Pharmakologie,

    Campus Berlin-Buch,

    Berlin, 13125, Germany.

    Correspondence to T.L.B.

    e-mail:

    [email protected]

    doi:10.1038/nrm2762

    Published online

    16 September 2009

    Chaperone

    A protein that assists in the

    folding or unfolding and

    the assembly or disassembly

    of other macromolecular

    structures.

    Neutral drift

    The process whereby random

    sampling effects over

    successive generations give

    rise to stochastic changes in

    the allele frequencies within

    a population.

    -lactamase

    An enzyme produced by some

    bacteria that confers resistance

    to lactam antibiotics.

    Structural and functional constraintsin the evolution of protein familiesCatherine L. Worth*, Sungsam Gong* and Tom L. Blundell*

    Abstract | High-throughput genomic sequencing has focused attention

    on understanding differences between species and between individuals.

    When this genetic variation affects protein sequences, the rate of amino acid

    substitution reflects both Darwinian selection for functionally advantageous

    mutations and selectively neutral evolution operating within the constraints of structureand function. During neutral evolution, whereby mutations accumulate by random drift,

    amino acid substitutions are constrained by factors such as the formation of intramolecular

    and intermolecular interactions and the accessibility to water or lipids surrounding the

    protein. These constraints arise from the need to conserve a specific architecture and to

    retain interactions that mediate functions in protein families and superfamilies.

    R E V I E W S

    nATure revIewS |Molecular cell Biology vOuMe 10 | OcTOber 2009 |709

    2009 Macmillan Publishers Limited. All rights reserved

    mailto:[email protected]:[email protected]
  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    2/12

    Constraint

    A structural and dynamic

    system, or functional factor,

    that influences the acceptance

    of amino acid substitutions

    that occur in divergent protein

    families. Given that selection

    occurs at the level of the

    organism and that individual

    proteins and the systems in

    which they evolve are plastic,

    these constraints tend not to

    force but rather to restrain

    the substitutions that occur

    in evolution.

    Orthologues

    Genes (or gene products)

    descended from a common

    ancestral origin that diverged

    as a result of a speciation

    event.

    Hydrogen bonding potential

    The capacity of atoms to act

    as proton donors or acceptors

    in the formation of hydrogenbonds.

    Jelly roll

    An eightstranded sandwich

    that is formed by four Greek

    key motifs, each consisting of

    four sequential antiparallel

    strands.

    -propeller

    An all protein architecture

    comprising four to eight

    bladeshaped sheets

    arranged toroidally around a

    central axis.

    spfamiis, hih o i. Sh ompaatiaayss of potis a tho ight o ths os-

    atios y fosig o sstittios at topoogiayqiat amio aid positios i famiis ad sp-famiis ad y itgatig th ifomatio ito oaiomt-dpdt sstittio tas. Thssho that idtia amio aids a sstittd i dif-ft ays, dpdig o th o of th amio aidi maitaiig th potis stt ad ftioaitatios. what th is th at of th constraintso amio aid sstittios that gi is to distitpatts of poti otio?

    I this ri osid amio aid sstittiosthat ha od i poti famiis ad spfamiis.w do ot disss th oigis of fods o thi otioy additios ad statios of mts of sodaystt, g dpiatios ad fsios; ths ha idy id sh1922. nith do osidostaits aisig fom th gomi positio of thodig gs, xpssio patts, positio i ioogiatoks o ostss to tasatio23 (s BOX 1 fo a-ios ostaits of poti otio). rath, foso ho th amio aid sstittios dig digtotio of poti famiis a ostaid y thstt ad ftioa itatios of a poti.

    w sho that amio aid sstittios a

    dstood tt ad pditd mo aaty if thth-dimsioa iomt of th amio aid sidhai ko as th oa stta iomt is dfid i th ftioa stat of th poti, foxamp i tms of soday stt, assiiityto th at, ipids o oth mdim sodig thpoti ad fomatio of hydog ods. I patia, fos o at-iassi poa sid hais, hihpoid stog stta ad ftioa ostaits ith otio of poti famiis. w sho that thsa gi is to haatisti ahitta motifsstig fom thi d to satisfy hydog odigqimts.

    Comparative analyses of homologous proteins

    w fist ty to dstad famiy smas fo sk to ogiz th iq fats of idiidafamiy mms. This is st ahid y ompaigth sqs ad stts of mms of famiisad spfamiis potis that a homoogos odsdd fom a ommo asto to fodamog th mo tha fifty thosad potis fo hihahitts ha dtmid at high sotio.w a th dfi ah amio aid positio i apoti famiy i tms of its oa stta io-mt ad istigat ho stta ostaits afftth amio aid sstittios that ha aptddig otio. O majo hag h is to dis-tigish orthologues, hih ha th sam ftiosi difft ogaisms, fom paaogs, hih stfom g dpiatio ad might ha od ftios24. Fo paaogs th ostaits i hahagd. Gay othoogs a dfid o thasis of sq simiaity t this mais a soof taity i ompaati aayss.

    Th fist ompaisos 40 to 50 yas ago of pimayad ttiay stts of homoogos potis (gois,si potiass ad ysozyms) fosd o assi-iity to at, say ad sot assiiity, adshod that th sot-iassi os of potistdd to osy pakd, mo hydophoi admo osd tha th sfa gios25. Aayss ofth stts fom may poti famiis (BOX 2) shothat this mais a sf gaizatio. Ths ayaayss aso fosd o ga soday stts,sh as -his ad -shts, hih immdiatyogizd to fao patia amio aids, so poidigfth ostaits o otioay hag2628.

    Paig ad oags aizd that th qimtfo th satisfatio of th hydrogen bonding potential ofpoypptid mai-hai pptid amid (nH) ad a-oy (cO) gops od ot oy gi is to gasoday stts29,30 t aso mak th mai haisof potis mo hydophoi so that thy od id i th o of a goa poti aog ith o-poa sid hais. It soo am idt that thsfats of mai-hai hydog odig stit po-ti ahitts to a imitd st of sp-sodaystts fomd y omiig soday sttsito goa its, sh as -sadihs ad as,jelly rolls, propellers, helical bundles, Rossman folds,-as ad may oths. Mai-hai hydog

    odig aso has impotat os i th fomatioof ompx ahs ad ts that ik -his ad-stads3133.

    nthss, may mai-hai pptid cO adnH gops a ft satisfid i thi pottia tofom hydog ods. A ay aaysis of hydogodig ad that ~40% of sh gops do ot fomhydog ods ith mai-hai atoms of oth amioaids34. I ga this ak of hydog odig osat pas h -stads ad -his tmiat3438,g39,40 o d41,42, t it is aso ommo i poypo-i o iga, tistd -stads43,44 ad i ahs adts3133,45,46. Th hydog odig pottia of ths

    Box 1 | Various constraints of protein evolution

    In this Review, we focus on local structural environments of amino acids as major

    constraints on the possible substitutions of amino acids during protein evolution.

    We also address the question of the importance of maintaining the function of a

    protein in imposing constraints, especially where molecular recognition is crucial,

    such as in enzyme active sites. However, there are many other constraints that are less

    well understood but provide important pressures in evolution. They include those that

    arise from DNA packaging and gene splicing and from the requirement for reliableand well-coordinated gene expression94,95,97. For example, ubiquitously expressed

    proteins tend to evolve slower than tissue-specific proteins. In addition, constraints

    arise from the process of protein folding98,99, from the importance of retaining various

    conformational changes and flexibility that mediate functions in the cell and from the

    need to avoid opportunistic interactions (interactions occurring by chance) and

    amyloid formation aggregation of misfolded proteins into a highly ordered

    fibril-like structure100,101. Furthermore, in order to prevent accumulation of damaging

    proteins the protein degradation system must be finely controlled, especially for

    misfolded proteins resulting from mutations102. Recently, it has been found that

    epigenetic factors, such as DNA methylation and chromatin remodelling, have

    important roles in the regulation of gene expression103 that eventually affect the

    evolution of proteins. Hence, an integrated approach is required to comprehensively

    understand protein evolution23.

    R E V I E W S

    710 | OcTOber 2009 | vOuMe 10 www..m/ws/mb

    2009 Macmillan Publishers Limited. All rights reserved

  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    3/12

    -helical bundle

    A protein fold consisting of

    multiple helices that are

    approximately parallel to

    one another.

    -Rossman fold

    Two repeating

    supersecondary motifs.

    Distance matrix

    An nn array that represents

    the distances between a set

    ofn elements.

    Positive main-chain

    torsion angle

    A positive dihedral angle

    around the nitrogencarbon

    bonds in the protein main

    chain. For lamino acids these

    bond angles are generally

    restricted to a negative value

    owing to steric hindrance from

    the side chains, but they can be

    positive when there is no side

    chain (Gly) or when polar

    sidechain interactions with the

    mainchain peptide units

    stabilize this conformation.

    motifs is satisfid y at mos o y poa sidhais; h th sid hais a iassi thy paa stog ostait o ta dift.

    compaisos of homoogos potis sho that

    itatio sits that mdiat impotat ftios yidig gatoy potis, i aids ad oth ig-ads aso pa stog otioay ostaits o amioaid sstittios4750. Ths itatio sits aot dstood at th of a isoatd poti; ath, dif-ft potis ad somtims oth maomosassoiat to fom a mtiompot systm that ssas a ftioa it ad pas sigifiat ostaits ootioay hag. I isi, fo xamp, ompaa-ti aayss of famiy mms ha ad that amioaid sstittios at th itfas iod i dim, hx-am ad pto ompx fomatio ha dstog ostaits si th otio of oy fishs oy th odt s-od of Hystiomopha, hihids aimas sh as th gia pig ad th oyp,has moomi isis47. Athogh th amio aid s-stittios that ad to th oss of th aiity of isi tohxamiz i Hystiomopha fist thoght to stiy ta, it is o thoght that thy po-ay stiy adatagos ad poidd a mas ofstay stoig isi, possiy i a iomt itha shotag of zi that ptd th s of zi isihxams as fod i oth mammas.

    Fo zyms, it is a that th oa iomt ofatayti sids i atio itmdiats ad tasitiostats mst osidd. Th d fo patia og-itio sqs at sits of post-tasatioa modifi-

    atio, of adaptotmpat poti itatios adof aosti ffto idig aso poids stog o-staits. rty, it has aso om idt that mayof ths sits of moa itatio o ogitio adto fth ostaits o th sstittio of amio aidsids i th iiity of poti idig sits t ot ith immdiat otat ith a igad51.

    Conservation and local environment

    Sq aigmts of homoogs of ko stta sd to hp qatify th ostaits that aisfom oth poti stt ad ftio i a famiy ofpotis. by dfiig th oa stta iomt

    of amio aid sids (soday stt, sotassiiity ad fomatio of hydog ods), dis-tit patts of sstittios ha osd52,53.eiomt-spifi sstittio tas (eSSTs) sto

    ths sstittio data qatitatiy i th fom ofpoaiitis ad thy poid ifomatio o thxist of ah amio aid i a patia iomtad th poaiity of it ig sstittd y ay othamio aid (BOX 3).

    Ths eSSTs sho that amio aids ith sid haisthat a hydog odd to mai-hai nH ad cOgops a mo osd tha thos ith sid haisthat a hydog odd to oth sid hais. This ispatiay idt h sid hais a iassi toth sot ad h thy fom hydog ods to mai-hai nH gops. This impis that a ia mt ipoti stt is th satisfatio of th hydog oddoo ad apto poptis of th mai-hai nHad cO gops h th poti is fodd. wh thsqimts a ot satisfid y soday stts,hydog ods to sid hais might osd tomt this qimt.

    Solvent accessibility has a major role. It has og ko that sid osatio i th sot-iassigios is mh high tha i thos gios that asot assi54. FIGURE 1 shos th stig of64 oa stta iomts ith th uPGMA(ightd pai gop mthod ith aithmti ma)agoithm55, asd o distas amog 64 sstittiotas (64 64 distance matrix), to idtify th stta

    ostaits that dtmi th sstittio patts ofamio aids. Th dista t to sstittiotas as masd y smmig th diffs i thpoaiity of amio aid sstittios. Th matis foth 64 iomts fom 3 distit sts: 2 a disti-gishd y sot assiiity (sts 1 ad 2 i FIG. 1),has th thid is haatizd y th ps of apositive mainchain torsion angle (st 3 i FIG. 1).

    e i th st of iomts ith posi-ti mai-hai tosio ags (s o), sotassiiity diids th iomts ito to: as-si ad iassi. Sot iassiiity ths ptsostaits o th apta of stiy ta

    Box 2 | A selection of protein classification databases and similarity search servers

    Insight into evolutionary relationships can be gained by grouping similar proteins. Several classification resources

    categorize proteins based on their degree of similarity but they differ in definition and method. Nevertheless, there is

    general agreement on the hierarchical order of overall topology or fold, superfamily, family and individual domains.

    Many proteins with the same topology will have convergently evolved, but members of superfamilies and families are

    likely to have arisen from a common ancestor by divergent evolution. SCOP104 and CATH105 are two well-known

    databases of hierarchical protein structure classification.HOMSTRAD70, PASS2(REF. 106), Toccata107 and Dali108 provide

    superimposed and aligned protein families with various annotations at the residue level. CE109

    also provides structurecomparison and alignment.MMDB provides structureneighbour calculations such that each structure is linked to

    related three-dimensional domains110. Sequence-based protein family databases include Pfam111 and InterPro112.

    InterPro is a consortium of several member databases such as PROSITE113, Pfam, Prints114, ProDom115, SMART116 and

    TIGRFAMs117. Using curated or computed protein classification schemes, homology detection can be achieved using

    sequence and/or structure similarity as implemented byGene3D118, Superfamily119, PhyloFacts120, CDD121, PairsDB122

    and SMART. These databases and servers can be useful resources in the study of protein evolution and a comprehensive

    comparison of them is available in REF. 123.

    R E V I E W S

    nATure revIewS |Molecular cell Biology vOuMe 10 | OcTOber 2009 |711

    2009 Macmillan Publishers Limited. All rights reserved

    http://www-cryst.bioc.cam.ac.uk/ESST/http://scop.mrc-lmb.cam.ac.uk/scop/http://www.cathdb.info/http://www-cryst.bioc.cam.ac.uk/~homstradhttp://www-cryst.bioc.cam.ac.uk/~homstradhttp://caps.ncbs.res.in/campass/pass2.htmlhttp://www-cryst.bioc.cam.ac.uk/toccata/toccata.phphttp://ekhidna.biocenter.helsinki.fi/dali/starthttp://cl.sdsc.edu/http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtmlhttp://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtmlhttp://pfam.sanger.ac.uk/http://www.ebi.ac.uk/interprohttp://www.expasy.ch/prositehttp://www.bioinf.man.ac.uk/dbbrowser/PRINTS/http://prodom.prabi.fr/prodom/current/html/home.phphttp://smart.embl-heidelberg.de/http://www.jcvi.org/cms/research/projects/tigrfams/overview/http://gene3d.biochem.ucl.ac.uk/Gene3D/http://gene3d.biochem.ucl.ac.uk/Gene3D/http://supfam.cs.bris.ac.uk/SUPERFAMILYhttp://phylogenomics.berkeley.edu/phylofactshttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://pairsdb.csc.fi/http://pairsdb.csc.fi/http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://phylogenomics.berkeley.edu/phylofactshttp://supfam.cs.bris.ac.uk/SUPERFAMILYhttp://gene3d.biochem.ucl.ac.uk/Gene3D/http://www.jcvi.org/cms/research/projects/tigrfams/overview/http://smart.embl-heidelberg.de/http://prodom.prabi.fr/prodom/current/html/home.phphttp://www.bioinf.man.ac.uk/dbbrowser/PRINTS/http://www.expasy.ch/prositehttp://www.ebi.ac.uk/interprohttp://pfam.sanger.ac.uk/http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtmlhttp://cl.sdsc.edu/http://ekhidna.biocenter.helsinki.fi/dali/starthttp://www-cryst.bioc.cam.ac.uk/toccata/toccata.phphttp://caps.ncbs.res.in/campass/pass2.htmlhttp://www-cryst.bioc.cam.ac.uk/~homstradhttp://www.cathdb.info/http://scop.mrc-lmb.cam.ac.uk/scop/http://www-cryst.bioc.cam.ac.uk/ESST/
  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    4/12

    T,W\U,LJ\7SGOGSHW9GG$)DU$)TY:VGY7S/U)VULK

    Q,W

  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    5/12

    |

    EaSOn

    CasOn

    Cason

    Hason

    HaSon

    HasO

    nHaSO

    nPaSO

    nCaSO

    nCasONEason

    PAson

    CAsON

    CASON

    EAsoN

    EAsON

    CAsoN

    CASoN

    EASoN

    EASON

    HAsON

    HASO

    N

    HAsoN

    HAS

    oN

    CAsO

    n

    CAS

    On

    CASon

    CAson

    EA

    sOn

    EAS

    on

    EASO

    n

    EasOn

    HaSON

    HaSoN

    HasoN

    CaS

    oN

    Cas

    oN

    HasON

    EaSon

    EasON

    EaSoN

    EasoN

    CaSo

    n

    CaSON

    EaSON

    Pason

    PASon

    PASOn

    PAsOn

    PasOn

    PASoNPAsoN

    PASONPAsONPaSoNPaSo

    nPaso

    N

    PaSO

    NPasO

    N

    HAson

    HASon

    HASOn

    HAsOn

    EAson

    1

    3

    2

    NHydrogen bonds to NH: n

    HSecondary structure:

    OHydrogen bonds to CO: o

    aSolvent accessibility: A

    E P C

    st, t th stig patt is ak tha that ofmai-hai nH gops. This sggsts that th diffttyps of hydog ods ha hiahia ffts o thsstittio patts of amio aids: hydog odst sid hais ad mai-hai nH gops a

    most iftia, food y hydog ods tmai hais ad mai hais, ad th sid haisad mai-hai cO gops. wh th ffts of sotassiiity ad th oth sot assiiity ad thtyp of soday stt a aagd, th stig

    Figure 1 | rss s 64 ms. Trees are constructed on the basis of the 64 64

    distance matrix. Environments are shown using five-letter code representation: the first letter defines the secondary

    structure (-helix (H), -strand (E), positive main-chain torsion angle (P) and coil (C)), the second defines solventaccessibility (accessible (A) and inaccessible (a)) and the remaining three letters define the existence (upper case) or

    absence (lower case) of hydrogen bonds from a side chain to another side chain (S and s, third letter), to a main-chain

    carbonyl group (O and o, fourth letter) and to a main-chain amide group (N and n, fifth letter) (see also BOX 3 for details).

    Three major clusters are numbered as 1, 2 and 3 on the nodes from which they branch. Around the tree there are fourconcentric rings, each of which represents a particular structural parameter: the first ring represents solvent accessibility,

    the second ring represents the existence or absence of hydrogen bonds from a side chain to a main-chain amide

    group, the third ring represents the type of secondary structure and the fourth ring represents the existence or absence

    of hydrogen bonds from a side chain to a main-chain carbonyl group. The 4 concentric rings highlight the hierarchical

    clustering of the 64 environments by showing which amino acid substitution matrices are similar and which local

    environments are the major determinants of the substitution patterns. The trees were drawn using iTOL 128.

    R E V I E W S

    nATure revIewS |Molecular cell Biology vOuMe 10 | OcTOber 2009 |713

    2009 Macmillan Publishers Limited. All rights reserved

  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    6/12

    van der Waals interaction

    A weak electrostatic interaction

    that is formed by the

    fluctuating electron clouds

    of two atoms.

    tais th sam od of hiahy (s Sppmtayifomatio S1a, (fig)). It is idt that th is ahiahy i th if of th ight typs of hydogods fom sid hais o amio aid sstittios ihomoogos potis (s Sppmtay ifomatioS1,d (fig)).

    Positive torsion angles constrain protein evolution.IFIG. 1, matis fo th 64 iomts ith a positi tosio ag ostitt a distit st, has othmts of soday stt a diidd y sotassiiity. A positi tosio ag a aom-modatd y a Gy, hih has o sid hai, t fo mostoth l-amio aids it ads to disaod itatiost sid-hai ad mai-hai atoms. Ho, fol-amio aids sh as Asp o As, itatios tth sid-hai cO gop ad th cO of th mai-haipptid od a staiiz a positi ag ofoma-tio58. Idd, Gy psts 63% of tota amio aidsthat ha a positi tosio ag, food y As(8%) ad Asp (5%) (data fom eSSTs). I additio, i

    a positi ag ass, sot-assi amio aidso fi tims as fqty as iassi sids,has th aag atio of assi to iassisids is ss tha o qa to 2.2 fo a asss of s-oday stt. H, th pdomia of Gy adpoa sids i th st of amio aids ith a positi tosio ag maks a distit sstittio patt adtay a distit st.

    The frequency of occurrence of local environments.Aaysis of pstati stts59 of poti famiisshos that ~80% of a amio aids og to 1 of 11(ot of 64) oa iomts (s Sppmtayifomatio S2 (ta)). Ho, o of ths 11 oaiomts ids ay hydog ods fom sidhais to mai-hai nH gops, as xptd fom thosatio that 68.6% of amio aids a o-poaad thfo aot fom hydog ods ith thisid hais. Oy 8.5% of amio aids ha a sid haiith a poto apto gop ad a thfo makhydog ods fom thi sid hais to mai-hainH gops, th sod most impotat oa io-mta dtmiat of sstittios aft sotassiiity (s Sppmtay ifomatio S3 (ta)).Th 8.5% of amio aids id Asp, S, As, Th, G,G, Ty, Mt, cys ad His, ad amog thm oy Asp,As ad S a o-pstd ompad ith thi

    akgod popsitis i th poti data st. Thisshos that th distitio of amio aids takig pati hydog odig fom sid hais to mai haisfoos th po a distitio oy a smapopotio of amio aids ha a impotat o i thsstittio patt.

    w ha sho that th dg of amio aidosatio is most afftd y sot assii-ity, food y th ps of hydog ods fomsid hais to mai hais ad t mai hais.Ho, th a oth typs of o-otioaitatios that a highy osd ad ha impo-tat os i poti stts ad idig gios.

    Thi impota is disssd i tms of potistaiity at i this ri. A fth osidatio isth xtt to hih th oa iomt is osdi homoogos famiis ad thfo a poid o-staits o amio aid sstittios. Aayss of potifamiis ad spfamiis sho that th most iapakig aagmts of idiida sid hais gito diff h to potis ha ss tha 30% sqidtity. This is d to ati momts of qiatsoday stta mts. Ho, som iahydog-odig itatios a taid at mhgat s of sq dig.

    Satisfaction of hydrogen bonds

    byig mai hais ad sid hais i th itio of thpoti mos thm fom th sot ad, thoghth hydophoi fft, otits mh to th staiityof th fodd stat of a poti. Ho, it is o athat a ompaa otitio to th staiity of thfodd poti is mad y hydog odig ithithi -his ad -shts o thogh sid hais

    fomig hydog ods ith th satisfid nH adcO gops, as otd ao. Idd, th hydog-odd sid-hai gops opy sma oms thath sam gops h ot hydog odd. This adsto iasd pakig dsity ad stog van der Waalsinteractions i a poti60, ths makig a ag, fao-a otitio to poti staiity ad thy tootio61.

    May sid hais a mak mo tha o hydo-g od y atig as oth poto doo ad apto.Sys of hydog ods i sts of high-sotiopoti stts ha ad that th poa atoms ofa poti ay fai to fom hydog ods ad thatthy otit to a hydog od tok that stai-izs th poti stt34,62,63. Ho, most stdisthat ha ookd at th satisfatio of hydog od-ig pottia i potis ha fosd o mai-haiitatios ad ha gopd sid-hai itatiosath tha tatig ah amio aid sid hai spa-aty62,63. rty, a aaysis of th hydog od-ig pottia of poa sid hais i poti famiis has dsid64. uik pios stdis of hydogods i potis, this stdy stimatd th osatioof ths poa sids i od to idtify atioshipsthat xist t sid osatio ad satisfatioof hydog od pottia. Aaysis of th sq

    aiaiity of id amio aid sids i poti

    fami is shos that id poa sid hais, fo hihth hydog od apaity is satisfid (that is, thyfom th f m of hydog ods that thy aapa of), a th most osd amio aid sidsi potis. bid ad satisfid poa sid hais amo osd tha o-poa sids ad idpoa sid hais that a satisfid o that do ot fomay hydog ods.

    Distigishig th hydog-odd stat of a poasids sid hai i tms of hydog od satisfa-tio xpais th osd osatio of ths poasids, patiay h th poa sid is id.wh a poa sid is id ad satisfid i tms of

    R E V I E W S

    714 | OcTOber 2009 | vOuMe 10 www..m/ws/mb

    2009 Macmillan Publishers Limited. All rights reserved

    http://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.html
  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    7/12

    |

    a

    c

    b

    3app (24) JJWW/Q/Q)G7*V$'/:9)6WH/VJGJIV*L$G7*WWO/O/GGV99

    4ape (21) *WSDTW/Q/G)'7*V6'/:9)6VH7NVWV,G*,$G7*WWO/\/SDW99

    2apr (24) *WS*NN)Q/')G7JV6'/:,$6WOd9DVVIG*L/G7*WWO/L/SQQL$

    1smra (21) *WSS TW)N9L)G7JV$1/:936WNdOOdHHJoH9Y9G7*VVI,6$SWVV/

    1mpp (21) *WSJ TG)\//)G7JV6'7:9SKNJdYVIGJDTD)W,G7*WQI)L$SVV)$

    1am5 (21) *WSS HV)N9,)G7*V6Q/:966VKdDDdHJoT$,9G7JWVN,Y$SYVD

    1psn (21) *WSD TG)W9Y)G7*V61/:936Y\dLDoDHJoT$,9G7JWVO/7*SWVS,

    2asi (27) *WSJ TG)O//)G7*V6'7:9SKNJdYUIVUSTD)W,G7*WQI)L0SVVD$

    EEEEEEEE EE EE EEEEE EEE DDDD

    30 40 50 240 250 260

    Tyr corner motif

    A motif that involves a

    conserved Tyr within Greek key

    proteins forming a hydrogen

    bond with the local protein

    backbone in an adjacent loop.

    sid-hai hydog odig, it is iky to ha osd dig otio as it staiizs thpoti stt. cosy, this sam otioaypss fo osatio is ot xtd o idpoa sids that a ot hydog odd o thata satisfid. Thfo, satisfatio of th hydogodig pottia of poa sid hais is a ky ostait

    i poti otio.

    Stabilization of protein architecture

    So hat kid of poti stts do ths id poasids maitai? Most aayss of th staiizig osof poa sid hais o th akos of poti st-ts ha fosd o a patia soday sttaotxt42,6567. O sh stdy68 aaysd sid-hai tosid-hai ad sid-hai to mai-hai itatiosthat assifid aodig to th positio of th atomgops ati to th amio ad aoxy tmii of-his, -stads ad ois. This ad oth aayssshod that appig sids sh as G o Asp itat

    ith -hix dipos, hih a fomd y th agggatfft of idiida dipos fom a of th pptid gopsi a -hix ad st i a patia positi hag at th-hix n tmis ad a patia gati hag atth c tmis69. Fo- ad fi-sid motifs thatgi ith a S o Th (ST motif)37 o a Asp o As(Asx motif)38 idtifid. Ths motifs fom hydo-g ods fom thi spti poa sid hais toth mai-hai atoms of amio aids a th -hixc tmis. Th motifs hp to staiiz poti stth thy o at -hix n tmii t aso ommoyfom idpdt ST -ts o Asx -ts o fatithi -g oops.

    Th ky o that staiizig hydog od it-atios ha i maitaiig poti stt is fthdmostatd y a xamp that s i potis: ahighy osd Ty i th Tyr corner motifof immo-goi-ik -sadih potis is impotat fomaitaiig poti staiity67. This is o of th mayidtifid xamps of ig patts that iohydog od itatios.

    Aaysis of th HOMSTrAD dataas70 shos thatot of a tota of 142 poti famiis that ha 5 mmso mo, 66 ha tiy osd id poa si-ds ad ths qiat sids fom hydog odsthogh thi sid hais to a mai-hai atom i ahstt. FIGURE 2 shos o sh xamp of osa-tio of sq ad oa stta iomt foth aspati potias famiy. Th osatio of thssid-hai to mai-hai itatios impis that mai-hai ahitt is a ia ostait o th otioof potis ad that th itatios a taid as asstia pat of th poti fod. Idd, i this as ithas ogizd that ths hydog ods oti-t to hodig togth to domais, hih sm toha od fom idtia sits i a astapoti ad a o taid i th dimi toiapotiass, sh as that fom HIv.

    what th a said i mo ga tms aotth ahitts i hih ths itatios ha shia os? w o sho that apat fom appig oasoday stts, thy oft spa mts of th s-oday stt, i a ay that is miist of th osof joists, as o stts that spa pias ad posts, ad atoth tims sppot ompx oop stts, ik tsssthat sppot th oofs of idigs.

    Side chains spanning secondary structures. Typia

    xamps of sid hais that spa soday sttsa poidd y Asp sids, hih fqty spa-hia n tmii y fomig hydog ods to thn-tmia mai-hai nH gops36,71,72 of a adjat-hix. Sh os fo Asp sids o -his poidstog ostaits o thi sstittio y oth amioaids. Ag sids ha simia os, spaig thc tmii of -his. Ths, a id Ag that is o-sd i a s mms of th ios isphosphataoxyas famiy is aays fod at a qiat posi-tio i a -hix c tmis ad is osd to spato his, fomig hydog ods to th c tmisof th adjat -hix (FIG. 3a).

    Figure 2 | S ss s- s ss. | Superimposed

    cartoon of eight members of the pepsin-like aspartic proteinase family that have two

    conserved buried Thr residues in topologically equivalent positions (shown in magenta),

    showing that hydrogen bonding interactions and the architectures that they stabilize are

    conserved in evolution. b | The two conserved Thr residues in a representative pepsin-like

    aspartic proteinase family member (Protein Data Bank code3app). Each Thr forms two

    hydrogen bonds (shown as grey dashed lines) to main-chain atoms. These residues and the

    interactions that they form are conserved across the family, implying that the side-chain

    to main-chain interactions have an important role in the main-chain architecture of these

    proteins; in fact, the hydrogen bonds formed between the Thr residues and the main

    chain help to hold the two domains together. | Selected regions of a multiple sequence

    alignment of the aspartic proteinases with two conserved DTG motifs (highlighted by

    black stars). The local structural environment of each residue in the alignment is indicated

    usingJOYannotation124: solvent inaccessible (uppercase), solvent accessible (lower case),

    -helix (red), -strand (blue), hydrogen bond to side chain (overlined), hydrogen bond tomain-chain amide group (bold), hydrogen bond to main-chain carbonyl group (underlined),

    disulphide bond (cedilla) and positive main-chain torsion angle (italic). Conserved-helices and -strands are indicated by a and b respectively. All protein structureimages were produced using PyMOL.

    R E V I E W S

    nATure revIewS |Molecular cell Biology vOuMe 10 | OcTOber 2009 |715

    2009 Macmillan Publishers Limited. All rights reserved

    http://tardis.nibio.go.jp/homstrad/http://www.rcsb.org/pdb/explore/explore.do?structureId=3APPhttp://www.rcsb.org/pdb/explore/explore.do?structureId=3APPhttp://www-cryst.bioc.cam.ac.uk/joyhttp://www.pymol.org/http://www.pymol.org/http://www.pymol.org/http://www-cryst.bioc.cam.ac.uk/joyhttp://www.rcsb.org/pdb/explore/explore.do?structureId=3APPhttp://tardis.nibio.go.jp/homstrad/
  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    8/12

    |

    a b

    c d

    Cation interaction

    A noncovalent interaction

    between an aromatic side

    chain and a cationic side chain.

    cosd sid hais a aso fod spaig-stads. This is oft as mai-hai atomsi -stads a ot satisfid y ita -sht hyd-og ods ad qi sid hais to satisfy thihydog odig pottia. This is fqty th asfo dg stads (-stads ith o o o hydogodig pat stad) o staggd -stads, foxamp thos i -a stts. Fo ista, atiy osd ad id As sid i th pioa-

    is oat poti famiy foms hydog ods ith

    mai-hai atoms i adjat dg stads, poidiga mhaism to satisfy th hydog odig pottiaof ths mai-hai atoms (FIG. 3b).

    Distotios i -his aso ad to ostaits o thsstittio of id poa sids. Fo xamp, ith matix mtaopotias famiy a id Ty hydo-g ods to mai-hai atoms i a distotd -hix,poay hpig to staiiz th ati sit His sidsi a ofomatio that is ssay fo ataysis (FIG. 3c).

    Oth ak o-oat itatios sh asaomatiaomati73,74, amioaomati75,76 ad cationinteractions77 aso poid a mhaism fo staiizigpoti stt, ad thfo ad to additioa

    ostaits o amio aid sstittios dig digtpoti otio. A itstig xamp is fod i thgootioid pto famiy, i hih a osd Agfoms a atio itatio ith a osd ad -id Ty (FIG. 3d). by mas of stta, phyogti adftioa aayss, it as sho that mtatio fomTy to Ag at positio 27 i a asta poti of thgootioid pto mst ha iasd staiity oa ia pat of th pto78. Th athos postatthat athogh this mtatio had o immdiat os-q, it atd a pmissi sq iomtfo sstittios that, miios of yas at, moddth poti ad yidd a ftio.

    A of th osd id poa sids sho iFIG. 3ad ha os that i may ays a aaogos tothos of stts o joists i idigs. w od thatit is ot oy th oa iomt t aso its o ith otxt of th oa ahitt ad ftio thatpas th ostaits o amio aid sstittios.

    Side chains supporting coils and turns.I gios of

    xtsi o-ga soday stt, amio aidsids a oft a to fom ita-mai-haihydog ods. Sh stts a oft sppotdy poa sid hais fom soday stt -mts. examps o ith tistd o -stads,shot -his ad ompx oop stts. A po-

    id ahitta qimts fo ostaits o oaiomts.

    Fo xamp, th ca2+-idig, paami-ikpotis ha a tiy osd ad id Aspthat foms hydog ods to a oi gio (FIG. 4a). Asimia itatio is osd i th itki-1-ikgoth fato famiy (athogh i this as it mgsfom a -stad ath tha a -hix), i hih aosd ad id S foms hydog ods tomai-hai atoms i typ I ad typ Iv -ts (FIG. 4b).Th osd id poa sids i ths to xam-ps hp to staiiz gios of oi, oft i aoatoop stts that fom xtdd ts ad ahs.Pios aayss of ita-oi sid-hai to mai-haihydog ods ad that Asp, S, As ad Tha th poa sids that most ommoy fom thistyp of itatio, ith 80% of ths ass ig atsot-xposd sits68.

    Th aoho dhydogas famiy has a osdid Ag sid that foms hydog ods topoypoi -his (FIG. 4c). I fat, Ag is th most

    ommo poa sid to fom hydog ods tomai-hai atoms of poypoi-typ -his43, ihih ita-hai hydog ods aot fom oigto th xtdd at of th hais ad i hih thth-fod s otatio symmty pts xt-si sp-soday itatios of th kid fod i-shts. Istad, sid-hai to mai-hai hydogods otit to mai-hai atom satisfatio adpoypoi staiity.

    A of th osd id poa sids sho iFIG. 4ac io mtip sid-hai to mai-hai it-atios that oft fom stts smig th tsssof oof sppots ad idgs. bid poa amio aids

    Figure 3 | cs ss s s ss. | A buried

    Arg at an -helix carboxyl terminus in ribulose bisphosphate carboxylases (ProteinData Bank (PDB) code 1gk8) forming hydrogen bonds to another -helix C terminus.b | A buried Asn spans -strands in a -barrel, forming a hydrogen bond with the mainchain of another -strand in the picornavirus coat protein family (PDB code1tme). | A Tyr in the matrix metalloproteinase family that spans -helices, forming a hydrogenbond to a main-chain group in a second (distorted) -helix that contains two active siteHis residues on the opposite face. | An Arg in the glucocorticoid receptor family

    that forms a cation interaction with a Tyr residue (PDB code 1m2z). Representativestructures were chosen for each family based on resolution; residues are coloured by atom

    type with buried polar residues shown in magenta. Hydrogen bonds are shown in grey.

    R E V I E W S

    716 | OcTOber 2009 | vOuMe 10 www..m/ws/mb

    2009 Macmillan Publishers Limited. All rights reserved

    http://www.rcsb.org/pdb/explore/explore.do?structureId=1GK8http://www.rcsb.org/pdb/explore/explore.do?structureId=1TMEhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1TMEhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1M2Zhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1M2Zhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1TMEhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1GK8
  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    9/12

    |

    a b c

    SH3 domain

    (Src homology 3 domain).

    A small domain that is found

    in various intracellular or

    membraneassociated proteins

    and has a barrel fold.

    Euclidean distance

    A geometric distance

    between two point sets in the

    ndimensional (or Euclidean)

    space.

    a poid a mhaism of piig oops i pa hmai-hai to mai-hai itatios aot sffi.

    cosatio of ths sids ad th itatios thatthy fom impis that thy a impotat fo maitai-ig poti stt ad thfo a poid stogostaits o amio aid sstittios.

    Evolutionary pressure on fast folding

    rsids that s fast ad ot poti fodig asos ot ftio ad this aso ads to ostaitso th otio of potis. Fodig simatios adsq dsig ha sd to dop a mthodfo dtmiig th fodig s of a poti ithko stt. This mthod has appid tohymotypsi ihiito 2 (REF. 79). Th pditd stof fodig s sids mathd thos idtifidy kiti stdis80, ith a a qaitati oatioig osd t sit osatio ad -asfo fodig. This idiats th impota of a gisid to th stt of th fodig s y po-

    idig a qatitati mas of th xtt to hih asid patiipats i ati-ik itatios dig that-imitig stp i fodig. Th stdy impis that si-ds that a iod i th fodig s, ad ha impotat fo fomig th ati poti stt,ostai amio aid sstittios.

    Miyet al. dopd th osatism of os-atism piip fo aaysig otioay sigas thata spifi to a gi fod; that is, thy idtifid o-

    sd amio aid positios i famiis of potis thata sttay atd to o aoth (t ot atdy sq)81. This appoah idtifid sids thatog to th fodig s of hmotaxis potichY81. Ssqt appiatio to fi of th mostommo poti fods dmostatd that otioaypss toads fast fodig ad ftio a aso adto high osatio of sids tha xptd fomsot assiiity82. Ho, oth sahs aot a i agmt aot fast fodig ostaiig thotio of potis; fo xamp, bak ad o-oksdid ot os a oatio t osatioad xpimtay masd -as83. nthss,

    thy did os a sigifiat oatio t thotitio of idiida sq positios ad th

    tasitio stat stt amog homoogos potis,idiatig that th stt of th fodig tasitiostat sm sms to mo highy osd thath spifi itatios that staiiz it83.

    Fth stdis ha idiatd that pooy ad highyosd sids a qay iky to patiipat ith poti-fodig s, igitig fth oto-

    sy o th otio of fodig s osatio84,85.Ho, ths at stdis ofimd that th fodigs of chY is sigifiaty osd, athogh thisas th xptio i th poti data sts stdid adis phaps d to xtaodiaiy tight pakig of thfodig s i chY76. Th fodig i of sompotis otai o-ati itatios i th tasi-tio stat that, h akd, so fodig do tdo ot hag th poti staiity. This is istatd ya isay osd I i th SH3 domain, hih iskitiay t ot thmodyamiay impotat i thSH3 domai-otaiig poti Ty kias S (REF. 86).Thfo, otioay ostaits o poti sttat oth to maitai poti ahitt ad to maitaiot (ad fast) fodig.

    Maintenance of function

    A of th ostaits atd to maita of ttiaystt a timaty ftioa. Ho, mayftios a mdiatd thogh qatay it atios

    of potis ith oth maomos i assmiso ith sstats, igads o aosti gatos.Th ffts of ths ostaits a ft som distaaay fom th itatio sit t thy td to ha aiasig if a to th ogitio sit. Toistigat this, th Euclidean distance as masdt y amio aid ad th ko ftioasids ad th dg of osatio as ompadi tms of th poximity ith ftioa sids51. Thathos shod that th dg of sid osatio issigifiaty high i sids that a a to th atisit tha i thos that a fa fom it. H, gomtiadista fom ko ati sits ostitts aoth

    Figure 4 | cs ss s s. | An Asp forming hydrogen bonds to a coil region in the

    Ca2+-binding, parvalbumin-like proteins (Protein Data Bank (PDB) code5pal). b | A Ser forming hydrogen bonds to

    main-chain atoms in type I and type IV -turns in interleukin-1-like growth factor family proteins (PDB code2fgf). | An Arg forming hydrogen bonds to polyproline -helices in the alcohol dehydrogenases (polyproline interaction on theright) (PDB code 2ohxa). Representative structures were chosen for each family based on resolution; residues are coloured

    by atom type with buried polar residues shown in magenta. Hydrogen bonds are shown in grey.

    R E V I E W S

    nATure revIewS |Molecular cell Biology vOuMe 10 | OcTOber 2009 |717

    2009 Macmillan Publishers Limited. All rights reserved

    http://www.rcsb.org/pdb/explore/explore.do?structureId=5PALhttp://www.rcsb.org/pdb/explore/explore.do?structureId=5PALhttp://www.rcsb.org/pdb/explore/explore.do?structureId=2FGFhttp://www.rcsb.org/pdb/explore/explore.do?structureId=2FGFhttp://www.rcsb.org/pdb/results/results.do?outformat=http://www.rcsb.org/pdb/results/results.do?outformat=http://www.rcsb.org/pdb/explore/explore.do?structureId=2FGFhttp://www.rcsb.org/pdb/explore/explore.do?structureId=5PAL
  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    10/12

    ostait o amio aid sstittios i poti o-tio ad thfo a s as a additioa paamtto dfi th oa stta iomt i assifyigamio aid sstittio patts.

    Th impat of aios ftioa ostaits maiy dfid i tms of itatios ith othmos sh as sstats, igads, i aids adoth potis o th osatio of amio aidsi th-dimsioa stts has istigatd.Ftioa sids xdd (maskd) fom thsq aigmt, ad th dg of sid os-atio as masd y disadig th oatios offtioa sids fom th aatio of sstittiopoaiitis59. Sa maskig mods ppad ysig aios omiatios of ftioa sids ad ompad ith th o-maskig mod, hihids ftioa sids i th aatio of s-stittio poaiitis. Th aag poaiity of amioaid osatio fo th o-maskig mod as~1.36% high tha that of a maskig mod, athoghth diff as ss distit h zym ati sits

    omittd fom maskig59. Oa this shos thatftioa sids a d gat pss to o-sd thoghot th otioay poss h thya iay impotat to th atiity of potis adths of sti adatag to th ogaism.

    Mtatios that o i ftioa sids ad tooss-of-ftio of th poti ith y disptig thati stt o y itfig i th itatio ithoth mos. Ho, mtatios a somtimsompsatd y oth mtatios oig i thitatig pat mo o mos, hihis xpaid as o-adaptatio o o-otio ofitatig poti pais87,88.

    Conclusions

    w ha disssd ho stta ad ftioa fa-ts ostai th otio of potis, ith othig di y th maita of poti f-tio. Th idtifiatio of sh ostaits i potifamiis a hpf fo poti giig xpi-mts, sh as dsigig zyms ith ftioso i th ditd staiizatio of poti ofomatiosthogh sit-ditd mtagsis. udstadig shfats aso aos th idtifiatio of mms of aspfamiy ad oft th pditio of ftioayimpotat itatig gios, so poidig aaaotatio of gom sqs i tms of sttad ftio.

    I this ri ha fosd o ostaits oth sstittio of idiida amio aids. Stog o-staits ais fom th osatio of stt, ot oyfom maita of a hydophoi o ad sodaystt t aso fom id, oft hagd hydogods. Ho, ha disssd ho ostaitsaso ais fom itatios ith oth potis; ths

    a oft ompots of itatio toks that aosd thoghot otio89, so that itatigpotis a d aios ostaits sh as atiityad iftim9092. Oth fatos a aso oatdith th at of poti otio. Fo xamp, xps-sio might a impotat fato ifigotioay at9395 as highy xpssd potis aostaid to ha f mtatios tha a potisto aoid th ost of misfodig ffts. A pop d-stadig of th ostaits o amio aid sstittiosis a sstia pqisit to dstadig poti o-tio, t fth isights i dpd o itgatd admtidisipiay systms appoahs23,96.

    1. Bajaj, M. & Blundell, T. Evolution and the tertiary

    structure of proteins.Annu. Rev. Biophys. Bioeng.

    13, 453492 (1984).2. Chothia, C. & Lesk, A. M. The relation between the

    divergence of sequence and structure in proteins.

    EMBO J.5, 823826 (1986).

    This paper quantifies the relationship between

    sequence variance and structural tolerance.

    3. Kimura, M. Evolutionary rate at the molecular level.

    Nature217, 624626 (1968).

    The first paper to introduce the neutral theory of

    evolution.

    4. Ohta, T. Slightly deleterious mutant substitutions in

    evolution. Nature246, 9698 (1973).

    Introduces the nearly neutral theory of molecular

    evolution, a modification of that detailed in

    reference 3.5. Zuckerkandl, E. Evolutionary processes and

    evolutionary noise at the molecular level.I. Functional density in proteins.J. Mol. Evol.7,

    167183 (1976).

    6. Zuckerkandl, E. Evolutionary processes and

    evolutionary noise at the molecular level. II. A

    selectionist model for random fixations in proteins.

    J. Mol. Evol.7, 269311 (1976).

    7. Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C.

    & Feldman, M. W. Evolutionary rate in the protein

    interaction network. Science 296, 750752 (2002).

    8. Bloom, J. D. & Adami, C. Apparent dependence of

    protein evolutionary rate on number of interactions is

    linked to biases in protein-protein interactions data

    sets. BMC Evol. Biol.3, 21 (2003).

    9. Jordan, I. K., Wolf, Y. I. & Koonin, E. V. No simple

    dependence between protein evolution rate and the

    number of proteinprotein interactions: only the most

    prolific interactors tend to evolve slowly. BMC Evol.

    Biol.3, 1 (2003).

    10. Orengo, C. A. & Thornton, J. M. Protein families and

    their evolution a structural perspective.Annu. Rev.

    Biochem.74, 867900 (2005).11. Bullock, A. N. et al. Thermodynamic stability of wild-

    type and mutant p53 core domain. Proc. Natl Acad.

    Sci. USA94, 1433814342 (1997).

    An elegant study that applied techniques

    initially devised to study the biophysics of

    protein folding to mutations in the protein p53,

    demonstrating that most of these changes are

    destabilizing.

    12. Canadillas, J. M. et al. Solution structure of p53 core

    domain: structural basis for its instability. Proc. Natl

    Acad. Sci. USA103, 21092114 (2006).

    13. Friedler, A., Veprintsev, D. B., Hansson, L. O. &

    Fersht, A. R. Kinetic instability of p53 core domain

    mutants: implications for rescue by small molecules.

    J. Biol. Chem.278, 2410824112 (2003).

    14. Joerger, A. C., Allen, M. D. & Fersht, A. R. Crystalstructure of a superstable mutant of human p53 core

    domain. Insights into the mechanism of rescuing

    oncogenic mutations.J. Biol. Chem.279, 12911296

    (2004).

    15. Nikolova, P. V., Henckel, J., Lane, D. P. & Fersht, A. R.

    Semirational design of active tumor suppressor p53

    DNA binding domain with enhanced stability. Proc.

    Natl Acad. Sci. USA95, 1467514680 (1998).

    16. Wang, X., Minasov, G. & Shoichet, B. K. Evolution of

    an antibiotic resistance enzyme constrained by

    stability and activity trade-offs.J. Mol. Biol.320,

    8595 (2002).

    17. Aharoni, A. The evolvability of promiscuous protein

    functions. Nature Genet.37, 7376 (2005).

    An original study on the evolution of new protein

    functions that shows that the process is driven by

    mutations having little effect on native function but

    large effects on promiscuous function.

    18. Aharoni, A. et al. Directed evolution of mammalian

    paraoxonases PON1 and PON3 for bacterial

    expression and catalytic specialization. Proc. Natl

    Acad. Sci. USA101, 482 (2004).

    19. Andreeva, A. & Murzin, A. G. Evolution of protein

    fold in the presence of functional constraints.

    Curr. Opin. Struct. Biol.16, 399408 (2006).

    A review of the mechanisms by which a protein fold

    can evolve whilst maintaining the functional-site

    structure.20. Caetano-Anolls, G., Wang, M., Caetano-Anolls, D. &

    Mittenthal, J. E. The origin, evolution and structure

    of the protein world. Biochem. J.417, 621637

    (2009).

    21. Copley, R. R., Letunic, I. & Bork, P. Genome and

    protein evolution in eukaryotes. Curr. Opin. Chem.

    Biol.6, 3945 (2002).22. Kinch, L. N. & Grishin, N. V. Evolution of protein

    structures and functions. Curr. Opin. Struct. Biol.12,400408 (2002).

    23. Pal, C., Papp, B. & Lercher, M. J. An integrated view of

    protein evolution. Nature Rev. Genet.7, 337348

    (2006).

    A comprehensive review of various approaches to

    study protein evolution.

    24. Koonin, E. V. Orthologs, paralogs, and evolutionary

    genomics.Annu. Rev. Genet.39, 309338

    (2005).

    25. Hubbard, T. J. & Blundell, T. L. Comparison of

    solvent-inaccessible cores of homologous proteins:

    definitions useful for protein modelling. Protein Eng.

    1, 159171 (1987).

    26. Garnier, J., Osguthorpe, D. J. & Robson, B.

    Analysis of the accuracy and implications of simple

    methods for predicting the secondary structure of

    globular proteins.J. Mol. Biol.120, 97120

    (1978).

    R E V I E W S

    718 | OcTOber 2009 | vOuMe 10 www..m/ws/mb

    2009 Macmillan Publishers Limited. All rights reserved

  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    11/12

    27. Gibrat, J. F., Garnier, J. & Robson, B. Further

    developments of protein secondary structure

    prediction using information theory. New parameters

    and consideration of residue pairs.J. Mol. Biol.198,

    425443 (1987).

    28. Levin, J. M., Robson, B. & Garnier, J. An algorithm for

    secondary structure determination in proteins based

    on sequence similarity. FEBS Lett.205, 303 (1986).

    29. Pauling, L. & Corey, R. B. Configurations of

    polypeptide chains with favored orientations around

    single bonds: two new pleated sheets. Proc. Natl

    Acad. Sci. USA37, 729740 (1951).30. Pauling, L., Corey, R. B. & Branson, H. R. The structure

    of proteins; two hydrogen-bonded helical

    configurations of the polypeptide chain. Proc. Natl

    Acad. Sci. USA37, 205211 (1951).

    References 29 and 30 provided the first hint that

    regular secondary structure might form in folded

    proteins.

    31. Hutchinson, E. G. & Thornton, J. M. A revised set

    of potentials for -turn formation in proteins. ProteinSci.3, 22072216 (1994).

    32. Sibanda, B. L., Blundell, T. L. & Thornton, J. M.

    Conformation of-hairpins in protein structures.

    A systematic classification with applications to

    modelling by homology, electron density fitting and

    protein engineering.J. Mol. Biol.206, 759777

    (1989).

    33. Wilmot, C. M. & Thornton, J. M. Analysis and

    prediction of the different types of-turn in proteins.

    J. Mol. Biol.203, 221232 (1988).

    34. Baker, E. N. & Hubbard, R. E. Hydrogen bonding in

    globular proteins. Prog. Biophys. Mol. Biol.44,

    97179 (1984).

    The first comprehensive survey of hydrogen bonds

    in high-resolution protein structures.35. Presta, L. G. & Rose, G. D. Helix signals in proteins.

    Science240, 16321641 (1988).

    36. Richardson, J. S. & Richardson, D. C. Amino acid

    preferences for specific locations at the ends of

    helices. Science240, 16481652 (1988).

    37. Wan, W. Y. & Milner-White, E. J. A recurring

    two-hydrogen-bond motif incorporating a serine or

    threonine residue is found both at -helical N termini

    and in other situations.J. Mol. Biol.286, 16511662

    (1999).

    38. Wan, W. Y. & Milner-White, E. J. A natural grouping of

    motifs with an aspartate or asparagine residue

    forming two hydrogen bonds to residues ahead in

    sequence: their occurrence at -helical N termini and

    in other situations.J. Mol. Biol.286, 16331649

    (1999).

    39. Chan, A. W. E., Hutchinson, E. G. & Thornton, J. M.Identification, classification, and analysis of-bulges in

    proteins. Protein Sci.2, 15741590 (1993).

    40. Richardson, J. S., Getzoff, E. D. & Richardson, D. C.

    The bulge: a common small unit of nonrepetitive

    protein structure. Proc. Natl Acad. Sci. USA75,

    25742578 (1978).

    41. Barlow, D. J. & Thornton, J. M. Helix geometry in

    proteins.J. Mol. Biol.201, 601619 (1988).

    42. Eswar, N. & Ramakrishnan, C. Secondary structures

    without backbone: an analysis of backbone mimicry

    by polar side chains in protein structures. Protein Eng.

    12, 447455 (1999).

    43. Cubellis, M. V., Caillez, F., Blundell, T. L. & Lovell, S. C.

    Properties of polyproline II, a secondary structure

    element implicated in proteinprotein interactions.

    Proteins58, 880892 (2005).

    44. Stapley, B. J. & Creamer, T. P. A survey of left-handed

    polyproline II helices. Protein Sci.8, 587595 (1999).

    45. Milner-White, E., Ross, B. M., Ismail, R., Belhadj-

    Mostefa, K. & Poet, R. One type of-turn, rather thanthe other gives rise to chain-reversal in proteins.

    J. Mol. Biol.204, 777782 (1988).

    46. Milner-White, E. J. -bulges within loops as recurring

    features of protein structure. Biochim. Biophys. Acta

    911, 261265 (1987).

    47. Blundell, T. L. & Wood, S. P. Is the evolution of insulin

    Darwinian or due to selectively neutral mutation?

    Nature257, 197203 (1975).

    An early paper discussing the evolution of protein

    structure and interactions in terms of adaptive

    processes and neutral mutations.48. Guharoy, M. & Chakrabarti, P. Conservation and

    relative importance of residues across proteinprotein

    interfaces. Proc. Natl Acad. Sci. USA102,

    1544715452 (2005).

    49. Kisters-Woike, B., Vangierdegom, C. & Mueller-Hill, B.

    On the conservation of protein sequences in evolution.

    Trends Biochem. Sci.25, 419421 (2000).

    50. Lichtarge, O., Bourne, H. R. & Cohen, F. E.

    Evolutionarily conserved G binding surfaces

    support a model of the G protein-receptor complex.

    Proc. Natl Acad. Sci. USA93, 75077511 (1996).51. Chelliah, V., Chen, L., Blundell, T. L. & Lovell, S. C.

    Distinguishing structural and functional restraints in

    evolution in order to identify interaction sites.J. Mol.

    Biol.342, 14871504 (2004).

    52. Blundell, T. L. et al. in Methods in Proteins Sequence

    Analysis (eds Jornvall, H. Hoog, J.O. Gustavsson, A.M.)

    373385 (Birkhauser, Basel, 1991).

    53. Overington, J., Johnson, M. S., Sali , A. & Blundell, T. L.Tertiary structural constraints on protein evolutionary

    diversity: templates, key residues and structure

    prediction. Proc. Biol. Sci.241, 132145 (1990).

    The first study to quantify structural restraints on

    amino acid substitutions between homologous

    proteins, identifying particular patterns of

    substitution.

    54. Overington, J., Donnelly, D., Johnson, M. S., Sali, A.

    & Blundell, T. L. Environment-specific amino acid

    substitution tables: tertiary templates and prediction

    of protein folds. Protein Sci.1, 216226 (1992).

    55. Michener, C. D. & Sokal, R. R. A quantitative

    approach to a problem in classification. Evolution11,

    130 (1957).

    56. Bloom, J. D., Labthavikul, S. T., Otey, C. R. &

    Arnold, F. H. Protein stability promotes evolvability.

    Proc. Natl Acad. Sci. USA103, 58695874 (2006).

    57. Bloom, J. D. et al. Thermodynamic prediction of

    protein neutrality. Proc. Natl Acad. Sci. USA102,

    606611 (2005).58. Deane, C. M., Allen, F. H., Taylor, R. & Blundell, T. L.

    Carbonylcarbonyl interactions stabilize the partially

    allowed Ramachandran conformations of asparagine

    and aspartic acid. Protein Eng.12, 10251028 (1999).

    59. Gong, S. & Blundell, T. L. Discarding functional

    residues from the substitution table improves

    predictions of active sites within three-dimensional

    structures. PLoS Comput. Biol.4, e1000179 (2008).

    60. Schell, D., Tsai, J., Scholtz, J. M. & Pace, C. N.

    Hydrogen bonding increases packing density in the

    protein interior. Proteins63, 278282 (2006).

    61. Pace, C. N. Polar group burial contributes more to

    protein stability than nonpolar group burial.

    Biochemistry16, 310313 (2001).

    62. Fleming, P. J. & Rose, G. D. Do all backbone polar

    groups in proteins form hydrogen bonds? Protein Sci.

    14, 19111917 (2005).63. McDonald, I. K. & Thornton, J. M. Satisfying hydrogen

    bonding potential in proteins.J. Mol. Biol.238,

    777793 (1994).

    64. Worth, C. L. & Blundell, T. L. Satisfact ion of hydrogen-bonding potential influences the conservation of polar

    sidechains. Proteins75, 413429 (2009).

    65. Eswar, N. & Ramakrishnan, C. Deterministic features

    of side-chain main-chain hydrogen bonds in globular

    protein structures. Protein Eng.13, 227238 (2000).

    66. Vijayakumar, M., Qian, H. & Zhou, H. X. Hydrogen

    bonds between short polar side chains and peptide

    backbone: prevalence in proteins and effects on helix-

    forming propensities. Proteins34, 497507 (1999).

    67. Hamill, S. J., Cota, E., Chothia, C. & Clarke, J.

    Conservation of folding and stability within a protein

    family: the tyrosine corner as an evolutionary

    cul-de-sac.J. Mol. Biol.295, 641649 (2000).68. Bordo, D. & Argos, P. The role of side-chain hydrogen

    bonds in the formation and stabilization of secondary

    structure in soluble proteins.J. Mol. Biol.243,

    504519 (1994).

    69. Nicholson, H., Anderson, D. E., Dao-pin, S. &

    Matthews, B. W. Analysis of the interaction between

    charged side chains and the -helix dipole usingdesigned thermostable mutants of phage T4 lysozyme.

    Biochemistry30, 98169828 (1991).

    70. Mizuguchi, K., Deane, C. M., Blundell, T. L. &

    Overington, J. P. HOMSTRAD: a database of protein

    structure alignments for homologous families.

    Protein Sci.7, 24692471 (1998).

    71. Harper, E. T. & Rose, G. D. Hel ix stop signals in

    proteins and peptides: the capping box. Biochemistry

    32, 76057609 (1993).

    72. Serrano, L., Sancho, J., Hirshberg, M. & Fersht, A. R.-Helix stability in proteins. I. Empirical correlations

    concerning substitution of side-chains at the N and

    C-caps and the replacement of alanine by glycine or

    serine at solvent-exposed surfaces.J. Mol. Biol.227,

    544559 (1992).

    73. Burley, S. K. & Petsko, G. A. Aromaticaromatic

    interaction a mechanism of protein-structure

    stabilization. Science229, 2328 (1985).

    74. Hunter, C. A., Singh, J. & Thornton, J. M.

    PiPi-interactions the geometry and energetics of

    phenylalanine phenylalanine interactions in proteins.

    J. Mol. Biol.218, 837846 (1991).75. Burley, S. K. & Petsko, G. A. Amino-aromatic

    interactions in proteins. FEBS Lett.203, 139143

    (1986).

    76. Mitchell, J. B. O., Nandi, C. L., Mcdonald, I. K.,

    Thornton, J. M. & Price, S. L. Amino/aromatic

    interactions in proteins is the evidence stacked

    against hydrogen-bonding.J. Mol. Biol.239,

    315331 (1994).77. Gallivan, J. P. & Dougherty, D. A. Cation

    interactions in structural biology. Proc. Natl Acad. Sci.

    USA96, 94599464 (1999).

    78. Ortlund, E. A., Bridgham, J. T., Redinbo, M. R. &

    Thornton, J. W. Crystal structure of an ancient protein:

    evolution by conformational epistasis. Science317,

    15441548 (2007).

    79. Shakhnovich, E., Abkevich, V. & Ptitsyn, O. Conserved

    residues and the mechanism of protein folding.

    Nature379, 9698 (1996).

    The presentation of a novel computational method

    for identifying the residues that form the folding

    nucleus of a protein.80. Itzhaki, L. S., Otzen, D. E. & Fersht, A. R. The structure

    of the transition state for folding of chymotrypsin

    inhibitor 2 analysed by protein engineering methods:

    evidence for a nucleation-condensation mechanism for

    protein folding.J. Mol. Biol.254, 260288 (1995).

    Introduced the nucleationcondensation model

    of protein folding from experimental work in

    chymotrypsin inhibitor 2.

    81. Mirny, L. A., Abkevich, V. I. & Shakhnovich, E. I.

    How evolution makes proteins fold quickly. Proc. Natl

    Acad. Sci. USA95, 49764981 (1998).

    82. Mirny, L. A. & Shakhnovich, E. I. Universally conserved

    positions in protein folds: reading evolutionary signals

    about stability, folding kinetics and function.J. Mol.

    Biol.291, 177196 (1999).

    83. Plaxco, K. W. et al. Evolutionary conservation in

    protein folding kinetics.J. Mol. Biol.298, 303 (2000).

    84. Larson, S. M., Ruczinski, I., Davidson, A. R., Baker, D.

    & Plaxco, K. W. Residues participating in the protein

    folding nucleus do not exhibit preferential evolutionary

    conservation.J. Mol. Biol.316, 225233 (2002).

    85. Tseng, Y. Y. & Liang, J. Are residues in a protein folding

    nucleus evolutionarily conserved?J. Mol. Biol.335,

    869880 (2004).86. Li, L., Mirny, L. A. & Shakhnovich, E. I. Kinetics,

    thermodynamics and evolution of non-native

    interactions in a protein folding nucleus. Nature

    Struct. Biol.7, 336342 (2000).87. Kim, W. K., Bolser, D. M. & Park, J. H. Large-scale

    co-evolution analysis of protein structural interlogues

    using the global protein structural interactome map

    (PSIMAP). Bioinformatics20, 11381150 (2004).

    88. Pazos, F. & Valencia, A. Protein co-evolution,

    co-adaptation and interactions. EMBO J.27,

    26482655 (2008).

    89. Park, J. & Bolser, D. Conservation of protein

    interaction network in evolution. Genome Inform.12,

    135140 (2001).90. Batada, N. N., Hurst, L. D. & Tyers, M. Evolutionary

    and physiological importance of hub proteins.

    PLoS Comput. Biol.2, e88 (2006).

    91. Pal, C., Papp, B. & Hurst, L. D. Genomic function:

    rate of evolution and gene dispensability. Nature421,

    496497 (2003).

    92. Wall, D. P. et al. Functional genomic analysis of the

    rates of protein evolution. Proc. Natl Acad. Sci. USA

    102, 54835488 (2005).

    93. Choi, J. K., Kim, S. C., Seo, J., Kim, S. & Bhak, J.Impact of transcriptional properties on essentiality

    and evolutionary rate. Genetics175, 199206

    (2007).

    94. Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O.

    & Arnold, F. H. Why highly expressed proteins evolve

    slowly. Proc. Natl Acad. Sci. USA102, 1433814343

    (2005).

    This paper suggests that the expression level of a

    protein is related to the demand for exact folding.

    95. Drummond, D. A., Raval, A. & Wilke, C. O. A single

    determinant dominates the rate of yeast protein

    evolution. Mol. Biol. Evol.23, 327337 (2006).

    96. Zeldovich, K. B. & Shakhnovich, E. I. Understanding

    protein evolution: from protein physics to Darwinian

    selection.Annu. Rev. Phys. Chem.59, 105127

    (2008).

    97. Akashi, H. Gene expression and molecular evolution.

    Curr. Opin. Genet. Dev.11, 660666 (2001).

    R E V I E W S

    nATure revIewS |Molecular cell Biology vOuMe 10 | OcTOber 2009 |719

    2009 Macmillan Publishers Limited. All rights reserved

  • 7/27/2019 Worth C.L. 2009. Structural and Functional Constraints in the Evolution of Protein Families

    12/12

    98. Drummond, D. A. & Wilke, C. O. Mistranslation-

    induced protein misfolding as a dominant constraint

    on coding-sequence evolution. Cell134, 341352

    (2008).

    99. Hamill, S. J., Steward, A. & Clarke, J. The folding of an

    immunoglobulin-like Greek key protein is defined by a

    common-core nucleus and regions constrained by

    topology.J. Mol. Biol.297, 165 (2000).

    100. Chiti, F. & Dobson, C. M. Protein misfolding, functional

    amyloid, and human disease.Annu. Rev. Biochem.75,

    333366 (2006).

    101. Hamada, D. et al. Competition between folding,native-state dimerisation and amyloid aggregation in

    -lactoglobulin.J. Mol. Biol.386, 878890 (2009).

    102. Goldberg, A. L. Protein degradation and protection

    against misfolded or damaged proteins. Nature426,

    895899 (2003).

    103. Wolffe, A. P. & Matzke, M. A. Epigenetics: regulation

    through repression. Science286, 481486 (1999).

    104. Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C.

    SCOP: a structural classification of proteins database

    for the investigation of sequences and structures.

    J. Mol. Biol.247, 536540 (1995).

    Details the first protein hierarchical classification

    scheme.

    105. Orengo, C. A. et al. CATHa hierarchic classification of

    protein domain structures. Structure5, 10931108

    (1997).

    106. Bhaduri, A., Pugalenthi, G. & Sowdhamini, R. PASS2:

    an automated database of protein alignments

    organised as structural superfamilies. BMC

    Bioinformatics5, 35 (2004).

    107. Worth, C. L. et al. A structural bioinformatics approach

    to the analysis of nonsynonymous single nucleotide

    polymorphisms (nsSNPs) and their relation to disease.

    J. Bioinform. Comput. Biol.5, 12971318 (2007).

    108. Holm, L., Kaariainen, S., Rosenstrom, P. & Schenkel, A.

    Searching protein structure databases with DaliLite

    v.3. Bioinformatics24, 2780 (2008).

    109. Shindyalov, I. N. & Bourne, P. E. Protein structure

    alignment by incremental combinatorial extension (CE)

    of the optimal path. Protein Eng.11, 739747

    (1998).

    110. Marchler-Bauer, A. et al. MMDB: Entrezs 3D

    structure database. Nucleic Acids Res.27, 240243

    (1999).

    111. Finn, R. D. et al. The Pfam protein families database.

    Nucleic Acids Res.36, D281D288 (2008).

    112. Hunter, S. et al. InterPro: the integrative protein

    signature database. Nucleic Acids Res.37,

    D211D215 (2009).

    113. Hulo, N. et al. The PROSITE database. Nucleic Acids

    Res.34, D227D230 (2006).114. Attwood, T. K. et al. PRINTS and its automatic

    supplement, prePRINTS. Nucleic Acids Res.31,

    400402 (2003).

    115. Servant, F. et al. ProDom: automated clustering

    of homologous domains. Brief. Bioinformatics3,

    246251 (2002).

    116. Schultz, J. , Milpetz, F., Bork, P. & Ponting, C. P.

    SMART, a simple modular architecture research tool:

    identification of signaling domains. Proc. Natl Acad.

    Sci. USA95, 58575864 (1998).117. Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs

    database of protein families. Nucleic Acids Res.31,

    371373 (2003).

    118. Buchan, D. W. et al. Gene3D: structural assignments

    for the biologist and bioinformaticist alike. Nucleic

    Acids Res.31, 469473 (2003).

    119. Wilson, D. et al. SUPERFAMILY sophisticated

    comparative genomics, data mining, visualization and

    phylogeny. Nucleic Acids Res.37, D380D386

    (2009).

    120. Krishnamurthy, N., Brown, D., Kirshner, D. &

    Sjolander, K. PhyloFacts: an online structural

    phylogenomic encyclopedia for protein functional

    and structural classification. Genome Biol.7, R83

    (2006).

    121. Marchler-Bauer, A. et al. CDD: a conserved domain

    database for interactive domain family analysis.

    Nucleic Acids Res.35, D237D240 (2007).

    122. Heger, A. et al. PairsDB atlas of protein sequence

    space. Nucleic Acids Res.36, D276D280 (2008).

    123. Orengo, C. A., Still toe, I., Reeves, G. & Pearl, F. M. G.

    What can structural classifications reveal about

    protein evolution?J. Struct. Biol.134, 145165

    (2001).

    124. Mizuguchi, K., Deane, C. M., Blundell, T. L.,

    Johnson, M. S. & Overington, J. P. JOY: protein

    sequence-structure representation and analysis.

    Bioinformatics14, 617623 (1998).125. Dayhoff, M. O. & Eck, R. V. inAtlas of Protein

    Sequence and Structure19671968 3345

    (National Biomedical Research Foundation, Silver

    Spring, Maryland, 1968).

    126. Henikoff, S. & Henikoff, J. G. Amino acid substitution

    matrices from protein blocks. Proc. Natl Acad. Sci.

    USA89, 1091510919 (1992).127. Lee, S. & Blundell, T. L. Ulla: a program for

    calculating environment-specific amino acid

    substitution tables. Bioinformatics25, 19761977

    (2009).

    128. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL):

    an online tool for phylogenetic tree display and

    annotation. Bioinformatics23, 127128 (2007).

    AcknowledgementsC.L.W. was funded by a Biotechnology and Biological Sciences

    Research Council studentship. S.G. was supported by the BiO

    foundation. T.L.B. is funded by the Wellcome Trust.

    DATABASES

    PDB:http://www.rcsb.org/pdb/home/home.do1gk8 | 1m2z | 1tme|2fgf|2ohxa|3app|5pal

    FURTHER INFORMATIONThe Blundell groups homepage: http://www-cryst.bioc.

    cam.ac.uk/

    CATH:http://www.cathdb.info/

    CDD:http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.

    shtml

    CE:http://cl.sdsc.edu

    Dali:http://ekhidna.biocenter.helsinki.fi/dali/start

    ESSTs: http://www-cryst.bioc.cam.ac.uk/ESST

    Gene3D:http://gene3d.biochem.ucl.ac.uk/Gene3D/

    HOMSTRAD:http://www-cryst.bioc.cam.ac.uk/~homstrad

    InterPro:http://www.ebi.ac.uk/interpro

    JOY: http://www-cryst.bioc.cam.ac.uk/joy

    MMDB:http://www.ncbi.nlm.nih.gov/Structure/MMDB/

    mmdb.shtml

    PairsDB:http://pairsdb.csc.fi

    PASS2:http://caps.ncbs.res.in/campass/pass2.html

    Pfam:http://pfam.sanger.ac.uk

    PhyloFacts:http://phylogenomics.berkeley.edu/phylofacts

    Prints:http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/

    ProDom:http://prodom.prabi.fr/prodom/current/html/

    home.php

    PROSITE:http://www.expasy.ch/prosite

    PyMol:http://www.pymol.org

    SCOP:http://scop.mrc-lmb.cam.ac.uk/scop

    SMART:http://smart.embl-heidelberg.de

    Superfamily:http://supfam.cs.bris.ac.uk/SUPERFAMILY

    TIFRFAMs:http://www.jcvi.org/cms/research/projects/

    tigrfams/overview/

    Toccata:http://www-cryst.bioc.cam.ac.uk/toccata/toccata.

    php

    Ulla:http://www-cryst.bioc.cam.ac.uk/ulla

    SUPPLEMENTARY INFORMATIONSee online article:S1 (figure) |S2 (table) | S3(table)

    all linkS are active in the online pdf

    R E V I E W S

    720 | OcTOber 2009 | vOuMe 10 / / b

    http://www.rcsb.org/pdb/home/home.dohttp://www.rcsb.org/pdb/home/home.dohttp://www.rcsb.org/pdb/explore/explore.do?structureId=1GK8http://www.rcsb.org/pdb/explore/explore.do?structureId=1M2Zhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1TMEhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1TMEhttp://www.rcsb.org/pdb/explore/explore.do?structureId=2FGFhttp://www.rcsb.org/pdb/explore/explore.do?structureId=2FGFhttp://www.rcsb.org/pdb/results/results.do?outformat=http://www.rcsb.org/pdb/results/results.do?outformat=http://www.rcsb.org/pdb/results/results.do?outformat=http://www.rcsb.org/pdb/explore/explore.do?structureId=3APPhttp://www.rcsb.org/pdb/explore/explore.do?structureId=3APPhttp://www.rcsb.org/pdb/explore/explore.do?structureId=3APPhttp://www.rcsb.org/pdb/explore/explore.do?structureId=5PALhttp://www.rcsb.org/pdb/explore/explore.do?structureId=5PALhttp://www-cryst.bioc.cam.ac.uk/http://www-cryst.bioc.cam.ac.uk/http://www.cathdb.info/http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://cl.sdsc.edu/http://cl.sdsc.edu/http://ekhidna.biocenter.helsinki.fi/dali/starthttp://ekhidna.biocenter.helsinki.fi/dali/starthttp://www-cryst.bioc.cam.ac.uk/ESSThttp://gene3d.biochem.ucl.ac.uk/Gene3D/http://gene3d.biochem.ucl.ac.uk/Gene3D/http://www-cryst.bioc.cam.ac.uk/~homstradhttp://www.ebi.ac.uk/interprohttp://www.ebi.ac.uk/interprohttp://www-cryst.bioc.cam.ac.uk/joyhttp://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtmlhttp://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtmlhttp://pairsdb.csc.fi/http://caps.ncbs.res.in/campass/pass2.htmlhttp://caps.ncbs.res.in/campass/pass2.htmlhttp://pfam.sanger.ac.uk/http://pfam.sanger.ac.uk/http://phylogenomics.berkeley.edu/phylofactshttp://www.bioinf.man.ac.uk/dbbrowser/PRINTS/http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/http://prodom.prabi.fr/prodom/current/html/home.phphttp://prodom.prabi.fr/prodom/current/html/home.phphttp://prodom.prabi.fr/prodom/current/html/home.phphttp://www.expasy.ch/prositehttp://www.pymol.org/http://www.pymol.org/http://scop.mrc-lmb.cam.ac.uk/scophttp://scop.mrc-lmb.cam.ac.uk/scophttp://smart.embl-heidelberg.de/http://smart.embl-heidelberg.de/http://supfam.cs.bris.ac.uk/SUPERFAMILYhttp://supfam.cs.bris.ac.uk/SUPERFAMILYhttp://www.jcvi.org/cms/research/projects/tigrfams/overview/http://www.jcvi.org/cms/research/projects/tigrfams/overview/http://www-cryst.bioc.cam.ac.uk/toccata/toccata.phphttp://www-cryst.bioc.cam.ac.uk/toccata/toccata.phphttp://www-cryst.bioc.cam.ac.uk/ullahttp://www-cryst.bioc.cam.ac.uk/ullahttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/vaop/ncurrent/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/vaop/ncurrent/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/vaop/ncurrent/suppinfo/nrm2762.htmlhttp://www.nature.com/nrm/journal/v10/n10/suppinfo/nrm2762.htmlhttp://www-cryst.bioc.cam.ac.uk/ullahttp://www-cryst.bioc.cam.ac.uk/toccata/toccata.phphttp://www-cryst.bioc.cam.ac.uk/toccata/toccata.phphttp://www.jcvi.org/cms/research/projects/tigrfams/overview/http://www.jcvi.org/cms/research/projects/tigrfams/overview/http://supfam.cs.bris.ac.uk/SUPERFAMILYhttp://smart.embl-heidelberg.de/http://scop.mrc-lmb.cam.ac.uk/scophttp://www.pymol.org/http://www.expasy.ch/prositehttp://prodom.prabi.fr/prodom/current/html/home.phphttp://prodom.prabi.fr/prodom/current/html/home.phphttp://www.bioinf.man.ac.uk/dbbrowser/PRINTS/http://phylogenomics.berkeley.edu/phylofactshttp://pfam.sanger.ac.uk/http://caps.ncbs.res.in/campass/pass2.htmlhttp://pairsdb.csc.fi/http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtmlhttp://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtmlhttp://www-cryst.bioc.cam.ac.uk/joyhttp://www.ebi.ac.uk/interprohttp://www-cryst.bioc.cam.ac.uk/~homstradhttp://gene3d.biochem.ucl.ac.uk/Gene3D/http://www-cryst.bioc.cam.ac.uk/ESSThttp://ekhidna.biocenter.helsinki.fi/dali/starthttp://cl.sdsc.edu/http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://www.cathdb.info/http://www-cryst.bioc.cam.ac.uk/http://www-cryst.bioc.cam.ac.uk/http://www.rcsb.org/pdb/explore/explore.do?structureId=5PALhttp://www.rcsb.org/pdb/explore/explore.do?structureId=3APPhttp://www.rcsb.org/pdb/results/results.do?outformat=http://www.rcsb.org/pdb/explore/explore.do?structureId=2FGFhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1TMEhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1M2Zhttp://www.rcsb.org/pdb/explore/explore.do?structureId=1GK8http://www.rcsb.org/pdb/home/home.do