a comprehensive resource for integrating and displaying protein ptms

Upload: amit007thechamp

Post on 30-May-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    1/20

    This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formattedPDF and full text (HTML) versions will be made available soon.

    A comprehensive resource for integrating and displaying proteinpost-translational modifications

    BMC Research Notes 2009, 2:111 doi:10.1186/1756-0500-2-111

    Tzong-Yi Lee ([email protected])Justin Bo-Kai Hsu ([email protected])

    Wen-Chi Chang ([email protected])Ting-Yuan Wang ([email protected])

    Po-Chiang Hsu ([email protected])Hsien-Da Huang ([email protected])

    ISSN 1756-0500

    Article type Data Note

    Submission date 18 November 2008

    Acceptance date 23 June 2009

    Publication date 23 June 2009

    Article URL http://www.biomedcentral.com/1756-0500/2/111

    This peer-reviewed article was published immediately upon acceptance. It can be downloaded,printed and distributed freely for any purposes (see copyright notice below).

    Articles in BMC Research Notes are listed in PubMed and archived at PubMed Central.

    For information about publishing your research in BMC Research Notes or any BioMed Centraljournal, go to

    http://www.biomedcentral.com/info/instructions/

    BMC Research Notes

    2009 Lee et al. , licensee BioMed Central Ltd.This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),

    which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://www.biomedcentral.com/1756-0500/2/111http://www.biomedcentral.com/info/instructions/http://creativecommons.org/licenses/by/2.0http://creativecommons.org/licenses/by/2.0http://www.biomedcentral.com/info/instructions/http://www.biomedcentral.com/1756-0500/2/111mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    2/20

    pp.1

    Acomprehensiveresourceforintegratinganddisplayingprotein

    post-translationalmodifications

    Tzong-YiLee1,JustinBo-KaiHsu1,Wen-ChiChang1,Ting-YuanWang1,Po-ChiangHsu2,andHsien-DaHuang

    1,3,4,

    1DepartmentofBiologicalScienceandTechnology,InstituteofBioinformaticsandSystems

    Biology,NationalChiaoTungUniversity,Hsin-Chu300,Taiwan2DepartmentofBiologicalScienceandTechnology,InstituteofBiochemicalEngineering,

    NationalChiaoTungUniversity,Hsin-Chu300,Taiwan3DepartmentofBiologicalScienceandTechnology,NationalChiaoTungUniversity,

    Hsin-Chu300,Taiwan4CoreFacilityforStructuralBioinformatics,NationalChiaoTungUniversity,Hsin-Chu300,

    Taiwan

    Correspondingauthor

    Emailaddresses:

    TYL: [email protected]

    JBKH:[email protected]

    WCC: [email protected]

    TYW: [email protected]

    PCH: [email protected]

    HDH: [email protected]

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    3/20

    pp.2

    Abstract

    Background

    ProteinPost-TranslationalModification (PTM)playsanessentialrole incellularcontrolmechanisms thatadjust

    protein physicalandchemicalproperties,folding, conformation, stabilityandactivity, thusalsoaltering protein

    function.

    Findings

    dbPTM (version 1.0), which was developed previously, aimed on a comprehensive collection of protein

    post-translational modifications. In thisupdate version (dbPTM2.0),wedevelopeda PTMdatabase towards an

    expertsystemofproteinpost-translationalmodifications.Thedatabasecomprehensivelycollectsexperimentaland

    predictiveproteinPTMsites.Inaddition,dbPTM2.0wasextendedtoa knowledgebasecomprisingthemodified

    sites,solventaccessibilityofsubstrate,proteinsecondaryandtertiarystructures,proteindomains,proteinintrinsic

    disorderregion,andproteinvariations.Moreover,thisworkcompilesabenchmarktoconstructevaluationdatasets

    forcomputationalstudytoidentifyingPTMsites,suchasphosphorylatedsites,glycosylatedsites,acetylatedsites

    andmethylatedsites.

    Conclusions

    The current release not only provides the sequence-based information, but also annotates the structure-based

    informationforproteinpost-translationalmodification.Theinterfaceisalsodesignedtofacilitatetheaccesstothe

    resource. Thiseffectivedatabaseisnowfreelyaccessibleathttp://dbPTM.mbc.nctu.edu.tw/.

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    4/20

    pp.3

    Findings

    Background

    Protein Post-Translational Modification (PTM) plays a critical role in cellular control mechanism, including

    phosphorylation for signal transduction, attachment of fatty acids for membrane anchoring and association,

    glycosylation for changing protein half-life, targeting substrates, and promoting cell-cell and cell-matrix

    interactions, and acetylation and methylation of histone for gene regulation [1]. Several databases collecting

    information about protein modifications have been established through high-throughput mass spectrometry in

    proteomics. UniProtKB/Swiss-Prot [2] collects many protein modification information with annotation and

    structure. Phospho.ELM [3], PhosphoSite [4] and Phosphorylation Site Database [5] were developed for

    accumulatingexperimentallyverifiedphosphorylationsites.PHOSIDA[6]integratesthousandsofhigh-confidence

    invivophosphorylationsitesidentifiedbymassspectrometry-basedproteomicsinvariousspecies.Phospho3D[7]

    isadatabaseof3Dstructuresofphosphorylationsites,whichstoresinformationretrievedfromthephospho.ELM

    databaseandisenrichedwithstructuralinformationandannotationsattheresiduelevel.O-GLYCBASE[8]isa

    databaseofglycoproteins,mostofwhichincludeexperimentallyverifiedO-linkedglycosylationsites.UbiProt[9]

    stores experimental ubiquitylated proteins and ubiquitylation sites, which are implicated in protein degradation

    throughanintracellularATP-dependentproteolyticsystem.Moreover,theRESIDproteinmodificationdatabaseisa

    comprehensivecollectionofannotationsandstructuresforproteinmodificationsandcross-links,includingpre-,co-,

    andpost-translationalmodifications[10].

    dbPTM [11] was developed previously to integrate several databases to accumulate known protein

    modifications,aswellastheputativeproteinmodificationspredictedbyaseriesofaccuratelycomputationaltools

    [12,13].ThisupdatedversionofdbPTMwasenhancedtobecomeaknowledgebaseforproteinpost-translational

    modifications,whichcomprises a variety of new features including the modified sites, solvent accessibility of

    substrate, protein secondary and tertiary structures, protein domains and protein variations. We also collected

    literature related to PTM, protein conservations and the specificity of substrate site. Especially for protein

    phosphorylation,thesite-specificinteractionsbetweencatalytickinasesandsubstratesareprovided.Furthermore,a

    variety of prediction tools have been developed for more than ten PTM types [14], such as phosphorylation,

    glycosylation,acetylation,methylation,sulfationandsumoylation.Thisworkconstructedabenchmarkdatasetfor

    computationalstudiesofproteinpost-translationalmodification.Thebenchmarkdatasetcanprovideastandardfor

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    5/20

    pp.4

    measuring the performance of prediction tools that have been presented for identifying post-translational

    modificationsitesofproteins.ThewebinterfaceofdbPTMisalsoredesignedandenhancedtofacilitatetheaccess

    totheproposedresource.

    Dataconstructionandcontent

    As shown in Figure 1, the system architecture of dbPTM2.0 database comprises threemajor components: the

    integrationofexternalPTMdatabases,thecomputationalidentificationofPTMs,andthestructuralandfunctional

    annotations of PTMs. We integrated five PTM databases, including UniProtKB/Swiss-Prot (release 55.0) [1],

    Phospho.ELM (version 7.0) [15], O-GLYCBASE (version 6.0) [8], UbiProt (version 1.0) [9] and PHOSIDA

    (version 1.0) [6] for obtaining experimental protein modifications. The description and data statistics of these

    databasesarebrieflygiveninTableS1(seeAdditionalfile1 -TableS1).Additionally,HumanProteinReference

    Database(HPRD)[16],whichcompilesinvaluableinformationrelevanttofunctionsandPTMsofhumanproteins

    inhealthanddisease,wasalsointegrated.

    In the part ofcomputational identificationofPTMs,KinasePhos-like method [11-13,17]wasapplied for

    identifying20typesofPTM,whichcontainatleast30experimentallyverifiedPTMsites.Thedetailedprocessing

    flowofKinasePhos-likemethodsisdisplayedinFigureS1(SeeAdditionalfile1-FigureS1).Thelearnedmodels

    were evaluated using k-fold cross validation. Table S2 (See Additional file 1 - Table S2) lists the predictive

    performanceofthesemodels.Toreducethenumberoffalsepositivepredictions,thepredictiveparameterswereset

    toensureamaximalofpredictivespecificity.

    ThestatisticsoftheexperimentalPTMsitesandputativePTMsitesinthisintegralPTMdatabaseisgivenin

    Table1.AfterremovingtheredundantPTMsitesamongsixdatabases,therearetotally45833experimentalPTM

    sitesin thisupdateversion.AllexperimentalPTMsitesarefurthercategorizedbyPTMtypes.For instance,there

    are31,363experimentalphosphorylationsitesand2,080experimentalacetylationsitesinthedatabase.Inaddition

    totheexperimentalPTMsites,UniProtKB/Swiss-ProtprovidesputativePTMsitesbyusingsequencesimilarityor

    evolutionary potential. Moreover, KinasePhos-like methods [11-13, 17] were adopted to construct the profile

    hiddenMarkovmodels(HMMs) for twentytypes ofPTMs.Thesemodelswereappliedto identifythe potential

    PTMsitesagainstproteinsequencesobtainedfromUniProtKB/Swiss-Prot.AsgiveninTable1,2,560,047sitesfor

    allPTMtypeswere identified.The structural andfunctionalannotations ofproteinmodificationswereobtained

    fromUniProtKB/Swiss-Prot[18],InterPro[19],ProteinDataBank[20]andRESID[10](SeeAdditionalfile1-

    TableS3).

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    6/20

    pp.5

    Utilityandmajorimprovements

    Inorder toprovidemore effective informationaboutprotein modifications inthisupdate version,weextended

    dbPTMto aknowledge base containing structuralproperties forPTM sites,PTMrelated literature,evolutionary

    conservationofPTMsites,subcellularlocalizationofmodifiedproteinsand thebenchmarksetfor computational

    studies.Table2showstheenhancementandnewfeaturessupportedinthisstudy.Firstofall,theintegratedPTM

    resourceismorecomprehensivethanpreviousdbPTM,whichenrichesthePTMtypes,varyingfrom373to431

    PTMtypes.TodetectthepotentialPTMsitesinUniProtKB/Swiss-ProtproteinswithoutanyPTMannotations,the

    KinasePhos-like method was applied to 20 PTM types. Especially in protein phosphorylation, more than 60

    kinase-specificpredictionmodelswereconstructedandappliedtoidentifythephosphorylationsiteswithcatalytic

    kinases.

    StructuralpropertiesofPTMsites

    In order to facilitate the investigation of structural characteristics surrounding the PTM sites, protein tertiary

    structure obtained from ProteinDataBank [20] was graphicallypresentedbyJmolprogram.Forproteinswith

    tertiary structures (5% of UniProtKB/Swiss-Prot proteins), the protein structural properties, such as solvent

    accessibility and secondary structure of residues, were calculated by DSSP [21]. The solvent accessibility of

    residuesandsecondarystructureofresiduesforproteinswithouttertiarystructureswerepredictedbyRVP-net[22]

    andPSIPRED[23],respectively.TheintrinsicdisorderregionswereprovidedusingDisopred2[24].

    Figure 2 depicts an illustrative example that Insulin Receptor Substrate 1 (IRS1) of human

    (UniProtKB/Swiss-ProtID: IRS1_HUMAN)can interactwithInsulinReceptor (INSR)andinvolveintheinsulin

    signaling pathway [25]. Three fragments of ISR1 protein have tertiary structures inPDB. Structure 1K3A the

    protein region from 891AA to902AA.Two experimental phosphorylation sites S892 andY896 locate inthe

    region, and their solvent accessibility and secondary structure can be derived from the tertiary structures.The

    solventaccessibilityandsecondarystructureinotherproteinregionswithouttertiarystructureswerecalculatedby

    theintegratedprograms,RVP-netandPSIPRED,respectively.

    Annotationofcatalytickinasesofproteinphosphorylationsites

    In addition to the experimental annotations of catalytic kinases of protein phosphorylation, we applied

    KinasePhos-likepredictionmethod[11-13,17]foridentifying20typesofPTM.Figure2givesanexamplethatthe

    experimentalphosphorylationsiteS892of IRS1waspredictedtobecatalyzedbyproteinkinaseMAPKandCDK

    withthe preferenceof prolineoccurredon position-2and+1 surrounding thephosphorylation site(position0).

    Besides,Y896is predictedtobe catalyzed bykinaseIGF1R, theresultis consistentwithprevious investigation

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    7/20

    pp.6

    [26]. Moreover, S892 is a protein variation site, which was mapped to a non-synonymous single nucleotide

    polymorphism(SNP),basedontheannotationobtainedfromdbSNP[27].

    EvolutionaryconservationofPTMsites

    InordertodeterminewhetheraPTMsitesisconservedamongorthologousproteinsequences,weintegratedthe

    databaseofClustersofOrthologousGroups(COGs)[28],whichcollected4873COGsin66unicellulargenomes

    and4852clustersofeukaryoticorthologousgroups(KOGs)in7eukaryoticgenomes.ClustalW[29]programwas

    adopted to implement the alignment of multiple protein sequences in each cluster, and the aligned profile is

    providedintheresource.Anexperimentallyverifiedacetyllysinelocatedinaprotein-conservedregionindicatesan

    evolutionaryinfluenceinwhichorthologoussitesinotherspeciescouldbeinvolvedinthesametypeofPTM(See

    Additional file 1 - Figure S2). Furthermore, as the example shown in Figure 2, two experimentally verified

    phosphorylationsitesareconserved.

    PTMbenchmarkdatasetforbioinformaticsstudy

    Duetothehigh-throughputofmassspectrometryinproteomics,theexperimentalsubstratesequencesofmorethan

    tenPTMtypes,suchasphosphorylation,glycosylation,acetylation,methylation,sulfationandsumoylation,were

    investigatedandusedfordevelopingthepredictiontools[14].Tounderstandthepredictiveperformanceof these

    toolspreviously developed, it iscrucial to have a common standard for evaluating thepredictive performance

    amongvariouspredictiontools.Therefore,weconstructedabenchmark,whichcomprisetheexperimentalsubstrate

    sequencesforeachPTMtype.

    TheprocesstocompiletheevaluationsetsisdescribedinFigureS3(SeeAdditionalfile1-FigureS3),based

    oncriteriadevelopedbyChenetal.[30].Toremovetheredundancy,theproteinsequencescontainingthesame

    typeofPTMsitesaregroupedbyathresholdof30%identitybyBLASTCLUST[31].Iftheidentityoftwoprotein

    sequencesisgreaterthan30%,were-alignedthefragmentsequencesofthesubstratesbyBL2SEQ.Ifthefragment

    sequencesoftwo substrates with the same locationare identical, only one ofthe substratewas includedin the

    benchmarkdataset.Therefore,twentyPTMtypescontainingmorethan30experimentalsiteswerecompliedinthe

    benchmarkdataset.

    Enhancedwebinterface

    Auser-friendlywebinterfaceisprovidedforsimplesearching,browsing,anddownloadingofproteinPTMdata.In

    additiontothedatabasequerybytheproteinname,genename,UniProtKB/Swiss-ProtIDoraccession,itallows

    the input of protein sequences for similarity search against UniProtKB/Swiss-Prot protein sequences (See

    Additionalfile1-FigureS4).ToprovideanoverviewofPTMtypesandtheirmodifiedresidues,asummarytable

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    8/20

    pp.7

    isprovidedforbrowsingtheinformationandtheannotationsaboutthepost-translationalmodificationtypes,which

    are referred to theUniProtKB/Swiss-Prot PTM list (http://www.expasy.org/cgi-bin/lists?ptmlist.txt) and RESID

    [10].

    Figure 3 shows an example that users can choose the acetylation of lysine (K) to obtain more detailed

    informationsuchasthepositionofmodifiedaminoacid,thelocationof themodificationinproteinsequence,the

    modifiedchemicalformula,themassdifference,andthesubstratesitespecificity,whichisthepreferenceofamino

    acidssurroundingthemodificationsites.Furthermore,thestructuralinformation,suchassolventaccessibilityand

    secondarystructuresurroundingthemodifiedsites,areprovided.AlltheexperimentalPTMsitesandputativePTM

    sitescanbedownloadedfromthewebinterface.

    Conclusions The proposed server enables both wet-lab biologists and bioinformatics researchers to easily explore the

    informationabout protein post-translational modifications. This study not only accumulates the experimentally

    verifiedPTMsiteswithrelevantliteraturereferences,butalsocomputationallyannotatestwentytypesofPTMsites

    against UniProtKB/Swiss-Prot proteins. As given in Table 2, the proposed knowledge base provides effective

    informationof proteinPTMs,including sequenceconservation, subcellular localization andsubstratespecificity,

    theaveragesolventaccessibilityandthesecondarystructuresurroundingthemodifiedsite.Moreover,weconstruct

    aPTMbenchmarkdatasetthatcanbeadoptedforcomputationalstudiesinevaluatingthepredictiveperformance

    of various tools about determining PTM sites. Previous investigations have indicated that many protein

    modificationscausebindingdomainsforspecificprotein-proteininteractiontoregulatecellularbehavior[32].All

    the experimental PTM sites and putative PTM sites are available and downloadable in the web interface.

    ProspectiveworkofdbPTMistointegrateprotein-proteininteractiondata.

    Availabilityandrequirements

    Projectname:dbPTM2.0:AKnowledgeBaseforProteinPost-TranslationalModifications

    ASMDprojecthomepage:http://dbPTM.mbc.nctu.edu.tw/

    Operatingsystem(s):Platform-independent

    ProgrammingLanguage:PHPPerl

    Otherrequirements:amodernwebbrowser(withCSSandJavaScriptsupport)

    Restrictionstousebynon-academics:None

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    9/20

    pp.8

    Listofabbreviations

    PTM:Post-TranslationalModification;HMMs:hiddenMarkovmodels;PDB:ProteinDataBank;SNP:single

    nucleotidepolymorphism.

    Competinginterests

    Theauthorsdeclarethattheyhavenocompetinginterests.

    Authors'contributions

    HDHconceptualizedtheproject.TYLandHDHdesignedandbuiltthedatabase.TYL,PCHandWCCperformed

    dataanalysis.TYLandJBKHdesignedandbuilttheinterfaces.TYL,JBKHandTYWcompiledapreviousversion

    ofthedatabase.HDH,TYLandWCCwrotethedraft.Allauthorstestedthedatabaseandinterfaces.Allauthors

    readandapprovedthefinalmanuscript.

    Acknowledgements

    TheauthorswouldliketothanktheNationalScienceCounciloftheRepublicofChinaforfinanciallysupporting

    thisresearchundercontractNo.NSC95-2311-B-009-004-MY3andNSC97-2627-B-009-007.

    Special thanks for

    financialsupportfromtheNationalResearchProgramforGenomicMedicine(NRPGM),Taiwan.Thisworkwas

    alsopartiallysupportedbyMOEATU.Funding topaytheOpenAccesspublicationcharges forthis articlewas

    providedbyNationalScienceCounciloftheRepublicofChinaandMOEATU.

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    10/20

    pp.9

    References

    1. Farriol-MathisN,GaravelliJS,BoeckmannB,DuvaudS,GasteigerE,GateauA,VeutheyAL,BairochA:

    Annotationofpost-translationalmodifications intheSwiss-Protknowledgebase.Proteomics 2004,

    4(6):1537-1550.

    2. BoeckmannB,BairochA,ApweilerR,BlatterMC,EstreicherA,GasteigerE,MartinMJ,MichoudK,

    O'DonovanC,PhanIetal:TheSWISS-PROTproteinknowledgebaseanditssupplementTrEMBLin

    2003.NucleicAcidsRes2003,31(1):365-370.

    3. Diella F, Gould CM, Chica C, Via A, Gibson TJ: Phospho.ELM: a database of phosphorylation

    sites--update2008.NucleicAcidsRes2008,36(Databaseissue):D240-244.

    4. HornbeckPV,ChabraI,KornhauserJM,SkrzypekE,ZhangB:PhosphoSite:Abioinformaticsresourcededicatedtophysiologicalproteinphosphorylation.Proteomics2004,4(6):1551-1561.

    5. Wurgler-Murphy SM,King DM, Kennelly PJ: The Phosphorylation Site Database: A guide to the

    serine-, threonine-, and/or tyrosine-phosphorylatedproteins inprokaryotic organisms.Proteomics

    2004,4(6):1562-1570.

    6. Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M: PHOSIDA (phosphorylation site

    database):management,structuralandevolutionaryinvestigation,andpredictionofphosphosites .

    GenomeBiol2007,8(11):R250.

    7. Zanzoni A, Ausiello G, Via A, Gherardini PF, Helmer-Citterich M: Phospho3D: a database of

    three-dimensional structures of proteinphosphorylation sites.NucleicAcids Res2007,35(Database

    issue):D229-231.

    8. GuptaR,BirchH,RapackiK,BrunakS,HansenJE:O-GLYCBASEversion4.0:areviseddatabaseof

    O-glycosylatedproteins.NucleicAcidsRes1999,27(1):370-372.

    9. Chernorudskiy AL, Garcia A, Eremin EV, Shorina AS, Kondratieva EV, Gainullin MR:UbiProt: a

    databaseofubiquitylatedproteins.BMCBioinformatics2007,8:126.

    10. Garavelli JS: The RESID Database of ProteinModifications as a resource and annotation tool.

    Proteomics2004,4(6):1527-1533.

    11. LeeTY,HuangHD,HungJH,HuangHY,YangYS,WangTH:dbPTM:aninformationrepositoryof

    proteinpost-translationalmodification.NucleicAcidsRes2006,34(Databaseissue):D622-627.

    12. HuangHD,LeeTY,TzengSW,WuLC,HorngJT,TsouAP,HuangKT:IncorporatinghiddenMarkov

    models for identifying protein kinase-specific phosphorylation sites. J Comput Chem 2005,

    26(10):1032-1041.

    13. Huang HD, Lee TY, Tzeng SW, Horng JT: KinasePhos: a web tool for identifying protein

    kinase-specificphosphorylationsites.NucleicAcidsRes2005,33(WebServerissue):W226-229.

    14. Zhou F, Xue Y, Yao X, Xu Y: A general user interface for prediction servers of proteins'

    post-translationalmodificationsites.NatProtoc2006,1(3):1318-1321.

    15. DiellaF,CameronS,GemundC,LindingR,ViaA,KusterB,Sicheritz-PontenT,BlomN,GibsonTJ:

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    11/20

    pp.10

    Phospho.ELM:adatabaseof experimentallyverifiedphosphorylationsites ineukaryoticproteins.

    BMCBioinformatics2004,5(1):79.

    16. MishraGR,SureshM,KumaranK,KannabiranN,SureshS,BalaP,ShivakumarK,AnuradhaN,Reddy

    R,Raghavan TM et al:Human protein reference database--2006 update.Nucleic Acids Res2006,

    34(Databaseissue):D411-414.

    17. WongYH,LeeTY,LiangHK,HuangCM,WangTY,YangYH,ChuCH,HuangHD,KoMT,HwangJK:

    KinasePhos2.0:awebserverforidentifyingproteinkinase-specificphosphorylationsitesbasedon

    sequencesandcouplingpatterns.NucleicAcidsRes2007,35(WebServerissue):W588-594.

    18. YipYL,ScheibH,DiemandAV,GattikerA, FamigliettiLM,GasteigerE,BairochA:TheSwiss-Prot

    variant page and theModSNP database: a resource for sequence and structure information on

    humanproteinvariants.HumMutat2004,23(5):464-470.

    19. MulderNJ,ApweilerR,AttwoodTK,BairochA,BatemanA,BinnsD,BiswasM,BradleyP,BorkP,

    BucherPetal: InterPro:anintegrateddocumentation resource for proteinfamilies,domainsandfunctionalsites.BriefBioinform2002,3(3):225-235.

    20. DeshpandeN,AddessKJ,BluhmWF,Merino-OttJC,Townsend-MerinoW,ZhangQ,KnezevichC,XieL,

    Chen L, Feng Z et al:The RCSB Protein Data Bank: a redesigned query system and relational

    databasebasedonthemmCIFschema .NucleicAcidsRes2005,33(Databaseissue):D233-237.

    21. Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of

    hydrogen-bondedandgeometricalfeatures.Biopolymers1983,22(12):2577-2637.

    22. AhmadS,GromihaMM,SaraiA:RVP-net:onlinepredictionofrealvaluedaccessiblesurfaceareaof

    proteinsfromsinglesequences.Bioinformatics2003,19(14):1849-1851.

    23. McGuffinLJ,BrysonK,JonesDT:ThePSIPREDproteinstructurepredictionserver.Bioinformatics

    2000,16(4):404-405.

    24. WardJJ,SodhiJS,McGuffinLJ,BuxtonBF,JonesDT:Predictionandfunctionalanalysisofnative

    disorderinproteinsfromthethreekingdomsoflife.JMolBiol2004,337(3):635-645.

    25. GustafsonTA,HeW,CraparoA, SchaubCD,O'Neill TJ:Phosphotyrosine-dependent interactionof

    SHC and insulin receptor substrate 1 with the NPEYmotif of the insulin receptor via a novel

    non-SH2domain.MolCellBiol1995,15(5):2500-2508.

    26. HersI,BellCJ,PooleAW,JiangD,DentonRM,SchaeferE,TavareJM:Reciprocalfeedbackregulation

    of insulin receptor and insulin receptor substrate tyrosine phosphorylation by phosphoinositide

    3-kinaseinprimaryadipocytes.BiochemJ2002,368(Pt3):875-884.

    27. Sherry ST,WardMH,KholodovM,Baker J, Phan L, Smigielski EM,SirotkinK:dbSNP: theNCBI

    databaseofgeneticvariation.NucleicAcidsRes2001,29(1):308-311.

    28. TatusovRL,FedorovaND,JacksonJD,JacobsAR,KiryutinB,KooninEV,KrylovDM,MazumderR,

    MekhedovSL,Nikolskaya ANetal:TheCOG database: anupdatedversion includeseukaryotes.

    BMCBioinformatics2003,4:41.

    29. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive

    multiplesequencealignmentthroughsequenceweighting,position-specificgappenaltiesandweight

    matrixchoice.NucleicAcidsRes1994,22(22):4673-4680.

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    12/20

    pp.11

    30. ChenH,XueY,HuangN,YaoX,SunZ:MeMo: awebtool forprediction ofproteinmethylation

    modifications.NucleicAcidsRes2006,34(WebServerissue):W249-253.

    31. AltschulSF,MaddenTL,SchafferAA,ZhangJ,ZhangZ,MillerW,LipmanDJ:GappedBLASTand

    PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997,

    25(17):3389-3402.

    32. SeetBT,DikicI,ZhouMM,PawsonT:Readingproteinmodificationswithinteractiondomains .Nat

    RevMolCellBiol2006,7(7):473-483.

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    13/20

    pp.12

    Figures

    Figure1-Thesystemarchitectureoftheknowledgebaseforproteintranslational

    modification.

    Itcomprisesthethreemajorcomponents:integrationofexternalexperimentalPTMdatabases,learningand

    predictionof20typesofPTM,andannotationsofPTMknowledge(moredetailsinthetext).

    Figure2-Apartofresultpageonthewebinterface.

    AnexampleofgraphicalpresentationofPTMsitesandthestructuralcharacteristicsofhumanproteinIRS1.

    Figure3-Anillustrativeexampletoshowthecatalyticspecificityofacetyllysine.

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    14/20

    pp.13

    Tables

    Table1-ThestatisticsofexperimentalPTMsitesandputativePTMsitesinthisstudy.

    PTMtypes Modifiedresidues

    No.of

    experimentalsites

    No.ofputative

    sitesfrom

    UniProtKB/Swis

    s-Prot

    No.of

    HMM-predictedsitesindbPTM

    Phosphorylation Serine,threonine,tyrosine,andhistidine 31,363 36,080 1,815,472

    N-linked

    GlycosylationAsparagineandlysine 3,264 77,571 179,955

    O-linked

    Glycosylation

    Lysine,praline,serine,threonine,and

    tyrosine1,896 2,558 386,545

    C-linked

    GlycosylationTryptophan 53 52 4,015

    AcetylationN-terminalofsomeresiduesandsidechain

    oflysineorcysteine 2,080 5,143 1,206

    Amidation

    GenerallyattheC-terminalofamature

    activepeptideafteroxidativecleavageof

    lastglycine

    2,150 1,117 24,352

    Hydroxylation Generallyofasparagine,aspartate,proline

    orlysine 1,033 1,074 9,743

    Methylation

    GenerallyofN-terminalphenylalanine,

    sidechainoflysine,arginine,histidine,

    asparagineorglutamate,andC-terminal

    cysteine

    746 2,846 18,716

    Pyrrolidone

    CarboxylicAcid

    N-terminalglutaminewhichhasformedan

    internalcycliclactam. 598 584 12,322

    Gamma-Carboxygl

    utamicAcid Glutamate 371 361 1,924

    Farnesylation Cysteine 61 216 5,349

    Myristoylation Glycine 108 765 10,998

    N-Palmitoylation Cysteine 33 1,279 6,554

    S-Palmitoylation Cysteine 177 2,303 21,287

    Geranyl-geranylati

    onCysteine 47 819 14,317

    S-diacylglycerol

    cysteineCysteine 36 1,529 8,977

    GPIanchoring C-terminalasparagine,asparate,andserine 27 681 -

    DeamidationAmidatedasparagineandglutamine(needs

    tobefollowedbyaG)38 26 2,022

    Sulfation Serine,threonine,andtyrosine 196 626 15,654

    Sumoylation Lysine 77 259 10,342

    Ubiquitylation Lysine 286 516 8,865

    ADP-ribosylation Arginine 3 203 -

    Formylation OftheN-terminalmethionine 28 35 -

    Citrullination Arginine 27 91 -

    Nitration Tyrosine 47 5 1,432Bromination Tryptophan 18 3 -

    FAD

    O-8alpha-FADtyrosine,Pros-8alpha-FAD

    histidine,S-8alpha-FADcysteine,and

    Tele-8alpha-FADhistidine

    12 116 -

    S-nitrosylation Cysteine 9 93 -

    Others 1049 2,958 -

    Total 45,833 139,909 2,560,047

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    15/20

    pp.14

    Table2-TheenhancedfeaturesinthisexpandingPTMdatabase(dbPTM2.0).

    Features PreviousPTMdatabase[11] dbPTM2.0

    Proteinentry UniProtKB/Swiss-Prot(release46) UniProtKB/Swiss-Prot(release55)

    ExperimentalPTMresource

    UniProtKB/Swiss-Prot,Phospho.ELM,andO-GLYCBASE

    UniProtKB/Swiss-Prot,Phospho.ELM,

    PHOSIDA,HPRD, O-GLYCBASE,and

    UbiProt

    Computationally

    predictedPTMs

    Phosphorylation,glycosylation,and

    sulfation

    About25typesofPTM(phosphorylation,

    glycosylation,sulfation,acetylation,

    methylation,sumoylation,hydroxylation,

    etc.)

    Proteinstructure ProteinDataBank(PDB) ProteinDataBank(PDB)

    PTMannotation RESID(373PTMannotations) RESID(431PTMannotations)

    Structuralinvestigation

    ofPTMsites-

    Solventaccessibility, secondarystructure

    andintrinsicdisorderregion

    Kinasefamily

    annotation- KinBase

    Proteindomain InterPro InterPro

    Proteinvariation Swiss-ProtandEnsembl Swiss-ProtandEnsembl

    Site-specificPTM

    literature-

    ExtractingthePTM-relatedliteraturesfrom

    UniProtKB/Swiss-Prot,Phospho.ELM,

    HPRD, O-GLYCBASE,andUbiProt

    Substratespecificity -Aminoacidfrequency,solventaccessibility,

    secondarystructureanddisorderregion

    surroundingmodifiedsites

    Evolutionary

    conservationofPTM

    sites

    - COGandClustalW

    PTMbenchmarksetfor

    computationalstudies-

    Providingthebenchmarkforconstructing

    PTMtestsettocomparethepredictive

    performanceofpredictiontools

    Relationshipbetween

    PTMandsubcellular

    localization

    - Analyzing the relationship between PTMandsubcellularlocalization

    Graphicalvisualization

    PTM, solvent accessibility, secondary

    structure, protein variation, protein

    domain,andtertiarystructure

    PTM,solventaccessibility,secondary

    structure,proteinvariation,proteindomain,

    tertiarystructure,orthologousconserved

    regions,substratesitespecificityandprotein

    interactionnetwork

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    16/20

    pp.15

    Additionalfiles

    Additionalfile1

    Fileformat:DOC

    Title:Supplementaryfigures(S1,S2,S3,andS4)andtables(S1,S2,andS3)

    Description: The data provided 4 figures and 3 tables. The description of each figures and

    tablesaregivenbelow.FigureS1.ThedetailedprocessingflowofKinasePhos-likemethods.

    FigureS2.Themultiplesequencealignmentoforthologousconservedregions.FigureS3.The

    flowcharttoremovedataredundance.FigureS4.Exampleofsearchwebpages.TableS1.Data

    statisticsoftheintegratedresources.TableS2.Theparametersandpredictiveperformanceof

    thetrainedmodelswithbestaccuracyforeachPTMtype.TableS3.Thelistofintegrated

    databasesandprograms.

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    17/20

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    18/20

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    19/20

  • 8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs

    20/20

    Additional files provided with this submission:

    Additional file 1: dbptm2_bmc rn_additional filet_20090226_wenchi.doc, 1023Khttp://www.biomedcentral.com/imedia/1323127087257761/supp1.doc

    http://www.biomedcentral.com/imedia/1323127087257761/supp1.dochttp://www.biomedcentral.com/imedia/1323127087257761/supp1.doc