research article predicting flavin and nicotinamide...

14
Research Article Predicting Flavin and Nicotinamide Adenine Dinucleotide-Binding Sites in Proteins Using the Fragment Transformation Method Chih-Hao Lu, 1,2 Chin-Sheng Yu, 3,4 Yu-Feng Lin, 5 and Jin-Yi Chen 1 1 Graduate Institute of Molecular Systems Biomedicine, China Medical University, Taichung 40402, Taiwan 2 Graduate Institute of Basic Medical Science, China Medical University, Taichung 40402, Taiwan 3 Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724, Taiwan 4 Master’s Program in Biomedical Informatics and Biomedical Engineering, Feng Chia University, Taichung 40724, Taiwan 5 Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 30068, Taiwan Correspondence should be addressed to Chih-Hao Lu; [email protected] Received 16 June 2014; Accepted 21 July 2014 Academic Editor: Hao-Teng Chang Copyright © 2015 Chih-Hao Lu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We developed a computational method to identify NAD- and FAD-binding sites in proteins. First, we extracted from the Protein Data Bank structures of proteins that bind to at least one of these ligands. NAD-/FAD-binding residue templates were then constructed by identifying binding residues through the ligand-binding database BioLiP. e fragment transformation method was used to identify structures within query proteins that resembled the ligand-binding templates. By comparing residue types and their relative spatial positions, potential binding sites were identified and a ligand-binding potential for each residue was calculated. Setting the false positive rate at 5%, our method predicted NAD- and FAD-binding sites at true positive rates of 67.1% and 68.4%, respectively. Our method provides excellent results for identifying FAD- and NAD-binding sites in proteins, and the most important is that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified. 1. Background Over the past 12 years, projects involving structural genomics have generated structural data for 12,000 proteins within the Protein Data Bank (PDB) [1]. For most of these proteins, however, biological function is unknown. It is therefore important to develop computational methodologies that can identify a protein’s function from its structure. Many bio- chemical processes depend on interactions between proteins and cofactors, such as metal ions, vitamins, and adenine din- ucleotides, for example, flavin adenine dinucleotide (FAD) and nicotinamide adenine dinucleotide (NAD). Adenine dinucleotides play important roles in many central biological processes, including DNA repair [2, 3], glycolysis, photosyn- thesis, and transcription [47]. By June 2010, 5293 proteins in PDB were annotated “nucleotide binding,” and nucleotides constitute 15% of biologically relevant ligands [8]. ese statistics demonstrate how ubiquitous and essential protein- nucleotide interactions are to biological processes. Although protein-ligand interactions are fundamental to most biochemical reactions, structural information con- cerning these binding sites is still inadequate. Once ligand- binding sites can be predicted from structural data, putative functions can be assigned to these proteins. More complete annotation of protein function will benefit both basic science and the pharmaceutical industry. Mutations or deletions within these ligand-binding domains oſten alter biochemical reactions and are the root causes of many diseases. is makes binding sites attractive targets for drug therapies, including anticancer chemotherapy. In recent years computational methods have been used to identify ligand-binding sites within proteins. ese methods include empirical approaches [9], support vector machines (SVM) [8, 10, 11], random forest Hindawi Publishing Corporation BioMed Research International Volume 2015, Article ID 402536, 13 pages http://dx.doi.org/10.1155/2015/402536

Upload: others

Post on 14-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

Research ArticlePredicting Flavin and Nicotinamide AdenineDinucleotide-Binding Sites in Proteins Using the FragmentTransformation Method

Chih-Hao Lu12 Chin-Sheng Yu34 Yu-Feng Lin5 and Jin-Yi Chen1

1Graduate Institute of Molecular Systems Biomedicine China Medical University Taichung 40402 Taiwan2Graduate Institute of Basic Medical Science China Medical University Taichung 40402 Taiwan3Department of Information Engineering and Computer Science Feng Chia University Taichung 40724 Taiwan4Masterrsquos Program in Biomedical Informatics and Biomedical Engineering Feng Chia University Taichung 40724 Taiwan5Institute of Bioinformatics and Systems Biology National Chiao Tung University Hsinchu 30068 Taiwan

Correspondence should be addressed to Chih-Hao Lu chlumailcmuedutw

Received 16 June 2014 Accepted 21 July 2014

Academic Editor Hao-Teng Chang

Copyright copy 2015 Chih-Hao Lu et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

We developed a computational method to identify NAD- and FAD-binding sites in proteins First we extracted from the ProteinData Bank structures of proteins that bind to at least one of these ligands NAD-FAD-binding residue templates were thenconstructed by identifying binding residues through the ligand-binding database BioLiP The fragment transformation methodwas used to identify structures within query proteins that resembled the ligand-binding templates By comparing residue types andtheir relative spatial positions potential binding sites were identified and a ligand-binding potential for each residue was calculatedSetting the false positive rate at 5 our method predicted NAD- and FAD-binding sites at true positive rates of 671 and 684respectively Ourmethod provides excellent results for identifying FAD- andNAD-binding sites in proteins and themost importantis that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified

1 Background

Over the past 12 years projects involving structural genomicshave generated structural data for sim12000 proteins withinthe Protein Data Bank (PDB) [1] For most of these proteinshowever biological function is unknown It is thereforeimportant to develop computational methodologies that canidentify a proteinrsquos function from its structure Many bio-chemical processes depend on interactions between proteinsand cofactors such as metal ions vitamins and adenine din-ucleotides for example flavin adenine dinucleotide (FAD)and nicotinamide adenine dinucleotide (NAD) Adeninedinucleotides play important roles in many central biologicalprocesses including DNA repair [2 3] glycolysis photosyn-thesis and transcription [4ndash7] By June 2010 5293 proteinsin PDBwere annotated ldquonucleotide bindingrdquo and nucleotidesconstitute sim15 of biologically relevant ligands [8] These

statistics demonstrate how ubiquitous and essential protein-nucleotide interactions are to biological processes

Although protein-ligand interactions are fundamentalto most biochemical reactions structural information con-cerning these binding sites is still inadequate Once ligand-binding sites can be predicted from structural data putativefunctions can be assigned to these proteins More completeannotation of protein function will benefit both basic scienceand the pharmaceutical industry Mutations or deletionswithin these ligand-binding domains often alter biochemicalreactions and are the root causes ofmany diseasesThismakesbinding sites attractive targets for drug therapies includinganticancer chemotherapy In recent years computationalmethods have been used to identify ligand-binding siteswithin proteinsThesemethods include empirical approaches[9] support vector machines (SVM) [8 10 11] random forest

Hindawi Publishing CorporationBioMed Research InternationalVolume 2015 Article ID 402536 13 pageshttpdxdoiorg1011552015402536

2 BioMed Research International

[12 13] and artificial neural networks [14] and structure com-parison approaches [15ndash17] These prediction methods canbe divided into two broad categories ones that use protein-sequence information for example amino acid composi-tion position-specific scoring matrix and physicochemicalproperties and ones that use protein-structure informationfor example dihedral angles secondary structure and 3D-structure comparisonThemost effective predictionmethod-ologies however tend to use a combination of sequence andstructure data

The structural genomics initiative resolves 20 new proteinstructures each week and more than 60000 structures havebeen deposited into PDBThe functional surfaces of proteinswhich interact with cofactors tend to be more structurallyconserved than internal structures [18] Residues that forma functional binding region are usually quite close to oneanother when the three-dimensional structure of a proteinis examined In addition binding regions typically constituteonly 10ndash30 of the entire protein [19ndash21] We took advantageof previously generated structural information and usedthe fragment transformation method [22] to identify newbinding sites for the NAD and FAD ligands

2 Results

21 Residues that Bind NAD or FAD To characterize thestructural environment ofNAD-FAD-binding sites we com-pared binding-site residues to whole-protein residues Thethree-dimensional structure of the NADFAD molecule wasdivided into three moieties according to functionWithin thespherical environment of NAD the adenosine-binding sitetypically contained glycine isoleucine tyrosine and asparticacid residues the phosphate-binding site contained glycineisoleucine serine threonine methionine phenylalaninetyrosine tryptophan arginine and histidine residues andthe nicotinamide-binding site contained serine threoninecysteine phenylalanine asparagine tyrosine tryptophanhistidine and asparagine residues For FAD adenosine wasboundby glycine valine cysteine and tryptophan phosphatewas bound by glycine serine and arginine and flavinwas bound by cysteine methionine phenylalanine tyrosinetryptophan and histidine The residue types whose ratio ofbinding-site residues frequency to whole-protein residuesfrequency was greater than 12 were listed above As such thebinding residues were primarily polar residues containingcharged groups amide groups and nucleophilic groups(Figure 1)

We also characterized the types of atoms that werewithin 35 A of the three moieties of each NADFAD ligand(Figure 2) Nicotinamide and flavin moieties were mostcommonly associatedwith nitrogen and oxygen atomswithinthe backbone and side-chains of the protein Phosphatemoieties were commonly bound by backbone and side-chain nitrogen or side-chain oxygen Each ligand moietypreferentially bound certain atoms within certain residues

22 Prediction Performance We chose two criteria to eval-uate the performance of our binding-site predictions per-formance at less than 5 FPR and the Matthews correlation

Table 1 The performance of binding-site predictions at a 5 FPRthreshold

Accuracy () Sensitivity () Specificity () MCCNAD 9346 6709 9508 052FAD 9359 6843 9522 054

Table 2The performance of binding-site predictions at amaximumMCC threshold

Accuracy () Sensitivity () Specificity () MCCNAD 9534 5788 9764 057FAD 9433 6413 9627 055

coefficient (MCC) We used a combination of features thatincluded the number of aligned residues RMSD BLOSUMandDSSPUsing a 5FPR thresholdNAD-binding siteswerepredicted with an accuracy of 9346 a sensitivity of 6709and an MCC of 052 Under these same conditions FAD-binding-site predictions yielded 9359 accuracy 6843sensitivity and an MCC of 054 (Table 1) When MCCswere maximized NAD-binding proteins were identified with9534 accuracy 5788 sensitivity 9764 specificity andan MCC of 057 Under these same conditions FAD-bindingresidues were identified with an accuracy of 9433 asensitivity of 6413 a specificity of 9627 and an MCC of055 (Table 2) These data indicated that our method couldpredict binding residues for these two ligands

23 Comparison with OtherMethods We next compared ourresults with other prediction methodologies For these com-parisons we chose two published methods that use similarcriteria for analyzing these kinds of ligand-protein complexes[10 11] These chosen methods assign binding or nonbindingstatus to each residue within NAD-FAD-binding proteinsBecause these published methods use an equal number ofbinding and nonbinding residues we applied our predictionmethod to a similar dataset to make the results comparableRandom-selection processes were performed five times forall nonbinding residues within ligand-protein complexes togenerate the same scale for binding and nonbinding residueswithin each protein For NAD-binding proteins our methodpredicted binding residues with a sensitivity of 8621 and anMCC of 075 compared with 8613 and 075 for the methoddeveloped by Ansari and Raghava [10] (Table 3) For FAD-binding proteins our method yielded 8568 sensitivity andan MCC of 075 These values compared with the perfor-mance of the publishedmethod (8336 and 066) developedbyMishra and Raghava [11] (Table 4) Ourmethod thereforehas similar performance in NAD-binding sites predicted butbetter in FAD-binding sites However in native proteins thenumber of binding and nonbinding residues should not beequalThe equal numbermodel needs to be further discussed

24 Template Matching Figures 3ndash6 show alignments ofpredicted NAD-FAD-binding proteins and correspondingtemplates Structures within these figures were drawn using

BioMed Research International 3

0005

01015

02025

03035

04Fr

eque

ncy

Type of residue

Adenosine-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Phosphate-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Nicotinamide-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(c)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Adenosine-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(d)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residuePhosphate-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(e)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Flavin-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(f)

Figure 1 Amino acid frequencies within NAD-FAD-binding sites Frequencies within NAD-FAD-binding sites (black) are compared withwhole-protein frequencies (white) (a) Adenosine-binding of NAD (b) Phosphate-binding of NAD (c) Nicotinamide-binding of NAD (d)Adenosine-binding of FAD (e) Phosphate-binding of FAD (f) Flavin-binding of FAD The preferred types of amino acids surrounding thedifferent moiety of NADFAD are shown

Table 3 Comparison between the fragment transformation and SVMmethods for predicting NAD-binding-site residues

Accuracy () Sensitivity () Specificity () MCCRandom 1 8746 8645 8848 075Random 2 8723 8579 8867 074Random 3 8738 8565 8911 075Random 4 8746 8691 8801 075Random 5 8738 8625 8851 075Average 8738 8621 8856 075SVM [10] 8725 8613 8837 075

4 BioMed Research International

0

01

02

03

04

05

06

N O SType of atom

Freq

uenc

y

(a)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(b)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(c)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(d)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(e)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(f)

Figure 2 Atom-type frequencies within NAD-FAD-binding sites Frequencies for both backbone (black) and side-chain (white) atoms areshown (a) Adenosine-binding of NAD (b) Phosphate-binding of NAD (c) Nicotinamide-binding of NAD (d) Adenosine-binding of FAD(e) Phosphate-binding of FAD (f) Flavin-binding of FAD The preferred types of atoms surrounding the different moiety of NADFAD areshown

BioMed Research International 5

(a)

(b) (c) (d)

Figure 3 Identification of NAD-binding sites (a) Chain A of D-2-hydroxyisocaproate dehydrogenase (PDB ID1DXY) was the query proteinTemplates were constructed from (b) D-Lactate dehydrogenase (chain A PDB ID3KB6) (c) phosphoglycerate dehydrogenase (chain A PDBID1YBA) and (d) C-terminal-binding proteinbrefeldin A-ADP ribosylated substrate (chain A PDB ID1HKU)

Table 4 Comparison between the fragment transformation and SVMmethods for predicting FAD-binding-site residues

Accuracy () Sensitivity () Specificity () MCCRandom 1 8738 8568 8908 075Random 2 8748 8573 8923 075Random 3 8735 8555 8915 075Random 4 8758 8573 8943 075Random 5 8744 8573 8915 075Average 8745 8568 8921 075SVM [11] 8286 8336 8236 066

PyMOL [23] and color coded light gray for the query proteinblue lines for the ligand hot pink orange and forest sticks foradenosine- phosphate- and nicotinamide-flavin-bindingresidues that are predicted correctly and dark gray sticks fornonbinding residues that are predicted to be binding residuesOur method accurately identified 21 NAD-binding residueswithin chain A of D-2-hydroxyisocaproate dehydrogenase(PDB ID1DXY) [24 25] with ten false positives (Figure 3)Nine nicotinamide-binding residues were identified based onD-Lactate dehydrogenase (chain A PDB ID3KB6) [26 27]

three phosphate-binding residues were identified based onphosphoglycerate dehydrogenase (chain A PDB ID1YBA)[28] five adenosine-binding residues were identified basedon C-terminal-binding proteinbrefeldin A-ADP ribosylatedsubstrate (chain A PDB ID1HKU) [29] and four wereidentified based on other protein templates Ourmethod alsoaccurately predicted 23 NAD-binding residues within chainC of 5-carboxymethyl-2-hydroxymuconate semialdehydedehydrogenase (PDB ID2D4E) with only eight false pos-itives (Figure 4) Nine nicotinamide-binding residues were

6 BioMed Research International

(a)

(b) (c)

Figure 4 Identification of NAD-binding sites (a) Chain C of 5-carboxymethyl-2-hydroxymuconate semialdehyde dehydrogenase (PDBID2D4E) was the query protein Templates were constructed from (b) aldehyde dehydrogenase (chain A PDB ID3B4W) and (c) 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU)

identified based on aldehyde dehydrogenase (chain A PDBID3B4W) three phosphate-binding and eight adenosine-binding residues were identified based on 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU) andthree were identified based on other protein templates

For the FAD-binding proteins our method accuratelypredicted chain A of deoxyribodipyrimidine photolyase(PDB ID1OWL) [30] which contains 24 residues that bindFAD (Figure 5) and only six false positives occurred Threeadenosine-binding residues were identified based on humancryptochrome DASH (chain X PDB ID2IJG) [31 32]six phosphate-binding residues were identified based onphotolyase-like domain of cryptochrome 1 (chain A PDBID1U3C) [33] eleven flavin-binding residues were identifiedbased on photolyase (chain A PDB ID1IQR) [34] and fourwere identified based on other protein templates In addition30 FAD-binding residues were accurately predicted withinchain H of D-amino acid oxidase (PDB ID1DDO) [35]with 14 false positives Five adenosine-binding residues werepredicted based on putidaredoxin reductase (chain B PDB

ID1Q1R) [36 37] three adenosine-binding and nine flavin-binding residues based on D-amino acid oxidase (chain APDB ID1C0I) [38] three phosphate-binding and five flavin-binding residues based on glycine oxidase (chain B PDBID1NG3) [39] and five based on other protein templates(Figure 6)

3 Discussion

Small molecular cofactors (ligands) are essential for cells toperform numerous biological functions NAD and FAD forexample bind to proteins that play critical roles in energytransfer energy storage and signal transduction to name justa few To understand the mechanism by which these ligandsaffect protein function it is important to identify ligand-binding residues within relevant proteins The experimentalidentification of these interacting residues is so difficulthowever that computational methods to accomplish this taskare in high demand

BioMed Research International 7

(a)

(b) (c) (d)

Figure 5 Identification of FAD-binding sites (a) Chain A of deoxyribodipyrimidine photolyase (PDB ID1OWL) was the query proteinTemplates were constructed from (b) human cryptochrome DASH (chain X PDB ID2IJG) (c) photolyase-like domain of cryptochrome 1(chain A PDB ID1U3C) and (d) photolyase (chain A PDB ID1IQR)

Here we developed a structure comparison method thatuses both sequence and structure information to predictNAD-FAD-binding residues within proteins This approachalso provides valuable information concerning the microen-vironment of the protein-ligand interactionThe compositionof NAD-FAD-binding residues that we identified here isgenerally similar to previous studies [10 11] Interestinglyglycine was the most frequent binding residue bindingto NAD through phosphate or adenosine moieties moreoften than through the nicotinamide moiety In contrastarginine preferentially interacted with phosphate moieties

and aspartic acid preferentially interacted with adenosinemoieties of NAD whereas threonine cysteine and histidinebound to nicotinamide The most common residue withinFAD-binding sites was also glycine which preferentiallybound phosphate and adenosine moieties Serine interactedwith phosphate moieties whereas cysteine tyrosine andtryptophan primarily bound to nicotinamide By takingadvantage of this kind of structural information detailsconcerning these critical binding sites may be revealed Toinvestigate the influence of amino acids on prediction per-formance the sensitivity and specificity associated with each

8 BioMed Research International

(a)

(b) (c) (d)

Figure 6 Identification of FAD-binding sites (a) Chain H of D-amino acid oxidase (PDB ID1DDO) was the query protein Templates wereconstructed from (b) putidaredoxin reductase (chain B PDB ID1Q1R) (c) D-amino acid oxidase (chain A PDB ID1C0I) and (d) glycineoxidase (chain B PDB ID1NG3)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

Figure 7 Sensitivity and specificity associated with each amino acid in NAD-FAD-binding-site predictions (a) NAD (b) FAD

residue were calculated (Figure 7) For NAD-binding-sitepredictions specificity for each residue was excellent (0927ndash0966) but sensitivity was relatively low for phenylalaninetryptophan arginine and glutaminewhichwere less than 05For FAD-binding sites all residues achieved high specificity(0933ndash0971) and sensitivity (0532ndash0791) It should be noted

that the ratio of NAD-FAD-binding residues to nonbindingresidues is about 1 to 16 in our dataset This large differencemight cause lots of false positives when predicted That is thereason for high specificity and accuracy but low sensitivity inour prediction results Hence the positions of false positiveresidues in sequence were also investigated 20 and 25 of

BioMed Research International 9

false positive residues of NAD- and FAD-binding predictionoccurred next to the true positive residues in sequence Itwas shown that these residues are also located near theligand in the coordinate space If these residues were treatedas true positive residues our prediction results of NAD-binding yielded 7155 sensitivity and 061 MCC at a 5FPR threshold Under the same conditions FAD-binding-site predictions yielded 7334 sensitivity and an MCC of064 Compared with other prediction methods ours did notuse protein evolutionary information but only used proteinstructure and did not need to use equal number dataset fortraining but predicted whole-proteins through comparingstructures of template database Our results yielded excellentprediction performancewhen analyzingNAD-FAD-bindingresidues and thus provide important details concerning thebinding-site microenvironment This approach thereforemay be used to predict putative NAD-FAD-binding proteinsand the specific residues involved in the interaction

4 Methods

41 Overview We extracted structures of proteins boundto NAD or FAD from PDB and constructed a database ofNAD-FAD-binding residue templates Residues that weredefined as binding residues by the ligand-binding databaseBioLiP [40] were included in the template Query proteinstructures were then compared with each template in thedatabase using a ldquoleave-one-outrdquo comparison method Thefragment transformation method [22] was used to alignquery and template structures After comparing the localprotein structure each residue was assigned a score basedon both protein sequence and structure Sequence similaritywas calculated using the BLOSUM62 substitutionmatrix [41]whereas structural similarity was calculated by measuringthe root mean square deviation (RMSD) of the C120572 carbonsfrom local structure alignments and using a secondarystructure substitutionmatrix [22] according to theDictionaryof Secondary Structure of Proteinsrsquo (DSSP) definition ofsecondary structure [42] Residues with an alignment scorethat exceeded a predetermined threshold were predicted tobind NADFADThis method is illustrated in Figure 8

42 NAD-FAD-Binding Proteins and Binding Residue Tem-plates We adopted the same datasets with previous research[10 11] All protein complexes were collected from PDBand had pairwise sequence identity lt40 by using CD-HITProteins chains that are not involved in NADFAD bindingwere excluded Residues that were defined as binding ornonbinding residues by using the ligand-binding databaseBioLiP The main dataset included 184 and 165 polypeptidechains for NAD and FAD respectively Because NAD iscomposed of a nicotinamide moiety an adenosine moietyand a phosphate moiety binding residues were dividedinto three groups nicotinamide binding adenosine bindingand phosphate binding FAD-binding sitessimilarly contain

flavin-binding residues adenosine-binding residues andphosphate-binding residues Groups of residues that con-tained more than or equal to two binding residues wereconsidered a binding residue template (see Figures 9 and 10)

43The Fragment TransformationMethod Weused the frag-ment transformation method to align NAD-FAD-bindingresidues Each residue was treated as an individual unit andwas used to align the query protein 119878 with the bindingtemplate 119879 The structural unit consists of a triplet formedby the NndashC

120572ndashC atoms within a given residue 119878 denotes the

query protein of length 119898 and 119879 denotes the template of 119899residues The query protein 119878 of length119898 and the template 119879of 119899 residues can therefore be expressed in terms of tripletsas 119878 = 120590

1 120590

2 120590

119898 and 119879 = 120591

1 120591

2 120591

119899 where 120590

119894=

(119901

119873 119901

119862120572 119901

119862) 120591119895= (119902

119873 119902

119862120572 119902

119862) and 119901 and 119902 are PDB

coordinates for each atomA matrix of dimensions 119898 times 119899 was then constructed for

the residues of 119878 and 119879 as

119872 =

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

119872

11119872

12sdot sdot sdot 119872

1119899

119872

21119872

22sdot sdot sdot 119872

2119899

sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

119872

1198981119872

1198982sdot sdot sdot 119872

119898119899

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

(1)

where the element119872119894119895is a rigid-body transformation matrix

that transforms the triplet 120590119894to 120591119895(ie119872

119894119895120590

119894= 120591

119895)

44 Performing Triplet Clustering 119863119894119895119896119897is the Cartesian dis-

tance between the target 120591119897and the transformed triplet119872

119894119895120590

119896

providing a measure of how similarly the triplet pairs (120590119894 120591

119895)

and (120590119896 120591

119897) are oriented This allows clustering of triplet

fragments using the single-linkage algorithm [43] as followsIf for two triplet pairs (120590

119894 120591

119895) and (120590

119896 120591

119897) 119863119894119895119896119897lt 119863

0 119894 = 119896

and 119895 = 119897 then the triplets are clustered Let 1198661and 119866

2

be two clusters with the first containing (120590119894 120591

119895) and (120590

119896 120591

119897)

and the second containing (1205901198941015840 1205911198951015840) and (120590

1198961015840 1205911198971015840) If 119863119894119895

11989610158401198971015840lt

119863

0 then 119866

1and 119866

2are merged to form a new cluster 119866

3

where 1198663= 119866

1cup 119866

2 These procedures are performed

iteratively until no new clusters can be formed For each finalcluster 119866

120583 we can obtain the transformation matrix119872120583

119896119897and

aligned substructure pair 119878120583= ⋃

120590119896isin119866120583

120590

119896and 119879

120583= ⋃

120591119897isin119866120583

120591

119897

where 119866120583

has the minimum Cartesian distance whenusing119872120583

119896119897

45 Scoring Function For each residue 119894 the binding score119862119894

is defined as

119862

119894= MAX120590119894isin119866120583

(120576

120583times 119862

119877

120583times 119862

119861

120583times 119862

119863

120583) (2)

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

2 BioMed Research International

[12 13] and artificial neural networks [14] and structure com-parison approaches [15ndash17] These prediction methods canbe divided into two broad categories ones that use protein-sequence information for example amino acid composi-tion position-specific scoring matrix and physicochemicalproperties and ones that use protein-structure informationfor example dihedral angles secondary structure and 3D-structure comparisonThemost effective predictionmethod-ologies however tend to use a combination of sequence andstructure data

The structural genomics initiative resolves 20 new proteinstructures each week and more than 60000 structures havebeen deposited into PDBThe functional surfaces of proteinswhich interact with cofactors tend to be more structurallyconserved than internal structures [18] Residues that forma functional binding region are usually quite close to oneanother when the three-dimensional structure of a proteinis examined In addition binding regions typically constituteonly 10ndash30 of the entire protein [19ndash21] We took advantageof previously generated structural information and usedthe fragment transformation method [22] to identify newbinding sites for the NAD and FAD ligands

2 Results

21 Residues that Bind NAD or FAD To characterize thestructural environment ofNAD-FAD-binding sites we com-pared binding-site residues to whole-protein residues Thethree-dimensional structure of the NADFAD molecule wasdivided into three moieties according to functionWithin thespherical environment of NAD the adenosine-binding sitetypically contained glycine isoleucine tyrosine and asparticacid residues the phosphate-binding site contained glycineisoleucine serine threonine methionine phenylalaninetyrosine tryptophan arginine and histidine residues andthe nicotinamide-binding site contained serine threoninecysteine phenylalanine asparagine tyrosine tryptophanhistidine and asparagine residues For FAD adenosine wasboundby glycine valine cysteine and tryptophan phosphatewas bound by glycine serine and arginine and flavinwas bound by cysteine methionine phenylalanine tyrosinetryptophan and histidine The residue types whose ratio ofbinding-site residues frequency to whole-protein residuesfrequency was greater than 12 were listed above As such thebinding residues were primarily polar residues containingcharged groups amide groups and nucleophilic groups(Figure 1)

We also characterized the types of atoms that werewithin 35 A of the three moieties of each NADFAD ligand(Figure 2) Nicotinamide and flavin moieties were mostcommonly associatedwith nitrogen and oxygen atomswithinthe backbone and side-chains of the protein Phosphatemoieties were commonly bound by backbone and side-chain nitrogen or side-chain oxygen Each ligand moietypreferentially bound certain atoms within certain residues

22 Prediction Performance We chose two criteria to eval-uate the performance of our binding-site predictions per-formance at less than 5 FPR and the Matthews correlation

Table 1 The performance of binding-site predictions at a 5 FPRthreshold

Accuracy () Sensitivity () Specificity () MCCNAD 9346 6709 9508 052FAD 9359 6843 9522 054

Table 2The performance of binding-site predictions at amaximumMCC threshold

Accuracy () Sensitivity () Specificity () MCCNAD 9534 5788 9764 057FAD 9433 6413 9627 055

coefficient (MCC) We used a combination of features thatincluded the number of aligned residues RMSD BLOSUMandDSSPUsing a 5FPR thresholdNAD-binding siteswerepredicted with an accuracy of 9346 a sensitivity of 6709and an MCC of 052 Under these same conditions FAD-binding-site predictions yielded 9359 accuracy 6843sensitivity and an MCC of 054 (Table 1) When MCCswere maximized NAD-binding proteins were identified with9534 accuracy 5788 sensitivity 9764 specificity andan MCC of 057 Under these same conditions FAD-bindingresidues were identified with an accuracy of 9433 asensitivity of 6413 a specificity of 9627 and an MCC of055 (Table 2) These data indicated that our method couldpredict binding residues for these two ligands

23 Comparison with OtherMethods We next compared ourresults with other prediction methodologies For these com-parisons we chose two published methods that use similarcriteria for analyzing these kinds of ligand-protein complexes[10 11] These chosen methods assign binding or nonbindingstatus to each residue within NAD-FAD-binding proteinsBecause these published methods use an equal number ofbinding and nonbinding residues we applied our predictionmethod to a similar dataset to make the results comparableRandom-selection processes were performed five times forall nonbinding residues within ligand-protein complexes togenerate the same scale for binding and nonbinding residueswithin each protein For NAD-binding proteins our methodpredicted binding residues with a sensitivity of 8621 and anMCC of 075 compared with 8613 and 075 for the methoddeveloped by Ansari and Raghava [10] (Table 3) For FAD-binding proteins our method yielded 8568 sensitivity andan MCC of 075 These values compared with the perfor-mance of the publishedmethod (8336 and 066) developedbyMishra and Raghava [11] (Table 4) Ourmethod thereforehas similar performance in NAD-binding sites predicted butbetter in FAD-binding sites However in native proteins thenumber of binding and nonbinding residues should not beequalThe equal numbermodel needs to be further discussed

24 Template Matching Figures 3ndash6 show alignments ofpredicted NAD-FAD-binding proteins and correspondingtemplates Structures within these figures were drawn using

BioMed Research International 3

0005

01015

02025

03035

04Fr

eque

ncy

Type of residue

Adenosine-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Phosphate-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Nicotinamide-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(c)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Adenosine-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(d)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residuePhosphate-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(e)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Flavin-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(f)

Figure 1 Amino acid frequencies within NAD-FAD-binding sites Frequencies within NAD-FAD-binding sites (black) are compared withwhole-protein frequencies (white) (a) Adenosine-binding of NAD (b) Phosphate-binding of NAD (c) Nicotinamide-binding of NAD (d)Adenosine-binding of FAD (e) Phosphate-binding of FAD (f) Flavin-binding of FAD The preferred types of amino acids surrounding thedifferent moiety of NADFAD are shown

Table 3 Comparison between the fragment transformation and SVMmethods for predicting NAD-binding-site residues

Accuracy () Sensitivity () Specificity () MCCRandom 1 8746 8645 8848 075Random 2 8723 8579 8867 074Random 3 8738 8565 8911 075Random 4 8746 8691 8801 075Random 5 8738 8625 8851 075Average 8738 8621 8856 075SVM [10] 8725 8613 8837 075

4 BioMed Research International

0

01

02

03

04

05

06

N O SType of atom

Freq

uenc

y

(a)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(b)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(c)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(d)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(e)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(f)

Figure 2 Atom-type frequencies within NAD-FAD-binding sites Frequencies for both backbone (black) and side-chain (white) atoms areshown (a) Adenosine-binding of NAD (b) Phosphate-binding of NAD (c) Nicotinamide-binding of NAD (d) Adenosine-binding of FAD(e) Phosphate-binding of FAD (f) Flavin-binding of FAD The preferred types of atoms surrounding the different moiety of NADFAD areshown

BioMed Research International 5

(a)

(b) (c) (d)

Figure 3 Identification of NAD-binding sites (a) Chain A of D-2-hydroxyisocaproate dehydrogenase (PDB ID1DXY) was the query proteinTemplates were constructed from (b) D-Lactate dehydrogenase (chain A PDB ID3KB6) (c) phosphoglycerate dehydrogenase (chain A PDBID1YBA) and (d) C-terminal-binding proteinbrefeldin A-ADP ribosylated substrate (chain A PDB ID1HKU)

Table 4 Comparison between the fragment transformation and SVMmethods for predicting FAD-binding-site residues

Accuracy () Sensitivity () Specificity () MCCRandom 1 8738 8568 8908 075Random 2 8748 8573 8923 075Random 3 8735 8555 8915 075Random 4 8758 8573 8943 075Random 5 8744 8573 8915 075Average 8745 8568 8921 075SVM [11] 8286 8336 8236 066

PyMOL [23] and color coded light gray for the query proteinblue lines for the ligand hot pink orange and forest sticks foradenosine- phosphate- and nicotinamide-flavin-bindingresidues that are predicted correctly and dark gray sticks fornonbinding residues that are predicted to be binding residuesOur method accurately identified 21 NAD-binding residueswithin chain A of D-2-hydroxyisocaproate dehydrogenase(PDB ID1DXY) [24 25] with ten false positives (Figure 3)Nine nicotinamide-binding residues were identified based onD-Lactate dehydrogenase (chain A PDB ID3KB6) [26 27]

three phosphate-binding residues were identified based onphosphoglycerate dehydrogenase (chain A PDB ID1YBA)[28] five adenosine-binding residues were identified basedon C-terminal-binding proteinbrefeldin A-ADP ribosylatedsubstrate (chain A PDB ID1HKU) [29] and four wereidentified based on other protein templates Ourmethod alsoaccurately predicted 23 NAD-binding residues within chainC of 5-carboxymethyl-2-hydroxymuconate semialdehydedehydrogenase (PDB ID2D4E) with only eight false pos-itives (Figure 4) Nine nicotinamide-binding residues were

6 BioMed Research International

(a)

(b) (c)

Figure 4 Identification of NAD-binding sites (a) Chain C of 5-carboxymethyl-2-hydroxymuconate semialdehyde dehydrogenase (PDBID2D4E) was the query protein Templates were constructed from (b) aldehyde dehydrogenase (chain A PDB ID3B4W) and (c) 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU)

identified based on aldehyde dehydrogenase (chain A PDBID3B4W) three phosphate-binding and eight adenosine-binding residues were identified based on 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU) andthree were identified based on other protein templates

For the FAD-binding proteins our method accuratelypredicted chain A of deoxyribodipyrimidine photolyase(PDB ID1OWL) [30] which contains 24 residues that bindFAD (Figure 5) and only six false positives occurred Threeadenosine-binding residues were identified based on humancryptochrome DASH (chain X PDB ID2IJG) [31 32]six phosphate-binding residues were identified based onphotolyase-like domain of cryptochrome 1 (chain A PDBID1U3C) [33] eleven flavin-binding residues were identifiedbased on photolyase (chain A PDB ID1IQR) [34] and fourwere identified based on other protein templates In addition30 FAD-binding residues were accurately predicted withinchain H of D-amino acid oxidase (PDB ID1DDO) [35]with 14 false positives Five adenosine-binding residues werepredicted based on putidaredoxin reductase (chain B PDB

ID1Q1R) [36 37] three adenosine-binding and nine flavin-binding residues based on D-amino acid oxidase (chain APDB ID1C0I) [38] three phosphate-binding and five flavin-binding residues based on glycine oxidase (chain B PDBID1NG3) [39] and five based on other protein templates(Figure 6)

3 Discussion

Small molecular cofactors (ligands) are essential for cells toperform numerous biological functions NAD and FAD forexample bind to proteins that play critical roles in energytransfer energy storage and signal transduction to name justa few To understand the mechanism by which these ligandsaffect protein function it is important to identify ligand-binding residues within relevant proteins The experimentalidentification of these interacting residues is so difficulthowever that computational methods to accomplish this taskare in high demand

BioMed Research International 7

(a)

(b) (c) (d)

Figure 5 Identification of FAD-binding sites (a) Chain A of deoxyribodipyrimidine photolyase (PDB ID1OWL) was the query proteinTemplates were constructed from (b) human cryptochrome DASH (chain X PDB ID2IJG) (c) photolyase-like domain of cryptochrome 1(chain A PDB ID1U3C) and (d) photolyase (chain A PDB ID1IQR)

Here we developed a structure comparison method thatuses both sequence and structure information to predictNAD-FAD-binding residues within proteins This approachalso provides valuable information concerning the microen-vironment of the protein-ligand interactionThe compositionof NAD-FAD-binding residues that we identified here isgenerally similar to previous studies [10 11] Interestinglyglycine was the most frequent binding residue bindingto NAD through phosphate or adenosine moieties moreoften than through the nicotinamide moiety In contrastarginine preferentially interacted with phosphate moieties

and aspartic acid preferentially interacted with adenosinemoieties of NAD whereas threonine cysteine and histidinebound to nicotinamide The most common residue withinFAD-binding sites was also glycine which preferentiallybound phosphate and adenosine moieties Serine interactedwith phosphate moieties whereas cysteine tyrosine andtryptophan primarily bound to nicotinamide By takingadvantage of this kind of structural information detailsconcerning these critical binding sites may be revealed Toinvestigate the influence of amino acids on prediction per-formance the sensitivity and specificity associated with each

8 BioMed Research International

(a)

(b) (c) (d)

Figure 6 Identification of FAD-binding sites (a) Chain H of D-amino acid oxidase (PDB ID1DDO) was the query protein Templates wereconstructed from (b) putidaredoxin reductase (chain B PDB ID1Q1R) (c) D-amino acid oxidase (chain A PDB ID1C0I) and (d) glycineoxidase (chain B PDB ID1NG3)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

Figure 7 Sensitivity and specificity associated with each amino acid in NAD-FAD-binding-site predictions (a) NAD (b) FAD

residue were calculated (Figure 7) For NAD-binding-sitepredictions specificity for each residue was excellent (0927ndash0966) but sensitivity was relatively low for phenylalaninetryptophan arginine and glutaminewhichwere less than 05For FAD-binding sites all residues achieved high specificity(0933ndash0971) and sensitivity (0532ndash0791) It should be noted

that the ratio of NAD-FAD-binding residues to nonbindingresidues is about 1 to 16 in our dataset This large differencemight cause lots of false positives when predicted That is thereason for high specificity and accuracy but low sensitivity inour prediction results Hence the positions of false positiveresidues in sequence were also investigated 20 and 25 of

BioMed Research International 9

false positive residues of NAD- and FAD-binding predictionoccurred next to the true positive residues in sequence Itwas shown that these residues are also located near theligand in the coordinate space If these residues were treatedas true positive residues our prediction results of NAD-binding yielded 7155 sensitivity and 061 MCC at a 5FPR threshold Under the same conditions FAD-binding-site predictions yielded 7334 sensitivity and an MCC of064 Compared with other prediction methods ours did notuse protein evolutionary information but only used proteinstructure and did not need to use equal number dataset fortraining but predicted whole-proteins through comparingstructures of template database Our results yielded excellentprediction performancewhen analyzingNAD-FAD-bindingresidues and thus provide important details concerning thebinding-site microenvironment This approach thereforemay be used to predict putative NAD-FAD-binding proteinsand the specific residues involved in the interaction

4 Methods

41 Overview We extracted structures of proteins boundto NAD or FAD from PDB and constructed a database ofNAD-FAD-binding residue templates Residues that weredefined as binding residues by the ligand-binding databaseBioLiP [40] were included in the template Query proteinstructures were then compared with each template in thedatabase using a ldquoleave-one-outrdquo comparison method Thefragment transformation method [22] was used to alignquery and template structures After comparing the localprotein structure each residue was assigned a score basedon both protein sequence and structure Sequence similaritywas calculated using the BLOSUM62 substitutionmatrix [41]whereas structural similarity was calculated by measuringthe root mean square deviation (RMSD) of the C120572 carbonsfrom local structure alignments and using a secondarystructure substitutionmatrix [22] according to theDictionaryof Secondary Structure of Proteinsrsquo (DSSP) definition ofsecondary structure [42] Residues with an alignment scorethat exceeded a predetermined threshold were predicted tobind NADFADThis method is illustrated in Figure 8

42 NAD-FAD-Binding Proteins and Binding Residue Tem-plates We adopted the same datasets with previous research[10 11] All protein complexes were collected from PDBand had pairwise sequence identity lt40 by using CD-HITProteins chains that are not involved in NADFAD bindingwere excluded Residues that were defined as binding ornonbinding residues by using the ligand-binding databaseBioLiP The main dataset included 184 and 165 polypeptidechains for NAD and FAD respectively Because NAD iscomposed of a nicotinamide moiety an adenosine moietyand a phosphate moiety binding residues were dividedinto three groups nicotinamide binding adenosine bindingand phosphate binding FAD-binding sitessimilarly contain

flavin-binding residues adenosine-binding residues andphosphate-binding residues Groups of residues that con-tained more than or equal to two binding residues wereconsidered a binding residue template (see Figures 9 and 10)

43The Fragment TransformationMethod Weused the frag-ment transformation method to align NAD-FAD-bindingresidues Each residue was treated as an individual unit andwas used to align the query protein 119878 with the bindingtemplate 119879 The structural unit consists of a triplet formedby the NndashC

120572ndashC atoms within a given residue 119878 denotes the

query protein of length 119898 and 119879 denotes the template of 119899residues The query protein 119878 of length119898 and the template 119879of 119899 residues can therefore be expressed in terms of tripletsas 119878 = 120590

1 120590

2 120590

119898 and 119879 = 120591

1 120591

2 120591

119899 where 120590

119894=

(119901

119873 119901

119862120572 119901

119862) 120591119895= (119902

119873 119902

119862120572 119902

119862) and 119901 and 119902 are PDB

coordinates for each atomA matrix of dimensions 119898 times 119899 was then constructed for

the residues of 119878 and 119879 as

119872 =

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

119872

11119872

12sdot sdot sdot 119872

1119899

119872

21119872

22sdot sdot sdot 119872

2119899

sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

119872

1198981119872

1198982sdot sdot sdot 119872

119898119899

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

(1)

where the element119872119894119895is a rigid-body transformation matrix

that transforms the triplet 120590119894to 120591119895(ie119872

119894119895120590

119894= 120591

119895)

44 Performing Triplet Clustering 119863119894119895119896119897is the Cartesian dis-

tance between the target 120591119897and the transformed triplet119872

119894119895120590

119896

providing a measure of how similarly the triplet pairs (120590119894 120591

119895)

and (120590119896 120591

119897) are oriented This allows clustering of triplet

fragments using the single-linkage algorithm [43] as followsIf for two triplet pairs (120590

119894 120591

119895) and (120590

119896 120591

119897) 119863119894119895119896119897lt 119863

0 119894 = 119896

and 119895 = 119897 then the triplets are clustered Let 1198661and 119866

2

be two clusters with the first containing (120590119894 120591

119895) and (120590

119896 120591

119897)

and the second containing (1205901198941015840 1205911198951015840) and (120590

1198961015840 1205911198971015840) If 119863119894119895

11989610158401198971015840lt

119863

0 then 119866

1and 119866

2are merged to form a new cluster 119866

3

where 1198663= 119866

1cup 119866

2 These procedures are performed

iteratively until no new clusters can be formed For each finalcluster 119866

120583 we can obtain the transformation matrix119872120583

119896119897and

aligned substructure pair 119878120583= ⋃

120590119896isin119866120583

120590

119896and 119879

120583= ⋃

120591119897isin119866120583

120591

119897

where 119866120583

has the minimum Cartesian distance whenusing119872120583

119896119897

45 Scoring Function For each residue 119894 the binding score119862119894

is defined as

119862

119894= MAX120590119894isin119866120583

(120576

120583times 119862

119877

120583times 119862

119861

120583times 119862

119863

120583) (2)

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

BioMed Research International 3

0005

01015

02025

03035

04Fr

eque

ncy

Type of residue

Adenosine-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Phosphate-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Nicotinamide-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(c)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Adenosine-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(d)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residuePhosphate-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(e)

0005

01015

02025

03035

04

Freq

uenc

y

Type of residue

Flavin-binding residues Whole-protein residues

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(f)

Figure 1 Amino acid frequencies within NAD-FAD-binding sites Frequencies within NAD-FAD-binding sites (black) are compared withwhole-protein frequencies (white) (a) Adenosine-binding of NAD (b) Phosphate-binding of NAD (c) Nicotinamide-binding of NAD (d)Adenosine-binding of FAD (e) Phosphate-binding of FAD (f) Flavin-binding of FAD The preferred types of amino acids surrounding thedifferent moiety of NADFAD are shown

Table 3 Comparison between the fragment transformation and SVMmethods for predicting NAD-binding-site residues

Accuracy () Sensitivity () Specificity () MCCRandom 1 8746 8645 8848 075Random 2 8723 8579 8867 074Random 3 8738 8565 8911 075Random 4 8746 8691 8801 075Random 5 8738 8625 8851 075Average 8738 8621 8856 075SVM [10] 8725 8613 8837 075

4 BioMed Research International

0

01

02

03

04

05

06

N O SType of atom

Freq

uenc

y

(a)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(b)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(c)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(d)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(e)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(f)

Figure 2 Atom-type frequencies within NAD-FAD-binding sites Frequencies for both backbone (black) and side-chain (white) atoms areshown (a) Adenosine-binding of NAD (b) Phosphate-binding of NAD (c) Nicotinamide-binding of NAD (d) Adenosine-binding of FAD(e) Phosphate-binding of FAD (f) Flavin-binding of FAD The preferred types of atoms surrounding the different moiety of NADFAD areshown

BioMed Research International 5

(a)

(b) (c) (d)

Figure 3 Identification of NAD-binding sites (a) Chain A of D-2-hydroxyisocaproate dehydrogenase (PDB ID1DXY) was the query proteinTemplates were constructed from (b) D-Lactate dehydrogenase (chain A PDB ID3KB6) (c) phosphoglycerate dehydrogenase (chain A PDBID1YBA) and (d) C-terminal-binding proteinbrefeldin A-ADP ribosylated substrate (chain A PDB ID1HKU)

Table 4 Comparison between the fragment transformation and SVMmethods for predicting FAD-binding-site residues

Accuracy () Sensitivity () Specificity () MCCRandom 1 8738 8568 8908 075Random 2 8748 8573 8923 075Random 3 8735 8555 8915 075Random 4 8758 8573 8943 075Random 5 8744 8573 8915 075Average 8745 8568 8921 075SVM [11] 8286 8336 8236 066

PyMOL [23] and color coded light gray for the query proteinblue lines for the ligand hot pink orange and forest sticks foradenosine- phosphate- and nicotinamide-flavin-bindingresidues that are predicted correctly and dark gray sticks fornonbinding residues that are predicted to be binding residuesOur method accurately identified 21 NAD-binding residueswithin chain A of D-2-hydroxyisocaproate dehydrogenase(PDB ID1DXY) [24 25] with ten false positives (Figure 3)Nine nicotinamide-binding residues were identified based onD-Lactate dehydrogenase (chain A PDB ID3KB6) [26 27]

three phosphate-binding residues were identified based onphosphoglycerate dehydrogenase (chain A PDB ID1YBA)[28] five adenosine-binding residues were identified basedon C-terminal-binding proteinbrefeldin A-ADP ribosylatedsubstrate (chain A PDB ID1HKU) [29] and four wereidentified based on other protein templates Ourmethod alsoaccurately predicted 23 NAD-binding residues within chainC of 5-carboxymethyl-2-hydroxymuconate semialdehydedehydrogenase (PDB ID2D4E) with only eight false pos-itives (Figure 4) Nine nicotinamide-binding residues were

6 BioMed Research International

(a)

(b) (c)

Figure 4 Identification of NAD-binding sites (a) Chain C of 5-carboxymethyl-2-hydroxymuconate semialdehyde dehydrogenase (PDBID2D4E) was the query protein Templates were constructed from (b) aldehyde dehydrogenase (chain A PDB ID3B4W) and (c) 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU)

identified based on aldehyde dehydrogenase (chain A PDBID3B4W) three phosphate-binding and eight adenosine-binding residues were identified based on 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU) andthree were identified based on other protein templates

For the FAD-binding proteins our method accuratelypredicted chain A of deoxyribodipyrimidine photolyase(PDB ID1OWL) [30] which contains 24 residues that bindFAD (Figure 5) and only six false positives occurred Threeadenosine-binding residues were identified based on humancryptochrome DASH (chain X PDB ID2IJG) [31 32]six phosphate-binding residues were identified based onphotolyase-like domain of cryptochrome 1 (chain A PDBID1U3C) [33] eleven flavin-binding residues were identifiedbased on photolyase (chain A PDB ID1IQR) [34] and fourwere identified based on other protein templates In addition30 FAD-binding residues were accurately predicted withinchain H of D-amino acid oxidase (PDB ID1DDO) [35]with 14 false positives Five adenosine-binding residues werepredicted based on putidaredoxin reductase (chain B PDB

ID1Q1R) [36 37] three adenosine-binding and nine flavin-binding residues based on D-amino acid oxidase (chain APDB ID1C0I) [38] three phosphate-binding and five flavin-binding residues based on glycine oxidase (chain B PDBID1NG3) [39] and five based on other protein templates(Figure 6)

3 Discussion

Small molecular cofactors (ligands) are essential for cells toperform numerous biological functions NAD and FAD forexample bind to proteins that play critical roles in energytransfer energy storage and signal transduction to name justa few To understand the mechanism by which these ligandsaffect protein function it is important to identify ligand-binding residues within relevant proteins The experimentalidentification of these interacting residues is so difficulthowever that computational methods to accomplish this taskare in high demand

BioMed Research International 7

(a)

(b) (c) (d)

Figure 5 Identification of FAD-binding sites (a) Chain A of deoxyribodipyrimidine photolyase (PDB ID1OWL) was the query proteinTemplates were constructed from (b) human cryptochrome DASH (chain X PDB ID2IJG) (c) photolyase-like domain of cryptochrome 1(chain A PDB ID1U3C) and (d) photolyase (chain A PDB ID1IQR)

Here we developed a structure comparison method thatuses both sequence and structure information to predictNAD-FAD-binding residues within proteins This approachalso provides valuable information concerning the microen-vironment of the protein-ligand interactionThe compositionof NAD-FAD-binding residues that we identified here isgenerally similar to previous studies [10 11] Interestinglyglycine was the most frequent binding residue bindingto NAD through phosphate or adenosine moieties moreoften than through the nicotinamide moiety In contrastarginine preferentially interacted with phosphate moieties

and aspartic acid preferentially interacted with adenosinemoieties of NAD whereas threonine cysteine and histidinebound to nicotinamide The most common residue withinFAD-binding sites was also glycine which preferentiallybound phosphate and adenosine moieties Serine interactedwith phosphate moieties whereas cysteine tyrosine andtryptophan primarily bound to nicotinamide By takingadvantage of this kind of structural information detailsconcerning these critical binding sites may be revealed Toinvestigate the influence of amino acids on prediction per-formance the sensitivity and specificity associated with each

8 BioMed Research International

(a)

(b) (c) (d)

Figure 6 Identification of FAD-binding sites (a) Chain H of D-amino acid oxidase (PDB ID1DDO) was the query protein Templates wereconstructed from (b) putidaredoxin reductase (chain B PDB ID1Q1R) (c) D-amino acid oxidase (chain A PDB ID1C0I) and (d) glycineoxidase (chain B PDB ID1NG3)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

Figure 7 Sensitivity and specificity associated with each amino acid in NAD-FAD-binding-site predictions (a) NAD (b) FAD

residue were calculated (Figure 7) For NAD-binding-sitepredictions specificity for each residue was excellent (0927ndash0966) but sensitivity was relatively low for phenylalaninetryptophan arginine and glutaminewhichwere less than 05For FAD-binding sites all residues achieved high specificity(0933ndash0971) and sensitivity (0532ndash0791) It should be noted

that the ratio of NAD-FAD-binding residues to nonbindingresidues is about 1 to 16 in our dataset This large differencemight cause lots of false positives when predicted That is thereason for high specificity and accuracy but low sensitivity inour prediction results Hence the positions of false positiveresidues in sequence were also investigated 20 and 25 of

BioMed Research International 9

false positive residues of NAD- and FAD-binding predictionoccurred next to the true positive residues in sequence Itwas shown that these residues are also located near theligand in the coordinate space If these residues were treatedas true positive residues our prediction results of NAD-binding yielded 7155 sensitivity and 061 MCC at a 5FPR threshold Under the same conditions FAD-binding-site predictions yielded 7334 sensitivity and an MCC of064 Compared with other prediction methods ours did notuse protein evolutionary information but only used proteinstructure and did not need to use equal number dataset fortraining but predicted whole-proteins through comparingstructures of template database Our results yielded excellentprediction performancewhen analyzingNAD-FAD-bindingresidues and thus provide important details concerning thebinding-site microenvironment This approach thereforemay be used to predict putative NAD-FAD-binding proteinsand the specific residues involved in the interaction

4 Methods

41 Overview We extracted structures of proteins boundto NAD or FAD from PDB and constructed a database ofNAD-FAD-binding residue templates Residues that weredefined as binding residues by the ligand-binding databaseBioLiP [40] were included in the template Query proteinstructures were then compared with each template in thedatabase using a ldquoleave-one-outrdquo comparison method Thefragment transformation method [22] was used to alignquery and template structures After comparing the localprotein structure each residue was assigned a score basedon both protein sequence and structure Sequence similaritywas calculated using the BLOSUM62 substitutionmatrix [41]whereas structural similarity was calculated by measuringthe root mean square deviation (RMSD) of the C120572 carbonsfrom local structure alignments and using a secondarystructure substitutionmatrix [22] according to theDictionaryof Secondary Structure of Proteinsrsquo (DSSP) definition ofsecondary structure [42] Residues with an alignment scorethat exceeded a predetermined threshold were predicted tobind NADFADThis method is illustrated in Figure 8

42 NAD-FAD-Binding Proteins and Binding Residue Tem-plates We adopted the same datasets with previous research[10 11] All protein complexes were collected from PDBand had pairwise sequence identity lt40 by using CD-HITProteins chains that are not involved in NADFAD bindingwere excluded Residues that were defined as binding ornonbinding residues by using the ligand-binding databaseBioLiP The main dataset included 184 and 165 polypeptidechains for NAD and FAD respectively Because NAD iscomposed of a nicotinamide moiety an adenosine moietyand a phosphate moiety binding residues were dividedinto three groups nicotinamide binding adenosine bindingand phosphate binding FAD-binding sitessimilarly contain

flavin-binding residues adenosine-binding residues andphosphate-binding residues Groups of residues that con-tained more than or equal to two binding residues wereconsidered a binding residue template (see Figures 9 and 10)

43The Fragment TransformationMethod Weused the frag-ment transformation method to align NAD-FAD-bindingresidues Each residue was treated as an individual unit andwas used to align the query protein 119878 with the bindingtemplate 119879 The structural unit consists of a triplet formedby the NndashC

120572ndashC atoms within a given residue 119878 denotes the

query protein of length 119898 and 119879 denotes the template of 119899residues The query protein 119878 of length119898 and the template 119879of 119899 residues can therefore be expressed in terms of tripletsas 119878 = 120590

1 120590

2 120590

119898 and 119879 = 120591

1 120591

2 120591

119899 where 120590

119894=

(119901

119873 119901

119862120572 119901

119862) 120591119895= (119902

119873 119902

119862120572 119902

119862) and 119901 and 119902 are PDB

coordinates for each atomA matrix of dimensions 119898 times 119899 was then constructed for

the residues of 119878 and 119879 as

119872 =

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

119872

11119872

12sdot sdot sdot 119872

1119899

119872

21119872

22sdot sdot sdot 119872

2119899

sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

119872

1198981119872

1198982sdot sdot sdot 119872

119898119899

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

(1)

where the element119872119894119895is a rigid-body transformation matrix

that transforms the triplet 120590119894to 120591119895(ie119872

119894119895120590

119894= 120591

119895)

44 Performing Triplet Clustering 119863119894119895119896119897is the Cartesian dis-

tance between the target 120591119897and the transformed triplet119872

119894119895120590

119896

providing a measure of how similarly the triplet pairs (120590119894 120591

119895)

and (120590119896 120591

119897) are oriented This allows clustering of triplet

fragments using the single-linkage algorithm [43] as followsIf for two triplet pairs (120590

119894 120591

119895) and (120590

119896 120591

119897) 119863119894119895119896119897lt 119863

0 119894 = 119896

and 119895 = 119897 then the triplets are clustered Let 1198661and 119866

2

be two clusters with the first containing (120590119894 120591

119895) and (120590

119896 120591

119897)

and the second containing (1205901198941015840 1205911198951015840) and (120590

1198961015840 1205911198971015840) If 119863119894119895

11989610158401198971015840lt

119863

0 then 119866

1and 119866

2are merged to form a new cluster 119866

3

where 1198663= 119866

1cup 119866

2 These procedures are performed

iteratively until no new clusters can be formed For each finalcluster 119866

120583 we can obtain the transformation matrix119872120583

119896119897and

aligned substructure pair 119878120583= ⋃

120590119896isin119866120583

120590

119896and 119879

120583= ⋃

120591119897isin119866120583

120591

119897

where 119866120583

has the minimum Cartesian distance whenusing119872120583

119896119897

45 Scoring Function For each residue 119894 the binding score119862119894

is defined as

119862

119894= MAX120590119894isin119866120583

(120576

120583times 119862

119877

120583times 119862

119861

120583times 119862

119863

120583) (2)

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

4 BioMed Research International

0

01

02

03

04

05

06

N O SType of atom

Freq

uenc

y

(a)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(b)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(c)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(d)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(e)

0

01

02

03

04

05

06

N OType of atom

S

Freq

uenc

y

(f)

Figure 2 Atom-type frequencies within NAD-FAD-binding sites Frequencies for both backbone (black) and side-chain (white) atoms areshown (a) Adenosine-binding of NAD (b) Phosphate-binding of NAD (c) Nicotinamide-binding of NAD (d) Adenosine-binding of FAD(e) Phosphate-binding of FAD (f) Flavin-binding of FAD The preferred types of atoms surrounding the different moiety of NADFAD areshown

BioMed Research International 5

(a)

(b) (c) (d)

Figure 3 Identification of NAD-binding sites (a) Chain A of D-2-hydroxyisocaproate dehydrogenase (PDB ID1DXY) was the query proteinTemplates were constructed from (b) D-Lactate dehydrogenase (chain A PDB ID3KB6) (c) phosphoglycerate dehydrogenase (chain A PDBID1YBA) and (d) C-terminal-binding proteinbrefeldin A-ADP ribosylated substrate (chain A PDB ID1HKU)

Table 4 Comparison between the fragment transformation and SVMmethods for predicting FAD-binding-site residues

Accuracy () Sensitivity () Specificity () MCCRandom 1 8738 8568 8908 075Random 2 8748 8573 8923 075Random 3 8735 8555 8915 075Random 4 8758 8573 8943 075Random 5 8744 8573 8915 075Average 8745 8568 8921 075SVM [11] 8286 8336 8236 066

PyMOL [23] and color coded light gray for the query proteinblue lines for the ligand hot pink orange and forest sticks foradenosine- phosphate- and nicotinamide-flavin-bindingresidues that are predicted correctly and dark gray sticks fornonbinding residues that are predicted to be binding residuesOur method accurately identified 21 NAD-binding residueswithin chain A of D-2-hydroxyisocaproate dehydrogenase(PDB ID1DXY) [24 25] with ten false positives (Figure 3)Nine nicotinamide-binding residues were identified based onD-Lactate dehydrogenase (chain A PDB ID3KB6) [26 27]

three phosphate-binding residues were identified based onphosphoglycerate dehydrogenase (chain A PDB ID1YBA)[28] five adenosine-binding residues were identified basedon C-terminal-binding proteinbrefeldin A-ADP ribosylatedsubstrate (chain A PDB ID1HKU) [29] and four wereidentified based on other protein templates Ourmethod alsoaccurately predicted 23 NAD-binding residues within chainC of 5-carboxymethyl-2-hydroxymuconate semialdehydedehydrogenase (PDB ID2D4E) with only eight false pos-itives (Figure 4) Nine nicotinamide-binding residues were

6 BioMed Research International

(a)

(b) (c)

Figure 4 Identification of NAD-binding sites (a) Chain C of 5-carboxymethyl-2-hydroxymuconate semialdehyde dehydrogenase (PDBID2D4E) was the query protein Templates were constructed from (b) aldehyde dehydrogenase (chain A PDB ID3B4W) and (c) 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU)

identified based on aldehyde dehydrogenase (chain A PDBID3B4W) three phosphate-binding and eight adenosine-binding residues were identified based on 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU) andthree were identified based on other protein templates

For the FAD-binding proteins our method accuratelypredicted chain A of deoxyribodipyrimidine photolyase(PDB ID1OWL) [30] which contains 24 residues that bindFAD (Figure 5) and only six false positives occurred Threeadenosine-binding residues were identified based on humancryptochrome DASH (chain X PDB ID2IJG) [31 32]six phosphate-binding residues were identified based onphotolyase-like domain of cryptochrome 1 (chain A PDBID1U3C) [33] eleven flavin-binding residues were identifiedbased on photolyase (chain A PDB ID1IQR) [34] and fourwere identified based on other protein templates In addition30 FAD-binding residues were accurately predicted withinchain H of D-amino acid oxidase (PDB ID1DDO) [35]with 14 false positives Five adenosine-binding residues werepredicted based on putidaredoxin reductase (chain B PDB

ID1Q1R) [36 37] three adenosine-binding and nine flavin-binding residues based on D-amino acid oxidase (chain APDB ID1C0I) [38] three phosphate-binding and five flavin-binding residues based on glycine oxidase (chain B PDBID1NG3) [39] and five based on other protein templates(Figure 6)

3 Discussion

Small molecular cofactors (ligands) are essential for cells toperform numerous biological functions NAD and FAD forexample bind to proteins that play critical roles in energytransfer energy storage and signal transduction to name justa few To understand the mechanism by which these ligandsaffect protein function it is important to identify ligand-binding residues within relevant proteins The experimentalidentification of these interacting residues is so difficulthowever that computational methods to accomplish this taskare in high demand

BioMed Research International 7

(a)

(b) (c) (d)

Figure 5 Identification of FAD-binding sites (a) Chain A of deoxyribodipyrimidine photolyase (PDB ID1OWL) was the query proteinTemplates were constructed from (b) human cryptochrome DASH (chain X PDB ID2IJG) (c) photolyase-like domain of cryptochrome 1(chain A PDB ID1U3C) and (d) photolyase (chain A PDB ID1IQR)

Here we developed a structure comparison method thatuses both sequence and structure information to predictNAD-FAD-binding residues within proteins This approachalso provides valuable information concerning the microen-vironment of the protein-ligand interactionThe compositionof NAD-FAD-binding residues that we identified here isgenerally similar to previous studies [10 11] Interestinglyglycine was the most frequent binding residue bindingto NAD through phosphate or adenosine moieties moreoften than through the nicotinamide moiety In contrastarginine preferentially interacted with phosphate moieties

and aspartic acid preferentially interacted with adenosinemoieties of NAD whereas threonine cysteine and histidinebound to nicotinamide The most common residue withinFAD-binding sites was also glycine which preferentiallybound phosphate and adenosine moieties Serine interactedwith phosphate moieties whereas cysteine tyrosine andtryptophan primarily bound to nicotinamide By takingadvantage of this kind of structural information detailsconcerning these critical binding sites may be revealed Toinvestigate the influence of amino acids on prediction per-formance the sensitivity and specificity associated with each

8 BioMed Research International

(a)

(b) (c) (d)

Figure 6 Identification of FAD-binding sites (a) Chain H of D-amino acid oxidase (PDB ID1DDO) was the query protein Templates wereconstructed from (b) putidaredoxin reductase (chain B PDB ID1Q1R) (c) D-amino acid oxidase (chain A PDB ID1C0I) and (d) glycineoxidase (chain B PDB ID1NG3)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

Figure 7 Sensitivity and specificity associated with each amino acid in NAD-FAD-binding-site predictions (a) NAD (b) FAD

residue were calculated (Figure 7) For NAD-binding-sitepredictions specificity for each residue was excellent (0927ndash0966) but sensitivity was relatively low for phenylalaninetryptophan arginine and glutaminewhichwere less than 05For FAD-binding sites all residues achieved high specificity(0933ndash0971) and sensitivity (0532ndash0791) It should be noted

that the ratio of NAD-FAD-binding residues to nonbindingresidues is about 1 to 16 in our dataset This large differencemight cause lots of false positives when predicted That is thereason for high specificity and accuracy but low sensitivity inour prediction results Hence the positions of false positiveresidues in sequence were also investigated 20 and 25 of

BioMed Research International 9

false positive residues of NAD- and FAD-binding predictionoccurred next to the true positive residues in sequence Itwas shown that these residues are also located near theligand in the coordinate space If these residues were treatedas true positive residues our prediction results of NAD-binding yielded 7155 sensitivity and 061 MCC at a 5FPR threshold Under the same conditions FAD-binding-site predictions yielded 7334 sensitivity and an MCC of064 Compared with other prediction methods ours did notuse protein evolutionary information but only used proteinstructure and did not need to use equal number dataset fortraining but predicted whole-proteins through comparingstructures of template database Our results yielded excellentprediction performancewhen analyzingNAD-FAD-bindingresidues and thus provide important details concerning thebinding-site microenvironment This approach thereforemay be used to predict putative NAD-FAD-binding proteinsand the specific residues involved in the interaction

4 Methods

41 Overview We extracted structures of proteins boundto NAD or FAD from PDB and constructed a database ofNAD-FAD-binding residue templates Residues that weredefined as binding residues by the ligand-binding databaseBioLiP [40] were included in the template Query proteinstructures were then compared with each template in thedatabase using a ldquoleave-one-outrdquo comparison method Thefragment transformation method [22] was used to alignquery and template structures After comparing the localprotein structure each residue was assigned a score basedon both protein sequence and structure Sequence similaritywas calculated using the BLOSUM62 substitutionmatrix [41]whereas structural similarity was calculated by measuringthe root mean square deviation (RMSD) of the C120572 carbonsfrom local structure alignments and using a secondarystructure substitutionmatrix [22] according to theDictionaryof Secondary Structure of Proteinsrsquo (DSSP) definition ofsecondary structure [42] Residues with an alignment scorethat exceeded a predetermined threshold were predicted tobind NADFADThis method is illustrated in Figure 8

42 NAD-FAD-Binding Proteins and Binding Residue Tem-plates We adopted the same datasets with previous research[10 11] All protein complexes were collected from PDBand had pairwise sequence identity lt40 by using CD-HITProteins chains that are not involved in NADFAD bindingwere excluded Residues that were defined as binding ornonbinding residues by using the ligand-binding databaseBioLiP The main dataset included 184 and 165 polypeptidechains for NAD and FAD respectively Because NAD iscomposed of a nicotinamide moiety an adenosine moietyand a phosphate moiety binding residues were dividedinto three groups nicotinamide binding adenosine bindingand phosphate binding FAD-binding sitessimilarly contain

flavin-binding residues adenosine-binding residues andphosphate-binding residues Groups of residues that con-tained more than or equal to two binding residues wereconsidered a binding residue template (see Figures 9 and 10)

43The Fragment TransformationMethod Weused the frag-ment transformation method to align NAD-FAD-bindingresidues Each residue was treated as an individual unit andwas used to align the query protein 119878 with the bindingtemplate 119879 The structural unit consists of a triplet formedby the NndashC

120572ndashC atoms within a given residue 119878 denotes the

query protein of length 119898 and 119879 denotes the template of 119899residues The query protein 119878 of length119898 and the template 119879of 119899 residues can therefore be expressed in terms of tripletsas 119878 = 120590

1 120590

2 120590

119898 and 119879 = 120591

1 120591

2 120591

119899 where 120590

119894=

(119901

119873 119901

119862120572 119901

119862) 120591119895= (119902

119873 119902

119862120572 119902

119862) and 119901 and 119902 are PDB

coordinates for each atomA matrix of dimensions 119898 times 119899 was then constructed for

the residues of 119878 and 119879 as

119872 =

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

119872

11119872

12sdot sdot sdot 119872

1119899

119872

21119872

22sdot sdot sdot 119872

2119899

sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

119872

1198981119872

1198982sdot sdot sdot 119872

119898119899

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

(1)

where the element119872119894119895is a rigid-body transformation matrix

that transforms the triplet 120590119894to 120591119895(ie119872

119894119895120590

119894= 120591

119895)

44 Performing Triplet Clustering 119863119894119895119896119897is the Cartesian dis-

tance between the target 120591119897and the transformed triplet119872

119894119895120590

119896

providing a measure of how similarly the triplet pairs (120590119894 120591

119895)

and (120590119896 120591

119897) are oriented This allows clustering of triplet

fragments using the single-linkage algorithm [43] as followsIf for two triplet pairs (120590

119894 120591

119895) and (120590

119896 120591

119897) 119863119894119895119896119897lt 119863

0 119894 = 119896

and 119895 = 119897 then the triplets are clustered Let 1198661and 119866

2

be two clusters with the first containing (120590119894 120591

119895) and (120590

119896 120591

119897)

and the second containing (1205901198941015840 1205911198951015840) and (120590

1198961015840 1205911198971015840) If 119863119894119895

11989610158401198971015840lt

119863

0 then 119866

1and 119866

2are merged to form a new cluster 119866

3

where 1198663= 119866

1cup 119866

2 These procedures are performed

iteratively until no new clusters can be formed For each finalcluster 119866

120583 we can obtain the transformation matrix119872120583

119896119897and

aligned substructure pair 119878120583= ⋃

120590119896isin119866120583

120590

119896and 119879

120583= ⋃

120591119897isin119866120583

120591

119897

where 119866120583

has the minimum Cartesian distance whenusing119872120583

119896119897

45 Scoring Function For each residue 119894 the binding score119862119894

is defined as

119862

119894= MAX120590119894isin119866120583

(120576

120583times 119862

119877

120583times 119862

119861

120583times 119862

119863

120583) (2)

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

BioMed Research International 5

(a)

(b) (c) (d)

Figure 3 Identification of NAD-binding sites (a) Chain A of D-2-hydroxyisocaproate dehydrogenase (PDB ID1DXY) was the query proteinTemplates were constructed from (b) D-Lactate dehydrogenase (chain A PDB ID3KB6) (c) phosphoglycerate dehydrogenase (chain A PDBID1YBA) and (d) C-terminal-binding proteinbrefeldin A-ADP ribosylated substrate (chain A PDB ID1HKU)

Table 4 Comparison between the fragment transformation and SVMmethods for predicting FAD-binding-site residues

Accuracy () Sensitivity () Specificity () MCCRandom 1 8738 8568 8908 075Random 2 8748 8573 8923 075Random 3 8735 8555 8915 075Random 4 8758 8573 8943 075Random 5 8744 8573 8915 075Average 8745 8568 8921 075SVM [11] 8286 8336 8236 066

PyMOL [23] and color coded light gray for the query proteinblue lines for the ligand hot pink orange and forest sticks foradenosine- phosphate- and nicotinamide-flavin-bindingresidues that are predicted correctly and dark gray sticks fornonbinding residues that are predicted to be binding residuesOur method accurately identified 21 NAD-binding residueswithin chain A of D-2-hydroxyisocaproate dehydrogenase(PDB ID1DXY) [24 25] with ten false positives (Figure 3)Nine nicotinamide-binding residues were identified based onD-Lactate dehydrogenase (chain A PDB ID3KB6) [26 27]

three phosphate-binding residues were identified based onphosphoglycerate dehydrogenase (chain A PDB ID1YBA)[28] five adenosine-binding residues were identified basedon C-terminal-binding proteinbrefeldin A-ADP ribosylatedsubstrate (chain A PDB ID1HKU) [29] and four wereidentified based on other protein templates Ourmethod alsoaccurately predicted 23 NAD-binding residues within chainC of 5-carboxymethyl-2-hydroxymuconate semialdehydedehydrogenase (PDB ID2D4E) with only eight false pos-itives (Figure 4) Nine nicotinamide-binding residues were

6 BioMed Research International

(a)

(b) (c)

Figure 4 Identification of NAD-binding sites (a) Chain C of 5-carboxymethyl-2-hydroxymuconate semialdehyde dehydrogenase (PDBID2D4E) was the query protein Templates were constructed from (b) aldehyde dehydrogenase (chain A PDB ID3B4W) and (c) 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU)

identified based on aldehyde dehydrogenase (chain A PDBID3B4W) three phosphate-binding and eight adenosine-binding residues were identified based on 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU) andthree were identified based on other protein templates

For the FAD-binding proteins our method accuratelypredicted chain A of deoxyribodipyrimidine photolyase(PDB ID1OWL) [30] which contains 24 residues that bindFAD (Figure 5) and only six false positives occurred Threeadenosine-binding residues were identified based on humancryptochrome DASH (chain X PDB ID2IJG) [31 32]six phosphate-binding residues were identified based onphotolyase-like domain of cryptochrome 1 (chain A PDBID1U3C) [33] eleven flavin-binding residues were identifiedbased on photolyase (chain A PDB ID1IQR) [34] and fourwere identified based on other protein templates In addition30 FAD-binding residues were accurately predicted withinchain H of D-amino acid oxidase (PDB ID1DDO) [35]with 14 false positives Five adenosine-binding residues werepredicted based on putidaredoxin reductase (chain B PDB

ID1Q1R) [36 37] three adenosine-binding and nine flavin-binding residues based on D-amino acid oxidase (chain APDB ID1C0I) [38] three phosphate-binding and five flavin-binding residues based on glycine oxidase (chain B PDBID1NG3) [39] and five based on other protein templates(Figure 6)

3 Discussion

Small molecular cofactors (ligands) are essential for cells toperform numerous biological functions NAD and FAD forexample bind to proteins that play critical roles in energytransfer energy storage and signal transduction to name justa few To understand the mechanism by which these ligandsaffect protein function it is important to identify ligand-binding residues within relevant proteins The experimentalidentification of these interacting residues is so difficulthowever that computational methods to accomplish this taskare in high demand

BioMed Research International 7

(a)

(b) (c) (d)

Figure 5 Identification of FAD-binding sites (a) Chain A of deoxyribodipyrimidine photolyase (PDB ID1OWL) was the query proteinTemplates were constructed from (b) human cryptochrome DASH (chain X PDB ID2IJG) (c) photolyase-like domain of cryptochrome 1(chain A PDB ID1U3C) and (d) photolyase (chain A PDB ID1IQR)

Here we developed a structure comparison method thatuses both sequence and structure information to predictNAD-FAD-binding residues within proteins This approachalso provides valuable information concerning the microen-vironment of the protein-ligand interactionThe compositionof NAD-FAD-binding residues that we identified here isgenerally similar to previous studies [10 11] Interestinglyglycine was the most frequent binding residue bindingto NAD through phosphate or adenosine moieties moreoften than through the nicotinamide moiety In contrastarginine preferentially interacted with phosphate moieties

and aspartic acid preferentially interacted with adenosinemoieties of NAD whereas threonine cysteine and histidinebound to nicotinamide The most common residue withinFAD-binding sites was also glycine which preferentiallybound phosphate and adenosine moieties Serine interactedwith phosphate moieties whereas cysteine tyrosine andtryptophan primarily bound to nicotinamide By takingadvantage of this kind of structural information detailsconcerning these critical binding sites may be revealed Toinvestigate the influence of amino acids on prediction per-formance the sensitivity and specificity associated with each

8 BioMed Research International

(a)

(b) (c) (d)

Figure 6 Identification of FAD-binding sites (a) Chain H of D-amino acid oxidase (PDB ID1DDO) was the query protein Templates wereconstructed from (b) putidaredoxin reductase (chain B PDB ID1Q1R) (c) D-amino acid oxidase (chain A PDB ID1C0I) and (d) glycineoxidase (chain B PDB ID1NG3)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

Figure 7 Sensitivity and specificity associated with each amino acid in NAD-FAD-binding-site predictions (a) NAD (b) FAD

residue were calculated (Figure 7) For NAD-binding-sitepredictions specificity for each residue was excellent (0927ndash0966) but sensitivity was relatively low for phenylalaninetryptophan arginine and glutaminewhichwere less than 05For FAD-binding sites all residues achieved high specificity(0933ndash0971) and sensitivity (0532ndash0791) It should be noted

that the ratio of NAD-FAD-binding residues to nonbindingresidues is about 1 to 16 in our dataset This large differencemight cause lots of false positives when predicted That is thereason for high specificity and accuracy but low sensitivity inour prediction results Hence the positions of false positiveresidues in sequence were also investigated 20 and 25 of

BioMed Research International 9

false positive residues of NAD- and FAD-binding predictionoccurred next to the true positive residues in sequence Itwas shown that these residues are also located near theligand in the coordinate space If these residues were treatedas true positive residues our prediction results of NAD-binding yielded 7155 sensitivity and 061 MCC at a 5FPR threshold Under the same conditions FAD-binding-site predictions yielded 7334 sensitivity and an MCC of064 Compared with other prediction methods ours did notuse protein evolutionary information but only used proteinstructure and did not need to use equal number dataset fortraining but predicted whole-proteins through comparingstructures of template database Our results yielded excellentprediction performancewhen analyzingNAD-FAD-bindingresidues and thus provide important details concerning thebinding-site microenvironment This approach thereforemay be used to predict putative NAD-FAD-binding proteinsand the specific residues involved in the interaction

4 Methods

41 Overview We extracted structures of proteins boundto NAD or FAD from PDB and constructed a database ofNAD-FAD-binding residue templates Residues that weredefined as binding residues by the ligand-binding databaseBioLiP [40] were included in the template Query proteinstructures were then compared with each template in thedatabase using a ldquoleave-one-outrdquo comparison method Thefragment transformation method [22] was used to alignquery and template structures After comparing the localprotein structure each residue was assigned a score basedon both protein sequence and structure Sequence similaritywas calculated using the BLOSUM62 substitutionmatrix [41]whereas structural similarity was calculated by measuringthe root mean square deviation (RMSD) of the C120572 carbonsfrom local structure alignments and using a secondarystructure substitutionmatrix [22] according to theDictionaryof Secondary Structure of Proteinsrsquo (DSSP) definition ofsecondary structure [42] Residues with an alignment scorethat exceeded a predetermined threshold were predicted tobind NADFADThis method is illustrated in Figure 8

42 NAD-FAD-Binding Proteins and Binding Residue Tem-plates We adopted the same datasets with previous research[10 11] All protein complexes were collected from PDBand had pairwise sequence identity lt40 by using CD-HITProteins chains that are not involved in NADFAD bindingwere excluded Residues that were defined as binding ornonbinding residues by using the ligand-binding databaseBioLiP The main dataset included 184 and 165 polypeptidechains for NAD and FAD respectively Because NAD iscomposed of a nicotinamide moiety an adenosine moietyand a phosphate moiety binding residues were dividedinto three groups nicotinamide binding adenosine bindingand phosphate binding FAD-binding sitessimilarly contain

flavin-binding residues adenosine-binding residues andphosphate-binding residues Groups of residues that con-tained more than or equal to two binding residues wereconsidered a binding residue template (see Figures 9 and 10)

43The Fragment TransformationMethod Weused the frag-ment transformation method to align NAD-FAD-bindingresidues Each residue was treated as an individual unit andwas used to align the query protein 119878 with the bindingtemplate 119879 The structural unit consists of a triplet formedby the NndashC

120572ndashC atoms within a given residue 119878 denotes the

query protein of length 119898 and 119879 denotes the template of 119899residues The query protein 119878 of length119898 and the template 119879of 119899 residues can therefore be expressed in terms of tripletsas 119878 = 120590

1 120590

2 120590

119898 and 119879 = 120591

1 120591

2 120591

119899 where 120590

119894=

(119901

119873 119901

119862120572 119901

119862) 120591119895= (119902

119873 119902

119862120572 119902

119862) and 119901 and 119902 are PDB

coordinates for each atomA matrix of dimensions 119898 times 119899 was then constructed for

the residues of 119878 and 119879 as

119872 =

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

119872

11119872

12sdot sdot sdot 119872

1119899

119872

21119872

22sdot sdot sdot 119872

2119899

sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

119872

1198981119872

1198982sdot sdot sdot 119872

119898119899

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

(1)

where the element119872119894119895is a rigid-body transformation matrix

that transforms the triplet 120590119894to 120591119895(ie119872

119894119895120590

119894= 120591

119895)

44 Performing Triplet Clustering 119863119894119895119896119897is the Cartesian dis-

tance between the target 120591119897and the transformed triplet119872

119894119895120590

119896

providing a measure of how similarly the triplet pairs (120590119894 120591

119895)

and (120590119896 120591

119897) are oriented This allows clustering of triplet

fragments using the single-linkage algorithm [43] as followsIf for two triplet pairs (120590

119894 120591

119895) and (120590

119896 120591

119897) 119863119894119895119896119897lt 119863

0 119894 = 119896

and 119895 = 119897 then the triplets are clustered Let 1198661and 119866

2

be two clusters with the first containing (120590119894 120591

119895) and (120590

119896 120591

119897)

and the second containing (1205901198941015840 1205911198951015840) and (120590

1198961015840 1205911198971015840) If 119863119894119895

11989610158401198971015840lt

119863

0 then 119866

1and 119866

2are merged to form a new cluster 119866

3

where 1198663= 119866

1cup 119866

2 These procedures are performed

iteratively until no new clusters can be formed For each finalcluster 119866

120583 we can obtain the transformation matrix119872120583

119896119897and

aligned substructure pair 119878120583= ⋃

120590119896isin119866120583

120590

119896and 119879

120583= ⋃

120591119897isin119866120583

120591

119897

where 119866120583

has the minimum Cartesian distance whenusing119872120583

119896119897

45 Scoring Function For each residue 119894 the binding score119862119894

is defined as

119862

119894= MAX120590119894isin119866120583

(120576

120583times 119862

119877

120583times 119862

119861

120583times 119862

119863

120583) (2)

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

6 BioMed Research International

(a)

(b) (c)

Figure 4 Identification of NAD-binding sites (a) Chain C of 5-carboxymethyl-2-hydroxymuconate semialdehyde dehydrogenase (PDBID2D4E) was the query protein Templates were constructed from (b) aldehyde dehydrogenase (chain A PDB ID3B4W) and (c) 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU)

identified based on aldehyde dehydrogenase (chain A PDBID3B4W) three phosphate-binding and eight adenosine-binding residues were identified based on 1-pyrroline-5-carboxylate dehydrogenase (chain A PDB ID2EHU) andthree were identified based on other protein templates

For the FAD-binding proteins our method accuratelypredicted chain A of deoxyribodipyrimidine photolyase(PDB ID1OWL) [30] which contains 24 residues that bindFAD (Figure 5) and only six false positives occurred Threeadenosine-binding residues were identified based on humancryptochrome DASH (chain X PDB ID2IJG) [31 32]six phosphate-binding residues were identified based onphotolyase-like domain of cryptochrome 1 (chain A PDBID1U3C) [33] eleven flavin-binding residues were identifiedbased on photolyase (chain A PDB ID1IQR) [34] and fourwere identified based on other protein templates In addition30 FAD-binding residues were accurately predicted withinchain H of D-amino acid oxidase (PDB ID1DDO) [35]with 14 false positives Five adenosine-binding residues werepredicted based on putidaredoxin reductase (chain B PDB

ID1Q1R) [36 37] three adenosine-binding and nine flavin-binding residues based on D-amino acid oxidase (chain APDB ID1C0I) [38] three phosphate-binding and five flavin-binding residues based on glycine oxidase (chain B PDBID1NG3) [39] and five based on other protein templates(Figure 6)

3 Discussion

Small molecular cofactors (ligands) are essential for cells toperform numerous biological functions NAD and FAD forexample bind to proteins that play critical roles in energytransfer energy storage and signal transduction to name justa few To understand the mechanism by which these ligandsaffect protein function it is important to identify ligand-binding residues within relevant proteins The experimentalidentification of these interacting residues is so difficulthowever that computational methods to accomplish this taskare in high demand

BioMed Research International 7

(a)

(b) (c) (d)

Figure 5 Identification of FAD-binding sites (a) Chain A of deoxyribodipyrimidine photolyase (PDB ID1OWL) was the query proteinTemplates were constructed from (b) human cryptochrome DASH (chain X PDB ID2IJG) (c) photolyase-like domain of cryptochrome 1(chain A PDB ID1U3C) and (d) photolyase (chain A PDB ID1IQR)

Here we developed a structure comparison method thatuses both sequence and structure information to predictNAD-FAD-binding residues within proteins This approachalso provides valuable information concerning the microen-vironment of the protein-ligand interactionThe compositionof NAD-FAD-binding residues that we identified here isgenerally similar to previous studies [10 11] Interestinglyglycine was the most frequent binding residue bindingto NAD through phosphate or adenosine moieties moreoften than through the nicotinamide moiety In contrastarginine preferentially interacted with phosphate moieties

and aspartic acid preferentially interacted with adenosinemoieties of NAD whereas threonine cysteine and histidinebound to nicotinamide The most common residue withinFAD-binding sites was also glycine which preferentiallybound phosphate and adenosine moieties Serine interactedwith phosphate moieties whereas cysteine tyrosine andtryptophan primarily bound to nicotinamide By takingadvantage of this kind of structural information detailsconcerning these critical binding sites may be revealed Toinvestigate the influence of amino acids on prediction per-formance the sensitivity and specificity associated with each

8 BioMed Research International

(a)

(b) (c) (d)

Figure 6 Identification of FAD-binding sites (a) Chain H of D-amino acid oxidase (PDB ID1DDO) was the query protein Templates wereconstructed from (b) putidaredoxin reductase (chain B PDB ID1Q1R) (c) D-amino acid oxidase (chain A PDB ID1C0I) and (d) glycineoxidase (chain B PDB ID1NG3)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

Figure 7 Sensitivity and specificity associated with each amino acid in NAD-FAD-binding-site predictions (a) NAD (b) FAD

residue were calculated (Figure 7) For NAD-binding-sitepredictions specificity for each residue was excellent (0927ndash0966) but sensitivity was relatively low for phenylalaninetryptophan arginine and glutaminewhichwere less than 05For FAD-binding sites all residues achieved high specificity(0933ndash0971) and sensitivity (0532ndash0791) It should be noted

that the ratio of NAD-FAD-binding residues to nonbindingresidues is about 1 to 16 in our dataset This large differencemight cause lots of false positives when predicted That is thereason for high specificity and accuracy but low sensitivity inour prediction results Hence the positions of false positiveresidues in sequence were also investigated 20 and 25 of

BioMed Research International 9

false positive residues of NAD- and FAD-binding predictionoccurred next to the true positive residues in sequence Itwas shown that these residues are also located near theligand in the coordinate space If these residues were treatedas true positive residues our prediction results of NAD-binding yielded 7155 sensitivity and 061 MCC at a 5FPR threshold Under the same conditions FAD-binding-site predictions yielded 7334 sensitivity and an MCC of064 Compared with other prediction methods ours did notuse protein evolutionary information but only used proteinstructure and did not need to use equal number dataset fortraining but predicted whole-proteins through comparingstructures of template database Our results yielded excellentprediction performancewhen analyzingNAD-FAD-bindingresidues and thus provide important details concerning thebinding-site microenvironment This approach thereforemay be used to predict putative NAD-FAD-binding proteinsand the specific residues involved in the interaction

4 Methods

41 Overview We extracted structures of proteins boundto NAD or FAD from PDB and constructed a database ofNAD-FAD-binding residue templates Residues that weredefined as binding residues by the ligand-binding databaseBioLiP [40] were included in the template Query proteinstructures were then compared with each template in thedatabase using a ldquoleave-one-outrdquo comparison method Thefragment transformation method [22] was used to alignquery and template structures After comparing the localprotein structure each residue was assigned a score basedon both protein sequence and structure Sequence similaritywas calculated using the BLOSUM62 substitutionmatrix [41]whereas structural similarity was calculated by measuringthe root mean square deviation (RMSD) of the C120572 carbonsfrom local structure alignments and using a secondarystructure substitutionmatrix [22] according to theDictionaryof Secondary Structure of Proteinsrsquo (DSSP) definition ofsecondary structure [42] Residues with an alignment scorethat exceeded a predetermined threshold were predicted tobind NADFADThis method is illustrated in Figure 8

42 NAD-FAD-Binding Proteins and Binding Residue Tem-plates We adopted the same datasets with previous research[10 11] All protein complexes were collected from PDBand had pairwise sequence identity lt40 by using CD-HITProteins chains that are not involved in NADFAD bindingwere excluded Residues that were defined as binding ornonbinding residues by using the ligand-binding databaseBioLiP The main dataset included 184 and 165 polypeptidechains for NAD and FAD respectively Because NAD iscomposed of a nicotinamide moiety an adenosine moietyand a phosphate moiety binding residues were dividedinto three groups nicotinamide binding adenosine bindingand phosphate binding FAD-binding sitessimilarly contain

flavin-binding residues adenosine-binding residues andphosphate-binding residues Groups of residues that con-tained more than or equal to two binding residues wereconsidered a binding residue template (see Figures 9 and 10)

43The Fragment TransformationMethod Weused the frag-ment transformation method to align NAD-FAD-bindingresidues Each residue was treated as an individual unit andwas used to align the query protein 119878 with the bindingtemplate 119879 The structural unit consists of a triplet formedby the NndashC

120572ndashC atoms within a given residue 119878 denotes the

query protein of length 119898 and 119879 denotes the template of 119899residues The query protein 119878 of length119898 and the template 119879of 119899 residues can therefore be expressed in terms of tripletsas 119878 = 120590

1 120590

2 120590

119898 and 119879 = 120591

1 120591

2 120591

119899 where 120590

119894=

(119901

119873 119901

119862120572 119901

119862) 120591119895= (119902

119873 119902

119862120572 119902

119862) and 119901 and 119902 are PDB

coordinates for each atomA matrix of dimensions 119898 times 119899 was then constructed for

the residues of 119878 and 119879 as

119872 =

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

119872

11119872

12sdot sdot sdot 119872

1119899

119872

21119872

22sdot sdot sdot 119872

2119899

sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

119872

1198981119872

1198982sdot sdot sdot 119872

119898119899

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

(1)

where the element119872119894119895is a rigid-body transformation matrix

that transforms the triplet 120590119894to 120591119895(ie119872

119894119895120590

119894= 120591

119895)

44 Performing Triplet Clustering 119863119894119895119896119897is the Cartesian dis-

tance between the target 120591119897and the transformed triplet119872

119894119895120590

119896

providing a measure of how similarly the triplet pairs (120590119894 120591

119895)

and (120590119896 120591

119897) are oriented This allows clustering of triplet

fragments using the single-linkage algorithm [43] as followsIf for two triplet pairs (120590

119894 120591

119895) and (120590

119896 120591

119897) 119863119894119895119896119897lt 119863

0 119894 = 119896

and 119895 = 119897 then the triplets are clustered Let 1198661and 119866

2

be two clusters with the first containing (120590119894 120591

119895) and (120590

119896 120591

119897)

and the second containing (1205901198941015840 1205911198951015840) and (120590

1198961015840 1205911198971015840) If 119863119894119895

11989610158401198971015840lt

119863

0 then 119866

1and 119866

2are merged to form a new cluster 119866

3

where 1198663= 119866

1cup 119866

2 These procedures are performed

iteratively until no new clusters can be formed For each finalcluster 119866

120583 we can obtain the transformation matrix119872120583

119896119897and

aligned substructure pair 119878120583= ⋃

120590119896isin119866120583

120590

119896and 119879

120583= ⋃

120591119897isin119866120583

120591

119897

where 119866120583

has the minimum Cartesian distance whenusing119872120583

119896119897

45 Scoring Function For each residue 119894 the binding score119862119894

is defined as

119862

119894= MAX120590119894isin119866120583

(120576

120583times 119862

119877

120583times 119862

119861

120583times 119862

119863

120583) (2)

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

BioMed Research International 7

(a)

(b) (c) (d)

Figure 5 Identification of FAD-binding sites (a) Chain A of deoxyribodipyrimidine photolyase (PDB ID1OWL) was the query proteinTemplates were constructed from (b) human cryptochrome DASH (chain X PDB ID2IJG) (c) photolyase-like domain of cryptochrome 1(chain A PDB ID1U3C) and (d) photolyase (chain A PDB ID1IQR)

Here we developed a structure comparison method thatuses both sequence and structure information to predictNAD-FAD-binding residues within proteins This approachalso provides valuable information concerning the microen-vironment of the protein-ligand interactionThe compositionof NAD-FAD-binding residues that we identified here isgenerally similar to previous studies [10 11] Interestinglyglycine was the most frequent binding residue bindingto NAD through phosphate or adenosine moieties moreoften than through the nicotinamide moiety In contrastarginine preferentially interacted with phosphate moieties

and aspartic acid preferentially interacted with adenosinemoieties of NAD whereas threonine cysteine and histidinebound to nicotinamide The most common residue withinFAD-binding sites was also glycine which preferentiallybound phosphate and adenosine moieties Serine interactedwith phosphate moieties whereas cysteine tyrosine andtryptophan primarily bound to nicotinamide By takingadvantage of this kind of structural information detailsconcerning these critical binding sites may be revealed Toinvestigate the influence of amino acids on prediction per-formance the sensitivity and specificity associated with each

8 BioMed Research International

(a)

(b) (c) (d)

Figure 6 Identification of FAD-binding sites (a) Chain H of D-amino acid oxidase (PDB ID1DDO) was the query protein Templates wereconstructed from (b) putidaredoxin reductase (chain B PDB ID1Q1R) (c) D-amino acid oxidase (chain A PDB ID1C0I) and (d) glycineoxidase (chain B PDB ID1NG3)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

Figure 7 Sensitivity and specificity associated with each amino acid in NAD-FAD-binding-site predictions (a) NAD (b) FAD

residue were calculated (Figure 7) For NAD-binding-sitepredictions specificity for each residue was excellent (0927ndash0966) but sensitivity was relatively low for phenylalaninetryptophan arginine and glutaminewhichwere less than 05For FAD-binding sites all residues achieved high specificity(0933ndash0971) and sensitivity (0532ndash0791) It should be noted

that the ratio of NAD-FAD-binding residues to nonbindingresidues is about 1 to 16 in our dataset This large differencemight cause lots of false positives when predicted That is thereason for high specificity and accuracy but low sensitivity inour prediction results Hence the positions of false positiveresidues in sequence were also investigated 20 and 25 of

BioMed Research International 9

false positive residues of NAD- and FAD-binding predictionoccurred next to the true positive residues in sequence Itwas shown that these residues are also located near theligand in the coordinate space If these residues were treatedas true positive residues our prediction results of NAD-binding yielded 7155 sensitivity and 061 MCC at a 5FPR threshold Under the same conditions FAD-binding-site predictions yielded 7334 sensitivity and an MCC of064 Compared with other prediction methods ours did notuse protein evolutionary information but only used proteinstructure and did not need to use equal number dataset fortraining but predicted whole-proteins through comparingstructures of template database Our results yielded excellentprediction performancewhen analyzingNAD-FAD-bindingresidues and thus provide important details concerning thebinding-site microenvironment This approach thereforemay be used to predict putative NAD-FAD-binding proteinsand the specific residues involved in the interaction

4 Methods

41 Overview We extracted structures of proteins boundto NAD or FAD from PDB and constructed a database ofNAD-FAD-binding residue templates Residues that weredefined as binding residues by the ligand-binding databaseBioLiP [40] were included in the template Query proteinstructures were then compared with each template in thedatabase using a ldquoleave-one-outrdquo comparison method Thefragment transformation method [22] was used to alignquery and template structures After comparing the localprotein structure each residue was assigned a score basedon both protein sequence and structure Sequence similaritywas calculated using the BLOSUM62 substitutionmatrix [41]whereas structural similarity was calculated by measuringthe root mean square deviation (RMSD) of the C120572 carbonsfrom local structure alignments and using a secondarystructure substitutionmatrix [22] according to theDictionaryof Secondary Structure of Proteinsrsquo (DSSP) definition ofsecondary structure [42] Residues with an alignment scorethat exceeded a predetermined threshold were predicted tobind NADFADThis method is illustrated in Figure 8

42 NAD-FAD-Binding Proteins and Binding Residue Tem-plates We adopted the same datasets with previous research[10 11] All protein complexes were collected from PDBand had pairwise sequence identity lt40 by using CD-HITProteins chains that are not involved in NADFAD bindingwere excluded Residues that were defined as binding ornonbinding residues by using the ligand-binding databaseBioLiP The main dataset included 184 and 165 polypeptidechains for NAD and FAD respectively Because NAD iscomposed of a nicotinamide moiety an adenosine moietyand a phosphate moiety binding residues were dividedinto three groups nicotinamide binding adenosine bindingand phosphate binding FAD-binding sitessimilarly contain

flavin-binding residues adenosine-binding residues andphosphate-binding residues Groups of residues that con-tained more than or equal to two binding residues wereconsidered a binding residue template (see Figures 9 and 10)

43The Fragment TransformationMethod Weused the frag-ment transformation method to align NAD-FAD-bindingresidues Each residue was treated as an individual unit andwas used to align the query protein 119878 with the bindingtemplate 119879 The structural unit consists of a triplet formedby the NndashC

120572ndashC atoms within a given residue 119878 denotes the

query protein of length 119898 and 119879 denotes the template of 119899residues The query protein 119878 of length119898 and the template 119879of 119899 residues can therefore be expressed in terms of tripletsas 119878 = 120590

1 120590

2 120590

119898 and 119879 = 120591

1 120591

2 120591

119899 where 120590

119894=

(119901

119873 119901

119862120572 119901

119862) 120591119895= (119902

119873 119902

119862120572 119902

119862) and 119901 and 119902 are PDB

coordinates for each atomA matrix of dimensions 119898 times 119899 was then constructed for

the residues of 119878 and 119879 as

119872 =

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

119872

11119872

12sdot sdot sdot 119872

1119899

119872

21119872

22sdot sdot sdot 119872

2119899

sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

119872

1198981119872

1198982sdot sdot sdot 119872

119898119899

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

(1)

where the element119872119894119895is a rigid-body transformation matrix

that transforms the triplet 120590119894to 120591119895(ie119872

119894119895120590

119894= 120591

119895)

44 Performing Triplet Clustering 119863119894119895119896119897is the Cartesian dis-

tance between the target 120591119897and the transformed triplet119872

119894119895120590

119896

providing a measure of how similarly the triplet pairs (120590119894 120591

119895)

and (120590119896 120591

119897) are oriented This allows clustering of triplet

fragments using the single-linkage algorithm [43] as followsIf for two triplet pairs (120590

119894 120591

119895) and (120590

119896 120591

119897) 119863119894119895119896119897lt 119863

0 119894 = 119896

and 119895 = 119897 then the triplets are clustered Let 1198661and 119866

2

be two clusters with the first containing (120590119894 120591

119895) and (120590

119896 120591

119897)

and the second containing (1205901198941015840 1205911198951015840) and (120590

1198961015840 1205911198971015840) If 119863119894119895

11989610158401198971015840lt

119863

0 then 119866

1and 119866

2are merged to form a new cluster 119866

3

where 1198663= 119866

1cup 119866

2 These procedures are performed

iteratively until no new clusters can be formed For each finalcluster 119866

120583 we can obtain the transformation matrix119872120583

119896119897and

aligned substructure pair 119878120583= ⋃

120590119896isin119866120583

120590

119896and 119879

120583= ⋃

120591119897isin119866120583

120591

119897

where 119866120583

has the minimum Cartesian distance whenusing119872120583

119896119897

45 Scoring Function For each residue 119894 the binding score119862119894

is defined as

119862

119894= MAX120590119894isin119866120583

(120576

120583times 119862

119877

120583times 119862

119861

120583times 119862

119863

120583) (2)

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

8 BioMed Research International

(a)

(b) (c) (d)

Figure 6 Identification of FAD-binding sites (a) Chain H of D-amino acid oxidase (PDB ID1DDO) was the query protein Templates wereconstructed from (b) putidaredoxin reductase (chain B PDB ID1Q1R) (c) D-amino acid oxidase (chain A PDB ID1C0I) and (d) glycineoxidase (chain B PDB ID1NG3)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(a)

0

02

04

06

08

1

Perfo

rman

ce

Type of residue

SensitivitySpecificity

GLY

ALA VA

LLE

UIL

ESE

RTH

RCY

SM

ETPR

OPH

ETY

RTR

PLY

SA

RG HIS

ASP

ASN

GLU

GLN

(b)

Figure 7 Sensitivity and specificity associated with each amino acid in NAD-FAD-binding-site predictions (a) NAD (b) FAD

residue were calculated (Figure 7) For NAD-binding-sitepredictions specificity for each residue was excellent (0927ndash0966) but sensitivity was relatively low for phenylalaninetryptophan arginine and glutaminewhichwere less than 05For FAD-binding sites all residues achieved high specificity(0933ndash0971) and sensitivity (0532ndash0791) It should be noted

that the ratio of NAD-FAD-binding residues to nonbindingresidues is about 1 to 16 in our dataset This large differencemight cause lots of false positives when predicted That is thereason for high specificity and accuracy but low sensitivity inour prediction results Hence the positions of false positiveresidues in sequence were also investigated 20 and 25 of

BioMed Research International 9

false positive residues of NAD- and FAD-binding predictionoccurred next to the true positive residues in sequence Itwas shown that these residues are also located near theligand in the coordinate space If these residues were treatedas true positive residues our prediction results of NAD-binding yielded 7155 sensitivity and 061 MCC at a 5FPR threshold Under the same conditions FAD-binding-site predictions yielded 7334 sensitivity and an MCC of064 Compared with other prediction methods ours did notuse protein evolutionary information but only used proteinstructure and did not need to use equal number dataset fortraining but predicted whole-proteins through comparingstructures of template database Our results yielded excellentprediction performancewhen analyzingNAD-FAD-bindingresidues and thus provide important details concerning thebinding-site microenvironment This approach thereforemay be used to predict putative NAD-FAD-binding proteinsand the specific residues involved in the interaction

4 Methods

41 Overview We extracted structures of proteins boundto NAD or FAD from PDB and constructed a database ofNAD-FAD-binding residue templates Residues that weredefined as binding residues by the ligand-binding databaseBioLiP [40] were included in the template Query proteinstructures were then compared with each template in thedatabase using a ldquoleave-one-outrdquo comparison method Thefragment transformation method [22] was used to alignquery and template structures After comparing the localprotein structure each residue was assigned a score basedon both protein sequence and structure Sequence similaritywas calculated using the BLOSUM62 substitutionmatrix [41]whereas structural similarity was calculated by measuringthe root mean square deviation (RMSD) of the C120572 carbonsfrom local structure alignments and using a secondarystructure substitutionmatrix [22] according to theDictionaryof Secondary Structure of Proteinsrsquo (DSSP) definition ofsecondary structure [42] Residues with an alignment scorethat exceeded a predetermined threshold were predicted tobind NADFADThis method is illustrated in Figure 8

42 NAD-FAD-Binding Proteins and Binding Residue Tem-plates We adopted the same datasets with previous research[10 11] All protein complexes were collected from PDBand had pairwise sequence identity lt40 by using CD-HITProteins chains that are not involved in NADFAD bindingwere excluded Residues that were defined as binding ornonbinding residues by using the ligand-binding databaseBioLiP The main dataset included 184 and 165 polypeptidechains for NAD and FAD respectively Because NAD iscomposed of a nicotinamide moiety an adenosine moietyand a phosphate moiety binding residues were dividedinto three groups nicotinamide binding adenosine bindingand phosphate binding FAD-binding sitessimilarly contain

flavin-binding residues adenosine-binding residues andphosphate-binding residues Groups of residues that con-tained more than or equal to two binding residues wereconsidered a binding residue template (see Figures 9 and 10)

43The Fragment TransformationMethod Weused the frag-ment transformation method to align NAD-FAD-bindingresidues Each residue was treated as an individual unit andwas used to align the query protein 119878 with the bindingtemplate 119879 The structural unit consists of a triplet formedby the NndashC

120572ndashC atoms within a given residue 119878 denotes the

query protein of length 119898 and 119879 denotes the template of 119899residues The query protein 119878 of length119898 and the template 119879of 119899 residues can therefore be expressed in terms of tripletsas 119878 = 120590

1 120590

2 120590

119898 and 119879 = 120591

1 120591

2 120591

119899 where 120590

119894=

(119901

119873 119901

119862120572 119901

119862) 120591119895= (119902

119873 119902

119862120572 119902

119862) and 119901 and 119902 are PDB

coordinates for each atomA matrix of dimensions 119898 times 119899 was then constructed for

the residues of 119878 and 119879 as

119872 =

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

119872

11119872

12sdot sdot sdot 119872

1119899

119872

21119872

22sdot sdot sdot 119872

2119899

sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

119872

1198981119872

1198982sdot sdot sdot 119872

119898119899

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

(1)

where the element119872119894119895is a rigid-body transformation matrix

that transforms the triplet 120590119894to 120591119895(ie119872

119894119895120590

119894= 120591

119895)

44 Performing Triplet Clustering 119863119894119895119896119897is the Cartesian dis-

tance between the target 120591119897and the transformed triplet119872

119894119895120590

119896

providing a measure of how similarly the triplet pairs (120590119894 120591

119895)

and (120590119896 120591

119897) are oriented This allows clustering of triplet

fragments using the single-linkage algorithm [43] as followsIf for two triplet pairs (120590

119894 120591

119895) and (120590

119896 120591

119897) 119863119894119895119896119897lt 119863

0 119894 = 119896

and 119895 = 119897 then the triplets are clustered Let 1198661and 119866

2

be two clusters with the first containing (120590119894 120591

119895) and (120590

119896 120591

119897)

and the second containing (1205901198941015840 1205911198951015840) and (120590

1198961015840 1205911198971015840) If 119863119894119895

11989610158401198971015840lt

119863

0 then 119866

1and 119866

2are merged to form a new cluster 119866

3

where 1198663= 119866

1cup 119866

2 These procedures are performed

iteratively until no new clusters can be formed For each finalcluster 119866

120583 we can obtain the transformation matrix119872120583

119896119897and

aligned substructure pair 119878120583= ⋃

120590119896isin119866120583

120590

119896and 119879

120583= ⋃

120591119897isin119866120583

120591

119897

where 119866120583

has the minimum Cartesian distance whenusing119872120583

119896119897

45 Scoring Function For each residue 119894 the binding score119862119894

is defined as

119862

119894= MAX120590119894isin119866120583

(120576

120583times 119862

119877

120583times 119862

119861

120583times 119862

119863

120583) (2)

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

BioMed Research International 9

false positive residues of NAD- and FAD-binding predictionoccurred next to the true positive residues in sequence Itwas shown that these residues are also located near theligand in the coordinate space If these residues were treatedas true positive residues our prediction results of NAD-binding yielded 7155 sensitivity and 061 MCC at a 5FPR threshold Under the same conditions FAD-binding-site predictions yielded 7334 sensitivity and an MCC of064 Compared with other prediction methods ours did notuse protein evolutionary information but only used proteinstructure and did not need to use equal number dataset fortraining but predicted whole-proteins through comparingstructures of template database Our results yielded excellentprediction performancewhen analyzingNAD-FAD-bindingresidues and thus provide important details concerning thebinding-site microenvironment This approach thereforemay be used to predict putative NAD-FAD-binding proteinsand the specific residues involved in the interaction

4 Methods

41 Overview We extracted structures of proteins boundto NAD or FAD from PDB and constructed a database ofNAD-FAD-binding residue templates Residues that weredefined as binding residues by the ligand-binding databaseBioLiP [40] were included in the template Query proteinstructures were then compared with each template in thedatabase using a ldquoleave-one-outrdquo comparison method Thefragment transformation method [22] was used to alignquery and template structures After comparing the localprotein structure each residue was assigned a score basedon both protein sequence and structure Sequence similaritywas calculated using the BLOSUM62 substitutionmatrix [41]whereas structural similarity was calculated by measuringthe root mean square deviation (RMSD) of the C120572 carbonsfrom local structure alignments and using a secondarystructure substitutionmatrix [22] according to theDictionaryof Secondary Structure of Proteinsrsquo (DSSP) definition ofsecondary structure [42] Residues with an alignment scorethat exceeded a predetermined threshold were predicted tobind NADFADThis method is illustrated in Figure 8

42 NAD-FAD-Binding Proteins and Binding Residue Tem-plates We adopted the same datasets with previous research[10 11] All protein complexes were collected from PDBand had pairwise sequence identity lt40 by using CD-HITProteins chains that are not involved in NADFAD bindingwere excluded Residues that were defined as binding ornonbinding residues by using the ligand-binding databaseBioLiP The main dataset included 184 and 165 polypeptidechains for NAD and FAD respectively Because NAD iscomposed of a nicotinamide moiety an adenosine moietyand a phosphate moiety binding residues were dividedinto three groups nicotinamide binding adenosine bindingand phosphate binding FAD-binding sitessimilarly contain

flavin-binding residues adenosine-binding residues andphosphate-binding residues Groups of residues that con-tained more than or equal to two binding residues wereconsidered a binding residue template (see Figures 9 and 10)

43The Fragment TransformationMethod Weused the frag-ment transformation method to align NAD-FAD-bindingresidues Each residue was treated as an individual unit andwas used to align the query protein 119878 with the bindingtemplate 119879 The structural unit consists of a triplet formedby the NndashC

120572ndashC atoms within a given residue 119878 denotes the

query protein of length 119898 and 119879 denotes the template of 119899residues The query protein 119878 of length119898 and the template 119879of 119899 residues can therefore be expressed in terms of tripletsas 119878 = 120590

1 120590

2 120590

119898 and 119879 = 120591

1 120591

2 120591

119899 where 120590

119894=

(119901

119873 119901

119862120572 119901

119862) 120591119895= (119902

119873 119902

119862120572 119902

119862) and 119901 and 119902 are PDB

coordinates for each atomA matrix of dimensions 119898 times 119899 was then constructed for

the residues of 119878 and 119879 as

119872 =

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

119872

11119872

12sdot sdot sdot 119872

1119899

119872

21119872

22sdot sdot sdot 119872

2119899

sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

119872

1198981119872

1198982sdot sdot sdot 119872

119898119899

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

1003816

(1)

where the element119872119894119895is a rigid-body transformation matrix

that transforms the triplet 120590119894to 120591119895(ie119872

119894119895120590

119894= 120591

119895)

44 Performing Triplet Clustering 119863119894119895119896119897is the Cartesian dis-

tance between the target 120591119897and the transformed triplet119872

119894119895120590

119896

providing a measure of how similarly the triplet pairs (120590119894 120591

119895)

and (120590119896 120591

119897) are oriented This allows clustering of triplet

fragments using the single-linkage algorithm [43] as followsIf for two triplet pairs (120590

119894 120591

119895) and (120590

119896 120591

119897) 119863119894119895119896119897lt 119863

0 119894 = 119896

and 119895 = 119897 then the triplets are clustered Let 1198661and 119866

2

be two clusters with the first containing (120590119894 120591

119895) and (120590

119896 120591

119897)

and the second containing (1205901198941015840 1205911198951015840) and (120590

1198961015840 1205911198971015840) If 119863119894119895

11989610158401198971015840lt

119863

0 then 119866

1and 119866

2are merged to form a new cluster 119866

3

where 1198663= 119866

1cup 119866

2 These procedures are performed

iteratively until no new clusters can be formed For each finalcluster 119866

120583 we can obtain the transformation matrix119872120583

119896119897and

aligned substructure pair 119878120583= ⋃

120590119896isin119866120583

120590

119896and 119879

120583= ⋃

120591119897isin119866120583

120591

119897

where 119866120583

has the minimum Cartesian distance whenusing119872120583

119896119897

45 Scoring Function For each residue 119894 the binding score119862119894

is defined as

119862

119894= MAX120590119894isin119866120583

(120576

120583times 119862

119877

120583times 119862

119861

120583times 119862

119863

120583) (2)

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

10 BioMed Research International

LYSLYS

GLY

ILE

GLU

GLU

GLY

LEU

SER

TRP

ASN

GLN

Ligand-binding residue template database

0123456

96 146

Query protein

The fragment transformation method

Scoring function

FAD

NAD

minus1

ALA

THR

SER

SER PHE

LYS

SER

SER

MET ARG

VAL

Figure 8 Schematic of the method for predicting NAD-FAD-binding sites

where 120576120583is the number of triplets of 119878

120583(ie the aligned

residues of the query structure)The alignment scores119862119877120583119862119861120583

and 119862119863120583are defined as

119862

119877

120583=

1

1 + RMSD (119878120583 119879

120583)

119862

119861

120583=

BLOSUM (119878120583 119879

120583)

BLOSUM (119879120583 119879

120583)

+ 1

119862

119863

120583=

DSSP (119878120583 119879

120583)

DSSP (119879120583 119879

120583)

+ 1

(3)

where RMSD (119878120583 119879

120583) is the RMSD of all 119862

120572atoms between

119878

120583and 119879

120583 BLOSUM (119878

120583 119879

120583) is the sequence alignment

score between 119878120583and 119879

120583calculated using the BLOSUM62

[41] substitution matrix BLOSUM (119879120583 119879

120583) is the maximum

sequence alignment score of 119879120583 DSSP (119878

120583 119879

120583) represents the

secondary structure alignment score based on a constructionsubstitution matrix [22] using the definition of DSSP [42]between 119878

120583and 119879

120583 and DSSP (119879

120583 119879

120583) is the maximum

secondary structure alignment score of 119879120583 The value of

RMSD (119878120583 119879

120583) should be lt3 A

For each residue 119894 we predict a geometric centerΘ120596119894of the

ligand by Θ120596119894= 119872

120583

119896119897

minus1

119871

120596 where 119871

120596is the geometric center

of the binding template type 120596 in template 119879 120596 representsthe three moieties of NADFAD nicotinamide adenosine

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

BioMed Research International 11

LYSLYS

GLY

ILEGLU

GLU GLY

LEUSER

TRP

ASN

GLN

(a)

LYS

GLYGLU

GLN

LEU

ASN

Nicotinamide-binding templates

(b)

SER

TRPPhosphate-binding templates

(c)

LYS

GLY

GLU

ILE

Adenosine-binding templates

(d)

Figure 9NAD-binding residue templates (a)The entireNAD-binding template (b)Nicotinamide-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

LYS

ARG

MET

ALA

THRSER

SER

SER

SER

PHE

ASN

VAL

(a)

Flavin-binding templates

SER

SER

SER

VAL ARG

THRASN

(b)

Phosphate-binding templates

SER

MET

LYS

(c)

Adenosine-binding templates

SER

ALAPHE

(d)

Figure 10 FAD-binding residue templates (a) The entire FAD-binding template (b) Flavin-binding templates (c) Phosphate-bindingtemplates (d) Adenosine-binding templates

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

12 BioMed Research International

and phosphate for NAD flavin adenosine and phosphatefor FAD The binding score 119862

119896is added to 119862

119894if the distance

betweenΘ120596119894andΘ120596

1015840

119896is between 3 and 9 A and120596 = 1205961015840 Finally

the normalized binding score 119885119862119894is calculated as

119885

119862

119894=

119862

119894minus 119862

119878119863

119862

(4)

where 119862 and 119878119863119862denote the mean and standard deviation

respectively of the binding score 119862119894

46 Performance Assessment The accuracy of predictingNAD-FAD-binding sites wasdefined as the number of truepositives and true negatives and was evaluated using a leave-one-out approach Accuracy (ACC) the true positive rate(TPR) and the false positive rate (FPR) were calculated usingtrue positive (TP) true negative (TN) false positive (FP) andfalse negative (FN) values as follows

ACC = TP + TNTP + TN + FP + FN

TPR = Sensitivity = TPTP + FN

FPR = 1 minus Specificity = FPFP + TN

(5)

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Authorsrsquo Contribution

Chih-Hao Lu and Chin-Sheng Yu developed and imple-mented the methods Chih-Hao Lu Chin-Sheng Yu Yu-Feng Lin and Jin-Yi Chen carried out the analysis Chih-Hao Lu and Yu-Feng Lin drafted the paper Chih-Hao Lusupervised the work All the authors have read and approvedthe content of the final paper Chih-Hao Lu and Chin-ShengYu contributed equally to this work

Acknowledgment

Thisworkwas supported byGrants from theNational ScienceCouncil (NSC 99-2113-M-039-002-MY2) and China MedicalUniversity (CMU99-N2-02) Taiwan to Chih-Hao Lu Thefunders had no role in study design data collection andanalysis decision to publish or preparation of the paper Theauthors are grateful to Yeong-Shin Lin (National Chiao TungUniversity Taiwan) for invaluable comments

References

[1] H M Berman J Westbrook Z Feng et al ldquoThe protein databankrdquo Nucleic Acids Research vol 28 no 1 pp 235ndash242 2000

[2] A Wilkinson J Day and R Bowater ldquoBacterial DNA ligasesrdquoMolecular Microbiology vol 40 no 6 pp 1241ndash1248 2001

[3] A Burkle ldquoPhysiology and pathophysiology of poly(ADP-ribosyl)ationrdquo BioEssays vol 23 no 9 pp 795ndash806 2001

[4] Q Zhang D W Piston and R H Goodman ldquoRegulation ofcorepressor function by nuclear NADHrdquo Science vol 295 no5561 pp 1895ndash1897 2002

[5] J S Smith and J D Boeke ldquoAn unusual form of transcriptionalsilencing in yeast ribosomalDNArdquoGenes andDevelopment vol11 no 2 pp 241ndash254 1997

[6] R M Anderson K J Bitterman J G Wood et al ldquoManipula-tion of a nuclear NAD+ salvage pathway delays aging withoutaltering steady-state NAD+ levelsrdquo The Journal of BiologicalChemistry vol 277 no 21 pp 18881ndash18890 2002

[7] J Rutter M Reick L C Wu and S L McKnight ldquoRegulationof crock and NPAS2 DNA binding by the redox state of NADcofactorsrdquo Science vol 293 no 5529 pp 510ndash514 2001

[8] K ChenM JMizianty and L Kurgan ldquoPrediction and analysisof nucleotide-binding residues using sequence and sequence-derived structural descriptorsrdquo Bioinformatics vol 28 no 3 pp331ndash341 2012

[9] M Saito M Go and T Shirai ldquoAn empirical approach fordetecting nucleotide-binding sites on proteinsrdquo Protein Engi-neering Design and Selection vol 19 no 2 pp 67ndash75 2006

[10] H R Ansari and G P S Raghava ldquoIdentification of NADinteracting residues in proteinsrdquo BMC Bioinformatics vol 11article 160 2010

[11] N K Mishra and G P S Raghava ldquoPrediction of FADinteracting residues in a protein from its primary sequenceusing evolutionary informationrdquo BMC Bioinformatics vol 11article S48 no 1 2010

[12] Z-P Liu L-Y Wu Y Wang X-S Zhang and L ChenldquoPrediction of protein-RNA binding sites by a random forestmethod with combined featuresrdquo Bioinformatics vol 26 no 13pp 1616ndash1622 2010

[13] L Wang Z P Liu X S Zhang and L Chen ldquoPrediction of hotspots in protein interfaces using a random forest model withhybrid featuresrdquo Protein Engineering Design and Selection vol25 no 3 pp 119ndash126 2012

[14] J S Chauhan N KMishra andG P S Raghava ldquoPrediction ofGTP interacting residues dipeptides and tripeptides in a pro-tein from its evolutionary informationrdquo BMC Bioinformaticsvol 11 article 301 2010

[15] A Roy and Y Zhang ldquoRecognizing protein-ligand binding sitesby global structural alignment and local geometry refinementrdquoStructure vol 20 no 6 pp 987ndash997 2012

[16] L Xie and P E Bourne ldquoDetecting evolutionary relationshipsacross existing fold space using sequence order-independentprofile-profile alignmentsrdquo Proceedings of the National Academyof Sciences of the United States of America vol 105 no 14 pp5441ndash5446 2008

[17] J Yang A Roy and Y Zhang ldquoProtein-ligand binding siterecognition using complementary binding-specific substruc-ture comparison and sequence profile alignmentrdquo Bioinformat-ics vol 29 no 20 pp 2588ndash2595 2013

[18] Y T Yan and W-H Li ldquoIdentification of protein functionalsurfaces by the concept of a split pocketrdquo Proteins StructureFunction and Bioinformatics vol 76 no 4 pp 959ndash976 2009

[19] K A Dill ldquoDominant forces in protein foldingrdquo Biochemistryvol 29 no 31 pp 7133ndash7155 1990

[20] S Govindarajan and R A Goldstein ldquoEvolution of modelproteins on a foldability landscaperdquo Proteins vol 29 no 4 pp461ndash466 1997

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

BioMed Research International 13

[21] G Parisi and J Echave ldquoStructural constraints and emergence ofsequence patterns in protein evolutionrdquo Molecular Biology andEvolution vol 18 no 5 pp 750ndash756 2001

[22] C H Lu Y S Lin Y C Chen C S Yu S Y Chang andJ K Hwang ldquoThe fragment transformation method to detectthe protein structural motifsrdquo Proteins Structure Function andGenetics vol 63 no 3 pp 636ndash643 2006

[23] L Schrodinger ThePyMOLMolecular Graphics System Version13r1 2010

[24] U Dengler K Niefind M Kieszlig and D Schomburg ldquoCrystalstructure of a ternary complex of D-2-hydroxy-isocaproatedehydrogenase from Lactobacillus casei NAD+ and 2-oxoisocaproate at 19 A resolutionrdquo Journal of MolecularBiology vol 267 no 3 pp 640ndash660 1997

[25] E Gross C S Sevier A Vala C A Kaiser and D Fass ldquoA newFAD-binding fold and intersubunit disulfide shuttle in the thioloxidase Erv2prdquoNature Structural Biology vol 9 no 1 pp 61ndash672002

[26] S V Antonyuk R W Strange M J Ellis et al ldquoStructure ofd-lactate dehydrogenase from Aquifex aeolicus complexed withNAD+ and lactic acid (or pyruvate)rdquo Acta Crystallographica FStructural Biology and Crystallization Communications vol 65part 12 pp 1209ndash1213 2009

[27] C K Wu T A Dailey H A Dailey B C Wang and J PRose ldquoThe crystal structure of augmenter of liver regenerationa mammalian FAD-dependent sulfhydryl oxidaserdquo ProteinScience vol 12 no 5 pp 1109ndash1118 2003

[28] J R Thompson J K Bell J Bratt G A Grant and LJ Banaszak ldquoVmax regulation through domain and subunitchanges The active form of phosphoglycerate dehydrogenaserdquoBiochemistry vol 44 no 15 pp 5763ndash5773 2005

[29] M Nardini S Spano C Cericola et al ldquoCtBPBARS a dual-function protein involved in transcription co-repression andGolgi membrane fissionrdquo EMBO Journal vol 22 no 12 pp3122ndash3130 2003

[30] R Kort H Komori S I Adachi K Miki and A Eker ldquoDNAapophotolyase from Anacystis nidulans 18 A structure 8-HDF reconstitution and X-ray-induced FAD reductionrdquo ActaCrystallographica Section D Biological Crystallography vol 60no 7 pp 1205ndash1213 2004

[31] Y Huang R Baxter B S Smith C L Partch C L Colbert andJ Deisenhofer ldquoCrystal structure of cryptochrome 3 from Ara-bidopsis thaliana and its implications for photolyase activityrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 103 no 47 pp 17701ndash17706 2006

[32] J B Thoden T M Wohlers J L Fridovich-Keil and H MHolden ldquoMolecular basis for severe epimerase deficiency galac-tosemia X-ray structure of the humanV94M-substitutedUDP-galactose 4-epimeraserdquoThe Journal of Biological Chemistry vol276 no 23 pp 20617ndash20623 2001

[33] C A Brautigam B S Smith Z Ma et al ldquoStructure of thephotolyase-like domain of cryptochrome 1 from Arabidopsisthalianardquo Proceedings of the National Academy of Sciences of theUnited States of America vol 101 no 33 pp 12142ndash12147 2004

[34] H Komori R Masui S Kuramitsu et al ldquoCrystal structure ofthermostable DNA photolyase pyrimidine-dimer recognitionmechanismrdquo Proceedings of the National Academy of Sciences ofthe United States of America vol 98 no 24 pp 13560ndash135652001

[35] F Todone M A Vanoni A Mozzarelli et al ldquoActive siteplasticity in D-amino acid oxidase a crystallographic analysisrdquoBiochemistry vol 36 no 19 pp 5853ndash5860 1997

[36] I F Sevrioukova H Li and T L Poulos ldquoCrystal structure ofputidaredoxin reductase from Pseudomonas putida the finalstructural component of the cytochromeP450cammonooxyge-naserdquo Journal of Molecular Biology vol 336 no 4 pp 889ndash9022004

[37] S Y Song Y B Xu Z J Lin and C L Tsou ldquoStructure ofactive site carboxymethylated D-glyceraldehyde-3-phosphatedehydrogenase from Palinurus versicolorrdquo Journal of MolecularBiology vol 287 no 4 pp 719ndash725 1999

[38] L Pollegioni KDiederichs GMolla et al ldquoYeastD-amino acidoxidase structural basis of its catalytic propertiesrdquo Journal ofMolecular Biology vol 324 no 3 pp 535ndash546 2002

[39] E C Settembre P C Dorrestein J H Park A M Augustine TP Begley and S E Ealick ldquoStructural and mechanistic studieson thiO a glycine oxidase essential for thiamin biosynthesis inBacillus subtilisrdquo Biochemistry vol 42 no 10 pp 2971ndash29812003

[40] J Yang A Roy and Y Zhang ldquoBioLiP a semi-manually curateddatabase for biologically relevant ligand-protein interactionsrdquoNucleic Acids Research vol 41 no 1 pp D1096ndashD1103 2013

[41] S Henikoff and J G Henikoff ldquoAmino acid substitution matri-ces from protein blocksrdquo Proceedings of the National Academyof Sciences of the United States of America vol 89 no 22 pp10915ndash10919 1992

[42] W Kabsch and C Sander ldquoDictionary of protein secondarystructure pattern recognition of hydrogen-bonded and geo-metrical featuresrdquo Biopolymers vol 22 no 12 pp 2577ndash26371983

[43] J C Gower and G J S Ross ldquoMinimum spanning trees andsingle-linkage cluster analysisrdquo Journal of the Royal StatisticalSociety vol 18 no 1 article 11 1969

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 14: Research Article Predicting Flavin and Nicotinamide ...downloads.hindawi.com/journals/bmri/2015/402536.pdf · Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flav in-binding

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology