iezy-drug: a web server for identifying the interaction between

14
Hindawi Publishing Corporation BioMed Research International Volume 2013, Article ID 701317, 13 pages http://dx.doi.org/10.1155/2013/701317 Research Article iEzy-Drug: A Web Server for Identifying the Interaction between Enzymes and Drugs in Cellular Networking Jian-Liang Min, 1 Xuan Xiao, 1,2,3 and Kuo-Chen Chou 3,4 1 Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333046, China 2 Information School, ZheJiang Textile & Fashion College, NingBo 315211, China 3 Gordon Life Science Institute, Belmont, MA 02478, USA 4 Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia Correspondence should be addressed to Xuan Xiao; [email protected] Received 7 August 2013; Accepted 17 September 2013 Academic Editor: Tatsuya Akutsu Copyright © 2013 Jian-Liang Min et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With the features of extremely high selectivity and efficiency in catalyzing almost all the chemical reactions in cells, enzymes play vitally important roles for the life of an organism and hence have become frequent targets for drug design. An essential step in developing drugs by targeting enzymes is to identify drug-enzyme interactions in cells. It is both time-consuming and costly to do this purely by means of experimental techniques alone. Although some computational methods were developed in this regard based on the knowledge of the three-dimensional structure of enzyme, unfortunately their usage is quite limited because three- dimensional structures of many enzymes are still unknown. Here, we reported a sequence-based predictor, called “iEzy-Drug,” in which each drug compound was formulated by a molecular fingerprint with 258 feature components, each enzyme by the Chou’s pseudo amino acid composition generated via incorporating sequential evolution information and physicochemical features derived from its sequence, and the prediction engine was operated by the fuzzy K-nearest neighbor algorithm. e overall success rate achieved by iEzy-Drug via rigorous cross-validations was about 91%. Moreover, to maximize the convenience for the majority of experimental scientists, a user-friendly web server was established, by which users can easily obtain their desired results. 1. Introduction Enzymes are biomacromolecules that catalyze almost all the chemical reactions essential for the life of a cell [1]. Most enzymes are proteins although some RNA molecules have been identified to possess the function of enzyme as well. As catalysts, enzymes possess two exceptional features: one is of high efficiency and the other of high selectivity. For instance, the second-order rate constant between some enzymes and their substrates [2] was surprisingly high [3], which could almost reach the upper limit of diffusion-controlled reaction rate according to the calculation and analysis by Chou and coworkers [46]. e high selectivity or specificity of enzymes was likened to the “lock-and-key” model, implying that an accurate fit is required between the active site of an enzyme and its substrate for the catalytic reaction to occur. Owing to the previous unique features, enzymes play a crucial role in controlling and regulating the order of chemical reactions in cells that is vitally important for their survival. It is also because of this that enzymes are excellent drug targets, and actually many drugs are enzyme inhibitors. For example, some peptide inhibitors against HIV/AIDS [710] and SARS (severe acute respiratory syn- drome) [1113] were based on the Chou’s distorted key theory [14], as illustrated in Figure 1, where (a) shows a good fit for a cleavable octapeptide with the active site of HIV-protease and (b) shows that the peptide has become an ideal inhibitor or “distorted key” aſter its scis- sile bond is modified. For a brief introduction about the Chou’s distorted key theory and its application for design- ing peptide drugs, see a Wikipedia article at http://en.wikiped ia.org/wiki/Chou’s distorted key theory for peptide drugs. To develop enzyme-targeting drugs, an essential step is to identify drug-enzyme interaction in cellular networking [15]. e completion of the human genome project and the

Upload: lamkhuong

Post on 09-Dec-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Hindawi Publishing CorporationBioMed Research InternationalVolume 2013 Article ID 701317 13 pageshttpdxdoiorg1011552013701317

Research ArticleiEzy-Drug A Web Server for Identifying the Interaction betweenEnzymes and Drugs in Cellular Networking

Jian-Liang Min1 Xuan Xiao123 and Kuo-Chen Chou34

1 Computer Department Jing-De-Zhen Ceramic Institute Jing-De-Zhen 333046 China2 Information School ZheJiang Textile amp Fashion College NingBo 315211 China3 Gordon Life Science Institute Belmont MA 02478 USA4Center of Excellence in Genomic Medicine Research (CEGMR) King Abdulaziz University Jeddah Saudi Arabia

Correspondence should be addressed to Xuan Xiao xxiaogordonlifescienceorg

Received 7 August 2013 Accepted 17 September 2013

Academic Editor Tatsuya Akutsu

Copyright copy 2013 Jian-Liang Min et alThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

With the features of extremely high selectivity and efficiency in catalyzing almost all the chemical reactions in cells enzymes playvitally important roles for the life of an organism and hence have become frequent targets for drug design An essential step indeveloping drugs by targeting enzymes is to identify drug-enzyme interactions in cells It is both time-consuming and costly todo this purely by means of experimental techniques alone Although some computational methods were developed in this regardbased on the knowledge of the three-dimensional structure of enzyme unfortunately their usage is quite limited because three-dimensional structures of many enzymes are still unknown Here we reported a sequence-based predictor called ldquoiEzy-Drugrdquoin which each drug compound was formulated by a molecular fingerprint with 258 feature components each enzyme by theChoursquos pseudo amino acid composition generated via incorporating sequential evolution information and physicochemical featuresderived from its sequence and the prediction engine was operated by the fuzzy K-nearest neighbor algorithm The overall successrate achieved by iEzy-Drug via rigorous cross-validations was about 91 Moreover to maximize the convenience for the majorityof experimental scientists a user-friendly web server was established by which users can easily obtain their desired results

1 Introduction

Enzymes are biomacromolecules that catalyze almost all thechemical reactions essential for the life of a cell [1] Mostenzymes are proteins although some RNA molecules havebeen identified to possess the function of enzyme as well Ascatalysts enzymes possess two exceptional features one is ofhigh efficiency and the other of high selectivity For instancethe second-order rate constant between some enzymes andtheir substrates [2] was surprisingly high [3] which couldalmost reach the upper limit of diffusion-controlled reactionrate according to the calculation and analysis by Chouand coworkers [4ndash6] The high selectivity or specificity ofenzymes was likened to the ldquolock-and-keyrdquo model implyingthat an accurate fit is required between the active site of anenzyme and its substrate for the catalytic reaction to occurOwing to the previous unique features enzymes play acrucial role in controlling and regulating the order of

chemical reactions in cells that is vitally important fortheir survival It is also because of this that enzymes areexcellent drug targets and actually many drugs are enzymeinhibitors For example some peptide inhibitors againstHIVAIDS [7ndash10] and SARS (severe acute respiratory syn-drome) [11ndash13] were based on the Choursquos distorted keytheory [14] as illustrated in Figure 1 where (a) shows agood fit for a cleavable octapeptide with the active siteof HIV-protease and (b) shows that the peptide hasbecome an ideal inhibitor or ldquodistorted keyrdquo after its scis-sile bond is modified For a brief introduction about theChoursquos distorted key theory and its application for design-ing peptide drugs see aWikipedia article at httpenwikipediaorgwikiChoursquos distorted key theory for peptide drugs

To develop enzyme-targeting drugs an essential step isto identify drug-enzyme interaction in cellular networking[15] The completion of the human genome project and the

2 BioMed Research International

(a) (b)

Figure 1 A schematic drawing to illustrate how to use Choursquos distorted key theory to develop peptide drugs against HIVAIDS (a) shows agood fitting and binding of a peptide to the active site of HIV protease right before it is cleaved by the enzyme (b) shows that the peptide hasbecome a noncleavable one after its scissile bond is modified although it can still tightly bind to the active site Such a modified peptide orldquodistorted keyrdquo will automatically become an inhibitor candidate against HIV protease

emergence of molecular medicine have provided excellentopportunity to discover unknown target enzymes for drugsMany efforts were made in this regard by computationallyanalyzing drug-enzyme interactions The most commonlyused approaches are docking simulations (see eg [16ndash19])and protein cleavage site analysis (see eg [8 12 13]) basedon Choursquos distorted key theory [14] However the latterapproach is mainly used to find peptide drugs Comparedwith the smaller organic compounds although peptide drugshave the advantage of low toxicity to human body theyhave the shortcoming of poor metabolic stability and lowbioavailability due to their inability to readily crossing thrumembrane barriers such as the intestinal and blood-brainbarriers [20] In contrast the molecular docking is indeed auseful vehicle for investigating the interaction of an enzymereceptor with its organic inhibitor and revealing their bindingmechanism as demonstrated by a series of studies [11 19ndash23] However to conduct molecular docking a necessaryprerequisite is the availability of the 3D (three dimensional)structure of the targeted enzyme Unfortunately the 3Dstructures of many enzymes are still unknown Although X-ray crystallography is a powerful tool in determining the 3Dstructures of enzymes it is time-consuming and expensiveParticularly not all enzymes can be successfully crystallizedFor example membrane enzymes are very difficult to crys-tallize and most of them will not dissolve in normal solventsTherefore so far very few membrane enzyme 3D structureshave been determined Although NMR is indeed a verypowerful tool in determining the 3D structures of membraneproteins as indicated by a series of recent publications (seeeg [24ndash30]) it is time-consuming and costly To acquire thestructural information in a timelymanner one has to resort tovarious structural bioinformatics tools (see eg [18 31 32])Unfortunately the number of templates for developing highquality 3D structures by structural bioinformatics is verylimited

Therefore it would save us a lot of time and moneyif we could identify the interactions between enzymes anddrugs before carrying out any intense study in this regardIn view of this the present study was initiated in an attemptto develop a computational method based on the sequence-derived features that can be used to predict the drug-enzymeinteractions in cellular networking

As summarized in a comprehensive review [33] anddemonstrated by a series of recent publications [34ndash37] tosuccessfully develop the desiredmethod we need to considerthe following procedures (i) construct or select a validbenchmark dataset to train and test the predictor (ii) denotethe drug-enzyme samples with an effective formulation thatcan truly reflect their intrinsic relation with the target to bepredicted (iii) introduce or develop a powerful algorithm(or engine) to operate the prediction (iv) conduct a rigorouscross-validation to objectively evaluate its anticipated accu-racy (v) establish a user-friendly web-server for the predictorthat is freely accessible to the public Next let us elaboratehow to deal with these procedures one by one

2 Materials and Methods

21 Benchmark Dataset The data used in this study werecollected from Kyoto Encyclopedia of Genes and Genomes(KEGG) [38] at httpwwwkeggjpkegg which is a data-base resource for understanding high-level functions andutilities of the biological system such as the cell theorganism and the ecosystem from molecular-level infor-mation especially large-scale molecular datasets generatedby genome sequencing and other high-throughput exper-imental technologies For the current study the benchmarkdataset S can be formulated as

S = S+

cup Sminus

(1)

where S+ is the positive subset that consists of the interactiveenzyme-drug pairs only while Sminus is the negative subsetthat contains of the noninteractive enzyme-drug pairs onlyand the symbol cup represents the union in the set theoryHere the ldquointeractiverdquo pair means the pair whose twocounterparts are interacted with each other in the drug-target networks as defined in the KEGG database [38]while the ldquononinteractiverdquo pair means that its two counterparts are not interacted with each other in the drug-targetnetworks The positive dataset S+ contains 2719 enzyme-drug pairs derived from Yamanishi et al [39] The negativedataset Sminus contains 5438 noninteractive enzyme-drug pairswhich were derived according to the following procedures(i) separating each of the pairs in S+ into single drug and

BioMed Research International 3

enzyme (ii) recoupling each of the single drugs with eachof the single enzymes into pairs in a way that none of themoccurred inS+ (iii) randomly picking the pairs thus formeduntil they reached the number two times as many as thepairs in S+ The 2719 interactive enzyme-drug pairs and5438 noninteractive enzyme-drug pairs are given in OnlineSupporting Information S1 (see Supplementary Materialavailable online at httpdxdoiorg1011552013701317) Allthe detailed information for the compounds or drugs listedthere can be found in the KEGG database via their codes

22 Sample Representation Since each of the samples in thecurrent network system contains an enzyme (protein) anda drug a combination of the following two approaches wasadopted to represent the enzyme-drug pair samples

221 Drug

(a) 2D Molecular Fingerprints Although the number ofdrugs is extremely large most of them are small organicmolecules and are composed of some fixed small structures[40] The identification of small molecules or structures canbe used to detect the drug-target interactions [41] Molec-ular fingerprints are bit-string representations of molecularstructure and properties [42] It should be pointed outthat there are many types of structural representations thathave been suggested for the description of drug moleculesincluding physicochemical properties [43] chemical graphs[44] topological indices [45] 3D pharmacophore patternsandmolecular fields In the current study let us use the simpleand generally adopted 2Dmolecular fingerprints to representdrug molecules as described below

First for each of the drugs concerned we can obtaina MOL file from the KEGG database [38] via its codethat contains the detailed information of chemical structureSecond we can convert the MOL file format into its 2Dmolecular fingerprint file format by using a chemical toolboxsoftware called OpenBabel [46] which can be downloadedfrom the website at httpopenbabelorg The current ver-sion of OpenBabel can generate four types of fingerprintsFP2 FP3 FP4 and MACCS In the current study we usedthe FP2 fingerprint format It is a path-based fingerprint thatidentifies small molecule fragments based on all linear andring substructures and maps them onto a bit-string using ahash function (somewhat similar to the daylight fingerprints[47 48]) It is a length of 256-bit hexadecimal string obtainedfrom theOpenBabel andwe can convert it to a 256-bit vectorThen a molecular fingerprint can be formulated as a 256-Dvector given by

MF = [1198601 sdot sdot sdot 119860119895 sdot sdot sdot 119860256]T (2)

where 119860119895(119895 = 1 2 256) is an integer between 0 and 15

and T is the matrix transpose operatorIn order to capture as much useful information from

a molecular fingerprint as possible we can also convertthe above 256-bit hexadecimal string into a 1024-bit binaryvector which is a digital sequence only including 0 and 1

and consider two different digital signal characteristics for thedigital sequence as follows

(b) Information Entropy Shannon proposed that any infor-mation is redundant and redundant size is related withthe occurrence probability or uncertainty of each symbolsuch as numbers letters or words among the informationThe information entropy for a system with a probabilitydistribution 119875(119909

119894) for two classes information entropy [49] is

defined as

119867119909= minussum

119909

119875 (119909119894) log2119875 (119909119894) (119894 = 0 1) (3)

where 119875(119909119894) represents the occurrence probability of num-

ber 119894 in the aforementioned 1024-bit binary vector andthe information entropy 119867

119909is a measure value of the

information amount For example for the digital sequence100100011010010 the value of the information entropy 119867

119909

thus obtained is

119875 (1199090) =

9

15

= 06

119875 (1199091) =

6

15

= 04

119867119909= minus (06 times log

206 + 04 times log

204) = 0971

(4)

(c) Complexity Factor The Lempel-Ziv (LZ) complexity [50]reflects the order that is retained in the sequence and hencewas adopted in this study For each step only two operationswere allowed in the process to get the LZ complexity eithercopying the longest section from the part of a nonemptysequence or generating an additional symbol mark thatensures the uniqueness of per component 119878(119894

119896minus1rarr 119894119896) Its

substring is expressed by

119878 (119894 997888rarr 119895) = 119898119894119898119894+1119898119894+2sdot sdot sdot 119898119895

(1 le 119894 le 119895 le 119871) (5)

where 1198981represents the 1st digital value 119898

2the 2nd value

and so forth A nonempty digital sequence is synthesizedaccording to the following formula

Syn (119878) = 119878 (1 997888rarr 1198941) ∙ 119878 (119894

1+ 1 997888rarr 119894

2)

∙ sdot sdot sdot ∙ 119878 (119894119898minus1

+ 1 997888rarr 119894119871)

(6)

Suppose that 119878 = 11989811198982119898311989841198985sdot sdot sdot 119898119871has been recon-

structed by the subsymbol 119898119903which is viewed as the newly

inserted symbol The substring up to 119898119903will be denoted by

119878(1 rarr 119903)∙ where the bold dot ∙ indicates that 119898119903is a newly

inserted symbol for checkingwhether the rest of the substring119878(119903+1 rarr 119871) can be reconstructed by a simple process At firstsuppose 119878(119902) = 119898

119903+ 1 and see whether 119878(119902) is the substring

for the subsequence 119878(1 rarr 119903) which means deleting thelast symbol from the substring 119878(1 rarr 119903)119878(119902) If the answeris ldquonordquo we insert 119878(119902) into the sequence followed by a dot ∙Thus it could not be obtained by the same operation If theanswer is ldquoyesrdquo no new symbol is needed and we can go on toproceed with 119878(119902) = 119898

119903+1119898119903+2

and repeat the same previousprocedure The LZ complexity is the number of dots (plus

4 BioMed Research International

one if the string is not terminated by a dot) For example forthe sequence 100100011010010 syn(119875) and the correspondingcomplexity factor CF are described as

Syn (119878) = 1 ∙ 0 ∙ 01 ∙ 000 ∙ 11 ∙ 01001 ∙ 0

CF = 7(7)

Thus by adding the information entropy 119867119909(4) and com-

plexity factor CF (7) into the molecular fingerprint MF (2)we obtained a total of (256 + 1 + 1) = 258 feature elements torepresent a drug compound that is it can now be formulatedas a 258-D vector given by

119863 = [11986011198602sdot sdot sdot 119860

256119867119909

CF]T (8)

where 119860119894has the same meaning as in (2) while 119867

119909and CF

are the information entropy and complexity factor respec-tively as described in the previous two sections

222 Enzyme The sequences of the enzymes involved inthis study are given in Online Supporting Information S2Now the problem is how to effectively represent these enzymesequences for the current study Generally speaking thereare two kinds of approaches to formulate enzyme sequencesthe sequential model and the nonsequential or discretemodel [51] The most typical sequential representation foran enzyme sample E with 119871 residues is its entire amino acidsequence that is

E = 1198771119877211987731198774119877511987761198777sdot sdot sdot 119877119871 (9)

where 1198771represents the 1st residue 119877

2the 2nd residue and

so forth An enzyme sample thus formulated can contain itsmost complete information This is an obvious advantage ofthe sequential representation To get the desired results thesequence-similarity-search-based tools such as BLAST [5253] are usually utilized to conduct the prediction Howeverthis kind of approach failed to work when the query enzymedid not have significant homology to enzyme of known char-acters Thus various nonsequential representation modelswere proposed The simplest nonsequential model for anenzyme was based on its amino acid composition (AAC) asdefined by

E = [11989111198912sdot sdot sdot 11989120]

T (10)

where 119891119906(119906 = 1 2 20) are the normalized occurrence

frequencies of the 20 native amino acids [54ndash56] in theenzyme E and T has the same meaning as in (2) and (8)The AAC-discrete model was widely used for identifyingvarious attributes of proteins (see eg [57ndash61]) Howeveras can be seen from (10) all the sequence order effectswere lost by using the AAC-discrete model This is its mainshortcoming To avoid completely losing the sequence-orderinformation the pseudo amino acid composition [62 63]or Choursquos PseAAC [3] was proposed to replace the simpleAAC model Since the concept of PseAAC was proposedin 2001 [62] it has penetrated into almost all the fields of

protein attribute predictions and computational proteomicssuch as predicting supersecondary structure [64] predictingmetalloproteinase family [65] predicting membrane pro-tein types [66 67] predicting protein structural class [68]discriminating outer membrane proteins [69] identifyingantibacterial peptides [70] identifying allergenic proteins[71] identifying bacterial virulent proteins [72] predictingprotein subcellular location [73 74] identifying GPCRs andtheir types [75] identifying protein quaternary structuralattributes [76] predicting protein submitochondria locations[77] identifying risk type of human papillomaviruses [78]identifying cyclin proteins [79] predicting GABA(A) recep-tor proteins [80] and predicting cysteine S-nitrosylation sitesin proteins [81] among many others (see a long list of paperscited in the References section of [33]) Recently the conceptof PseAAC was further extended to represent the featurevectors of DNA and nucleotides [36 82] as well as otherbiological samples (see eg [83 84]) Because it has beenwidely and increasingly used recently two powerful soft-wares called ldquoPseAAC-Builderrdquo [85] and ldquopropyrdquo [86] wereestablished for generating various special Choursquos pseudo-amino acid compositions in addition to the web-serverPseAAC [87] built in 2008 According to a recent review [33]the general formof Choursquos PseAAC for an enzyme sample canbe formulated by

E = [12059511205952sdot sdot sdot 120595119906sdot sdot sdot 120595Ω]

T (11)

where the subscript Ω is an integer and its value as wellas the components 120595

119906(119906 = 1 2 Ω) will depend on

how to extract the desired information from the amino acidsequence of E (cf (10)) Next let us describe how to extractuseful information from the benchmark datasetS andOnlineSupporting Information S2 to define the enzyme samplesconcerned via (11)

To incorporate as much useful information as possiblefrom an enzyme sample we are to approach this problemfrom three different angles followed by incorporating thefeature elements thus obtained into the general form ofPseAAC of (11)(a) Amino Acid Composition The components of amino acidcomposition have beenwidely used to predict various proteinattributes [57ndash61] In this study they were also included as thefirst 20 elements in the general Choursquos PseAAC (cf (11)) thatis

120595119906= 119891119906

(119906 = 1 2 20) (12)

where 119891119906has the same meaning as in (10)

(b) Dipeptide Composition Dipeptide composition has beenused to predict the protein secondary structural contents[88 89] as well as various protein attributes (see eg [90ndash93]) The number of different dipeptides is 20 times 20 = 400Suppose that the normalized occurrence frequencies of the400 dipeptides in an enzyme sample are given by

119891

(2)

119906(119906 = 1 2 400) (13)

BioMed Research International 5

Incorporating the above 400 dipeptide components into (11)we have

120595119906+20

= 119891

(2)

119906(119906 = 1 2 400) (14)

(c) Sequential Evolution Information Biology is a naturalscience with a historic dimension All biological specieshave developed starting out from a very limited number ofancestral species Their evolution involves changes of singleresidues insertions and deletions of several residues [94]gene doubling and gene fusion With these changes accu-mulated for a long period of time many similarities betweeninitial and resultant amino acid sequences are graduallyeliminated but the corresponding proteins may still sharemany common attributes [18] such as having basically thesame biological function and residing at a same subcellularlocation To extract the sequential evolution information anduse it to define the components of (11) the PSSM (PositionSpecific Scoring Matrix) was used as described next

According to Schaffer et al [95] the sequence evolutioninformation of enzyme E with 119871 amino acid residues can beexpressed by an 119871 times 20matrix as given by

P(0)PSSM =

[

[

[

[

[

[

[

119864

0

1rarr1119864

0

1rarr2sdot sdot sdot 119864

0

1rarr20

119864

0

2rarr1119864

0

2rarr2sdot sdot sdot 119864

0

2rarr20

119864

0

119871rarr1119864

0

119871rarr2sdot sdot sdot 119864

0

119871rarr20

]

]

]

]

]

]

]

(15)

where 1198640119894rarr 119895

represents the original score of the 119894th aminoacid residue (119894 = 1 2 119871) in the enzyme sequence changedto amino acid type 119895 (119895 = 1 2 20) in the process ofevolution Here the numerical codes 1 2 20 are used torepresent the 20 native amino acid types denoted by A CD E F G H I K L M N P Q R S T V W and Y The119871 times 20 scores in (15) were generated by using PSI-BLAST[96] to search the UniProtKBSwiss-Prot database (Release2013-05) through three iterations with 0001 as the 119864 valuecutoff for multiple sequence alignment against the sequenceof the enzyme E In order to make every element in (15) bescaled from their original score ranges into region of [0 1]we performed a conversion through the standard sigmoidfunction to make it become

P(1)PSSM =

[

[

[

[

[

[

[

119864

1

1rarr1119864

1

1rarr2sdot sdot sdot 119864

1

1rarr20

119864

1

2rarr1119864

1

2rarr2sdot sdot sdot 119864

1

2rarr20

119864

1

119871rarr1119864

1

119871rarr2sdot sdot sdot 119864

1

119871rarr20

]

]

]

]

]

]

]

(16)

where

119864

1

119894rarr 119895=

1

1 + 119890

minus1198640

119894rarr 119895

(1 le 119894 le 119871 1 le 119895 le 20) (17)

Nowwe extract the useful information from (16) to define thecomponents of (11) via the following approach

120595119906+420

= ℓ119906

(119906 = 1 2 20) (18)

where

ℓ119895=

1

119871

times

119871

sum

119896=1

119864

1

119896rarr119895(119895 = 1 2 20) (19)

(d) Grey System Model Approach The grey system theory[97] is quite useful in dealing with complicated systems thatlack sufficient information or need to process uncertaininformation According to the grey system theory we canextract the following information from the 119895th column of(16) that is

[

119886

119895

1

119886

119895

2

] = (BT119895B119895)

minus1

BT119895U119895

(119895 = 1 2 20) (20)

where

B119895=

[

[

[

[

[

[

[

[

[

[

minus119864

1

2rarr119895minus119864

1

1rarr119895minus 05119864

1

2rarr1198951

minus119864

1

3rarr119895minus

2

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

3rarr1198951

minus119864

1

119871rarr119895minus

119871minus1

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

119871rarr1198951

]

]

]

]

]

]

]

]

]

]

U119895=

[

[

[

[

[

[

119864

1

2rarr119895minus 119864

1

1rarr119895

119864

1

3rarr119895minus 119864

1

2rarr119895

119864

1

119871rarr119895minus 119864

1

119871minus1rarr119895

]

]

]

]

]

]

(21)

Therefore based on the grey system theory and (20) we canextract another 20 times 2 = 40 quantities from (16) to definethe components of (11) that is

120593119895=

1199081119886

119895

1when 119895 is an odd number

1199082119886

119895

2when 119895 is an even number

1 le 119895 le 20

(22)

where 1198861198951and 119886119895

2are given by (20) 119908

1and 119908

2are weight

factors which were all set to 1 in the current studySubstituting the elements in (12) (14) (18) and (22) we

finally obtain a total of Ω = 20 + 400 + 20 + 40 = 480

components for the PseAAC of (11) where

120595119906=

119891119906

when 1 le 119906 le 20119891

(2)

119906when 21 le 119906 le 420

ℓ119906

when 421 le 119906 le 440120593119906

when 440 le 119906 le 480

(23)

In other words in this study (11) or Choursquos PseAAC is a 480-D vector whose 480 components are given by (23) derivedfrom the amino acid composition dipeptide compositionsequential evolution information and grey system theory(e) Representing Enzyme-Drug Pairs Now the pair betweenan enzyme molecule E and a drug compound D can beformulated by combing (8) and (11) as given by

G = D oplus E

= [1198601sdot sdot sdot 119860

256119867119909

CF 1205951sdot sdot sdot 120595480]

T

(24)

6 BioMed Research International

where G represents the enzyme-drug pair oplus the orthogonalsum [51] and each of the (258 + 480) = 738 feature elementsis given in (8) and (23)

For the convenience of the later formulation let us use119909119894(119894 = 1 2 738) to represent the 738 components of (24)

that is

G = [11990911199092sdot sdot sdot 119909119894sdot sdot sdot 119909738]

T (25)

To optimize the prediction results different weights wereusually tested for each of the elements in (25) However sinceit would consume a lot of computational time for a total of 738weight factors here let us adopt the normalization approachto deal with this problem as done in [98 99] that is convert119909119894in (25) to 119910

119894according to the following equation

119910119894=

2tanminus1 (119909119894)

120587

(119894 = 1 2 738) (26)

where tanminus1 means arctangent By means of (26) everycomponent in (25) will be converted into the range of[minus1 1] that is we have minus1 le 119910

119894le 1 As demonstrated

in [98 99] the normalization approach via (26) was quiteeffective in enhancing the quality of prediction operated ina high dimension space Therefore in this study we wouldnot to take the procedure of optimizing the weight factorssignificantly reducing the computational times

23 Fuzzy 119870-Nearest Neighbour Algorithm The 119870-NN (119870-Nearest Neighbor) classifier is quite popular in patternrecognition community owing to its good performance andsimple-to-use feature According to the 119870-NN rule [100]named also as the ldquovoting 119870-NN rulerdquo the query sampleshould be assigned to the subset represented by a majorityof its 119870 nearest neighbors as illustrated in Figure 5 of [33]

Fuzzy 119870-NN classification method [101] is a specialvariation of the119870-NNclassification family Instead of roughlyassigning the label based on a voting from the 119870 nearestneighbors it attempts to estimate the membership valuesthat indicate how much degree the query sample belongsto the classes concerned Obviously it is impossible for anycharacteristic description to contain complete informationwhich would make the classification ambiguous In view ofthis the fuzzy principle is very reasonable and particularlyuseful in dealing with complicated biological systems such asidentifying nuclear receptor subfamilies [102] characterizingthe structure of fast-folding proteins [103] classifying Gprotein-coupled receptors [104] predicting protein quater-nary structural attributes [105] predicting protein structuralclasses [106 107] and so forth

Next let us give a brief introduction how to use thefuzzy119870-NN approach to identify the interactions between theenzymes and the drug compounds in the network concerned

Supposing that S(119873) = G1G2 G

119873 is a set of vec-

tors representing 119873 enzyme-drug pairs in a training set

classified into two classes 119862+ 119862minus where 119862+ denotes theinteractive pair class while 119862minus the noninteractive pair classSlowast(G) = Glowast

1Glowast2 Glowast

119870 sub S(119873) is the subset of the 119870

nearest neighbor pairs to the query pair G Thus the fuzzymembership value for the query pair G in the two classes ofS(119873) is given by

120583

+

(G) =sum

119870

119895=1120583

+(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

120583

minus

(G) =sum

119870

119895=1120583

minus(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

(27)

where 119870 is the number of the nearest neighbors counted forthe query pairG 120583+(Glowast

119895) and 120583minus(Glowast

119895) the fuzzy membership

values of the training sample Glowast119895to the class 119862+ and 119862minus

respectively as will be further defined next 119889(GGlowast119895) the

cosine distance between G and its 119895th nearest pair Glowast119895in

the training dataset S(119873) 120593(gt 1) the fuzzy coefficient fordetermining how heavily the distance is weighted whencalculating each nearest neighborrsquos contribution to the mem-bership value Note that the parameters 119870 and 120593 will affectthe computation result of (27) and they will be optimizedby a grid-search as will be described later Also variousother metrics can be chosen for 119889(GGlowast

119895) such as Euclidean

distance Hamming distance [108] andMahalanobis distance[55 109]

The quantitative definitions for the aforementioned120583

+(Glowast119895) and 120583minus(Glowast

119895) in (27) are given by

120583

+

(Glowast119895) =

1 if Glowast119895isin 119862

+

0 otherwise

120583

minus

(Glowast119895) =

1 if Glowast119895isin 119862

minus

0 otherwise

(28)

Substituting the results obtained by (27) into (28) it followsthat if 120583+(G) gt 120583

minus(G) then the query pair G is an

interactive couple otherwise noninteractive In other wordsthe outcome can be formulated as

G isin

119862

+ if 120583+ (G) gt 120583minus (G)

119862

minus otherwise

(29)

If there is a tie between 120583+(G) and 120583minus(G) the query pair Gwill be randomly assigned to one of the two classes Howevercase like that is quite rare and in this study never happened

The predictor thus established is called iEzy-Drugwhere ldquoirdquo means identify and ldquoEzy-Drugrdquo means the inter-action between enzyme and drug To provide an intuitiveoverall picture a flowchart is provided in Figure 2 to showthe process of how the classifier works in identifying enzyme-drug interactions

BioMed Research International 7

Input a queryprotein sequence and

a drug code

Create Chous PseAAC of a protein

Create feature vector of a drug

Feature fusion

engineTraining dataset

Yes No

Noninteractive enzyme drug pair

Interactive enzymedrug pair

AACDipeptideSeq Evol

Grey model

Mol FptInfo entropy

CF

FKNN operation

Figure 2 A flowchart to show the operation process of the iEzy-Drug predictor See the text for further explanation

24 Criteria for Performance Evaluation In the literaturethe following equation set is often used for examining theperformance quality of a predictor

Sn = TPTP + FN

Sp = TNTN + FP

Acc = TP + TNTP + TN + FP + FN

MCC = (TP times TN) minus (FP times FN)radic(TP + FP) (TP + FN) (TN + FP) (TN + FN)

(30)

where TP represents the true positive TN the true negativeFP the false positive FN the false negative Sn the sensitiv-ity Sp the specificity Acc the accuracy MCC the Mathewrsquoscorrelation coefficient

To most biologists however the four metrics as formu-lated in (30) are not quite intuitive and easier-to-understandparticularly for the Mathewrsquos correlation coefficient Herelet us adopt the Choursquos symbols to formulate the previousfour metrics By means of Choursquos symbols [111 112] the rates

of correct predictions for the interactive enzyme-drug pairsin dataset S+ and the noninteractive enzyme-drug pairs indataset Sminus are respectively defined by (cf (1))

Λ

+

=

119873

+minus 119873

+

minus

119873

+ for the interactive enzyme-drug pairs

Λ

minus

=

119873

minusminus 119873

minus

+

119873

minus for the noninteractive enzyme-drug pairs

(31)

where 119873+ is the total number of the interactive enzyme-drug pairs investigated while 119873+

minusis the number of the

interactive enzyme-drug pairs incorrectly predicted as thenoninteractive enzyme-drug pairs 119873minus is the total numberof the noninteractive enzyme-drug pairs investigated while119873

minus

+is the number of the noninteractive enzyme-drug pairs

incorrectly predicted as the interactive enzyme-drug pairsThe overall success prediction rate is given by [113] as follows

Λ =

Λ

+119873

++ Λ

minus119873

minus

119873

++ 119873

minus= 1 minus

119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

(32)

It is obvious from (31)-(32) that if and only if none ofthe interactive enzyme-drug pairs and the noninteractiveenzyme-drug pairs are mispredicted that is 119873+

minus= 119873

minus

+= 0

and Λ+ = Λ

minus= 1 we have the overall success rate Λ = 1

Otherwise the overall success rate would be smaller than 1The relations between the symbols in (32) and those in

(30) are given by

TP = 119873+ minus 119873+minus

TN = 119873

minus

minus 119873

minus

+

FP = 119873minus+

FN = 119873

+

minus

(33)

Substituting (33) into (30) and also noting (31)-(32) we obtain

Sn = Λ+ = 1 minus119873

+

minus

119873

+

Sp = Λminus = 1 minus119873

minus

+

119873

minus

Acc = Λ = 1 minus119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

MCC =1 minus ((119873

+

minus119873

+) + (119873

minus

+119873

minus))

radic(1 + (119873

minus

+minus 119873

+

minus) 119873

+) (1 + (119873

+

minusminus 119873

minus

+) 119873

minus)

(34)

Now it is obvious to see from (34) when 119873

+

minus= 0

meaning none of the interactive enzyme-drug pairs was

8 BioMed Research International

02

46

810

1

15

2089

0895

09

0905

091

0915

092

ACC

K120593

Figure 3 A 3D plot to show how the parameter in (27) wasoptimized for the iEzy-Drug predictor

mispredicted to be a noninteractive enzyme-drug pair wehave the sensitivity Sn = 1 while 119873+

minus= 119873

+ meaning thatall the interactive enzyme-drug pairs weremispredicted to bethe noninteractive enzyme-drug pairs we have the sensitivitySn = 0 Likewise when 119873minus

+= 0 meaning none of the

noninteractive enzyme-drug pairs wasmispredicted we havethe specificity Sp = 1 while 119873minus

+= 119873

minus meaning all thenoninteractive enzyme-drug pairs were incorrectly predictedas interactive enzyme-drug pairs we have the specificity Sp =0 When119873+

minus= 119873

minus

+= 0 meaning that none of the interactive

enzyme-drug pairs in the dataset S+ and none of the nonin-teractive enzyme-drug pairs in Sminus was incorrectly predictedwe have the overall accuracy Acc = Λ = 1 while 119873+

minus= 119873

+

and 119873minus+= 119873

minus meaning that all the interactive enzyme-drugpairs in the dataset S+ and all the noninteractive enzyme-drug pairs in Sminus were mispredicted we have the overallaccuracy Acc = Λ = 0 The MCC correlation coefficient isusually used for measuring the quality of binary (two-class)classifications When 119873+

minus= 119873

minus

+= 0 meaning that none

of the interactive enzyme-drug pairs in the dataset S+ andnone of the noninteractive enzyme-drug pairs in Sminus weremispredicted we have MCC = 1 when 119873+

minus= 119873

+2 and

119873

minus

+= 119873

minus2 we have MCC = 0 meaning no better than

random prediction when 119873+minus= 119873

+ and 119873minus+= 119873

minus we haveMCC = minus1 meaning total disagreement between predictionand observation As we can see from the previous discussionit is much more intuitive and easier-to-understand whenusing (34) to examine a predictor for its sensitivity specificityoverall accuracy and Mathewrsquos correlation coefficient It isinstructive to point out that themetrics as defined in (30) and(34) are valid for single label systems for multilabel systemsa set of more complicated metrics should be used as given in[114]

3 Results and Discussion

31 Cross-Validation How to properly examine the predic-tion quality is a key for developing a new predictor and

False positive rate

True

pos

itive

rate

1

01

02

03

04

05

06

07

08

09

0

101 02 03 04 05 06 07 08 090

BaselineiEzy-Drug AUC = 09377

Figure 4 A plot for the ROC curve to quantitatively show theperformance of the iEzy-Drug predictor

estimating its potential application value Generally speak-ing the following three cross-validation methods are oftenused to examine a predictor of its effectiveness in practicalapplication independent dataset test subsampling or 119870-fold(such as 5-fold 7-fold or 10-fold) test and jackknife test[108] However as elaborated by a penetrating analysis in[115] considerable arbitrariness exists in the independentdataset test Also as demonstrated by (27)ndash(29) in [33]the subsampling test (or 119870-fold cross-validation) cannotavoid arbitrariness either Only the jackknife test is the leastarbitrary that can always yield a unique result for a givenbenchmark dataset Therefore the jackknife test has beenwidely recognized and increasingly utilized by investigatorsto examine the quality of various predictors (see eg [66 7174 80]) Accordingly the success rate by the jackknife test wasalso used to optimize the two uncertain parameters 119870 and 120593in (27) The result thus obtained is shown in Figure 3 fromwhich we obtain when 119870 = 6 and 120593 = 15 the iEzy-Drugpredictor reaches its optimized status

The success rates thus obtained by the jackknife test inidentifying interactive Enzyme-drug pairs or noninteractiveenzyme-drug pairs on the benchmark dataset S (cf OnlineSupporting Information S1) are given in Table 1 where forfacilitating comparison the corresponding result by He et al[110] is also given As we can see from the table the overallaccuracy Acc achieved by iEzy-Drug was 9103 remarkablyhigher than 8548 the corresponding rate obtained byHe et al [110] on the same benchmark Furthermore listedin Table 1 are also the values obtained by iEzy-Drug for theother three metrics that is Sn = 9081 Sp = 9114 andMCC = 8039 indicating that the accuracy of iEzy-Drug isnot only very high but also quite stale

To provide a graphical illustration to show the per-formance of the current binary classifier iEzy-Drug asits discrimination threshold is varied a 2D plot calledReceiver Operating Characteristic (ROC) curve [116 117]was also given (Figure 4) In the ROC curve the verticalcoordinate 119884 is for the true positive rate or Sn (cf (34))while horizontal coordinate 119883 for the false positive rate

BioMed Research International 9

Read Me Data Citation

Enter both the protein sequence and the drug code (example) the number of query pairs is limited at 10 or less for each submission

Submit Clear

You can download the program and run it on your own local computer

iEzy-Drug identifying the interaction between enzymes anddrugs in cellular networking

Figure 5 A semiscreenshot to show the top page of the iEzy-Drugweb-server Its web-site address is at httpwwwjci-bioinfocniEzy-Drug

Table 1 The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drugpairs for the benchmark dataset S (cf Online Supporting Information S1)

Method Acc Sn Sp MCC

iEzy-Druga 74258157 = 9103 24692719 = 9081 49565438 = 9114 8039NN predictorb 8548 NA NA NAaSee (27) where the parameters119870 = 6 and 120593 = 15bSee [110]

or 1-Sp The best possible prediction method would yielda point with the coordinate (0 1) representing 100 truepositive rate (sensitivity Sn) and 0 false positive rate or100 specificity Therefore the (0 1) point is also calleda perfect classification A completely random guess wouldgive a point along a diagonal from the point (0 0) to (1 1)The area under the ROC curve also called Area Under theROC (AUROC) is often used to indicate the performancequality of a binary classifier the value 05 of AUROCis equivalent to random prediction while 1 of AUROCrepresents a perfect one As we can see from Figure 4the AUROC value obtained by iEzy-Drug is 09377

The reason why iEzy-Drug can remarkably improve theprediction quality is that it has introduced the 2D molecularfingerprints to represent drug samples see Online SupportingInformation S3 for the detailed fingerprint expressions for thedrugs listed in Online Supporting Information S1 and thatit has successfully used PseAAC to incorporate the featuresderived from the sequences of enzymes that are essentialfor identifying the interaction of enzymes with drugs in thecellular networking

To enhance the value of its practical applications theweb server for iEzy-Drug has been established that canbe freely accessible at httpwwwjci-bioinfocniEzy-DrugIt is anticipated that the web server will become a usefulhigh throughput tool for both basic research and drug

development in the relevant areas or at the very least playa complementary role to the existing method [39 110 118] forwhich so far noweb-serverwhatsoever has been provided yet

32 The Protocol or User Guide For the convenience of thevast majority of biologists and pharmaceutical scientists herelet us provide a step-by-step guide to show how the userscan easily get the desired result by means of the web serverwithout the need to follow the complicated mathematicalequations presented in this paper for the process of develop-ing the predictor and its integrityStep 1 Open the web server at the site httpwwwjci-bio-infocniEzy-Drug and you will see the top page of thepredictor on your computer screen as shown in Figure 5Click on the ReadMe button to see a brief introduction aboutiEzy-Drug predictor and the caveat when using itStep 2 Either type or copypaste the query pairs into theinput box at the center of Figure 5 Each query pair consistsof two parts one is for the protein sequence and the other forthe drug The enzyme sequence should be in FASTA formatwhile the drug in the KEGG code Examples for the querypairs input can be seen by clicking on the Example buttonright above the input boxStep 3 Click on the Submit button to see the predicted resultFor example if you use the four query pairs in the Example

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

2 BioMed Research International

(a) (b)

Figure 1 A schematic drawing to illustrate how to use Choursquos distorted key theory to develop peptide drugs against HIVAIDS (a) shows agood fitting and binding of a peptide to the active site of HIV protease right before it is cleaved by the enzyme (b) shows that the peptide hasbecome a noncleavable one after its scissile bond is modified although it can still tightly bind to the active site Such a modified peptide orldquodistorted keyrdquo will automatically become an inhibitor candidate against HIV protease

emergence of molecular medicine have provided excellentopportunity to discover unknown target enzymes for drugsMany efforts were made in this regard by computationallyanalyzing drug-enzyme interactions The most commonlyused approaches are docking simulations (see eg [16ndash19])and protein cleavage site analysis (see eg [8 12 13]) basedon Choursquos distorted key theory [14] However the latterapproach is mainly used to find peptide drugs Comparedwith the smaller organic compounds although peptide drugshave the advantage of low toxicity to human body theyhave the shortcoming of poor metabolic stability and lowbioavailability due to their inability to readily crossing thrumembrane barriers such as the intestinal and blood-brainbarriers [20] In contrast the molecular docking is indeed auseful vehicle for investigating the interaction of an enzymereceptor with its organic inhibitor and revealing their bindingmechanism as demonstrated by a series of studies [11 19ndash23] However to conduct molecular docking a necessaryprerequisite is the availability of the 3D (three dimensional)structure of the targeted enzyme Unfortunately the 3Dstructures of many enzymes are still unknown Although X-ray crystallography is a powerful tool in determining the 3Dstructures of enzymes it is time-consuming and expensiveParticularly not all enzymes can be successfully crystallizedFor example membrane enzymes are very difficult to crys-tallize and most of them will not dissolve in normal solventsTherefore so far very few membrane enzyme 3D structureshave been determined Although NMR is indeed a verypowerful tool in determining the 3D structures of membraneproteins as indicated by a series of recent publications (seeeg [24ndash30]) it is time-consuming and costly To acquire thestructural information in a timelymanner one has to resort tovarious structural bioinformatics tools (see eg [18 31 32])Unfortunately the number of templates for developing highquality 3D structures by structural bioinformatics is verylimited

Therefore it would save us a lot of time and moneyif we could identify the interactions between enzymes anddrugs before carrying out any intense study in this regardIn view of this the present study was initiated in an attemptto develop a computational method based on the sequence-derived features that can be used to predict the drug-enzymeinteractions in cellular networking

As summarized in a comprehensive review [33] anddemonstrated by a series of recent publications [34ndash37] tosuccessfully develop the desiredmethod we need to considerthe following procedures (i) construct or select a validbenchmark dataset to train and test the predictor (ii) denotethe drug-enzyme samples with an effective formulation thatcan truly reflect their intrinsic relation with the target to bepredicted (iii) introduce or develop a powerful algorithm(or engine) to operate the prediction (iv) conduct a rigorouscross-validation to objectively evaluate its anticipated accu-racy (v) establish a user-friendly web-server for the predictorthat is freely accessible to the public Next let us elaboratehow to deal with these procedures one by one

2 Materials and Methods

21 Benchmark Dataset The data used in this study werecollected from Kyoto Encyclopedia of Genes and Genomes(KEGG) [38] at httpwwwkeggjpkegg which is a data-base resource for understanding high-level functions andutilities of the biological system such as the cell theorganism and the ecosystem from molecular-level infor-mation especially large-scale molecular datasets generatedby genome sequencing and other high-throughput exper-imental technologies For the current study the benchmarkdataset S can be formulated as

S = S+

cup Sminus

(1)

where S+ is the positive subset that consists of the interactiveenzyme-drug pairs only while Sminus is the negative subsetthat contains of the noninteractive enzyme-drug pairs onlyand the symbol cup represents the union in the set theoryHere the ldquointeractiverdquo pair means the pair whose twocounterparts are interacted with each other in the drug-target networks as defined in the KEGG database [38]while the ldquononinteractiverdquo pair means that its two counterparts are not interacted with each other in the drug-targetnetworks The positive dataset S+ contains 2719 enzyme-drug pairs derived from Yamanishi et al [39] The negativedataset Sminus contains 5438 noninteractive enzyme-drug pairswhich were derived according to the following procedures(i) separating each of the pairs in S+ into single drug and

BioMed Research International 3

enzyme (ii) recoupling each of the single drugs with eachof the single enzymes into pairs in a way that none of themoccurred inS+ (iii) randomly picking the pairs thus formeduntil they reached the number two times as many as thepairs in S+ The 2719 interactive enzyme-drug pairs and5438 noninteractive enzyme-drug pairs are given in OnlineSupporting Information S1 (see Supplementary Materialavailable online at httpdxdoiorg1011552013701317) Allthe detailed information for the compounds or drugs listedthere can be found in the KEGG database via their codes

22 Sample Representation Since each of the samples in thecurrent network system contains an enzyme (protein) anda drug a combination of the following two approaches wasadopted to represent the enzyme-drug pair samples

221 Drug

(a) 2D Molecular Fingerprints Although the number ofdrugs is extremely large most of them are small organicmolecules and are composed of some fixed small structures[40] The identification of small molecules or structures canbe used to detect the drug-target interactions [41] Molec-ular fingerprints are bit-string representations of molecularstructure and properties [42] It should be pointed outthat there are many types of structural representations thathave been suggested for the description of drug moleculesincluding physicochemical properties [43] chemical graphs[44] topological indices [45] 3D pharmacophore patternsandmolecular fields In the current study let us use the simpleand generally adopted 2Dmolecular fingerprints to representdrug molecules as described below

First for each of the drugs concerned we can obtaina MOL file from the KEGG database [38] via its codethat contains the detailed information of chemical structureSecond we can convert the MOL file format into its 2Dmolecular fingerprint file format by using a chemical toolboxsoftware called OpenBabel [46] which can be downloadedfrom the website at httpopenbabelorg The current ver-sion of OpenBabel can generate four types of fingerprintsFP2 FP3 FP4 and MACCS In the current study we usedthe FP2 fingerprint format It is a path-based fingerprint thatidentifies small molecule fragments based on all linear andring substructures and maps them onto a bit-string using ahash function (somewhat similar to the daylight fingerprints[47 48]) It is a length of 256-bit hexadecimal string obtainedfrom theOpenBabel andwe can convert it to a 256-bit vectorThen a molecular fingerprint can be formulated as a 256-Dvector given by

MF = [1198601 sdot sdot sdot 119860119895 sdot sdot sdot 119860256]T (2)

where 119860119895(119895 = 1 2 256) is an integer between 0 and 15

and T is the matrix transpose operatorIn order to capture as much useful information from

a molecular fingerprint as possible we can also convertthe above 256-bit hexadecimal string into a 1024-bit binaryvector which is a digital sequence only including 0 and 1

and consider two different digital signal characteristics for thedigital sequence as follows

(b) Information Entropy Shannon proposed that any infor-mation is redundant and redundant size is related withthe occurrence probability or uncertainty of each symbolsuch as numbers letters or words among the informationThe information entropy for a system with a probabilitydistribution 119875(119909

119894) for two classes information entropy [49] is

defined as

119867119909= minussum

119909

119875 (119909119894) log2119875 (119909119894) (119894 = 0 1) (3)

where 119875(119909119894) represents the occurrence probability of num-

ber 119894 in the aforementioned 1024-bit binary vector andthe information entropy 119867

119909is a measure value of the

information amount For example for the digital sequence100100011010010 the value of the information entropy 119867

119909

thus obtained is

119875 (1199090) =

9

15

= 06

119875 (1199091) =

6

15

= 04

119867119909= minus (06 times log

206 + 04 times log

204) = 0971

(4)

(c) Complexity Factor The Lempel-Ziv (LZ) complexity [50]reflects the order that is retained in the sequence and hencewas adopted in this study For each step only two operationswere allowed in the process to get the LZ complexity eithercopying the longest section from the part of a nonemptysequence or generating an additional symbol mark thatensures the uniqueness of per component 119878(119894

119896minus1rarr 119894119896) Its

substring is expressed by

119878 (119894 997888rarr 119895) = 119898119894119898119894+1119898119894+2sdot sdot sdot 119898119895

(1 le 119894 le 119895 le 119871) (5)

where 1198981represents the 1st digital value 119898

2the 2nd value

and so forth A nonempty digital sequence is synthesizedaccording to the following formula

Syn (119878) = 119878 (1 997888rarr 1198941) ∙ 119878 (119894

1+ 1 997888rarr 119894

2)

∙ sdot sdot sdot ∙ 119878 (119894119898minus1

+ 1 997888rarr 119894119871)

(6)

Suppose that 119878 = 11989811198982119898311989841198985sdot sdot sdot 119898119871has been recon-

structed by the subsymbol 119898119903which is viewed as the newly

inserted symbol The substring up to 119898119903will be denoted by

119878(1 rarr 119903)∙ where the bold dot ∙ indicates that 119898119903is a newly

inserted symbol for checkingwhether the rest of the substring119878(119903+1 rarr 119871) can be reconstructed by a simple process At firstsuppose 119878(119902) = 119898

119903+ 1 and see whether 119878(119902) is the substring

for the subsequence 119878(1 rarr 119903) which means deleting thelast symbol from the substring 119878(1 rarr 119903)119878(119902) If the answeris ldquonordquo we insert 119878(119902) into the sequence followed by a dot ∙Thus it could not be obtained by the same operation If theanswer is ldquoyesrdquo no new symbol is needed and we can go on toproceed with 119878(119902) = 119898

119903+1119898119903+2

and repeat the same previousprocedure The LZ complexity is the number of dots (plus

4 BioMed Research International

one if the string is not terminated by a dot) For example forthe sequence 100100011010010 syn(119875) and the correspondingcomplexity factor CF are described as

Syn (119878) = 1 ∙ 0 ∙ 01 ∙ 000 ∙ 11 ∙ 01001 ∙ 0

CF = 7(7)

Thus by adding the information entropy 119867119909(4) and com-

plexity factor CF (7) into the molecular fingerprint MF (2)we obtained a total of (256 + 1 + 1) = 258 feature elements torepresent a drug compound that is it can now be formulatedas a 258-D vector given by

119863 = [11986011198602sdot sdot sdot 119860

256119867119909

CF]T (8)

where 119860119894has the same meaning as in (2) while 119867

119909and CF

are the information entropy and complexity factor respec-tively as described in the previous two sections

222 Enzyme The sequences of the enzymes involved inthis study are given in Online Supporting Information S2Now the problem is how to effectively represent these enzymesequences for the current study Generally speaking thereare two kinds of approaches to formulate enzyme sequencesthe sequential model and the nonsequential or discretemodel [51] The most typical sequential representation foran enzyme sample E with 119871 residues is its entire amino acidsequence that is

E = 1198771119877211987731198774119877511987761198777sdot sdot sdot 119877119871 (9)

where 1198771represents the 1st residue 119877

2the 2nd residue and

so forth An enzyme sample thus formulated can contain itsmost complete information This is an obvious advantage ofthe sequential representation To get the desired results thesequence-similarity-search-based tools such as BLAST [5253] are usually utilized to conduct the prediction Howeverthis kind of approach failed to work when the query enzymedid not have significant homology to enzyme of known char-acters Thus various nonsequential representation modelswere proposed The simplest nonsequential model for anenzyme was based on its amino acid composition (AAC) asdefined by

E = [11989111198912sdot sdot sdot 11989120]

T (10)

where 119891119906(119906 = 1 2 20) are the normalized occurrence

frequencies of the 20 native amino acids [54ndash56] in theenzyme E and T has the same meaning as in (2) and (8)The AAC-discrete model was widely used for identifyingvarious attributes of proteins (see eg [57ndash61]) Howeveras can be seen from (10) all the sequence order effectswere lost by using the AAC-discrete model This is its mainshortcoming To avoid completely losing the sequence-orderinformation the pseudo amino acid composition [62 63]or Choursquos PseAAC [3] was proposed to replace the simpleAAC model Since the concept of PseAAC was proposedin 2001 [62] it has penetrated into almost all the fields of

protein attribute predictions and computational proteomicssuch as predicting supersecondary structure [64] predictingmetalloproteinase family [65] predicting membrane pro-tein types [66 67] predicting protein structural class [68]discriminating outer membrane proteins [69] identifyingantibacterial peptides [70] identifying allergenic proteins[71] identifying bacterial virulent proteins [72] predictingprotein subcellular location [73 74] identifying GPCRs andtheir types [75] identifying protein quaternary structuralattributes [76] predicting protein submitochondria locations[77] identifying risk type of human papillomaviruses [78]identifying cyclin proteins [79] predicting GABA(A) recep-tor proteins [80] and predicting cysteine S-nitrosylation sitesin proteins [81] among many others (see a long list of paperscited in the References section of [33]) Recently the conceptof PseAAC was further extended to represent the featurevectors of DNA and nucleotides [36 82] as well as otherbiological samples (see eg [83 84]) Because it has beenwidely and increasingly used recently two powerful soft-wares called ldquoPseAAC-Builderrdquo [85] and ldquopropyrdquo [86] wereestablished for generating various special Choursquos pseudo-amino acid compositions in addition to the web-serverPseAAC [87] built in 2008 According to a recent review [33]the general formof Choursquos PseAAC for an enzyme sample canbe formulated by

E = [12059511205952sdot sdot sdot 120595119906sdot sdot sdot 120595Ω]

T (11)

where the subscript Ω is an integer and its value as wellas the components 120595

119906(119906 = 1 2 Ω) will depend on

how to extract the desired information from the amino acidsequence of E (cf (10)) Next let us describe how to extractuseful information from the benchmark datasetS andOnlineSupporting Information S2 to define the enzyme samplesconcerned via (11)

To incorporate as much useful information as possiblefrom an enzyme sample we are to approach this problemfrom three different angles followed by incorporating thefeature elements thus obtained into the general form ofPseAAC of (11)(a) Amino Acid Composition The components of amino acidcomposition have beenwidely used to predict various proteinattributes [57ndash61] In this study they were also included as thefirst 20 elements in the general Choursquos PseAAC (cf (11)) thatis

120595119906= 119891119906

(119906 = 1 2 20) (12)

where 119891119906has the same meaning as in (10)

(b) Dipeptide Composition Dipeptide composition has beenused to predict the protein secondary structural contents[88 89] as well as various protein attributes (see eg [90ndash93]) The number of different dipeptides is 20 times 20 = 400Suppose that the normalized occurrence frequencies of the400 dipeptides in an enzyme sample are given by

119891

(2)

119906(119906 = 1 2 400) (13)

BioMed Research International 5

Incorporating the above 400 dipeptide components into (11)we have

120595119906+20

= 119891

(2)

119906(119906 = 1 2 400) (14)

(c) Sequential Evolution Information Biology is a naturalscience with a historic dimension All biological specieshave developed starting out from a very limited number ofancestral species Their evolution involves changes of singleresidues insertions and deletions of several residues [94]gene doubling and gene fusion With these changes accu-mulated for a long period of time many similarities betweeninitial and resultant amino acid sequences are graduallyeliminated but the corresponding proteins may still sharemany common attributes [18] such as having basically thesame biological function and residing at a same subcellularlocation To extract the sequential evolution information anduse it to define the components of (11) the PSSM (PositionSpecific Scoring Matrix) was used as described next

According to Schaffer et al [95] the sequence evolutioninformation of enzyme E with 119871 amino acid residues can beexpressed by an 119871 times 20matrix as given by

P(0)PSSM =

[

[

[

[

[

[

[

119864

0

1rarr1119864

0

1rarr2sdot sdot sdot 119864

0

1rarr20

119864

0

2rarr1119864

0

2rarr2sdot sdot sdot 119864

0

2rarr20

119864

0

119871rarr1119864

0

119871rarr2sdot sdot sdot 119864

0

119871rarr20

]

]

]

]

]

]

]

(15)

where 1198640119894rarr 119895

represents the original score of the 119894th aminoacid residue (119894 = 1 2 119871) in the enzyme sequence changedto amino acid type 119895 (119895 = 1 2 20) in the process ofevolution Here the numerical codes 1 2 20 are used torepresent the 20 native amino acid types denoted by A CD E F G H I K L M N P Q R S T V W and Y The119871 times 20 scores in (15) were generated by using PSI-BLAST[96] to search the UniProtKBSwiss-Prot database (Release2013-05) through three iterations with 0001 as the 119864 valuecutoff for multiple sequence alignment against the sequenceof the enzyme E In order to make every element in (15) bescaled from their original score ranges into region of [0 1]we performed a conversion through the standard sigmoidfunction to make it become

P(1)PSSM =

[

[

[

[

[

[

[

119864

1

1rarr1119864

1

1rarr2sdot sdot sdot 119864

1

1rarr20

119864

1

2rarr1119864

1

2rarr2sdot sdot sdot 119864

1

2rarr20

119864

1

119871rarr1119864

1

119871rarr2sdot sdot sdot 119864

1

119871rarr20

]

]

]

]

]

]

]

(16)

where

119864

1

119894rarr 119895=

1

1 + 119890

minus1198640

119894rarr 119895

(1 le 119894 le 119871 1 le 119895 le 20) (17)

Nowwe extract the useful information from (16) to define thecomponents of (11) via the following approach

120595119906+420

= ℓ119906

(119906 = 1 2 20) (18)

where

ℓ119895=

1

119871

times

119871

sum

119896=1

119864

1

119896rarr119895(119895 = 1 2 20) (19)

(d) Grey System Model Approach The grey system theory[97] is quite useful in dealing with complicated systems thatlack sufficient information or need to process uncertaininformation According to the grey system theory we canextract the following information from the 119895th column of(16) that is

[

119886

119895

1

119886

119895

2

] = (BT119895B119895)

minus1

BT119895U119895

(119895 = 1 2 20) (20)

where

B119895=

[

[

[

[

[

[

[

[

[

[

minus119864

1

2rarr119895minus119864

1

1rarr119895minus 05119864

1

2rarr1198951

minus119864

1

3rarr119895minus

2

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

3rarr1198951

minus119864

1

119871rarr119895minus

119871minus1

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

119871rarr1198951

]

]

]

]

]

]

]

]

]

]

U119895=

[

[

[

[

[

[

119864

1

2rarr119895minus 119864

1

1rarr119895

119864

1

3rarr119895minus 119864

1

2rarr119895

119864

1

119871rarr119895minus 119864

1

119871minus1rarr119895

]

]

]

]

]

]

(21)

Therefore based on the grey system theory and (20) we canextract another 20 times 2 = 40 quantities from (16) to definethe components of (11) that is

120593119895=

1199081119886

119895

1when 119895 is an odd number

1199082119886

119895

2when 119895 is an even number

1 le 119895 le 20

(22)

where 1198861198951and 119886119895

2are given by (20) 119908

1and 119908

2are weight

factors which were all set to 1 in the current studySubstituting the elements in (12) (14) (18) and (22) we

finally obtain a total of Ω = 20 + 400 + 20 + 40 = 480

components for the PseAAC of (11) where

120595119906=

119891119906

when 1 le 119906 le 20119891

(2)

119906when 21 le 119906 le 420

ℓ119906

when 421 le 119906 le 440120593119906

when 440 le 119906 le 480

(23)

In other words in this study (11) or Choursquos PseAAC is a 480-D vector whose 480 components are given by (23) derivedfrom the amino acid composition dipeptide compositionsequential evolution information and grey system theory(e) Representing Enzyme-Drug Pairs Now the pair betweenan enzyme molecule E and a drug compound D can beformulated by combing (8) and (11) as given by

G = D oplus E

= [1198601sdot sdot sdot 119860

256119867119909

CF 1205951sdot sdot sdot 120595480]

T

(24)

6 BioMed Research International

where G represents the enzyme-drug pair oplus the orthogonalsum [51] and each of the (258 + 480) = 738 feature elementsis given in (8) and (23)

For the convenience of the later formulation let us use119909119894(119894 = 1 2 738) to represent the 738 components of (24)

that is

G = [11990911199092sdot sdot sdot 119909119894sdot sdot sdot 119909738]

T (25)

To optimize the prediction results different weights wereusually tested for each of the elements in (25) However sinceit would consume a lot of computational time for a total of 738weight factors here let us adopt the normalization approachto deal with this problem as done in [98 99] that is convert119909119894in (25) to 119910

119894according to the following equation

119910119894=

2tanminus1 (119909119894)

120587

(119894 = 1 2 738) (26)

where tanminus1 means arctangent By means of (26) everycomponent in (25) will be converted into the range of[minus1 1] that is we have minus1 le 119910

119894le 1 As demonstrated

in [98 99] the normalization approach via (26) was quiteeffective in enhancing the quality of prediction operated ina high dimension space Therefore in this study we wouldnot to take the procedure of optimizing the weight factorssignificantly reducing the computational times

23 Fuzzy 119870-Nearest Neighbour Algorithm The 119870-NN (119870-Nearest Neighbor) classifier is quite popular in patternrecognition community owing to its good performance andsimple-to-use feature According to the 119870-NN rule [100]named also as the ldquovoting 119870-NN rulerdquo the query sampleshould be assigned to the subset represented by a majorityof its 119870 nearest neighbors as illustrated in Figure 5 of [33]

Fuzzy 119870-NN classification method [101] is a specialvariation of the119870-NNclassification family Instead of roughlyassigning the label based on a voting from the 119870 nearestneighbors it attempts to estimate the membership valuesthat indicate how much degree the query sample belongsto the classes concerned Obviously it is impossible for anycharacteristic description to contain complete informationwhich would make the classification ambiguous In view ofthis the fuzzy principle is very reasonable and particularlyuseful in dealing with complicated biological systems such asidentifying nuclear receptor subfamilies [102] characterizingthe structure of fast-folding proteins [103] classifying Gprotein-coupled receptors [104] predicting protein quater-nary structural attributes [105] predicting protein structuralclasses [106 107] and so forth

Next let us give a brief introduction how to use thefuzzy119870-NN approach to identify the interactions between theenzymes and the drug compounds in the network concerned

Supposing that S(119873) = G1G2 G

119873 is a set of vec-

tors representing 119873 enzyme-drug pairs in a training set

classified into two classes 119862+ 119862minus where 119862+ denotes theinteractive pair class while 119862minus the noninteractive pair classSlowast(G) = Glowast

1Glowast2 Glowast

119870 sub S(119873) is the subset of the 119870

nearest neighbor pairs to the query pair G Thus the fuzzymembership value for the query pair G in the two classes ofS(119873) is given by

120583

+

(G) =sum

119870

119895=1120583

+(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

120583

minus

(G) =sum

119870

119895=1120583

minus(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

(27)

where 119870 is the number of the nearest neighbors counted forthe query pairG 120583+(Glowast

119895) and 120583minus(Glowast

119895) the fuzzy membership

values of the training sample Glowast119895to the class 119862+ and 119862minus

respectively as will be further defined next 119889(GGlowast119895) the

cosine distance between G and its 119895th nearest pair Glowast119895in

the training dataset S(119873) 120593(gt 1) the fuzzy coefficient fordetermining how heavily the distance is weighted whencalculating each nearest neighborrsquos contribution to the mem-bership value Note that the parameters 119870 and 120593 will affectthe computation result of (27) and they will be optimizedby a grid-search as will be described later Also variousother metrics can be chosen for 119889(GGlowast

119895) such as Euclidean

distance Hamming distance [108] andMahalanobis distance[55 109]

The quantitative definitions for the aforementioned120583

+(Glowast119895) and 120583minus(Glowast

119895) in (27) are given by

120583

+

(Glowast119895) =

1 if Glowast119895isin 119862

+

0 otherwise

120583

minus

(Glowast119895) =

1 if Glowast119895isin 119862

minus

0 otherwise

(28)

Substituting the results obtained by (27) into (28) it followsthat if 120583+(G) gt 120583

minus(G) then the query pair G is an

interactive couple otherwise noninteractive In other wordsthe outcome can be formulated as

G isin

119862

+ if 120583+ (G) gt 120583minus (G)

119862

minus otherwise

(29)

If there is a tie between 120583+(G) and 120583minus(G) the query pair Gwill be randomly assigned to one of the two classes Howevercase like that is quite rare and in this study never happened

The predictor thus established is called iEzy-Drugwhere ldquoirdquo means identify and ldquoEzy-Drugrdquo means the inter-action between enzyme and drug To provide an intuitiveoverall picture a flowchart is provided in Figure 2 to showthe process of how the classifier works in identifying enzyme-drug interactions

BioMed Research International 7

Input a queryprotein sequence and

a drug code

Create Chous PseAAC of a protein

Create feature vector of a drug

Feature fusion

engineTraining dataset

Yes No

Noninteractive enzyme drug pair

Interactive enzymedrug pair

AACDipeptideSeq Evol

Grey model

Mol FptInfo entropy

CF

FKNN operation

Figure 2 A flowchart to show the operation process of the iEzy-Drug predictor See the text for further explanation

24 Criteria for Performance Evaluation In the literaturethe following equation set is often used for examining theperformance quality of a predictor

Sn = TPTP + FN

Sp = TNTN + FP

Acc = TP + TNTP + TN + FP + FN

MCC = (TP times TN) minus (FP times FN)radic(TP + FP) (TP + FN) (TN + FP) (TN + FN)

(30)

where TP represents the true positive TN the true negativeFP the false positive FN the false negative Sn the sensitiv-ity Sp the specificity Acc the accuracy MCC the Mathewrsquoscorrelation coefficient

To most biologists however the four metrics as formu-lated in (30) are not quite intuitive and easier-to-understandparticularly for the Mathewrsquos correlation coefficient Herelet us adopt the Choursquos symbols to formulate the previousfour metrics By means of Choursquos symbols [111 112] the rates

of correct predictions for the interactive enzyme-drug pairsin dataset S+ and the noninteractive enzyme-drug pairs indataset Sminus are respectively defined by (cf (1))

Λ

+

=

119873

+minus 119873

+

minus

119873

+ for the interactive enzyme-drug pairs

Λ

minus

=

119873

minusminus 119873

minus

+

119873

minus for the noninteractive enzyme-drug pairs

(31)

where 119873+ is the total number of the interactive enzyme-drug pairs investigated while 119873+

minusis the number of the

interactive enzyme-drug pairs incorrectly predicted as thenoninteractive enzyme-drug pairs 119873minus is the total numberof the noninteractive enzyme-drug pairs investigated while119873

minus

+is the number of the noninteractive enzyme-drug pairs

incorrectly predicted as the interactive enzyme-drug pairsThe overall success prediction rate is given by [113] as follows

Λ =

Λ

+119873

++ Λ

minus119873

minus

119873

++ 119873

minus= 1 minus

119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

(32)

It is obvious from (31)-(32) that if and only if none ofthe interactive enzyme-drug pairs and the noninteractiveenzyme-drug pairs are mispredicted that is 119873+

minus= 119873

minus

+= 0

and Λ+ = Λ

minus= 1 we have the overall success rate Λ = 1

Otherwise the overall success rate would be smaller than 1The relations between the symbols in (32) and those in

(30) are given by

TP = 119873+ minus 119873+minus

TN = 119873

minus

minus 119873

minus

+

FP = 119873minus+

FN = 119873

+

minus

(33)

Substituting (33) into (30) and also noting (31)-(32) we obtain

Sn = Λ+ = 1 minus119873

+

minus

119873

+

Sp = Λminus = 1 minus119873

minus

+

119873

minus

Acc = Λ = 1 minus119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

MCC =1 minus ((119873

+

minus119873

+) + (119873

minus

+119873

minus))

radic(1 + (119873

minus

+minus 119873

+

minus) 119873

+) (1 + (119873

+

minusminus 119873

minus

+) 119873

minus)

(34)

Now it is obvious to see from (34) when 119873

+

minus= 0

meaning none of the interactive enzyme-drug pairs was

8 BioMed Research International

02

46

810

1

15

2089

0895

09

0905

091

0915

092

ACC

K120593

Figure 3 A 3D plot to show how the parameter in (27) wasoptimized for the iEzy-Drug predictor

mispredicted to be a noninteractive enzyme-drug pair wehave the sensitivity Sn = 1 while 119873+

minus= 119873

+ meaning thatall the interactive enzyme-drug pairs weremispredicted to bethe noninteractive enzyme-drug pairs we have the sensitivitySn = 0 Likewise when 119873minus

+= 0 meaning none of the

noninteractive enzyme-drug pairs wasmispredicted we havethe specificity Sp = 1 while 119873minus

+= 119873

minus meaning all thenoninteractive enzyme-drug pairs were incorrectly predictedas interactive enzyme-drug pairs we have the specificity Sp =0 When119873+

minus= 119873

minus

+= 0 meaning that none of the interactive

enzyme-drug pairs in the dataset S+ and none of the nonin-teractive enzyme-drug pairs in Sminus was incorrectly predictedwe have the overall accuracy Acc = Λ = 1 while 119873+

minus= 119873

+

and 119873minus+= 119873

minus meaning that all the interactive enzyme-drugpairs in the dataset S+ and all the noninteractive enzyme-drug pairs in Sminus were mispredicted we have the overallaccuracy Acc = Λ = 0 The MCC correlation coefficient isusually used for measuring the quality of binary (two-class)classifications When 119873+

minus= 119873

minus

+= 0 meaning that none

of the interactive enzyme-drug pairs in the dataset S+ andnone of the noninteractive enzyme-drug pairs in Sminus weremispredicted we have MCC = 1 when 119873+

minus= 119873

+2 and

119873

minus

+= 119873

minus2 we have MCC = 0 meaning no better than

random prediction when 119873+minus= 119873

+ and 119873minus+= 119873

minus we haveMCC = minus1 meaning total disagreement between predictionand observation As we can see from the previous discussionit is much more intuitive and easier-to-understand whenusing (34) to examine a predictor for its sensitivity specificityoverall accuracy and Mathewrsquos correlation coefficient It isinstructive to point out that themetrics as defined in (30) and(34) are valid for single label systems for multilabel systemsa set of more complicated metrics should be used as given in[114]

3 Results and Discussion

31 Cross-Validation How to properly examine the predic-tion quality is a key for developing a new predictor and

False positive rate

True

pos

itive

rate

1

01

02

03

04

05

06

07

08

09

0

101 02 03 04 05 06 07 08 090

BaselineiEzy-Drug AUC = 09377

Figure 4 A plot for the ROC curve to quantitatively show theperformance of the iEzy-Drug predictor

estimating its potential application value Generally speak-ing the following three cross-validation methods are oftenused to examine a predictor of its effectiveness in practicalapplication independent dataset test subsampling or 119870-fold(such as 5-fold 7-fold or 10-fold) test and jackknife test[108] However as elaborated by a penetrating analysis in[115] considerable arbitrariness exists in the independentdataset test Also as demonstrated by (27)ndash(29) in [33]the subsampling test (or 119870-fold cross-validation) cannotavoid arbitrariness either Only the jackknife test is the leastarbitrary that can always yield a unique result for a givenbenchmark dataset Therefore the jackknife test has beenwidely recognized and increasingly utilized by investigatorsto examine the quality of various predictors (see eg [66 7174 80]) Accordingly the success rate by the jackknife test wasalso used to optimize the two uncertain parameters 119870 and 120593in (27) The result thus obtained is shown in Figure 3 fromwhich we obtain when 119870 = 6 and 120593 = 15 the iEzy-Drugpredictor reaches its optimized status

The success rates thus obtained by the jackknife test inidentifying interactive Enzyme-drug pairs or noninteractiveenzyme-drug pairs on the benchmark dataset S (cf OnlineSupporting Information S1) are given in Table 1 where forfacilitating comparison the corresponding result by He et al[110] is also given As we can see from the table the overallaccuracy Acc achieved by iEzy-Drug was 9103 remarkablyhigher than 8548 the corresponding rate obtained byHe et al [110] on the same benchmark Furthermore listedin Table 1 are also the values obtained by iEzy-Drug for theother three metrics that is Sn = 9081 Sp = 9114 andMCC = 8039 indicating that the accuracy of iEzy-Drug isnot only very high but also quite stale

To provide a graphical illustration to show the per-formance of the current binary classifier iEzy-Drug asits discrimination threshold is varied a 2D plot calledReceiver Operating Characteristic (ROC) curve [116 117]was also given (Figure 4) In the ROC curve the verticalcoordinate 119884 is for the true positive rate or Sn (cf (34))while horizontal coordinate 119883 for the false positive rate

BioMed Research International 9

Read Me Data Citation

Enter both the protein sequence and the drug code (example) the number of query pairs is limited at 10 or less for each submission

Submit Clear

You can download the program and run it on your own local computer

iEzy-Drug identifying the interaction between enzymes anddrugs in cellular networking

Figure 5 A semiscreenshot to show the top page of the iEzy-Drugweb-server Its web-site address is at httpwwwjci-bioinfocniEzy-Drug

Table 1 The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drugpairs for the benchmark dataset S (cf Online Supporting Information S1)

Method Acc Sn Sp MCC

iEzy-Druga 74258157 = 9103 24692719 = 9081 49565438 = 9114 8039NN predictorb 8548 NA NA NAaSee (27) where the parameters119870 = 6 and 120593 = 15bSee [110]

or 1-Sp The best possible prediction method would yielda point with the coordinate (0 1) representing 100 truepositive rate (sensitivity Sn) and 0 false positive rate or100 specificity Therefore the (0 1) point is also calleda perfect classification A completely random guess wouldgive a point along a diagonal from the point (0 0) to (1 1)The area under the ROC curve also called Area Under theROC (AUROC) is often used to indicate the performancequality of a binary classifier the value 05 of AUROCis equivalent to random prediction while 1 of AUROCrepresents a perfect one As we can see from Figure 4the AUROC value obtained by iEzy-Drug is 09377

The reason why iEzy-Drug can remarkably improve theprediction quality is that it has introduced the 2D molecularfingerprints to represent drug samples see Online SupportingInformation S3 for the detailed fingerprint expressions for thedrugs listed in Online Supporting Information S1 and thatit has successfully used PseAAC to incorporate the featuresderived from the sequences of enzymes that are essentialfor identifying the interaction of enzymes with drugs in thecellular networking

To enhance the value of its practical applications theweb server for iEzy-Drug has been established that canbe freely accessible at httpwwwjci-bioinfocniEzy-DrugIt is anticipated that the web server will become a usefulhigh throughput tool for both basic research and drug

development in the relevant areas or at the very least playa complementary role to the existing method [39 110 118] forwhich so far noweb-serverwhatsoever has been provided yet

32 The Protocol or User Guide For the convenience of thevast majority of biologists and pharmaceutical scientists herelet us provide a step-by-step guide to show how the userscan easily get the desired result by means of the web serverwithout the need to follow the complicated mathematicalequations presented in this paper for the process of develop-ing the predictor and its integrityStep 1 Open the web server at the site httpwwwjci-bio-infocniEzy-Drug and you will see the top page of thepredictor on your computer screen as shown in Figure 5Click on the ReadMe button to see a brief introduction aboutiEzy-Drug predictor and the caveat when using itStep 2 Either type or copypaste the query pairs into theinput box at the center of Figure 5 Each query pair consistsof two parts one is for the protein sequence and the other forthe drug The enzyme sequence should be in FASTA formatwhile the drug in the KEGG code Examples for the querypairs input can be seen by clicking on the Example buttonright above the input boxStep 3 Click on the Submit button to see the predicted resultFor example if you use the four query pairs in the Example

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 3

enzyme (ii) recoupling each of the single drugs with eachof the single enzymes into pairs in a way that none of themoccurred inS+ (iii) randomly picking the pairs thus formeduntil they reached the number two times as many as thepairs in S+ The 2719 interactive enzyme-drug pairs and5438 noninteractive enzyme-drug pairs are given in OnlineSupporting Information S1 (see Supplementary Materialavailable online at httpdxdoiorg1011552013701317) Allthe detailed information for the compounds or drugs listedthere can be found in the KEGG database via their codes

22 Sample Representation Since each of the samples in thecurrent network system contains an enzyme (protein) anda drug a combination of the following two approaches wasadopted to represent the enzyme-drug pair samples

221 Drug

(a) 2D Molecular Fingerprints Although the number ofdrugs is extremely large most of them are small organicmolecules and are composed of some fixed small structures[40] The identification of small molecules or structures canbe used to detect the drug-target interactions [41] Molec-ular fingerprints are bit-string representations of molecularstructure and properties [42] It should be pointed outthat there are many types of structural representations thathave been suggested for the description of drug moleculesincluding physicochemical properties [43] chemical graphs[44] topological indices [45] 3D pharmacophore patternsandmolecular fields In the current study let us use the simpleand generally adopted 2Dmolecular fingerprints to representdrug molecules as described below

First for each of the drugs concerned we can obtaina MOL file from the KEGG database [38] via its codethat contains the detailed information of chemical structureSecond we can convert the MOL file format into its 2Dmolecular fingerprint file format by using a chemical toolboxsoftware called OpenBabel [46] which can be downloadedfrom the website at httpopenbabelorg The current ver-sion of OpenBabel can generate four types of fingerprintsFP2 FP3 FP4 and MACCS In the current study we usedthe FP2 fingerprint format It is a path-based fingerprint thatidentifies small molecule fragments based on all linear andring substructures and maps them onto a bit-string using ahash function (somewhat similar to the daylight fingerprints[47 48]) It is a length of 256-bit hexadecimal string obtainedfrom theOpenBabel andwe can convert it to a 256-bit vectorThen a molecular fingerprint can be formulated as a 256-Dvector given by

MF = [1198601 sdot sdot sdot 119860119895 sdot sdot sdot 119860256]T (2)

where 119860119895(119895 = 1 2 256) is an integer between 0 and 15

and T is the matrix transpose operatorIn order to capture as much useful information from

a molecular fingerprint as possible we can also convertthe above 256-bit hexadecimal string into a 1024-bit binaryvector which is a digital sequence only including 0 and 1

and consider two different digital signal characteristics for thedigital sequence as follows

(b) Information Entropy Shannon proposed that any infor-mation is redundant and redundant size is related withthe occurrence probability or uncertainty of each symbolsuch as numbers letters or words among the informationThe information entropy for a system with a probabilitydistribution 119875(119909

119894) for two classes information entropy [49] is

defined as

119867119909= minussum

119909

119875 (119909119894) log2119875 (119909119894) (119894 = 0 1) (3)

where 119875(119909119894) represents the occurrence probability of num-

ber 119894 in the aforementioned 1024-bit binary vector andthe information entropy 119867

119909is a measure value of the

information amount For example for the digital sequence100100011010010 the value of the information entropy 119867

119909

thus obtained is

119875 (1199090) =

9

15

= 06

119875 (1199091) =

6

15

= 04

119867119909= minus (06 times log

206 + 04 times log

204) = 0971

(4)

(c) Complexity Factor The Lempel-Ziv (LZ) complexity [50]reflects the order that is retained in the sequence and hencewas adopted in this study For each step only two operationswere allowed in the process to get the LZ complexity eithercopying the longest section from the part of a nonemptysequence or generating an additional symbol mark thatensures the uniqueness of per component 119878(119894

119896minus1rarr 119894119896) Its

substring is expressed by

119878 (119894 997888rarr 119895) = 119898119894119898119894+1119898119894+2sdot sdot sdot 119898119895

(1 le 119894 le 119895 le 119871) (5)

where 1198981represents the 1st digital value 119898

2the 2nd value

and so forth A nonempty digital sequence is synthesizedaccording to the following formula

Syn (119878) = 119878 (1 997888rarr 1198941) ∙ 119878 (119894

1+ 1 997888rarr 119894

2)

∙ sdot sdot sdot ∙ 119878 (119894119898minus1

+ 1 997888rarr 119894119871)

(6)

Suppose that 119878 = 11989811198982119898311989841198985sdot sdot sdot 119898119871has been recon-

structed by the subsymbol 119898119903which is viewed as the newly

inserted symbol The substring up to 119898119903will be denoted by

119878(1 rarr 119903)∙ where the bold dot ∙ indicates that 119898119903is a newly

inserted symbol for checkingwhether the rest of the substring119878(119903+1 rarr 119871) can be reconstructed by a simple process At firstsuppose 119878(119902) = 119898

119903+ 1 and see whether 119878(119902) is the substring

for the subsequence 119878(1 rarr 119903) which means deleting thelast symbol from the substring 119878(1 rarr 119903)119878(119902) If the answeris ldquonordquo we insert 119878(119902) into the sequence followed by a dot ∙Thus it could not be obtained by the same operation If theanswer is ldquoyesrdquo no new symbol is needed and we can go on toproceed with 119878(119902) = 119898

119903+1119898119903+2

and repeat the same previousprocedure The LZ complexity is the number of dots (plus

4 BioMed Research International

one if the string is not terminated by a dot) For example forthe sequence 100100011010010 syn(119875) and the correspondingcomplexity factor CF are described as

Syn (119878) = 1 ∙ 0 ∙ 01 ∙ 000 ∙ 11 ∙ 01001 ∙ 0

CF = 7(7)

Thus by adding the information entropy 119867119909(4) and com-

plexity factor CF (7) into the molecular fingerprint MF (2)we obtained a total of (256 + 1 + 1) = 258 feature elements torepresent a drug compound that is it can now be formulatedas a 258-D vector given by

119863 = [11986011198602sdot sdot sdot 119860

256119867119909

CF]T (8)

where 119860119894has the same meaning as in (2) while 119867

119909and CF

are the information entropy and complexity factor respec-tively as described in the previous two sections

222 Enzyme The sequences of the enzymes involved inthis study are given in Online Supporting Information S2Now the problem is how to effectively represent these enzymesequences for the current study Generally speaking thereare two kinds of approaches to formulate enzyme sequencesthe sequential model and the nonsequential or discretemodel [51] The most typical sequential representation foran enzyme sample E with 119871 residues is its entire amino acidsequence that is

E = 1198771119877211987731198774119877511987761198777sdot sdot sdot 119877119871 (9)

where 1198771represents the 1st residue 119877

2the 2nd residue and

so forth An enzyme sample thus formulated can contain itsmost complete information This is an obvious advantage ofthe sequential representation To get the desired results thesequence-similarity-search-based tools such as BLAST [5253] are usually utilized to conduct the prediction Howeverthis kind of approach failed to work when the query enzymedid not have significant homology to enzyme of known char-acters Thus various nonsequential representation modelswere proposed The simplest nonsequential model for anenzyme was based on its amino acid composition (AAC) asdefined by

E = [11989111198912sdot sdot sdot 11989120]

T (10)

where 119891119906(119906 = 1 2 20) are the normalized occurrence

frequencies of the 20 native amino acids [54ndash56] in theenzyme E and T has the same meaning as in (2) and (8)The AAC-discrete model was widely used for identifyingvarious attributes of proteins (see eg [57ndash61]) Howeveras can be seen from (10) all the sequence order effectswere lost by using the AAC-discrete model This is its mainshortcoming To avoid completely losing the sequence-orderinformation the pseudo amino acid composition [62 63]or Choursquos PseAAC [3] was proposed to replace the simpleAAC model Since the concept of PseAAC was proposedin 2001 [62] it has penetrated into almost all the fields of

protein attribute predictions and computational proteomicssuch as predicting supersecondary structure [64] predictingmetalloproteinase family [65] predicting membrane pro-tein types [66 67] predicting protein structural class [68]discriminating outer membrane proteins [69] identifyingantibacterial peptides [70] identifying allergenic proteins[71] identifying bacterial virulent proteins [72] predictingprotein subcellular location [73 74] identifying GPCRs andtheir types [75] identifying protein quaternary structuralattributes [76] predicting protein submitochondria locations[77] identifying risk type of human papillomaviruses [78]identifying cyclin proteins [79] predicting GABA(A) recep-tor proteins [80] and predicting cysteine S-nitrosylation sitesin proteins [81] among many others (see a long list of paperscited in the References section of [33]) Recently the conceptof PseAAC was further extended to represent the featurevectors of DNA and nucleotides [36 82] as well as otherbiological samples (see eg [83 84]) Because it has beenwidely and increasingly used recently two powerful soft-wares called ldquoPseAAC-Builderrdquo [85] and ldquopropyrdquo [86] wereestablished for generating various special Choursquos pseudo-amino acid compositions in addition to the web-serverPseAAC [87] built in 2008 According to a recent review [33]the general formof Choursquos PseAAC for an enzyme sample canbe formulated by

E = [12059511205952sdot sdot sdot 120595119906sdot sdot sdot 120595Ω]

T (11)

where the subscript Ω is an integer and its value as wellas the components 120595

119906(119906 = 1 2 Ω) will depend on

how to extract the desired information from the amino acidsequence of E (cf (10)) Next let us describe how to extractuseful information from the benchmark datasetS andOnlineSupporting Information S2 to define the enzyme samplesconcerned via (11)

To incorporate as much useful information as possiblefrom an enzyme sample we are to approach this problemfrom three different angles followed by incorporating thefeature elements thus obtained into the general form ofPseAAC of (11)(a) Amino Acid Composition The components of amino acidcomposition have beenwidely used to predict various proteinattributes [57ndash61] In this study they were also included as thefirst 20 elements in the general Choursquos PseAAC (cf (11)) thatis

120595119906= 119891119906

(119906 = 1 2 20) (12)

where 119891119906has the same meaning as in (10)

(b) Dipeptide Composition Dipeptide composition has beenused to predict the protein secondary structural contents[88 89] as well as various protein attributes (see eg [90ndash93]) The number of different dipeptides is 20 times 20 = 400Suppose that the normalized occurrence frequencies of the400 dipeptides in an enzyme sample are given by

119891

(2)

119906(119906 = 1 2 400) (13)

BioMed Research International 5

Incorporating the above 400 dipeptide components into (11)we have

120595119906+20

= 119891

(2)

119906(119906 = 1 2 400) (14)

(c) Sequential Evolution Information Biology is a naturalscience with a historic dimension All biological specieshave developed starting out from a very limited number ofancestral species Their evolution involves changes of singleresidues insertions and deletions of several residues [94]gene doubling and gene fusion With these changes accu-mulated for a long period of time many similarities betweeninitial and resultant amino acid sequences are graduallyeliminated but the corresponding proteins may still sharemany common attributes [18] such as having basically thesame biological function and residing at a same subcellularlocation To extract the sequential evolution information anduse it to define the components of (11) the PSSM (PositionSpecific Scoring Matrix) was used as described next

According to Schaffer et al [95] the sequence evolutioninformation of enzyme E with 119871 amino acid residues can beexpressed by an 119871 times 20matrix as given by

P(0)PSSM =

[

[

[

[

[

[

[

119864

0

1rarr1119864

0

1rarr2sdot sdot sdot 119864

0

1rarr20

119864

0

2rarr1119864

0

2rarr2sdot sdot sdot 119864

0

2rarr20

119864

0

119871rarr1119864

0

119871rarr2sdot sdot sdot 119864

0

119871rarr20

]

]

]

]

]

]

]

(15)

where 1198640119894rarr 119895

represents the original score of the 119894th aminoacid residue (119894 = 1 2 119871) in the enzyme sequence changedto amino acid type 119895 (119895 = 1 2 20) in the process ofevolution Here the numerical codes 1 2 20 are used torepresent the 20 native amino acid types denoted by A CD E F G H I K L M N P Q R S T V W and Y The119871 times 20 scores in (15) were generated by using PSI-BLAST[96] to search the UniProtKBSwiss-Prot database (Release2013-05) through three iterations with 0001 as the 119864 valuecutoff for multiple sequence alignment against the sequenceof the enzyme E In order to make every element in (15) bescaled from their original score ranges into region of [0 1]we performed a conversion through the standard sigmoidfunction to make it become

P(1)PSSM =

[

[

[

[

[

[

[

119864

1

1rarr1119864

1

1rarr2sdot sdot sdot 119864

1

1rarr20

119864

1

2rarr1119864

1

2rarr2sdot sdot sdot 119864

1

2rarr20

119864

1

119871rarr1119864

1

119871rarr2sdot sdot sdot 119864

1

119871rarr20

]

]

]

]

]

]

]

(16)

where

119864

1

119894rarr 119895=

1

1 + 119890

minus1198640

119894rarr 119895

(1 le 119894 le 119871 1 le 119895 le 20) (17)

Nowwe extract the useful information from (16) to define thecomponents of (11) via the following approach

120595119906+420

= ℓ119906

(119906 = 1 2 20) (18)

where

ℓ119895=

1

119871

times

119871

sum

119896=1

119864

1

119896rarr119895(119895 = 1 2 20) (19)

(d) Grey System Model Approach The grey system theory[97] is quite useful in dealing with complicated systems thatlack sufficient information or need to process uncertaininformation According to the grey system theory we canextract the following information from the 119895th column of(16) that is

[

119886

119895

1

119886

119895

2

] = (BT119895B119895)

minus1

BT119895U119895

(119895 = 1 2 20) (20)

where

B119895=

[

[

[

[

[

[

[

[

[

[

minus119864

1

2rarr119895minus119864

1

1rarr119895minus 05119864

1

2rarr1198951

minus119864

1

3rarr119895minus

2

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

3rarr1198951

minus119864

1

119871rarr119895minus

119871minus1

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

119871rarr1198951

]

]

]

]

]

]

]

]

]

]

U119895=

[

[

[

[

[

[

119864

1

2rarr119895minus 119864

1

1rarr119895

119864

1

3rarr119895minus 119864

1

2rarr119895

119864

1

119871rarr119895minus 119864

1

119871minus1rarr119895

]

]

]

]

]

]

(21)

Therefore based on the grey system theory and (20) we canextract another 20 times 2 = 40 quantities from (16) to definethe components of (11) that is

120593119895=

1199081119886

119895

1when 119895 is an odd number

1199082119886

119895

2when 119895 is an even number

1 le 119895 le 20

(22)

where 1198861198951and 119886119895

2are given by (20) 119908

1and 119908

2are weight

factors which were all set to 1 in the current studySubstituting the elements in (12) (14) (18) and (22) we

finally obtain a total of Ω = 20 + 400 + 20 + 40 = 480

components for the PseAAC of (11) where

120595119906=

119891119906

when 1 le 119906 le 20119891

(2)

119906when 21 le 119906 le 420

ℓ119906

when 421 le 119906 le 440120593119906

when 440 le 119906 le 480

(23)

In other words in this study (11) or Choursquos PseAAC is a 480-D vector whose 480 components are given by (23) derivedfrom the amino acid composition dipeptide compositionsequential evolution information and grey system theory(e) Representing Enzyme-Drug Pairs Now the pair betweenan enzyme molecule E and a drug compound D can beformulated by combing (8) and (11) as given by

G = D oplus E

= [1198601sdot sdot sdot 119860

256119867119909

CF 1205951sdot sdot sdot 120595480]

T

(24)

6 BioMed Research International

where G represents the enzyme-drug pair oplus the orthogonalsum [51] and each of the (258 + 480) = 738 feature elementsis given in (8) and (23)

For the convenience of the later formulation let us use119909119894(119894 = 1 2 738) to represent the 738 components of (24)

that is

G = [11990911199092sdot sdot sdot 119909119894sdot sdot sdot 119909738]

T (25)

To optimize the prediction results different weights wereusually tested for each of the elements in (25) However sinceit would consume a lot of computational time for a total of 738weight factors here let us adopt the normalization approachto deal with this problem as done in [98 99] that is convert119909119894in (25) to 119910

119894according to the following equation

119910119894=

2tanminus1 (119909119894)

120587

(119894 = 1 2 738) (26)

where tanminus1 means arctangent By means of (26) everycomponent in (25) will be converted into the range of[minus1 1] that is we have minus1 le 119910

119894le 1 As demonstrated

in [98 99] the normalization approach via (26) was quiteeffective in enhancing the quality of prediction operated ina high dimension space Therefore in this study we wouldnot to take the procedure of optimizing the weight factorssignificantly reducing the computational times

23 Fuzzy 119870-Nearest Neighbour Algorithm The 119870-NN (119870-Nearest Neighbor) classifier is quite popular in patternrecognition community owing to its good performance andsimple-to-use feature According to the 119870-NN rule [100]named also as the ldquovoting 119870-NN rulerdquo the query sampleshould be assigned to the subset represented by a majorityof its 119870 nearest neighbors as illustrated in Figure 5 of [33]

Fuzzy 119870-NN classification method [101] is a specialvariation of the119870-NNclassification family Instead of roughlyassigning the label based on a voting from the 119870 nearestneighbors it attempts to estimate the membership valuesthat indicate how much degree the query sample belongsto the classes concerned Obviously it is impossible for anycharacteristic description to contain complete informationwhich would make the classification ambiguous In view ofthis the fuzzy principle is very reasonable and particularlyuseful in dealing with complicated biological systems such asidentifying nuclear receptor subfamilies [102] characterizingthe structure of fast-folding proteins [103] classifying Gprotein-coupled receptors [104] predicting protein quater-nary structural attributes [105] predicting protein structuralclasses [106 107] and so forth

Next let us give a brief introduction how to use thefuzzy119870-NN approach to identify the interactions between theenzymes and the drug compounds in the network concerned

Supposing that S(119873) = G1G2 G

119873 is a set of vec-

tors representing 119873 enzyme-drug pairs in a training set

classified into two classes 119862+ 119862minus where 119862+ denotes theinteractive pair class while 119862minus the noninteractive pair classSlowast(G) = Glowast

1Glowast2 Glowast

119870 sub S(119873) is the subset of the 119870

nearest neighbor pairs to the query pair G Thus the fuzzymembership value for the query pair G in the two classes ofS(119873) is given by

120583

+

(G) =sum

119870

119895=1120583

+(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

120583

minus

(G) =sum

119870

119895=1120583

minus(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

(27)

where 119870 is the number of the nearest neighbors counted forthe query pairG 120583+(Glowast

119895) and 120583minus(Glowast

119895) the fuzzy membership

values of the training sample Glowast119895to the class 119862+ and 119862minus

respectively as will be further defined next 119889(GGlowast119895) the

cosine distance between G and its 119895th nearest pair Glowast119895in

the training dataset S(119873) 120593(gt 1) the fuzzy coefficient fordetermining how heavily the distance is weighted whencalculating each nearest neighborrsquos contribution to the mem-bership value Note that the parameters 119870 and 120593 will affectthe computation result of (27) and they will be optimizedby a grid-search as will be described later Also variousother metrics can be chosen for 119889(GGlowast

119895) such as Euclidean

distance Hamming distance [108] andMahalanobis distance[55 109]

The quantitative definitions for the aforementioned120583

+(Glowast119895) and 120583minus(Glowast

119895) in (27) are given by

120583

+

(Glowast119895) =

1 if Glowast119895isin 119862

+

0 otherwise

120583

minus

(Glowast119895) =

1 if Glowast119895isin 119862

minus

0 otherwise

(28)

Substituting the results obtained by (27) into (28) it followsthat if 120583+(G) gt 120583

minus(G) then the query pair G is an

interactive couple otherwise noninteractive In other wordsthe outcome can be formulated as

G isin

119862

+ if 120583+ (G) gt 120583minus (G)

119862

minus otherwise

(29)

If there is a tie between 120583+(G) and 120583minus(G) the query pair Gwill be randomly assigned to one of the two classes Howevercase like that is quite rare and in this study never happened

The predictor thus established is called iEzy-Drugwhere ldquoirdquo means identify and ldquoEzy-Drugrdquo means the inter-action between enzyme and drug To provide an intuitiveoverall picture a flowchart is provided in Figure 2 to showthe process of how the classifier works in identifying enzyme-drug interactions

BioMed Research International 7

Input a queryprotein sequence and

a drug code

Create Chous PseAAC of a protein

Create feature vector of a drug

Feature fusion

engineTraining dataset

Yes No

Noninteractive enzyme drug pair

Interactive enzymedrug pair

AACDipeptideSeq Evol

Grey model

Mol FptInfo entropy

CF

FKNN operation

Figure 2 A flowchart to show the operation process of the iEzy-Drug predictor See the text for further explanation

24 Criteria for Performance Evaluation In the literaturethe following equation set is often used for examining theperformance quality of a predictor

Sn = TPTP + FN

Sp = TNTN + FP

Acc = TP + TNTP + TN + FP + FN

MCC = (TP times TN) minus (FP times FN)radic(TP + FP) (TP + FN) (TN + FP) (TN + FN)

(30)

where TP represents the true positive TN the true negativeFP the false positive FN the false negative Sn the sensitiv-ity Sp the specificity Acc the accuracy MCC the Mathewrsquoscorrelation coefficient

To most biologists however the four metrics as formu-lated in (30) are not quite intuitive and easier-to-understandparticularly for the Mathewrsquos correlation coefficient Herelet us adopt the Choursquos symbols to formulate the previousfour metrics By means of Choursquos symbols [111 112] the rates

of correct predictions for the interactive enzyme-drug pairsin dataset S+ and the noninteractive enzyme-drug pairs indataset Sminus are respectively defined by (cf (1))

Λ

+

=

119873

+minus 119873

+

minus

119873

+ for the interactive enzyme-drug pairs

Λ

minus

=

119873

minusminus 119873

minus

+

119873

minus for the noninteractive enzyme-drug pairs

(31)

where 119873+ is the total number of the interactive enzyme-drug pairs investigated while 119873+

minusis the number of the

interactive enzyme-drug pairs incorrectly predicted as thenoninteractive enzyme-drug pairs 119873minus is the total numberof the noninteractive enzyme-drug pairs investigated while119873

minus

+is the number of the noninteractive enzyme-drug pairs

incorrectly predicted as the interactive enzyme-drug pairsThe overall success prediction rate is given by [113] as follows

Λ =

Λ

+119873

++ Λ

minus119873

minus

119873

++ 119873

minus= 1 minus

119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

(32)

It is obvious from (31)-(32) that if and only if none ofthe interactive enzyme-drug pairs and the noninteractiveenzyme-drug pairs are mispredicted that is 119873+

minus= 119873

minus

+= 0

and Λ+ = Λ

minus= 1 we have the overall success rate Λ = 1

Otherwise the overall success rate would be smaller than 1The relations between the symbols in (32) and those in

(30) are given by

TP = 119873+ minus 119873+minus

TN = 119873

minus

minus 119873

minus

+

FP = 119873minus+

FN = 119873

+

minus

(33)

Substituting (33) into (30) and also noting (31)-(32) we obtain

Sn = Λ+ = 1 minus119873

+

minus

119873

+

Sp = Λminus = 1 minus119873

minus

+

119873

minus

Acc = Λ = 1 minus119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

MCC =1 minus ((119873

+

minus119873

+) + (119873

minus

+119873

minus))

radic(1 + (119873

minus

+minus 119873

+

minus) 119873

+) (1 + (119873

+

minusminus 119873

minus

+) 119873

minus)

(34)

Now it is obvious to see from (34) when 119873

+

minus= 0

meaning none of the interactive enzyme-drug pairs was

8 BioMed Research International

02

46

810

1

15

2089

0895

09

0905

091

0915

092

ACC

K120593

Figure 3 A 3D plot to show how the parameter in (27) wasoptimized for the iEzy-Drug predictor

mispredicted to be a noninteractive enzyme-drug pair wehave the sensitivity Sn = 1 while 119873+

minus= 119873

+ meaning thatall the interactive enzyme-drug pairs weremispredicted to bethe noninteractive enzyme-drug pairs we have the sensitivitySn = 0 Likewise when 119873minus

+= 0 meaning none of the

noninteractive enzyme-drug pairs wasmispredicted we havethe specificity Sp = 1 while 119873minus

+= 119873

minus meaning all thenoninteractive enzyme-drug pairs were incorrectly predictedas interactive enzyme-drug pairs we have the specificity Sp =0 When119873+

minus= 119873

minus

+= 0 meaning that none of the interactive

enzyme-drug pairs in the dataset S+ and none of the nonin-teractive enzyme-drug pairs in Sminus was incorrectly predictedwe have the overall accuracy Acc = Λ = 1 while 119873+

minus= 119873

+

and 119873minus+= 119873

minus meaning that all the interactive enzyme-drugpairs in the dataset S+ and all the noninteractive enzyme-drug pairs in Sminus were mispredicted we have the overallaccuracy Acc = Λ = 0 The MCC correlation coefficient isusually used for measuring the quality of binary (two-class)classifications When 119873+

minus= 119873

minus

+= 0 meaning that none

of the interactive enzyme-drug pairs in the dataset S+ andnone of the noninteractive enzyme-drug pairs in Sminus weremispredicted we have MCC = 1 when 119873+

minus= 119873

+2 and

119873

minus

+= 119873

minus2 we have MCC = 0 meaning no better than

random prediction when 119873+minus= 119873

+ and 119873minus+= 119873

minus we haveMCC = minus1 meaning total disagreement between predictionand observation As we can see from the previous discussionit is much more intuitive and easier-to-understand whenusing (34) to examine a predictor for its sensitivity specificityoverall accuracy and Mathewrsquos correlation coefficient It isinstructive to point out that themetrics as defined in (30) and(34) are valid for single label systems for multilabel systemsa set of more complicated metrics should be used as given in[114]

3 Results and Discussion

31 Cross-Validation How to properly examine the predic-tion quality is a key for developing a new predictor and

False positive rate

True

pos

itive

rate

1

01

02

03

04

05

06

07

08

09

0

101 02 03 04 05 06 07 08 090

BaselineiEzy-Drug AUC = 09377

Figure 4 A plot for the ROC curve to quantitatively show theperformance of the iEzy-Drug predictor

estimating its potential application value Generally speak-ing the following three cross-validation methods are oftenused to examine a predictor of its effectiveness in practicalapplication independent dataset test subsampling or 119870-fold(such as 5-fold 7-fold or 10-fold) test and jackknife test[108] However as elaborated by a penetrating analysis in[115] considerable arbitrariness exists in the independentdataset test Also as demonstrated by (27)ndash(29) in [33]the subsampling test (or 119870-fold cross-validation) cannotavoid arbitrariness either Only the jackknife test is the leastarbitrary that can always yield a unique result for a givenbenchmark dataset Therefore the jackknife test has beenwidely recognized and increasingly utilized by investigatorsto examine the quality of various predictors (see eg [66 7174 80]) Accordingly the success rate by the jackknife test wasalso used to optimize the two uncertain parameters 119870 and 120593in (27) The result thus obtained is shown in Figure 3 fromwhich we obtain when 119870 = 6 and 120593 = 15 the iEzy-Drugpredictor reaches its optimized status

The success rates thus obtained by the jackknife test inidentifying interactive Enzyme-drug pairs or noninteractiveenzyme-drug pairs on the benchmark dataset S (cf OnlineSupporting Information S1) are given in Table 1 where forfacilitating comparison the corresponding result by He et al[110] is also given As we can see from the table the overallaccuracy Acc achieved by iEzy-Drug was 9103 remarkablyhigher than 8548 the corresponding rate obtained byHe et al [110] on the same benchmark Furthermore listedin Table 1 are also the values obtained by iEzy-Drug for theother three metrics that is Sn = 9081 Sp = 9114 andMCC = 8039 indicating that the accuracy of iEzy-Drug isnot only very high but also quite stale

To provide a graphical illustration to show the per-formance of the current binary classifier iEzy-Drug asits discrimination threshold is varied a 2D plot calledReceiver Operating Characteristic (ROC) curve [116 117]was also given (Figure 4) In the ROC curve the verticalcoordinate 119884 is for the true positive rate or Sn (cf (34))while horizontal coordinate 119883 for the false positive rate

BioMed Research International 9

Read Me Data Citation

Enter both the protein sequence and the drug code (example) the number of query pairs is limited at 10 or less for each submission

Submit Clear

You can download the program and run it on your own local computer

iEzy-Drug identifying the interaction between enzymes anddrugs in cellular networking

Figure 5 A semiscreenshot to show the top page of the iEzy-Drugweb-server Its web-site address is at httpwwwjci-bioinfocniEzy-Drug

Table 1 The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drugpairs for the benchmark dataset S (cf Online Supporting Information S1)

Method Acc Sn Sp MCC

iEzy-Druga 74258157 = 9103 24692719 = 9081 49565438 = 9114 8039NN predictorb 8548 NA NA NAaSee (27) where the parameters119870 = 6 and 120593 = 15bSee [110]

or 1-Sp The best possible prediction method would yielda point with the coordinate (0 1) representing 100 truepositive rate (sensitivity Sn) and 0 false positive rate or100 specificity Therefore the (0 1) point is also calleda perfect classification A completely random guess wouldgive a point along a diagonal from the point (0 0) to (1 1)The area under the ROC curve also called Area Under theROC (AUROC) is often used to indicate the performancequality of a binary classifier the value 05 of AUROCis equivalent to random prediction while 1 of AUROCrepresents a perfect one As we can see from Figure 4the AUROC value obtained by iEzy-Drug is 09377

The reason why iEzy-Drug can remarkably improve theprediction quality is that it has introduced the 2D molecularfingerprints to represent drug samples see Online SupportingInformation S3 for the detailed fingerprint expressions for thedrugs listed in Online Supporting Information S1 and thatit has successfully used PseAAC to incorporate the featuresderived from the sequences of enzymes that are essentialfor identifying the interaction of enzymes with drugs in thecellular networking

To enhance the value of its practical applications theweb server for iEzy-Drug has been established that canbe freely accessible at httpwwwjci-bioinfocniEzy-DrugIt is anticipated that the web server will become a usefulhigh throughput tool for both basic research and drug

development in the relevant areas or at the very least playa complementary role to the existing method [39 110 118] forwhich so far noweb-serverwhatsoever has been provided yet

32 The Protocol or User Guide For the convenience of thevast majority of biologists and pharmaceutical scientists herelet us provide a step-by-step guide to show how the userscan easily get the desired result by means of the web serverwithout the need to follow the complicated mathematicalequations presented in this paper for the process of develop-ing the predictor and its integrityStep 1 Open the web server at the site httpwwwjci-bio-infocniEzy-Drug and you will see the top page of thepredictor on your computer screen as shown in Figure 5Click on the ReadMe button to see a brief introduction aboutiEzy-Drug predictor and the caveat when using itStep 2 Either type or copypaste the query pairs into theinput box at the center of Figure 5 Each query pair consistsof two parts one is for the protein sequence and the other forthe drug The enzyme sequence should be in FASTA formatwhile the drug in the KEGG code Examples for the querypairs input can be seen by clicking on the Example buttonright above the input boxStep 3 Click on the Submit button to see the predicted resultFor example if you use the four query pairs in the Example

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

4 BioMed Research International

one if the string is not terminated by a dot) For example forthe sequence 100100011010010 syn(119875) and the correspondingcomplexity factor CF are described as

Syn (119878) = 1 ∙ 0 ∙ 01 ∙ 000 ∙ 11 ∙ 01001 ∙ 0

CF = 7(7)

Thus by adding the information entropy 119867119909(4) and com-

plexity factor CF (7) into the molecular fingerprint MF (2)we obtained a total of (256 + 1 + 1) = 258 feature elements torepresent a drug compound that is it can now be formulatedas a 258-D vector given by

119863 = [11986011198602sdot sdot sdot 119860

256119867119909

CF]T (8)

where 119860119894has the same meaning as in (2) while 119867

119909and CF

are the information entropy and complexity factor respec-tively as described in the previous two sections

222 Enzyme The sequences of the enzymes involved inthis study are given in Online Supporting Information S2Now the problem is how to effectively represent these enzymesequences for the current study Generally speaking thereare two kinds of approaches to formulate enzyme sequencesthe sequential model and the nonsequential or discretemodel [51] The most typical sequential representation foran enzyme sample E with 119871 residues is its entire amino acidsequence that is

E = 1198771119877211987731198774119877511987761198777sdot sdot sdot 119877119871 (9)

where 1198771represents the 1st residue 119877

2the 2nd residue and

so forth An enzyme sample thus formulated can contain itsmost complete information This is an obvious advantage ofthe sequential representation To get the desired results thesequence-similarity-search-based tools such as BLAST [5253] are usually utilized to conduct the prediction Howeverthis kind of approach failed to work when the query enzymedid not have significant homology to enzyme of known char-acters Thus various nonsequential representation modelswere proposed The simplest nonsequential model for anenzyme was based on its amino acid composition (AAC) asdefined by

E = [11989111198912sdot sdot sdot 11989120]

T (10)

where 119891119906(119906 = 1 2 20) are the normalized occurrence

frequencies of the 20 native amino acids [54ndash56] in theenzyme E and T has the same meaning as in (2) and (8)The AAC-discrete model was widely used for identifyingvarious attributes of proteins (see eg [57ndash61]) Howeveras can be seen from (10) all the sequence order effectswere lost by using the AAC-discrete model This is its mainshortcoming To avoid completely losing the sequence-orderinformation the pseudo amino acid composition [62 63]or Choursquos PseAAC [3] was proposed to replace the simpleAAC model Since the concept of PseAAC was proposedin 2001 [62] it has penetrated into almost all the fields of

protein attribute predictions and computational proteomicssuch as predicting supersecondary structure [64] predictingmetalloproteinase family [65] predicting membrane pro-tein types [66 67] predicting protein structural class [68]discriminating outer membrane proteins [69] identifyingantibacterial peptides [70] identifying allergenic proteins[71] identifying bacterial virulent proteins [72] predictingprotein subcellular location [73 74] identifying GPCRs andtheir types [75] identifying protein quaternary structuralattributes [76] predicting protein submitochondria locations[77] identifying risk type of human papillomaviruses [78]identifying cyclin proteins [79] predicting GABA(A) recep-tor proteins [80] and predicting cysteine S-nitrosylation sitesin proteins [81] among many others (see a long list of paperscited in the References section of [33]) Recently the conceptof PseAAC was further extended to represent the featurevectors of DNA and nucleotides [36 82] as well as otherbiological samples (see eg [83 84]) Because it has beenwidely and increasingly used recently two powerful soft-wares called ldquoPseAAC-Builderrdquo [85] and ldquopropyrdquo [86] wereestablished for generating various special Choursquos pseudo-amino acid compositions in addition to the web-serverPseAAC [87] built in 2008 According to a recent review [33]the general formof Choursquos PseAAC for an enzyme sample canbe formulated by

E = [12059511205952sdot sdot sdot 120595119906sdot sdot sdot 120595Ω]

T (11)

where the subscript Ω is an integer and its value as wellas the components 120595

119906(119906 = 1 2 Ω) will depend on

how to extract the desired information from the amino acidsequence of E (cf (10)) Next let us describe how to extractuseful information from the benchmark datasetS andOnlineSupporting Information S2 to define the enzyme samplesconcerned via (11)

To incorporate as much useful information as possiblefrom an enzyme sample we are to approach this problemfrom three different angles followed by incorporating thefeature elements thus obtained into the general form ofPseAAC of (11)(a) Amino Acid Composition The components of amino acidcomposition have beenwidely used to predict various proteinattributes [57ndash61] In this study they were also included as thefirst 20 elements in the general Choursquos PseAAC (cf (11)) thatis

120595119906= 119891119906

(119906 = 1 2 20) (12)

where 119891119906has the same meaning as in (10)

(b) Dipeptide Composition Dipeptide composition has beenused to predict the protein secondary structural contents[88 89] as well as various protein attributes (see eg [90ndash93]) The number of different dipeptides is 20 times 20 = 400Suppose that the normalized occurrence frequencies of the400 dipeptides in an enzyme sample are given by

119891

(2)

119906(119906 = 1 2 400) (13)

BioMed Research International 5

Incorporating the above 400 dipeptide components into (11)we have

120595119906+20

= 119891

(2)

119906(119906 = 1 2 400) (14)

(c) Sequential Evolution Information Biology is a naturalscience with a historic dimension All biological specieshave developed starting out from a very limited number ofancestral species Their evolution involves changes of singleresidues insertions and deletions of several residues [94]gene doubling and gene fusion With these changes accu-mulated for a long period of time many similarities betweeninitial and resultant amino acid sequences are graduallyeliminated but the corresponding proteins may still sharemany common attributes [18] such as having basically thesame biological function and residing at a same subcellularlocation To extract the sequential evolution information anduse it to define the components of (11) the PSSM (PositionSpecific Scoring Matrix) was used as described next

According to Schaffer et al [95] the sequence evolutioninformation of enzyme E with 119871 amino acid residues can beexpressed by an 119871 times 20matrix as given by

P(0)PSSM =

[

[

[

[

[

[

[

119864

0

1rarr1119864

0

1rarr2sdot sdot sdot 119864

0

1rarr20

119864

0

2rarr1119864

0

2rarr2sdot sdot sdot 119864

0

2rarr20

119864

0

119871rarr1119864

0

119871rarr2sdot sdot sdot 119864

0

119871rarr20

]

]

]

]

]

]

]

(15)

where 1198640119894rarr 119895

represents the original score of the 119894th aminoacid residue (119894 = 1 2 119871) in the enzyme sequence changedto amino acid type 119895 (119895 = 1 2 20) in the process ofevolution Here the numerical codes 1 2 20 are used torepresent the 20 native amino acid types denoted by A CD E F G H I K L M N P Q R S T V W and Y The119871 times 20 scores in (15) were generated by using PSI-BLAST[96] to search the UniProtKBSwiss-Prot database (Release2013-05) through three iterations with 0001 as the 119864 valuecutoff for multiple sequence alignment against the sequenceof the enzyme E In order to make every element in (15) bescaled from their original score ranges into region of [0 1]we performed a conversion through the standard sigmoidfunction to make it become

P(1)PSSM =

[

[

[

[

[

[

[

119864

1

1rarr1119864

1

1rarr2sdot sdot sdot 119864

1

1rarr20

119864

1

2rarr1119864

1

2rarr2sdot sdot sdot 119864

1

2rarr20

119864

1

119871rarr1119864

1

119871rarr2sdot sdot sdot 119864

1

119871rarr20

]

]

]

]

]

]

]

(16)

where

119864

1

119894rarr 119895=

1

1 + 119890

minus1198640

119894rarr 119895

(1 le 119894 le 119871 1 le 119895 le 20) (17)

Nowwe extract the useful information from (16) to define thecomponents of (11) via the following approach

120595119906+420

= ℓ119906

(119906 = 1 2 20) (18)

where

ℓ119895=

1

119871

times

119871

sum

119896=1

119864

1

119896rarr119895(119895 = 1 2 20) (19)

(d) Grey System Model Approach The grey system theory[97] is quite useful in dealing with complicated systems thatlack sufficient information or need to process uncertaininformation According to the grey system theory we canextract the following information from the 119895th column of(16) that is

[

119886

119895

1

119886

119895

2

] = (BT119895B119895)

minus1

BT119895U119895

(119895 = 1 2 20) (20)

where

B119895=

[

[

[

[

[

[

[

[

[

[

minus119864

1

2rarr119895minus119864

1

1rarr119895minus 05119864

1

2rarr1198951

minus119864

1

3rarr119895minus

2

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

3rarr1198951

minus119864

1

119871rarr119895minus

119871minus1

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

119871rarr1198951

]

]

]

]

]

]

]

]

]

]

U119895=

[

[

[

[

[

[

119864

1

2rarr119895minus 119864

1

1rarr119895

119864

1

3rarr119895minus 119864

1

2rarr119895

119864

1

119871rarr119895minus 119864

1

119871minus1rarr119895

]

]

]

]

]

]

(21)

Therefore based on the grey system theory and (20) we canextract another 20 times 2 = 40 quantities from (16) to definethe components of (11) that is

120593119895=

1199081119886

119895

1when 119895 is an odd number

1199082119886

119895

2when 119895 is an even number

1 le 119895 le 20

(22)

where 1198861198951and 119886119895

2are given by (20) 119908

1and 119908

2are weight

factors which were all set to 1 in the current studySubstituting the elements in (12) (14) (18) and (22) we

finally obtain a total of Ω = 20 + 400 + 20 + 40 = 480

components for the PseAAC of (11) where

120595119906=

119891119906

when 1 le 119906 le 20119891

(2)

119906when 21 le 119906 le 420

ℓ119906

when 421 le 119906 le 440120593119906

when 440 le 119906 le 480

(23)

In other words in this study (11) or Choursquos PseAAC is a 480-D vector whose 480 components are given by (23) derivedfrom the amino acid composition dipeptide compositionsequential evolution information and grey system theory(e) Representing Enzyme-Drug Pairs Now the pair betweenan enzyme molecule E and a drug compound D can beformulated by combing (8) and (11) as given by

G = D oplus E

= [1198601sdot sdot sdot 119860

256119867119909

CF 1205951sdot sdot sdot 120595480]

T

(24)

6 BioMed Research International

where G represents the enzyme-drug pair oplus the orthogonalsum [51] and each of the (258 + 480) = 738 feature elementsis given in (8) and (23)

For the convenience of the later formulation let us use119909119894(119894 = 1 2 738) to represent the 738 components of (24)

that is

G = [11990911199092sdot sdot sdot 119909119894sdot sdot sdot 119909738]

T (25)

To optimize the prediction results different weights wereusually tested for each of the elements in (25) However sinceit would consume a lot of computational time for a total of 738weight factors here let us adopt the normalization approachto deal with this problem as done in [98 99] that is convert119909119894in (25) to 119910

119894according to the following equation

119910119894=

2tanminus1 (119909119894)

120587

(119894 = 1 2 738) (26)

where tanminus1 means arctangent By means of (26) everycomponent in (25) will be converted into the range of[minus1 1] that is we have minus1 le 119910

119894le 1 As demonstrated

in [98 99] the normalization approach via (26) was quiteeffective in enhancing the quality of prediction operated ina high dimension space Therefore in this study we wouldnot to take the procedure of optimizing the weight factorssignificantly reducing the computational times

23 Fuzzy 119870-Nearest Neighbour Algorithm The 119870-NN (119870-Nearest Neighbor) classifier is quite popular in patternrecognition community owing to its good performance andsimple-to-use feature According to the 119870-NN rule [100]named also as the ldquovoting 119870-NN rulerdquo the query sampleshould be assigned to the subset represented by a majorityof its 119870 nearest neighbors as illustrated in Figure 5 of [33]

Fuzzy 119870-NN classification method [101] is a specialvariation of the119870-NNclassification family Instead of roughlyassigning the label based on a voting from the 119870 nearestneighbors it attempts to estimate the membership valuesthat indicate how much degree the query sample belongsto the classes concerned Obviously it is impossible for anycharacteristic description to contain complete informationwhich would make the classification ambiguous In view ofthis the fuzzy principle is very reasonable and particularlyuseful in dealing with complicated biological systems such asidentifying nuclear receptor subfamilies [102] characterizingthe structure of fast-folding proteins [103] classifying Gprotein-coupled receptors [104] predicting protein quater-nary structural attributes [105] predicting protein structuralclasses [106 107] and so forth

Next let us give a brief introduction how to use thefuzzy119870-NN approach to identify the interactions between theenzymes and the drug compounds in the network concerned

Supposing that S(119873) = G1G2 G

119873 is a set of vec-

tors representing 119873 enzyme-drug pairs in a training set

classified into two classes 119862+ 119862minus where 119862+ denotes theinteractive pair class while 119862minus the noninteractive pair classSlowast(G) = Glowast

1Glowast2 Glowast

119870 sub S(119873) is the subset of the 119870

nearest neighbor pairs to the query pair G Thus the fuzzymembership value for the query pair G in the two classes ofS(119873) is given by

120583

+

(G) =sum

119870

119895=1120583

+(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

120583

minus

(G) =sum

119870

119895=1120583

minus(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

(27)

where 119870 is the number of the nearest neighbors counted forthe query pairG 120583+(Glowast

119895) and 120583minus(Glowast

119895) the fuzzy membership

values of the training sample Glowast119895to the class 119862+ and 119862minus

respectively as will be further defined next 119889(GGlowast119895) the

cosine distance between G and its 119895th nearest pair Glowast119895in

the training dataset S(119873) 120593(gt 1) the fuzzy coefficient fordetermining how heavily the distance is weighted whencalculating each nearest neighborrsquos contribution to the mem-bership value Note that the parameters 119870 and 120593 will affectthe computation result of (27) and they will be optimizedby a grid-search as will be described later Also variousother metrics can be chosen for 119889(GGlowast

119895) such as Euclidean

distance Hamming distance [108] andMahalanobis distance[55 109]

The quantitative definitions for the aforementioned120583

+(Glowast119895) and 120583minus(Glowast

119895) in (27) are given by

120583

+

(Glowast119895) =

1 if Glowast119895isin 119862

+

0 otherwise

120583

minus

(Glowast119895) =

1 if Glowast119895isin 119862

minus

0 otherwise

(28)

Substituting the results obtained by (27) into (28) it followsthat if 120583+(G) gt 120583

minus(G) then the query pair G is an

interactive couple otherwise noninteractive In other wordsthe outcome can be formulated as

G isin

119862

+ if 120583+ (G) gt 120583minus (G)

119862

minus otherwise

(29)

If there is a tie between 120583+(G) and 120583minus(G) the query pair Gwill be randomly assigned to one of the two classes Howevercase like that is quite rare and in this study never happened

The predictor thus established is called iEzy-Drugwhere ldquoirdquo means identify and ldquoEzy-Drugrdquo means the inter-action between enzyme and drug To provide an intuitiveoverall picture a flowchart is provided in Figure 2 to showthe process of how the classifier works in identifying enzyme-drug interactions

BioMed Research International 7

Input a queryprotein sequence and

a drug code

Create Chous PseAAC of a protein

Create feature vector of a drug

Feature fusion

engineTraining dataset

Yes No

Noninteractive enzyme drug pair

Interactive enzymedrug pair

AACDipeptideSeq Evol

Grey model

Mol FptInfo entropy

CF

FKNN operation

Figure 2 A flowchart to show the operation process of the iEzy-Drug predictor See the text for further explanation

24 Criteria for Performance Evaluation In the literaturethe following equation set is often used for examining theperformance quality of a predictor

Sn = TPTP + FN

Sp = TNTN + FP

Acc = TP + TNTP + TN + FP + FN

MCC = (TP times TN) minus (FP times FN)radic(TP + FP) (TP + FN) (TN + FP) (TN + FN)

(30)

where TP represents the true positive TN the true negativeFP the false positive FN the false negative Sn the sensitiv-ity Sp the specificity Acc the accuracy MCC the Mathewrsquoscorrelation coefficient

To most biologists however the four metrics as formu-lated in (30) are not quite intuitive and easier-to-understandparticularly for the Mathewrsquos correlation coefficient Herelet us adopt the Choursquos symbols to formulate the previousfour metrics By means of Choursquos symbols [111 112] the rates

of correct predictions for the interactive enzyme-drug pairsin dataset S+ and the noninteractive enzyme-drug pairs indataset Sminus are respectively defined by (cf (1))

Λ

+

=

119873

+minus 119873

+

minus

119873

+ for the interactive enzyme-drug pairs

Λ

minus

=

119873

minusminus 119873

minus

+

119873

minus for the noninteractive enzyme-drug pairs

(31)

where 119873+ is the total number of the interactive enzyme-drug pairs investigated while 119873+

minusis the number of the

interactive enzyme-drug pairs incorrectly predicted as thenoninteractive enzyme-drug pairs 119873minus is the total numberof the noninteractive enzyme-drug pairs investigated while119873

minus

+is the number of the noninteractive enzyme-drug pairs

incorrectly predicted as the interactive enzyme-drug pairsThe overall success prediction rate is given by [113] as follows

Λ =

Λ

+119873

++ Λ

minus119873

minus

119873

++ 119873

minus= 1 minus

119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

(32)

It is obvious from (31)-(32) that if and only if none ofthe interactive enzyme-drug pairs and the noninteractiveenzyme-drug pairs are mispredicted that is 119873+

minus= 119873

minus

+= 0

and Λ+ = Λ

minus= 1 we have the overall success rate Λ = 1

Otherwise the overall success rate would be smaller than 1The relations between the symbols in (32) and those in

(30) are given by

TP = 119873+ minus 119873+minus

TN = 119873

minus

minus 119873

minus

+

FP = 119873minus+

FN = 119873

+

minus

(33)

Substituting (33) into (30) and also noting (31)-(32) we obtain

Sn = Λ+ = 1 minus119873

+

minus

119873

+

Sp = Λminus = 1 minus119873

minus

+

119873

minus

Acc = Λ = 1 minus119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

MCC =1 minus ((119873

+

minus119873

+) + (119873

minus

+119873

minus))

radic(1 + (119873

minus

+minus 119873

+

minus) 119873

+) (1 + (119873

+

minusminus 119873

minus

+) 119873

minus)

(34)

Now it is obvious to see from (34) when 119873

+

minus= 0

meaning none of the interactive enzyme-drug pairs was

8 BioMed Research International

02

46

810

1

15

2089

0895

09

0905

091

0915

092

ACC

K120593

Figure 3 A 3D plot to show how the parameter in (27) wasoptimized for the iEzy-Drug predictor

mispredicted to be a noninteractive enzyme-drug pair wehave the sensitivity Sn = 1 while 119873+

minus= 119873

+ meaning thatall the interactive enzyme-drug pairs weremispredicted to bethe noninteractive enzyme-drug pairs we have the sensitivitySn = 0 Likewise when 119873minus

+= 0 meaning none of the

noninteractive enzyme-drug pairs wasmispredicted we havethe specificity Sp = 1 while 119873minus

+= 119873

minus meaning all thenoninteractive enzyme-drug pairs were incorrectly predictedas interactive enzyme-drug pairs we have the specificity Sp =0 When119873+

minus= 119873

minus

+= 0 meaning that none of the interactive

enzyme-drug pairs in the dataset S+ and none of the nonin-teractive enzyme-drug pairs in Sminus was incorrectly predictedwe have the overall accuracy Acc = Λ = 1 while 119873+

minus= 119873

+

and 119873minus+= 119873

minus meaning that all the interactive enzyme-drugpairs in the dataset S+ and all the noninteractive enzyme-drug pairs in Sminus were mispredicted we have the overallaccuracy Acc = Λ = 0 The MCC correlation coefficient isusually used for measuring the quality of binary (two-class)classifications When 119873+

minus= 119873

minus

+= 0 meaning that none

of the interactive enzyme-drug pairs in the dataset S+ andnone of the noninteractive enzyme-drug pairs in Sminus weremispredicted we have MCC = 1 when 119873+

minus= 119873

+2 and

119873

minus

+= 119873

minus2 we have MCC = 0 meaning no better than

random prediction when 119873+minus= 119873

+ and 119873minus+= 119873

minus we haveMCC = minus1 meaning total disagreement between predictionand observation As we can see from the previous discussionit is much more intuitive and easier-to-understand whenusing (34) to examine a predictor for its sensitivity specificityoverall accuracy and Mathewrsquos correlation coefficient It isinstructive to point out that themetrics as defined in (30) and(34) are valid for single label systems for multilabel systemsa set of more complicated metrics should be used as given in[114]

3 Results and Discussion

31 Cross-Validation How to properly examine the predic-tion quality is a key for developing a new predictor and

False positive rate

True

pos

itive

rate

1

01

02

03

04

05

06

07

08

09

0

101 02 03 04 05 06 07 08 090

BaselineiEzy-Drug AUC = 09377

Figure 4 A plot for the ROC curve to quantitatively show theperformance of the iEzy-Drug predictor

estimating its potential application value Generally speak-ing the following three cross-validation methods are oftenused to examine a predictor of its effectiveness in practicalapplication independent dataset test subsampling or 119870-fold(such as 5-fold 7-fold or 10-fold) test and jackknife test[108] However as elaborated by a penetrating analysis in[115] considerable arbitrariness exists in the independentdataset test Also as demonstrated by (27)ndash(29) in [33]the subsampling test (or 119870-fold cross-validation) cannotavoid arbitrariness either Only the jackknife test is the leastarbitrary that can always yield a unique result for a givenbenchmark dataset Therefore the jackknife test has beenwidely recognized and increasingly utilized by investigatorsto examine the quality of various predictors (see eg [66 7174 80]) Accordingly the success rate by the jackknife test wasalso used to optimize the two uncertain parameters 119870 and 120593in (27) The result thus obtained is shown in Figure 3 fromwhich we obtain when 119870 = 6 and 120593 = 15 the iEzy-Drugpredictor reaches its optimized status

The success rates thus obtained by the jackknife test inidentifying interactive Enzyme-drug pairs or noninteractiveenzyme-drug pairs on the benchmark dataset S (cf OnlineSupporting Information S1) are given in Table 1 where forfacilitating comparison the corresponding result by He et al[110] is also given As we can see from the table the overallaccuracy Acc achieved by iEzy-Drug was 9103 remarkablyhigher than 8548 the corresponding rate obtained byHe et al [110] on the same benchmark Furthermore listedin Table 1 are also the values obtained by iEzy-Drug for theother three metrics that is Sn = 9081 Sp = 9114 andMCC = 8039 indicating that the accuracy of iEzy-Drug isnot only very high but also quite stale

To provide a graphical illustration to show the per-formance of the current binary classifier iEzy-Drug asits discrimination threshold is varied a 2D plot calledReceiver Operating Characteristic (ROC) curve [116 117]was also given (Figure 4) In the ROC curve the verticalcoordinate 119884 is for the true positive rate or Sn (cf (34))while horizontal coordinate 119883 for the false positive rate

BioMed Research International 9

Read Me Data Citation

Enter both the protein sequence and the drug code (example) the number of query pairs is limited at 10 or less for each submission

Submit Clear

You can download the program and run it on your own local computer

iEzy-Drug identifying the interaction between enzymes anddrugs in cellular networking

Figure 5 A semiscreenshot to show the top page of the iEzy-Drugweb-server Its web-site address is at httpwwwjci-bioinfocniEzy-Drug

Table 1 The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drugpairs for the benchmark dataset S (cf Online Supporting Information S1)

Method Acc Sn Sp MCC

iEzy-Druga 74258157 = 9103 24692719 = 9081 49565438 = 9114 8039NN predictorb 8548 NA NA NAaSee (27) where the parameters119870 = 6 and 120593 = 15bSee [110]

or 1-Sp The best possible prediction method would yielda point with the coordinate (0 1) representing 100 truepositive rate (sensitivity Sn) and 0 false positive rate or100 specificity Therefore the (0 1) point is also calleda perfect classification A completely random guess wouldgive a point along a diagonal from the point (0 0) to (1 1)The area under the ROC curve also called Area Under theROC (AUROC) is often used to indicate the performancequality of a binary classifier the value 05 of AUROCis equivalent to random prediction while 1 of AUROCrepresents a perfect one As we can see from Figure 4the AUROC value obtained by iEzy-Drug is 09377

The reason why iEzy-Drug can remarkably improve theprediction quality is that it has introduced the 2D molecularfingerprints to represent drug samples see Online SupportingInformation S3 for the detailed fingerprint expressions for thedrugs listed in Online Supporting Information S1 and thatit has successfully used PseAAC to incorporate the featuresderived from the sequences of enzymes that are essentialfor identifying the interaction of enzymes with drugs in thecellular networking

To enhance the value of its practical applications theweb server for iEzy-Drug has been established that canbe freely accessible at httpwwwjci-bioinfocniEzy-DrugIt is anticipated that the web server will become a usefulhigh throughput tool for both basic research and drug

development in the relevant areas or at the very least playa complementary role to the existing method [39 110 118] forwhich so far noweb-serverwhatsoever has been provided yet

32 The Protocol or User Guide For the convenience of thevast majority of biologists and pharmaceutical scientists herelet us provide a step-by-step guide to show how the userscan easily get the desired result by means of the web serverwithout the need to follow the complicated mathematicalequations presented in this paper for the process of develop-ing the predictor and its integrityStep 1 Open the web server at the site httpwwwjci-bio-infocniEzy-Drug and you will see the top page of thepredictor on your computer screen as shown in Figure 5Click on the ReadMe button to see a brief introduction aboutiEzy-Drug predictor and the caveat when using itStep 2 Either type or copypaste the query pairs into theinput box at the center of Figure 5 Each query pair consistsof two parts one is for the protein sequence and the other forthe drug The enzyme sequence should be in FASTA formatwhile the drug in the KEGG code Examples for the querypairs input can be seen by clicking on the Example buttonright above the input boxStep 3 Click on the Submit button to see the predicted resultFor example if you use the four query pairs in the Example

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 5

Incorporating the above 400 dipeptide components into (11)we have

120595119906+20

= 119891

(2)

119906(119906 = 1 2 400) (14)

(c) Sequential Evolution Information Biology is a naturalscience with a historic dimension All biological specieshave developed starting out from a very limited number ofancestral species Their evolution involves changes of singleresidues insertions and deletions of several residues [94]gene doubling and gene fusion With these changes accu-mulated for a long period of time many similarities betweeninitial and resultant amino acid sequences are graduallyeliminated but the corresponding proteins may still sharemany common attributes [18] such as having basically thesame biological function and residing at a same subcellularlocation To extract the sequential evolution information anduse it to define the components of (11) the PSSM (PositionSpecific Scoring Matrix) was used as described next

According to Schaffer et al [95] the sequence evolutioninformation of enzyme E with 119871 amino acid residues can beexpressed by an 119871 times 20matrix as given by

P(0)PSSM =

[

[

[

[

[

[

[

119864

0

1rarr1119864

0

1rarr2sdot sdot sdot 119864

0

1rarr20

119864

0

2rarr1119864

0

2rarr2sdot sdot sdot 119864

0

2rarr20

119864

0

119871rarr1119864

0

119871rarr2sdot sdot sdot 119864

0

119871rarr20

]

]

]

]

]

]

]

(15)

where 1198640119894rarr 119895

represents the original score of the 119894th aminoacid residue (119894 = 1 2 119871) in the enzyme sequence changedto amino acid type 119895 (119895 = 1 2 20) in the process ofevolution Here the numerical codes 1 2 20 are used torepresent the 20 native amino acid types denoted by A CD E F G H I K L M N P Q R S T V W and Y The119871 times 20 scores in (15) were generated by using PSI-BLAST[96] to search the UniProtKBSwiss-Prot database (Release2013-05) through three iterations with 0001 as the 119864 valuecutoff for multiple sequence alignment against the sequenceof the enzyme E In order to make every element in (15) bescaled from their original score ranges into region of [0 1]we performed a conversion through the standard sigmoidfunction to make it become

P(1)PSSM =

[

[

[

[

[

[

[

119864

1

1rarr1119864

1

1rarr2sdot sdot sdot 119864

1

1rarr20

119864

1

2rarr1119864

1

2rarr2sdot sdot sdot 119864

1

2rarr20

119864

1

119871rarr1119864

1

119871rarr2sdot sdot sdot 119864

1

119871rarr20

]

]

]

]

]

]

]

(16)

where

119864

1

119894rarr 119895=

1

1 + 119890

minus1198640

119894rarr 119895

(1 le 119894 le 119871 1 le 119895 le 20) (17)

Nowwe extract the useful information from (16) to define thecomponents of (11) via the following approach

120595119906+420

= ℓ119906

(119906 = 1 2 20) (18)

where

ℓ119895=

1

119871

times

119871

sum

119896=1

119864

1

119896rarr119895(119895 = 1 2 20) (19)

(d) Grey System Model Approach The grey system theory[97] is quite useful in dealing with complicated systems thatlack sufficient information or need to process uncertaininformation According to the grey system theory we canextract the following information from the 119895th column of(16) that is

[

119886

119895

1

119886

119895

2

] = (BT119895B119895)

minus1

BT119895U119895

(119895 = 1 2 20) (20)

where

B119895=

[

[

[

[

[

[

[

[

[

[

minus119864

1

2rarr119895minus119864

1

1rarr119895minus 05119864

1

2rarr1198951

minus119864

1

3rarr119895minus

2

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

3rarr1198951

minus119864

1

119871rarr119895minus

119871minus1

sum

119894=1

119864

1

119894rarr 119895minus 05119864

1

119871rarr1198951

]

]

]

]

]

]

]

]

]

]

U119895=

[

[

[

[

[

[

119864

1

2rarr119895minus 119864

1

1rarr119895

119864

1

3rarr119895minus 119864

1

2rarr119895

119864

1

119871rarr119895minus 119864

1

119871minus1rarr119895

]

]

]

]

]

]

(21)

Therefore based on the grey system theory and (20) we canextract another 20 times 2 = 40 quantities from (16) to definethe components of (11) that is

120593119895=

1199081119886

119895

1when 119895 is an odd number

1199082119886

119895

2when 119895 is an even number

1 le 119895 le 20

(22)

where 1198861198951and 119886119895

2are given by (20) 119908

1and 119908

2are weight

factors which were all set to 1 in the current studySubstituting the elements in (12) (14) (18) and (22) we

finally obtain a total of Ω = 20 + 400 + 20 + 40 = 480

components for the PseAAC of (11) where

120595119906=

119891119906

when 1 le 119906 le 20119891

(2)

119906when 21 le 119906 le 420

ℓ119906

when 421 le 119906 le 440120593119906

when 440 le 119906 le 480

(23)

In other words in this study (11) or Choursquos PseAAC is a 480-D vector whose 480 components are given by (23) derivedfrom the amino acid composition dipeptide compositionsequential evolution information and grey system theory(e) Representing Enzyme-Drug Pairs Now the pair betweenan enzyme molecule E and a drug compound D can beformulated by combing (8) and (11) as given by

G = D oplus E

= [1198601sdot sdot sdot 119860

256119867119909

CF 1205951sdot sdot sdot 120595480]

T

(24)

6 BioMed Research International

where G represents the enzyme-drug pair oplus the orthogonalsum [51] and each of the (258 + 480) = 738 feature elementsis given in (8) and (23)

For the convenience of the later formulation let us use119909119894(119894 = 1 2 738) to represent the 738 components of (24)

that is

G = [11990911199092sdot sdot sdot 119909119894sdot sdot sdot 119909738]

T (25)

To optimize the prediction results different weights wereusually tested for each of the elements in (25) However sinceit would consume a lot of computational time for a total of 738weight factors here let us adopt the normalization approachto deal with this problem as done in [98 99] that is convert119909119894in (25) to 119910

119894according to the following equation

119910119894=

2tanminus1 (119909119894)

120587

(119894 = 1 2 738) (26)

where tanminus1 means arctangent By means of (26) everycomponent in (25) will be converted into the range of[minus1 1] that is we have minus1 le 119910

119894le 1 As demonstrated

in [98 99] the normalization approach via (26) was quiteeffective in enhancing the quality of prediction operated ina high dimension space Therefore in this study we wouldnot to take the procedure of optimizing the weight factorssignificantly reducing the computational times

23 Fuzzy 119870-Nearest Neighbour Algorithm The 119870-NN (119870-Nearest Neighbor) classifier is quite popular in patternrecognition community owing to its good performance andsimple-to-use feature According to the 119870-NN rule [100]named also as the ldquovoting 119870-NN rulerdquo the query sampleshould be assigned to the subset represented by a majorityof its 119870 nearest neighbors as illustrated in Figure 5 of [33]

Fuzzy 119870-NN classification method [101] is a specialvariation of the119870-NNclassification family Instead of roughlyassigning the label based on a voting from the 119870 nearestneighbors it attempts to estimate the membership valuesthat indicate how much degree the query sample belongsto the classes concerned Obviously it is impossible for anycharacteristic description to contain complete informationwhich would make the classification ambiguous In view ofthis the fuzzy principle is very reasonable and particularlyuseful in dealing with complicated biological systems such asidentifying nuclear receptor subfamilies [102] characterizingthe structure of fast-folding proteins [103] classifying Gprotein-coupled receptors [104] predicting protein quater-nary structural attributes [105] predicting protein structuralclasses [106 107] and so forth

Next let us give a brief introduction how to use thefuzzy119870-NN approach to identify the interactions between theenzymes and the drug compounds in the network concerned

Supposing that S(119873) = G1G2 G

119873 is a set of vec-

tors representing 119873 enzyme-drug pairs in a training set

classified into two classes 119862+ 119862minus where 119862+ denotes theinteractive pair class while 119862minus the noninteractive pair classSlowast(G) = Glowast

1Glowast2 Glowast

119870 sub S(119873) is the subset of the 119870

nearest neighbor pairs to the query pair G Thus the fuzzymembership value for the query pair G in the two classes ofS(119873) is given by

120583

+

(G) =sum

119870

119895=1120583

+(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

120583

minus

(G) =sum

119870

119895=1120583

minus(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

(27)

where 119870 is the number of the nearest neighbors counted forthe query pairG 120583+(Glowast

119895) and 120583minus(Glowast

119895) the fuzzy membership

values of the training sample Glowast119895to the class 119862+ and 119862minus

respectively as will be further defined next 119889(GGlowast119895) the

cosine distance between G and its 119895th nearest pair Glowast119895in

the training dataset S(119873) 120593(gt 1) the fuzzy coefficient fordetermining how heavily the distance is weighted whencalculating each nearest neighborrsquos contribution to the mem-bership value Note that the parameters 119870 and 120593 will affectthe computation result of (27) and they will be optimizedby a grid-search as will be described later Also variousother metrics can be chosen for 119889(GGlowast

119895) such as Euclidean

distance Hamming distance [108] andMahalanobis distance[55 109]

The quantitative definitions for the aforementioned120583

+(Glowast119895) and 120583minus(Glowast

119895) in (27) are given by

120583

+

(Glowast119895) =

1 if Glowast119895isin 119862

+

0 otherwise

120583

minus

(Glowast119895) =

1 if Glowast119895isin 119862

minus

0 otherwise

(28)

Substituting the results obtained by (27) into (28) it followsthat if 120583+(G) gt 120583

minus(G) then the query pair G is an

interactive couple otherwise noninteractive In other wordsthe outcome can be formulated as

G isin

119862

+ if 120583+ (G) gt 120583minus (G)

119862

minus otherwise

(29)

If there is a tie between 120583+(G) and 120583minus(G) the query pair Gwill be randomly assigned to one of the two classes Howevercase like that is quite rare and in this study never happened

The predictor thus established is called iEzy-Drugwhere ldquoirdquo means identify and ldquoEzy-Drugrdquo means the inter-action between enzyme and drug To provide an intuitiveoverall picture a flowchart is provided in Figure 2 to showthe process of how the classifier works in identifying enzyme-drug interactions

BioMed Research International 7

Input a queryprotein sequence and

a drug code

Create Chous PseAAC of a protein

Create feature vector of a drug

Feature fusion

engineTraining dataset

Yes No

Noninteractive enzyme drug pair

Interactive enzymedrug pair

AACDipeptideSeq Evol

Grey model

Mol FptInfo entropy

CF

FKNN operation

Figure 2 A flowchart to show the operation process of the iEzy-Drug predictor See the text for further explanation

24 Criteria for Performance Evaluation In the literaturethe following equation set is often used for examining theperformance quality of a predictor

Sn = TPTP + FN

Sp = TNTN + FP

Acc = TP + TNTP + TN + FP + FN

MCC = (TP times TN) minus (FP times FN)radic(TP + FP) (TP + FN) (TN + FP) (TN + FN)

(30)

where TP represents the true positive TN the true negativeFP the false positive FN the false negative Sn the sensitiv-ity Sp the specificity Acc the accuracy MCC the Mathewrsquoscorrelation coefficient

To most biologists however the four metrics as formu-lated in (30) are not quite intuitive and easier-to-understandparticularly for the Mathewrsquos correlation coefficient Herelet us adopt the Choursquos symbols to formulate the previousfour metrics By means of Choursquos symbols [111 112] the rates

of correct predictions for the interactive enzyme-drug pairsin dataset S+ and the noninteractive enzyme-drug pairs indataset Sminus are respectively defined by (cf (1))

Λ

+

=

119873

+minus 119873

+

minus

119873

+ for the interactive enzyme-drug pairs

Λ

minus

=

119873

minusminus 119873

minus

+

119873

minus for the noninteractive enzyme-drug pairs

(31)

where 119873+ is the total number of the interactive enzyme-drug pairs investigated while 119873+

minusis the number of the

interactive enzyme-drug pairs incorrectly predicted as thenoninteractive enzyme-drug pairs 119873minus is the total numberof the noninteractive enzyme-drug pairs investigated while119873

minus

+is the number of the noninteractive enzyme-drug pairs

incorrectly predicted as the interactive enzyme-drug pairsThe overall success prediction rate is given by [113] as follows

Λ =

Λ

+119873

++ Λ

minus119873

minus

119873

++ 119873

minus= 1 minus

119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

(32)

It is obvious from (31)-(32) that if and only if none ofthe interactive enzyme-drug pairs and the noninteractiveenzyme-drug pairs are mispredicted that is 119873+

minus= 119873

minus

+= 0

and Λ+ = Λ

minus= 1 we have the overall success rate Λ = 1

Otherwise the overall success rate would be smaller than 1The relations between the symbols in (32) and those in

(30) are given by

TP = 119873+ minus 119873+minus

TN = 119873

minus

minus 119873

minus

+

FP = 119873minus+

FN = 119873

+

minus

(33)

Substituting (33) into (30) and also noting (31)-(32) we obtain

Sn = Λ+ = 1 minus119873

+

minus

119873

+

Sp = Λminus = 1 minus119873

minus

+

119873

minus

Acc = Λ = 1 minus119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

MCC =1 minus ((119873

+

minus119873

+) + (119873

minus

+119873

minus))

radic(1 + (119873

minus

+minus 119873

+

minus) 119873

+) (1 + (119873

+

minusminus 119873

minus

+) 119873

minus)

(34)

Now it is obvious to see from (34) when 119873

+

minus= 0

meaning none of the interactive enzyme-drug pairs was

8 BioMed Research International

02

46

810

1

15

2089

0895

09

0905

091

0915

092

ACC

K120593

Figure 3 A 3D plot to show how the parameter in (27) wasoptimized for the iEzy-Drug predictor

mispredicted to be a noninteractive enzyme-drug pair wehave the sensitivity Sn = 1 while 119873+

minus= 119873

+ meaning thatall the interactive enzyme-drug pairs weremispredicted to bethe noninteractive enzyme-drug pairs we have the sensitivitySn = 0 Likewise when 119873minus

+= 0 meaning none of the

noninteractive enzyme-drug pairs wasmispredicted we havethe specificity Sp = 1 while 119873minus

+= 119873

minus meaning all thenoninteractive enzyme-drug pairs were incorrectly predictedas interactive enzyme-drug pairs we have the specificity Sp =0 When119873+

minus= 119873

minus

+= 0 meaning that none of the interactive

enzyme-drug pairs in the dataset S+ and none of the nonin-teractive enzyme-drug pairs in Sminus was incorrectly predictedwe have the overall accuracy Acc = Λ = 1 while 119873+

minus= 119873

+

and 119873minus+= 119873

minus meaning that all the interactive enzyme-drugpairs in the dataset S+ and all the noninteractive enzyme-drug pairs in Sminus were mispredicted we have the overallaccuracy Acc = Λ = 0 The MCC correlation coefficient isusually used for measuring the quality of binary (two-class)classifications When 119873+

minus= 119873

minus

+= 0 meaning that none

of the interactive enzyme-drug pairs in the dataset S+ andnone of the noninteractive enzyme-drug pairs in Sminus weremispredicted we have MCC = 1 when 119873+

minus= 119873

+2 and

119873

minus

+= 119873

minus2 we have MCC = 0 meaning no better than

random prediction when 119873+minus= 119873

+ and 119873minus+= 119873

minus we haveMCC = minus1 meaning total disagreement between predictionand observation As we can see from the previous discussionit is much more intuitive and easier-to-understand whenusing (34) to examine a predictor for its sensitivity specificityoverall accuracy and Mathewrsquos correlation coefficient It isinstructive to point out that themetrics as defined in (30) and(34) are valid for single label systems for multilabel systemsa set of more complicated metrics should be used as given in[114]

3 Results and Discussion

31 Cross-Validation How to properly examine the predic-tion quality is a key for developing a new predictor and

False positive rate

True

pos

itive

rate

1

01

02

03

04

05

06

07

08

09

0

101 02 03 04 05 06 07 08 090

BaselineiEzy-Drug AUC = 09377

Figure 4 A plot for the ROC curve to quantitatively show theperformance of the iEzy-Drug predictor

estimating its potential application value Generally speak-ing the following three cross-validation methods are oftenused to examine a predictor of its effectiveness in practicalapplication independent dataset test subsampling or 119870-fold(such as 5-fold 7-fold or 10-fold) test and jackknife test[108] However as elaborated by a penetrating analysis in[115] considerable arbitrariness exists in the independentdataset test Also as demonstrated by (27)ndash(29) in [33]the subsampling test (or 119870-fold cross-validation) cannotavoid arbitrariness either Only the jackknife test is the leastarbitrary that can always yield a unique result for a givenbenchmark dataset Therefore the jackknife test has beenwidely recognized and increasingly utilized by investigatorsto examine the quality of various predictors (see eg [66 7174 80]) Accordingly the success rate by the jackknife test wasalso used to optimize the two uncertain parameters 119870 and 120593in (27) The result thus obtained is shown in Figure 3 fromwhich we obtain when 119870 = 6 and 120593 = 15 the iEzy-Drugpredictor reaches its optimized status

The success rates thus obtained by the jackknife test inidentifying interactive Enzyme-drug pairs or noninteractiveenzyme-drug pairs on the benchmark dataset S (cf OnlineSupporting Information S1) are given in Table 1 where forfacilitating comparison the corresponding result by He et al[110] is also given As we can see from the table the overallaccuracy Acc achieved by iEzy-Drug was 9103 remarkablyhigher than 8548 the corresponding rate obtained byHe et al [110] on the same benchmark Furthermore listedin Table 1 are also the values obtained by iEzy-Drug for theother three metrics that is Sn = 9081 Sp = 9114 andMCC = 8039 indicating that the accuracy of iEzy-Drug isnot only very high but also quite stale

To provide a graphical illustration to show the per-formance of the current binary classifier iEzy-Drug asits discrimination threshold is varied a 2D plot calledReceiver Operating Characteristic (ROC) curve [116 117]was also given (Figure 4) In the ROC curve the verticalcoordinate 119884 is for the true positive rate or Sn (cf (34))while horizontal coordinate 119883 for the false positive rate

BioMed Research International 9

Read Me Data Citation

Enter both the protein sequence and the drug code (example) the number of query pairs is limited at 10 or less for each submission

Submit Clear

You can download the program and run it on your own local computer

iEzy-Drug identifying the interaction between enzymes anddrugs in cellular networking

Figure 5 A semiscreenshot to show the top page of the iEzy-Drugweb-server Its web-site address is at httpwwwjci-bioinfocniEzy-Drug

Table 1 The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drugpairs for the benchmark dataset S (cf Online Supporting Information S1)

Method Acc Sn Sp MCC

iEzy-Druga 74258157 = 9103 24692719 = 9081 49565438 = 9114 8039NN predictorb 8548 NA NA NAaSee (27) where the parameters119870 = 6 and 120593 = 15bSee [110]

or 1-Sp The best possible prediction method would yielda point with the coordinate (0 1) representing 100 truepositive rate (sensitivity Sn) and 0 false positive rate or100 specificity Therefore the (0 1) point is also calleda perfect classification A completely random guess wouldgive a point along a diagonal from the point (0 0) to (1 1)The area under the ROC curve also called Area Under theROC (AUROC) is often used to indicate the performancequality of a binary classifier the value 05 of AUROCis equivalent to random prediction while 1 of AUROCrepresents a perfect one As we can see from Figure 4the AUROC value obtained by iEzy-Drug is 09377

The reason why iEzy-Drug can remarkably improve theprediction quality is that it has introduced the 2D molecularfingerprints to represent drug samples see Online SupportingInformation S3 for the detailed fingerprint expressions for thedrugs listed in Online Supporting Information S1 and thatit has successfully used PseAAC to incorporate the featuresderived from the sequences of enzymes that are essentialfor identifying the interaction of enzymes with drugs in thecellular networking

To enhance the value of its practical applications theweb server for iEzy-Drug has been established that canbe freely accessible at httpwwwjci-bioinfocniEzy-DrugIt is anticipated that the web server will become a usefulhigh throughput tool for both basic research and drug

development in the relevant areas or at the very least playa complementary role to the existing method [39 110 118] forwhich so far noweb-serverwhatsoever has been provided yet

32 The Protocol or User Guide For the convenience of thevast majority of biologists and pharmaceutical scientists herelet us provide a step-by-step guide to show how the userscan easily get the desired result by means of the web serverwithout the need to follow the complicated mathematicalequations presented in this paper for the process of develop-ing the predictor and its integrityStep 1 Open the web server at the site httpwwwjci-bio-infocniEzy-Drug and you will see the top page of thepredictor on your computer screen as shown in Figure 5Click on the ReadMe button to see a brief introduction aboutiEzy-Drug predictor and the caveat when using itStep 2 Either type or copypaste the query pairs into theinput box at the center of Figure 5 Each query pair consistsof two parts one is for the protein sequence and the other forthe drug The enzyme sequence should be in FASTA formatwhile the drug in the KEGG code Examples for the querypairs input can be seen by clicking on the Example buttonright above the input boxStep 3 Click on the Submit button to see the predicted resultFor example if you use the four query pairs in the Example

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

6 BioMed Research International

where G represents the enzyme-drug pair oplus the orthogonalsum [51] and each of the (258 + 480) = 738 feature elementsis given in (8) and (23)

For the convenience of the later formulation let us use119909119894(119894 = 1 2 738) to represent the 738 components of (24)

that is

G = [11990911199092sdot sdot sdot 119909119894sdot sdot sdot 119909738]

T (25)

To optimize the prediction results different weights wereusually tested for each of the elements in (25) However sinceit would consume a lot of computational time for a total of 738weight factors here let us adopt the normalization approachto deal with this problem as done in [98 99] that is convert119909119894in (25) to 119910

119894according to the following equation

119910119894=

2tanminus1 (119909119894)

120587

(119894 = 1 2 738) (26)

where tanminus1 means arctangent By means of (26) everycomponent in (25) will be converted into the range of[minus1 1] that is we have minus1 le 119910

119894le 1 As demonstrated

in [98 99] the normalization approach via (26) was quiteeffective in enhancing the quality of prediction operated ina high dimension space Therefore in this study we wouldnot to take the procedure of optimizing the weight factorssignificantly reducing the computational times

23 Fuzzy 119870-Nearest Neighbour Algorithm The 119870-NN (119870-Nearest Neighbor) classifier is quite popular in patternrecognition community owing to its good performance andsimple-to-use feature According to the 119870-NN rule [100]named also as the ldquovoting 119870-NN rulerdquo the query sampleshould be assigned to the subset represented by a majorityof its 119870 nearest neighbors as illustrated in Figure 5 of [33]

Fuzzy 119870-NN classification method [101] is a specialvariation of the119870-NNclassification family Instead of roughlyassigning the label based on a voting from the 119870 nearestneighbors it attempts to estimate the membership valuesthat indicate how much degree the query sample belongsto the classes concerned Obviously it is impossible for anycharacteristic description to contain complete informationwhich would make the classification ambiguous In view ofthis the fuzzy principle is very reasonable and particularlyuseful in dealing with complicated biological systems such asidentifying nuclear receptor subfamilies [102] characterizingthe structure of fast-folding proteins [103] classifying Gprotein-coupled receptors [104] predicting protein quater-nary structural attributes [105] predicting protein structuralclasses [106 107] and so forth

Next let us give a brief introduction how to use thefuzzy119870-NN approach to identify the interactions between theenzymes and the drug compounds in the network concerned

Supposing that S(119873) = G1G2 G

119873 is a set of vec-

tors representing 119873 enzyme-drug pairs in a training set

classified into two classes 119862+ 119862minus where 119862+ denotes theinteractive pair class while 119862minus the noninteractive pair classSlowast(G) = Glowast

1Glowast2 Glowast

119870 sub S(119873) is the subset of the 119870

nearest neighbor pairs to the query pair G Thus the fuzzymembership value for the query pair G in the two classes ofS(119873) is given by

120583

+

(G) =sum

119870

119895=1120583

+(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

120583

minus

(G) =sum

119870

119895=1120583

minus(Glowast119895) 119889(GGlowast

119895)

minus2(120593minus1)

sum

119870

119895=1119889(GGlowast

119895)

minus2(120593minus1)

(27)

where 119870 is the number of the nearest neighbors counted forthe query pairG 120583+(Glowast

119895) and 120583minus(Glowast

119895) the fuzzy membership

values of the training sample Glowast119895to the class 119862+ and 119862minus

respectively as will be further defined next 119889(GGlowast119895) the

cosine distance between G and its 119895th nearest pair Glowast119895in

the training dataset S(119873) 120593(gt 1) the fuzzy coefficient fordetermining how heavily the distance is weighted whencalculating each nearest neighborrsquos contribution to the mem-bership value Note that the parameters 119870 and 120593 will affectthe computation result of (27) and they will be optimizedby a grid-search as will be described later Also variousother metrics can be chosen for 119889(GGlowast

119895) such as Euclidean

distance Hamming distance [108] andMahalanobis distance[55 109]

The quantitative definitions for the aforementioned120583

+(Glowast119895) and 120583minus(Glowast

119895) in (27) are given by

120583

+

(Glowast119895) =

1 if Glowast119895isin 119862

+

0 otherwise

120583

minus

(Glowast119895) =

1 if Glowast119895isin 119862

minus

0 otherwise

(28)

Substituting the results obtained by (27) into (28) it followsthat if 120583+(G) gt 120583

minus(G) then the query pair G is an

interactive couple otherwise noninteractive In other wordsthe outcome can be formulated as

G isin

119862

+ if 120583+ (G) gt 120583minus (G)

119862

minus otherwise

(29)

If there is a tie between 120583+(G) and 120583minus(G) the query pair Gwill be randomly assigned to one of the two classes Howevercase like that is quite rare and in this study never happened

The predictor thus established is called iEzy-Drugwhere ldquoirdquo means identify and ldquoEzy-Drugrdquo means the inter-action between enzyme and drug To provide an intuitiveoverall picture a flowchart is provided in Figure 2 to showthe process of how the classifier works in identifying enzyme-drug interactions

BioMed Research International 7

Input a queryprotein sequence and

a drug code

Create Chous PseAAC of a protein

Create feature vector of a drug

Feature fusion

engineTraining dataset

Yes No

Noninteractive enzyme drug pair

Interactive enzymedrug pair

AACDipeptideSeq Evol

Grey model

Mol FptInfo entropy

CF

FKNN operation

Figure 2 A flowchart to show the operation process of the iEzy-Drug predictor See the text for further explanation

24 Criteria for Performance Evaluation In the literaturethe following equation set is often used for examining theperformance quality of a predictor

Sn = TPTP + FN

Sp = TNTN + FP

Acc = TP + TNTP + TN + FP + FN

MCC = (TP times TN) minus (FP times FN)radic(TP + FP) (TP + FN) (TN + FP) (TN + FN)

(30)

where TP represents the true positive TN the true negativeFP the false positive FN the false negative Sn the sensitiv-ity Sp the specificity Acc the accuracy MCC the Mathewrsquoscorrelation coefficient

To most biologists however the four metrics as formu-lated in (30) are not quite intuitive and easier-to-understandparticularly for the Mathewrsquos correlation coefficient Herelet us adopt the Choursquos symbols to formulate the previousfour metrics By means of Choursquos symbols [111 112] the rates

of correct predictions for the interactive enzyme-drug pairsin dataset S+ and the noninteractive enzyme-drug pairs indataset Sminus are respectively defined by (cf (1))

Λ

+

=

119873

+minus 119873

+

minus

119873

+ for the interactive enzyme-drug pairs

Λ

minus

=

119873

minusminus 119873

minus

+

119873

minus for the noninteractive enzyme-drug pairs

(31)

where 119873+ is the total number of the interactive enzyme-drug pairs investigated while 119873+

minusis the number of the

interactive enzyme-drug pairs incorrectly predicted as thenoninteractive enzyme-drug pairs 119873minus is the total numberof the noninteractive enzyme-drug pairs investigated while119873

minus

+is the number of the noninteractive enzyme-drug pairs

incorrectly predicted as the interactive enzyme-drug pairsThe overall success prediction rate is given by [113] as follows

Λ =

Λ

+119873

++ Λ

minus119873

minus

119873

++ 119873

minus= 1 minus

119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

(32)

It is obvious from (31)-(32) that if and only if none ofthe interactive enzyme-drug pairs and the noninteractiveenzyme-drug pairs are mispredicted that is 119873+

minus= 119873

minus

+= 0

and Λ+ = Λ

minus= 1 we have the overall success rate Λ = 1

Otherwise the overall success rate would be smaller than 1The relations between the symbols in (32) and those in

(30) are given by

TP = 119873+ minus 119873+minus

TN = 119873

minus

minus 119873

minus

+

FP = 119873minus+

FN = 119873

+

minus

(33)

Substituting (33) into (30) and also noting (31)-(32) we obtain

Sn = Λ+ = 1 minus119873

+

minus

119873

+

Sp = Λminus = 1 minus119873

minus

+

119873

minus

Acc = Λ = 1 minus119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

MCC =1 minus ((119873

+

minus119873

+) + (119873

minus

+119873

minus))

radic(1 + (119873

minus

+minus 119873

+

minus) 119873

+) (1 + (119873

+

minusminus 119873

minus

+) 119873

minus)

(34)

Now it is obvious to see from (34) when 119873

+

minus= 0

meaning none of the interactive enzyme-drug pairs was

8 BioMed Research International

02

46

810

1

15

2089

0895

09

0905

091

0915

092

ACC

K120593

Figure 3 A 3D plot to show how the parameter in (27) wasoptimized for the iEzy-Drug predictor

mispredicted to be a noninteractive enzyme-drug pair wehave the sensitivity Sn = 1 while 119873+

minus= 119873

+ meaning thatall the interactive enzyme-drug pairs weremispredicted to bethe noninteractive enzyme-drug pairs we have the sensitivitySn = 0 Likewise when 119873minus

+= 0 meaning none of the

noninteractive enzyme-drug pairs wasmispredicted we havethe specificity Sp = 1 while 119873minus

+= 119873

minus meaning all thenoninteractive enzyme-drug pairs were incorrectly predictedas interactive enzyme-drug pairs we have the specificity Sp =0 When119873+

minus= 119873

minus

+= 0 meaning that none of the interactive

enzyme-drug pairs in the dataset S+ and none of the nonin-teractive enzyme-drug pairs in Sminus was incorrectly predictedwe have the overall accuracy Acc = Λ = 1 while 119873+

minus= 119873

+

and 119873minus+= 119873

minus meaning that all the interactive enzyme-drugpairs in the dataset S+ and all the noninteractive enzyme-drug pairs in Sminus were mispredicted we have the overallaccuracy Acc = Λ = 0 The MCC correlation coefficient isusually used for measuring the quality of binary (two-class)classifications When 119873+

minus= 119873

minus

+= 0 meaning that none

of the interactive enzyme-drug pairs in the dataset S+ andnone of the noninteractive enzyme-drug pairs in Sminus weremispredicted we have MCC = 1 when 119873+

minus= 119873

+2 and

119873

minus

+= 119873

minus2 we have MCC = 0 meaning no better than

random prediction when 119873+minus= 119873

+ and 119873minus+= 119873

minus we haveMCC = minus1 meaning total disagreement between predictionand observation As we can see from the previous discussionit is much more intuitive and easier-to-understand whenusing (34) to examine a predictor for its sensitivity specificityoverall accuracy and Mathewrsquos correlation coefficient It isinstructive to point out that themetrics as defined in (30) and(34) are valid for single label systems for multilabel systemsa set of more complicated metrics should be used as given in[114]

3 Results and Discussion

31 Cross-Validation How to properly examine the predic-tion quality is a key for developing a new predictor and

False positive rate

True

pos

itive

rate

1

01

02

03

04

05

06

07

08

09

0

101 02 03 04 05 06 07 08 090

BaselineiEzy-Drug AUC = 09377

Figure 4 A plot for the ROC curve to quantitatively show theperformance of the iEzy-Drug predictor

estimating its potential application value Generally speak-ing the following three cross-validation methods are oftenused to examine a predictor of its effectiveness in practicalapplication independent dataset test subsampling or 119870-fold(such as 5-fold 7-fold or 10-fold) test and jackknife test[108] However as elaborated by a penetrating analysis in[115] considerable arbitrariness exists in the independentdataset test Also as demonstrated by (27)ndash(29) in [33]the subsampling test (or 119870-fold cross-validation) cannotavoid arbitrariness either Only the jackknife test is the leastarbitrary that can always yield a unique result for a givenbenchmark dataset Therefore the jackknife test has beenwidely recognized and increasingly utilized by investigatorsto examine the quality of various predictors (see eg [66 7174 80]) Accordingly the success rate by the jackknife test wasalso used to optimize the two uncertain parameters 119870 and 120593in (27) The result thus obtained is shown in Figure 3 fromwhich we obtain when 119870 = 6 and 120593 = 15 the iEzy-Drugpredictor reaches its optimized status

The success rates thus obtained by the jackknife test inidentifying interactive Enzyme-drug pairs or noninteractiveenzyme-drug pairs on the benchmark dataset S (cf OnlineSupporting Information S1) are given in Table 1 where forfacilitating comparison the corresponding result by He et al[110] is also given As we can see from the table the overallaccuracy Acc achieved by iEzy-Drug was 9103 remarkablyhigher than 8548 the corresponding rate obtained byHe et al [110] on the same benchmark Furthermore listedin Table 1 are also the values obtained by iEzy-Drug for theother three metrics that is Sn = 9081 Sp = 9114 andMCC = 8039 indicating that the accuracy of iEzy-Drug isnot only very high but also quite stale

To provide a graphical illustration to show the per-formance of the current binary classifier iEzy-Drug asits discrimination threshold is varied a 2D plot calledReceiver Operating Characteristic (ROC) curve [116 117]was also given (Figure 4) In the ROC curve the verticalcoordinate 119884 is for the true positive rate or Sn (cf (34))while horizontal coordinate 119883 for the false positive rate

BioMed Research International 9

Read Me Data Citation

Enter both the protein sequence and the drug code (example) the number of query pairs is limited at 10 or less for each submission

Submit Clear

You can download the program and run it on your own local computer

iEzy-Drug identifying the interaction between enzymes anddrugs in cellular networking

Figure 5 A semiscreenshot to show the top page of the iEzy-Drugweb-server Its web-site address is at httpwwwjci-bioinfocniEzy-Drug

Table 1 The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drugpairs for the benchmark dataset S (cf Online Supporting Information S1)

Method Acc Sn Sp MCC

iEzy-Druga 74258157 = 9103 24692719 = 9081 49565438 = 9114 8039NN predictorb 8548 NA NA NAaSee (27) where the parameters119870 = 6 and 120593 = 15bSee [110]

or 1-Sp The best possible prediction method would yielda point with the coordinate (0 1) representing 100 truepositive rate (sensitivity Sn) and 0 false positive rate or100 specificity Therefore the (0 1) point is also calleda perfect classification A completely random guess wouldgive a point along a diagonal from the point (0 0) to (1 1)The area under the ROC curve also called Area Under theROC (AUROC) is often used to indicate the performancequality of a binary classifier the value 05 of AUROCis equivalent to random prediction while 1 of AUROCrepresents a perfect one As we can see from Figure 4the AUROC value obtained by iEzy-Drug is 09377

The reason why iEzy-Drug can remarkably improve theprediction quality is that it has introduced the 2D molecularfingerprints to represent drug samples see Online SupportingInformation S3 for the detailed fingerprint expressions for thedrugs listed in Online Supporting Information S1 and thatit has successfully used PseAAC to incorporate the featuresderived from the sequences of enzymes that are essentialfor identifying the interaction of enzymes with drugs in thecellular networking

To enhance the value of its practical applications theweb server for iEzy-Drug has been established that canbe freely accessible at httpwwwjci-bioinfocniEzy-DrugIt is anticipated that the web server will become a usefulhigh throughput tool for both basic research and drug

development in the relevant areas or at the very least playa complementary role to the existing method [39 110 118] forwhich so far noweb-serverwhatsoever has been provided yet

32 The Protocol or User Guide For the convenience of thevast majority of biologists and pharmaceutical scientists herelet us provide a step-by-step guide to show how the userscan easily get the desired result by means of the web serverwithout the need to follow the complicated mathematicalequations presented in this paper for the process of develop-ing the predictor and its integrityStep 1 Open the web server at the site httpwwwjci-bio-infocniEzy-Drug and you will see the top page of thepredictor on your computer screen as shown in Figure 5Click on the ReadMe button to see a brief introduction aboutiEzy-Drug predictor and the caveat when using itStep 2 Either type or copypaste the query pairs into theinput box at the center of Figure 5 Each query pair consistsof two parts one is for the protein sequence and the other forthe drug The enzyme sequence should be in FASTA formatwhile the drug in the KEGG code Examples for the querypairs input can be seen by clicking on the Example buttonright above the input boxStep 3 Click on the Submit button to see the predicted resultFor example if you use the four query pairs in the Example

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 7

Input a queryprotein sequence and

a drug code

Create Chous PseAAC of a protein

Create feature vector of a drug

Feature fusion

engineTraining dataset

Yes No

Noninteractive enzyme drug pair

Interactive enzymedrug pair

AACDipeptideSeq Evol

Grey model

Mol FptInfo entropy

CF

FKNN operation

Figure 2 A flowchart to show the operation process of the iEzy-Drug predictor See the text for further explanation

24 Criteria for Performance Evaluation In the literaturethe following equation set is often used for examining theperformance quality of a predictor

Sn = TPTP + FN

Sp = TNTN + FP

Acc = TP + TNTP + TN + FP + FN

MCC = (TP times TN) minus (FP times FN)radic(TP + FP) (TP + FN) (TN + FP) (TN + FN)

(30)

where TP represents the true positive TN the true negativeFP the false positive FN the false negative Sn the sensitiv-ity Sp the specificity Acc the accuracy MCC the Mathewrsquoscorrelation coefficient

To most biologists however the four metrics as formu-lated in (30) are not quite intuitive and easier-to-understandparticularly for the Mathewrsquos correlation coefficient Herelet us adopt the Choursquos symbols to formulate the previousfour metrics By means of Choursquos symbols [111 112] the rates

of correct predictions for the interactive enzyme-drug pairsin dataset S+ and the noninteractive enzyme-drug pairs indataset Sminus are respectively defined by (cf (1))

Λ

+

=

119873

+minus 119873

+

minus

119873

+ for the interactive enzyme-drug pairs

Λ

minus

=

119873

minusminus 119873

minus

+

119873

minus for the noninteractive enzyme-drug pairs

(31)

where 119873+ is the total number of the interactive enzyme-drug pairs investigated while 119873+

minusis the number of the

interactive enzyme-drug pairs incorrectly predicted as thenoninteractive enzyme-drug pairs 119873minus is the total numberof the noninteractive enzyme-drug pairs investigated while119873

minus

+is the number of the noninteractive enzyme-drug pairs

incorrectly predicted as the interactive enzyme-drug pairsThe overall success prediction rate is given by [113] as follows

Λ =

Λ

+119873

++ Λ

minus119873

minus

119873

++ 119873

minus= 1 minus

119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

(32)

It is obvious from (31)-(32) that if and only if none ofthe interactive enzyme-drug pairs and the noninteractiveenzyme-drug pairs are mispredicted that is 119873+

minus= 119873

minus

+= 0

and Λ+ = Λ

minus= 1 we have the overall success rate Λ = 1

Otherwise the overall success rate would be smaller than 1The relations between the symbols in (32) and those in

(30) are given by

TP = 119873+ minus 119873+minus

TN = 119873

minus

minus 119873

minus

+

FP = 119873minus+

FN = 119873

+

minus

(33)

Substituting (33) into (30) and also noting (31)-(32) we obtain

Sn = Λ+ = 1 minus119873

+

minus

119873

+

Sp = Λminus = 1 minus119873

minus

+

119873

minus

Acc = Λ = 1 minus119873

+

minus+ 119873

minus

+

119873

++ 119873

minus

MCC =1 minus ((119873

+

minus119873

+) + (119873

minus

+119873

minus))

radic(1 + (119873

minus

+minus 119873

+

minus) 119873

+) (1 + (119873

+

minusminus 119873

minus

+) 119873

minus)

(34)

Now it is obvious to see from (34) when 119873

+

minus= 0

meaning none of the interactive enzyme-drug pairs was

8 BioMed Research International

02

46

810

1

15

2089

0895

09

0905

091

0915

092

ACC

K120593

Figure 3 A 3D plot to show how the parameter in (27) wasoptimized for the iEzy-Drug predictor

mispredicted to be a noninteractive enzyme-drug pair wehave the sensitivity Sn = 1 while 119873+

minus= 119873

+ meaning thatall the interactive enzyme-drug pairs weremispredicted to bethe noninteractive enzyme-drug pairs we have the sensitivitySn = 0 Likewise when 119873minus

+= 0 meaning none of the

noninteractive enzyme-drug pairs wasmispredicted we havethe specificity Sp = 1 while 119873minus

+= 119873

minus meaning all thenoninteractive enzyme-drug pairs were incorrectly predictedas interactive enzyme-drug pairs we have the specificity Sp =0 When119873+

minus= 119873

minus

+= 0 meaning that none of the interactive

enzyme-drug pairs in the dataset S+ and none of the nonin-teractive enzyme-drug pairs in Sminus was incorrectly predictedwe have the overall accuracy Acc = Λ = 1 while 119873+

minus= 119873

+

and 119873minus+= 119873

minus meaning that all the interactive enzyme-drugpairs in the dataset S+ and all the noninteractive enzyme-drug pairs in Sminus were mispredicted we have the overallaccuracy Acc = Λ = 0 The MCC correlation coefficient isusually used for measuring the quality of binary (two-class)classifications When 119873+

minus= 119873

minus

+= 0 meaning that none

of the interactive enzyme-drug pairs in the dataset S+ andnone of the noninteractive enzyme-drug pairs in Sminus weremispredicted we have MCC = 1 when 119873+

minus= 119873

+2 and

119873

minus

+= 119873

minus2 we have MCC = 0 meaning no better than

random prediction when 119873+minus= 119873

+ and 119873minus+= 119873

minus we haveMCC = minus1 meaning total disagreement between predictionand observation As we can see from the previous discussionit is much more intuitive and easier-to-understand whenusing (34) to examine a predictor for its sensitivity specificityoverall accuracy and Mathewrsquos correlation coefficient It isinstructive to point out that themetrics as defined in (30) and(34) are valid for single label systems for multilabel systemsa set of more complicated metrics should be used as given in[114]

3 Results and Discussion

31 Cross-Validation How to properly examine the predic-tion quality is a key for developing a new predictor and

False positive rate

True

pos

itive

rate

1

01

02

03

04

05

06

07

08

09

0

101 02 03 04 05 06 07 08 090

BaselineiEzy-Drug AUC = 09377

Figure 4 A plot for the ROC curve to quantitatively show theperformance of the iEzy-Drug predictor

estimating its potential application value Generally speak-ing the following three cross-validation methods are oftenused to examine a predictor of its effectiveness in practicalapplication independent dataset test subsampling or 119870-fold(such as 5-fold 7-fold or 10-fold) test and jackknife test[108] However as elaborated by a penetrating analysis in[115] considerable arbitrariness exists in the independentdataset test Also as demonstrated by (27)ndash(29) in [33]the subsampling test (or 119870-fold cross-validation) cannotavoid arbitrariness either Only the jackknife test is the leastarbitrary that can always yield a unique result for a givenbenchmark dataset Therefore the jackknife test has beenwidely recognized and increasingly utilized by investigatorsto examine the quality of various predictors (see eg [66 7174 80]) Accordingly the success rate by the jackknife test wasalso used to optimize the two uncertain parameters 119870 and 120593in (27) The result thus obtained is shown in Figure 3 fromwhich we obtain when 119870 = 6 and 120593 = 15 the iEzy-Drugpredictor reaches its optimized status

The success rates thus obtained by the jackknife test inidentifying interactive Enzyme-drug pairs or noninteractiveenzyme-drug pairs on the benchmark dataset S (cf OnlineSupporting Information S1) are given in Table 1 where forfacilitating comparison the corresponding result by He et al[110] is also given As we can see from the table the overallaccuracy Acc achieved by iEzy-Drug was 9103 remarkablyhigher than 8548 the corresponding rate obtained byHe et al [110] on the same benchmark Furthermore listedin Table 1 are also the values obtained by iEzy-Drug for theother three metrics that is Sn = 9081 Sp = 9114 andMCC = 8039 indicating that the accuracy of iEzy-Drug isnot only very high but also quite stale

To provide a graphical illustration to show the per-formance of the current binary classifier iEzy-Drug asits discrimination threshold is varied a 2D plot calledReceiver Operating Characteristic (ROC) curve [116 117]was also given (Figure 4) In the ROC curve the verticalcoordinate 119884 is for the true positive rate or Sn (cf (34))while horizontal coordinate 119883 for the false positive rate

BioMed Research International 9

Read Me Data Citation

Enter both the protein sequence and the drug code (example) the number of query pairs is limited at 10 or less for each submission

Submit Clear

You can download the program and run it on your own local computer

iEzy-Drug identifying the interaction between enzymes anddrugs in cellular networking

Figure 5 A semiscreenshot to show the top page of the iEzy-Drugweb-server Its web-site address is at httpwwwjci-bioinfocniEzy-Drug

Table 1 The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drugpairs for the benchmark dataset S (cf Online Supporting Information S1)

Method Acc Sn Sp MCC

iEzy-Druga 74258157 = 9103 24692719 = 9081 49565438 = 9114 8039NN predictorb 8548 NA NA NAaSee (27) where the parameters119870 = 6 and 120593 = 15bSee [110]

or 1-Sp The best possible prediction method would yielda point with the coordinate (0 1) representing 100 truepositive rate (sensitivity Sn) and 0 false positive rate or100 specificity Therefore the (0 1) point is also calleda perfect classification A completely random guess wouldgive a point along a diagonal from the point (0 0) to (1 1)The area under the ROC curve also called Area Under theROC (AUROC) is often used to indicate the performancequality of a binary classifier the value 05 of AUROCis equivalent to random prediction while 1 of AUROCrepresents a perfect one As we can see from Figure 4the AUROC value obtained by iEzy-Drug is 09377

The reason why iEzy-Drug can remarkably improve theprediction quality is that it has introduced the 2D molecularfingerprints to represent drug samples see Online SupportingInformation S3 for the detailed fingerprint expressions for thedrugs listed in Online Supporting Information S1 and thatit has successfully used PseAAC to incorporate the featuresderived from the sequences of enzymes that are essentialfor identifying the interaction of enzymes with drugs in thecellular networking

To enhance the value of its practical applications theweb server for iEzy-Drug has been established that canbe freely accessible at httpwwwjci-bioinfocniEzy-DrugIt is anticipated that the web server will become a usefulhigh throughput tool for both basic research and drug

development in the relevant areas or at the very least playa complementary role to the existing method [39 110 118] forwhich so far noweb-serverwhatsoever has been provided yet

32 The Protocol or User Guide For the convenience of thevast majority of biologists and pharmaceutical scientists herelet us provide a step-by-step guide to show how the userscan easily get the desired result by means of the web serverwithout the need to follow the complicated mathematicalequations presented in this paper for the process of develop-ing the predictor and its integrityStep 1 Open the web server at the site httpwwwjci-bio-infocniEzy-Drug and you will see the top page of thepredictor on your computer screen as shown in Figure 5Click on the ReadMe button to see a brief introduction aboutiEzy-Drug predictor and the caveat when using itStep 2 Either type or copypaste the query pairs into theinput box at the center of Figure 5 Each query pair consistsof two parts one is for the protein sequence and the other forthe drug The enzyme sequence should be in FASTA formatwhile the drug in the KEGG code Examples for the querypairs input can be seen by clicking on the Example buttonright above the input boxStep 3 Click on the Submit button to see the predicted resultFor example if you use the four query pairs in the Example

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

8 BioMed Research International

02

46

810

1

15

2089

0895

09

0905

091

0915

092

ACC

K120593

Figure 3 A 3D plot to show how the parameter in (27) wasoptimized for the iEzy-Drug predictor

mispredicted to be a noninteractive enzyme-drug pair wehave the sensitivity Sn = 1 while 119873+

minus= 119873

+ meaning thatall the interactive enzyme-drug pairs weremispredicted to bethe noninteractive enzyme-drug pairs we have the sensitivitySn = 0 Likewise when 119873minus

+= 0 meaning none of the

noninteractive enzyme-drug pairs wasmispredicted we havethe specificity Sp = 1 while 119873minus

+= 119873

minus meaning all thenoninteractive enzyme-drug pairs were incorrectly predictedas interactive enzyme-drug pairs we have the specificity Sp =0 When119873+

minus= 119873

minus

+= 0 meaning that none of the interactive

enzyme-drug pairs in the dataset S+ and none of the nonin-teractive enzyme-drug pairs in Sminus was incorrectly predictedwe have the overall accuracy Acc = Λ = 1 while 119873+

minus= 119873

+

and 119873minus+= 119873

minus meaning that all the interactive enzyme-drugpairs in the dataset S+ and all the noninteractive enzyme-drug pairs in Sminus were mispredicted we have the overallaccuracy Acc = Λ = 0 The MCC correlation coefficient isusually used for measuring the quality of binary (two-class)classifications When 119873+

minus= 119873

minus

+= 0 meaning that none

of the interactive enzyme-drug pairs in the dataset S+ andnone of the noninteractive enzyme-drug pairs in Sminus weremispredicted we have MCC = 1 when 119873+

minus= 119873

+2 and

119873

minus

+= 119873

minus2 we have MCC = 0 meaning no better than

random prediction when 119873+minus= 119873

+ and 119873minus+= 119873

minus we haveMCC = minus1 meaning total disagreement between predictionand observation As we can see from the previous discussionit is much more intuitive and easier-to-understand whenusing (34) to examine a predictor for its sensitivity specificityoverall accuracy and Mathewrsquos correlation coefficient It isinstructive to point out that themetrics as defined in (30) and(34) are valid for single label systems for multilabel systemsa set of more complicated metrics should be used as given in[114]

3 Results and Discussion

31 Cross-Validation How to properly examine the predic-tion quality is a key for developing a new predictor and

False positive rate

True

pos

itive

rate

1

01

02

03

04

05

06

07

08

09

0

101 02 03 04 05 06 07 08 090

BaselineiEzy-Drug AUC = 09377

Figure 4 A plot for the ROC curve to quantitatively show theperformance of the iEzy-Drug predictor

estimating its potential application value Generally speak-ing the following three cross-validation methods are oftenused to examine a predictor of its effectiveness in practicalapplication independent dataset test subsampling or 119870-fold(such as 5-fold 7-fold or 10-fold) test and jackknife test[108] However as elaborated by a penetrating analysis in[115] considerable arbitrariness exists in the independentdataset test Also as demonstrated by (27)ndash(29) in [33]the subsampling test (or 119870-fold cross-validation) cannotavoid arbitrariness either Only the jackknife test is the leastarbitrary that can always yield a unique result for a givenbenchmark dataset Therefore the jackknife test has beenwidely recognized and increasingly utilized by investigatorsto examine the quality of various predictors (see eg [66 7174 80]) Accordingly the success rate by the jackknife test wasalso used to optimize the two uncertain parameters 119870 and 120593in (27) The result thus obtained is shown in Figure 3 fromwhich we obtain when 119870 = 6 and 120593 = 15 the iEzy-Drugpredictor reaches its optimized status

The success rates thus obtained by the jackknife test inidentifying interactive Enzyme-drug pairs or noninteractiveenzyme-drug pairs on the benchmark dataset S (cf OnlineSupporting Information S1) are given in Table 1 where forfacilitating comparison the corresponding result by He et al[110] is also given As we can see from the table the overallaccuracy Acc achieved by iEzy-Drug was 9103 remarkablyhigher than 8548 the corresponding rate obtained byHe et al [110] on the same benchmark Furthermore listedin Table 1 are also the values obtained by iEzy-Drug for theother three metrics that is Sn = 9081 Sp = 9114 andMCC = 8039 indicating that the accuracy of iEzy-Drug isnot only very high but also quite stale

To provide a graphical illustration to show the per-formance of the current binary classifier iEzy-Drug asits discrimination threshold is varied a 2D plot calledReceiver Operating Characteristic (ROC) curve [116 117]was also given (Figure 4) In the ROC curve the verticalcoordinate 119884 is for the true positive rate or Sn (cf (34))while horizontal coordinate 119883 for the false positive rate

BioMed Research International 9

Read Me Data Citation

Enter both the protein sequence and the drug code (example) the number of query pairs is limited at 10 or less for each submission

Submit Clear

You can download the program and run it on your own local computer

iEzy-Drug identifying the interaction between enzymes anddrugs in cellular networking

Figure 5 A semiscreenshot to show the top page of the iEzy-Drugweb-server Its web-site address is at httpwwwjci-bioinfocniEzy-Drug

Table 1 The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drugpairs for the benchmark dataset S (cf Online Supporting Information S1)

Method Acc Sn Sp MCC

iEzy-Druga 74258157 = 9103 24692719 = 9081 49565438 = 9114 8039NN predictorb 8548 NA NA NAaSee (27) where the parameters119870 = 6 and 120593 = 15bSee [110]

or 1-Sp The best possible prediction method would yielda point with the coordinate (0 1) representing 100 truepositive rate (sensitivity Sn) and 0 false positive rate or100 specificity Therefore the (0 1) point is also calleda perfect classification A completely random guess wouldgive a point along a diagonal from the point (0 0) to (1 1)The area under the ROC curve also called Area Under theROC (AUROC) is often used to indicate the performancequality of a binary classifier the value 05 of AUROCis equivalent to random prediction while 1 of AUROCrepresents a perfect one As we can see from Figure 4the AUROC value obtained by iEzy-Drug is 09377

The reason why iEzy-Drug can remarkably improve theprediction quality is that it has introduced the 2D molecularfingerprints to represent drug samples see Online SupportingInformation S3 for the detailed fingerprint expressions for thedrugs listed in Online Supporting Information S1 and thatit has successfully used PseAAC to incorporate the featuresderived from the sequences of enzymes that are essentialfor identifying the interaction of enzymes with drugs in thecellular networking

To enhance the value of its practical applications theweb server for iEzy-Drug has been established that canbe freely accessible at httpwwwjci-bioinfocniEzy-DrugIt is anticipated that the web server will become a usefulhigh throughput tool for both basic research and drug

development in the relevant areas or at the very least playa complementary role to the existing method [39 110 118] forwhich so far noweb-serverwhatsoever has been provided yet

32 The Protocol or User Guide For the convenience of thevast majority of biologists and pharmaceutical scientists herelet us provide a step-by-step guide to show how the userscan easily get the desired result by means of the web serverwithout the need to follow the complicated mathematicalequations presented in this paper for the process of develop-ing the predictor and its integrityStep 1 Open the web server at the site httpwwwjci-bio-infocniEzy-Drug and you will see the top page of thepredictor on your computer screen as shown in Figure 5Click on the ReadMe button to see a brief introduction aboutiEzy-Drug predictor and the caveat when using itStep 2 Either type or copypaste the query pairs into theinput box at the center of Figure 5 Each query pair consistsof two parts one is for the protein sequence and the other forthe drug The enzyme sequence should be in FASTA formatwhile the drug in the KEGG code Examples for the querypairs input can be seen by clicking on the Example buttonright above the input boxStep 3 Click on the Submit button to see the predicted resultFor example if you use the four query pairs in the Example

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 9

Read Me Data Citation

Enter both the protein sequence and the drug code (example) the number of query pairs is limited at 10 or less for each submission

Submit Clear

You can download the program and run it on your own local computer

iEzy-Drug identifying the interaction between enzymes anddrugs in cellular networking

Figure 5 A semiscreenshot to show the top page of the iEzy-Drugweb-server Its web-site address is at httpwwwjci-bioinfocniEzy-Drug

Table 1 The jackknife success rates obtained with iEzy-Drug in identifying interactive enzyme-drug pairs and noninteractive enzyme-drugpairs for the benchmark dataset S (cf Online Supporting Information S1)

Method Acc Sn Sp MCC

iEzy-Druga 74258157 = 9103 24692719 = 9081 49565438 = 9114 8039NN predictorb 8548 NA NA NAaSee (27) where the parameters119870 = 6 and 120593 = 15bSee [110]

or 1-Sp The best possible prediction method would yielda point with the coordinate (0 1) representing 100 truepositive rate (sensitivity Sn) and 0 false positive rate or100 specificity Therefore the (0 1) point is also calleda perfect classification A completely random guess wouldgive a point along a diagonal from the point (0 0) to (1 1)The area under the ROC curve also called Area Under theROC (AUROC) is often used to indicate the performancequality of a binary classifier the value 05 of AUROCis equivalent to random prediction while 1 of AUROCrepresents a perfect one As we can see from Figure 4the AUROC value obtained by iEzy-Drug is 09377

The reason why iEzy-Drug can remarkably improve theprediction quality is that it has introduced the 2D molecularfingerprints to represent drug samples see Online SupportingInformation S3 for the detailed fingerprint expressions for thedrugs listed in Online Supporting Information S1 and thatit has successfully used PseAAC to incorporate the featuresderived from the sequences of enzymes that are essentialfor identifying the interaction of enzymes with drugs in thecellular networking

To enhance the value of its practical applications theweb server for iEzy-Drug has been established that canbe freely accessible at httpwwwjci-bioinfocniEzy-DrugIt is anticipated that the web server will become a usefulhigh throughput tool for both basic research and drug

development in the relevant areas or at the very least playa complementary role to the existing method [39 110 118] forwhich so far noweb-serverwhatsoever has been provided yet

32 The Protocol or User Guide For the convenience of thevast majority of biologists and pharmaceutical scientists herelet us provide a step-by-step guide to show how the userscan easily get the desired result by means of the web serverwithout the need to follow the complicated mathematicalequations presented in this paper for the process of develop-ing the predictor and its integrityStep 1 Open the web server at the site httpwwwjci-bio-infocniEzy-Drug and you will see the top page of thepredictor on your computer screen as shown in Figure 5Click on the ReadMe button to see a brief introduction aboutiEzy-Drug predictor and the caveat when using itStep 2 Either type or copypaste the query pairs into theinput box at the center of Figure 5 Each query pair consistsof two parts one is for the protein sequence and the other forthe drug The enzyme sequence should be in FASTA formatwhile the drug in the KEGG code Examples for the querypairs input can be seen by clicking on the Example buttonright above the input boxStep 3 Click on the Submit button to see the predicted resultFor example if you use the four query pairs in the Example

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

10 BioMed Research International

window as the input after clicking the Submit button youwill see on your screen that the ldquohsa 10056rdquo enzyme andthe ldquoD0021rdquo drug are an interactive pair and that the ldquohsa100rdquo enzyme and the ldquoD0037rdquo drug are also an interactivepair but that the ldquohsa 3295rdquo enzyme and the ldquoD00889rdquo drugare not an interactive pair and that the ldquohsa 7366rdquo enzymeand the ldquoD03601rdquo drug are not an interactive pair eitherAll these results are fully consistent with the experimentalobservations It takes about 3 minutes before the results areshown on the screenStep 4 Click on the Citation button to find the relevant paperthat documents the detailed development and algorithm ofiEzy-DurgStep 5 Click on the Data button to download the benchmarkdataset used to train and test the iEzy-Durg predictorStep 6 The program code is also available by clicking thebutton download on the lower panel of Figure 5

Acknowledgments

The authors would like to thank the three anonymousreviewers whose constructive comments are very helpful forstrengthening the presentation of the paper This work wassupported by the Grants from the National Natural ScienceFoundation of China (no 31260273) the Jiangxi ProvincialForeign Scientific and Technological Cooperation Project(no 20120BDH80023) Natural Science Foundation of JiangxiProvince China (no 2010GZS0122 20122BAB201020) theDepartment of Education of Jiangxi Province (GJJ12490)the LuoDi plan of the Department of Education of JiangxiProvince (KJLD12083) and the Jiangxi Provincial Founda-tion for Leaders of Disciplines in Science (20113BCB22008)The funders had no role in study design data collection andanalysis decision to publish or preparation of the paper

References

[1] A Bairoch ldquoThe ENZYME database in 2000rdquo Nucleic AcidsResearch vol 28 no 1 pp 304ndash305 2000

[2] S H Koenig and R D Brown ldquoH2CO3as substrate for carbonic

anhydrase in the dehydration of H2CO3(minus)rdquo Proceedings of the

National Academy of Sciences of the United States of Americavol 69 no 9 pp 2422ndash2425 1972

[3] S X Lin and J Lapointe ldquoTheoretical and experimental biologyin onerdquo Journal of Biomedical Science and Engineering vol 6 pp435ndash442 2013

[4] K C Chou and S P Jiang ldquoStudies on the rate of diffusion-controlled reactions of enzymes Spatial factor and force fieldfactorrdquo Scientia Sinica vol 17 no 5 pp 664ndash680 1974

[5] K C Chou ldquoThe kinetics of the combination reaction betweenenzyme and substraterdquo Scientia Sinica vol 19 no 4 pp 505ndash528 1976

[6] K C Chou and G P Zhou ldquoRole of the protein outside activesite on the diffusion-controlled reaction of enzymerdquo Journal ofthe American Chemical Society vol 104 no 5 pp 1409ndash14131982

[7] R A Poorman A G Tomasselli R L Heinrikson and FJ Kezdy ldquoA cumulative specificity model for proteases from

human immunodeficiency virus types 1 and 2 inferred fromstatistical analysis of an extended substrate data baserdquo TheJournal of Biological Chemistry vol 266 no 22 pp 14554ndash145611991

[8] K-C Chou ldquoA vectorized sequence-coupling model for pre-dicting HIV protease cleavage sites in proteinsrdquo The Journal ofBiological Chemistry vol 268 no 23 pp 16938ndash16948 1993

[9] G Z Liang and S Z Li ldquoA new sequence representation asapplied in better specificity elucidation for human immunod-eficiency virus type 1 proteaserdquo Biopolymers vol 88 no 3 pp401ndash412 2007

[10] J J Chou ldquoPredicting cleavability of peptide sequences byHIVprotease via correlation-angle approachrdquo Journal of ProteinChemistry vol 12 no 3 pp 291ndash302 1993

[11] Q S Du S Wang D Q Wei S Sirois and K C ChouldquoMolecular modeling and chemical modification for findingpeptide inhibitor against severe acute respiratory syndromecoronavirus main proteinaserdquo Analytical Biochemistry vol 337no 2 pp 262ndash270 2005

[12] Y Gan H Huang Y Huang et al ldquoSynthesis and activity ofan octapeptide inhibitor designed for SARS coronavirus mainproteinaserdquo Peptides vol 27 no 4 pp 622ndash625 2006

[13] Q S Du H Sun and K C Chou ldquoInhibitor design for SARScoronavirus main protease based on lsquodistorted key theoryrsquordquoMedicinal Chemistry vol 3 no 1 pp 1ndash6 2007

[14] K C Chou ldquoPrediction of human immunodeficiency virusprotease cleavage sites in proteinsrdquoAnalytical Biochemistry vol233 no 1 pp 1ndash14 1996

[15] J Knowles and G Gromo ldquoTarget selection in drug discoveryrdquoNature Reviews Drug Discovery vol 2 no 1 pp 63ndash69 2003

[16] A C Cheng R G Coleman K T Smyth et al ldquoStructure-basedmaximal affinity model predicts small-molecule druggabilityrdquoNature Biotechnology vol 25 no 1 pp 71ndash75 2007

[17] M Rarey B Kramer T Lengauer and G Klebe ldquoA fast flexibledocking method using an incremental construction algorithmrdquoJournal of Molecular Biology vol 261 no 3 pp 470ndash489 1996

[18] K C Chou ldquoReview structural bioinformatics and its impactto biomedical sciencerdquo Current Medicinal Chemistry vol 11 no16 pp 2105ndash2134 2004

[19] K C Chou D Q Wei and W Z Zhong ldquoBinding mechanismof coronavirus main proteinase with ligands and its implicationto drug design against SARSrdquo Biochemical and BiophysicalResearch Communications vol 308 pp 148ndash151 2003 (Erratumin Biochemical and Biophysical Research Communications vol310 p 675 2003)

[20] K C Chou D Q Wei Q S Du S Sirois and W Z ZhongldquoProgress in computational approach to drug developmentagaint SARSrdquo Current Medicinal Chemistry vol 13 no 27 pp3263ndash3270 2006

[21] G P Zhou and F A Troy ldquoNMR studies on how the bindingcomplex of polyisoprenol recognition sequence peptides andpolyisoprenols can modulate membrane structurerdquo CurrentProtein and Peptide Science vol 6 no 5 pp 399ndash411 2005

[22] R B Huang Q S Du C H Wang and K C Chou ldquoAn in-depth analysis of the biological functional studies based on theNMR M2 channel structure of influenza A virusrdquo Biochemicaland Biophysical Research Communications vol 377 no 4 pp1243ndash1247 2008

[23] Q S Du R B Huang C H Wang X M Li and K C ChouldquoEnergetic analysis of the two controversial drug binding sitesof the M2 proton channel in influenza A virusrdquo Journal ofTheoretical Biology vol 259 no 1 pp 159ndash164 2009

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 11

[24] M J Berardi W M Shih S C Harrison and J J Chou ldquoMito-chondrial uncoupling protein 2 structure determined by NMRmolecular fragment searchingrdquo Nature vol 476 no 7358 pp109ndash114 2011

[25] J R Schnell and J J Chou ldquoStructure andmechanism of theM2proton channel of influenza A virusrdquo Nature vol 451 no 7178pp 591ndash595 2008

[26] B OuYang S Xie M J Berardi et al ldquoUnusual architecture ofthe p7 channel from hepatitis C virusrdquoNature vol 498 pp 521ndash525 2013

[27] K Oxenoid and J J Chou ldquoThe structure of phospholambanpentamer reveals a channel-like architecture in membranesrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 102 no 31 pp 10870ndash10875 2005

[28] M E Call KWWucherpfennig and J J Chou ldquoThe structuralbasis for intramembrane assembly of an activating immunore-ceptor complexrdquo Nature Immunology vol 11 no 11 pp 1023ndash1029 2010

[29] R M Pielak J R Schnell and J J Chou ldquoMechanism of druginhibition and drug resistance of influenza A M2 channelrdquoProceedings of the National Academy of Sciences of the UnitedStates of America vol 106 no 18 pp 7379ndash7384 2009

[30] J Wang R M Pielak M A McClintock and J J ChouldquoSolution structure and functional analysis of the influenza Bproton channelrdquo Nature Structural and Molecular Biology vol16 no 12 pp 1267ndash1271 2009

[31] K C Chou ldquoCoupling interaction between thromboxane A2receptor and alpha-13 subunit of guanine nucleotide-bindingproteinrdquo Journal of Proteome Research vol 4 no 5 pp 1681ndash1686 2005

[32] K C Chou ldquoInsights from modeling three-dimensional struc-tures of the human potassium and sodium channelsrdquo Journal ofProteome Research vol 3 no 4 pp 856ndash861 2004

[33] K C Chou ldquoSome remarks on protein attribute prediction andpseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 273 no 1 pp 236ndash247 2011

[34] W Z Lin J A Fang X Xiao and K C Chou ldquoiLoc-animal amulti-label learning classifier for predicting subcellular local-ization of animal proteinsrdquo Molecular BioSystems vol 9 pp634ndash644 2013

[35] X Xiao PWangW Z Lin J H Jia and K C Chou ldquoiAMP-2La two-level multi-label classifier for identifying antimicrobialpeptides and their functional typesrdquo Analytical Biochemistryvol 436 pp 168ndash177 2013

[36] W Chen P M Feng H Lin and K C Chou ldquoiRSpot-PseDNCidentify recombination spots with pseudo dinucleotide compo-sitionrdquo Nucleic Acids Research vol 41 article e69 2013

[37] K C Chou Z C Wu and X Xiao ldquoILoc-Hum using theaccumulation-label scale to predict subcellular locations ofhuman proteins with both single and multiple sitesrdquoMolecularBioSystems vol 8 no 2 pp 629ndash641 2012

[38] M Kotera M Hirakawa T Tokimatsu S Goto and MKanehisa ldquoThe KEGG databases and tools facilitating omicsanalysis latest developments involving human diseases andpharmaceuticalsrdquo Methods in Molecular Biology vol 802 pp19ndash39 2012

[39] Y YamanishiM Araki A GutteridgeWHonda andM Kane-hisa ldquoPrediction of drug-target interaction networks from theintegration of chemical and genomic spacesrdquo Bioinformaticsvol 24 no 13 pp i232ndashi240 2008

[40] P Finn S Muggleton D Page and A Srinivasan ldquoPharma-cophore discovery using the Inductive Logic Programmingsystem PROGOLrdquo Machine Learning vol 30 no 2-3 pp 241ndash270 1998

[41] I Vogt D Stumpfe H E Ahmed and J Bajorath ldquoMethodsfor computer-aided chemical biology Part 2 evaluation of com-pound selectivity using 2D molecular fingerprintsrdquo ChemicalBiology and Drug Design vol 70 pp 195ndash205 2007

[42] H Eckert and J Bajorath ldquoMolecular similarity analysis in vir-tual screening foundations limitations and novel approachesrdquoDrug Discovery Today vol 12 no 5-6 pp 225ndash233 2007

[43] S Laurent L V Elst and R N Muller ldquoComparative studyof the physicochemical properties of six clinical low molecularweight gadolinium contrast agentsrdquo Contrast Media amp Molecu-lar Imaging vol 1 no 3 pp 128ndash137 2006

[44] E Gregori-Puigjane R Garriga-Sust and J Mestres ldquoIndexingmolecules with chemical graph identifiersrdquo Journal of Compu-tational Chemistry vol 32 no 12 pp 2638ndash2646 2011

[45] B Ren ldquoApplication of novel atom-type AI topological indicesto QSPR studies of alkanesrdquo Computers and Chemistry vol 26no 4 pp 357ndash369 2002

[46] N M OrsquoBoyle M Banck C A James C Morley T Vander-meersch and G R Hutchison ldquoOpen Babel an open chemicaltoolboxrdquo Journal of Cheminformatics vol 3 p 33 2011

[47] V J Gillet P Willett and J Bradshaw ldquoSimilarity searchingusing reduced graphsrdquo Journal of Chemical Information andComputer Sciences vol 43 no 2 pp 338ndash345 2003

[48] D Butina ldquoUnsupervised data base clustering based on day-lightrsquos fingerprint and Tanimoto similarity a fast and automatedway to cluster small and large data setsrdquo Journal of ChemicalInformation and Computer Sciences vol 39 no 4 pp 747ndash7501999

[49] C E Shannon ldquoA mathematical theory of communicationrdquoACM SIGMOBILE Mobile Computing and CommunicationsReview vol 5 pp 3ndash55 2001

[50] V D Gusev L A Nemytikova and N A Chuzhanova ldquoOn thecomplexity measures of genetic sequencesrdquo Bioinformatics vol15 no 12 pp 994ndash999 1999

[51] K C Chou and H B Shen ldquoReview recent progress in proteinsubcellular location predictionrdquo Analytical Biochemistry vol370 no 1 pp 1ndash16 2007

[52] S F Altschul ldquoEvaluating the statistical significance of multipledistinct local alignmentsrdquo in Theoretical and ComputationalMethods in Genome Research S Suhai Ed pp 1ndash14 PlenumNew York NY USA 1997

[53] J C Wootton and S Federhen ldquoStatistics of local complexity inamino acid sequences and sequence databasesrdquo Computers andChemistry vol 17 no 2 pp 149ndash163 1993

[54] H Nakashima K Nishikawa and T Ooi ldquoThe folding type ofa protein is relevant to the amino acid compositionrdquo Journal ofBiochemistry vol 99 no 1 pp 153ndash162 1986

[55] K C Chou and C T Zhang ldquoPredicting protein folding typesby distance functions that make allowances for amino acidinteractionsrdquo The Journal of Biological Chemistry vol 269 no35 pp 22014ndash22020 1994

[56] K-C Chou ldquoA novel approach to predicting protein structuralclasses in a (20-1)-D amino acid composition spacerdquo Proteinsvol 21 no 4 pp 319ndash344 1995

[57] H Nakashima and K Nishikawa ldquoDiscrimination of intracel-lular and extracellular proteins using amino acid compositionand residue-pair frequenciesrdquo Journal of Molecular Biology vol238 no 1 pp 54ndash61 1994

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

12 BioMed Research International

[58] G P Zhou ldquoAn intriguing controversy over protein structuralclass predictionrdquo Journal of Protein Chemistry vol 17 no 8 pp729ndash738 1998

[59] I Bahar A R Atilgan R L Jernigan and B Erman ldquoUnder-standing the recognition of protein structural classes by aminoacid compositionrdquo Proteins vol 29 pp 172ndash185 1997

[60] J Cedano P Aloy J A Perez-Pons and E Querol ldquoRelationbetween amino acid composition and cellular location ofproteinsrdquo Journal of Molecular Biology vol 266 no 3 pp 594ndash600 1997

[61] G P Zhou and K Doctor ldquoSubcellular location prediction ofapoptosis proteinsrdquo Proteins vol 50 no 1 pp 44ndash48 2003

[62] KChou ldquoPrediction of protein cellular attributes using pseudo-amino acid compositionrdquo Proteins vol 43 pp 246ndash255 2001(Erratum in Proteins vol 44 p 60 2001)

[63] K C Chou ldquoUsing amphiphilic pseudo amino acid composi-tion to predict enzyme subfamily classesrdquo Bioinformatics vol21 no 1 pp 10ndash19 2005

[64] D Zou Z He J He and Y Xia ldquoSupersecondary structureprediction using Choursquos pseudo amino acid compositionrdquoJournal of Computational Chemistry vol 32 no 2 pp 271ndash2782011

[65] M M Beigi M Behjati and H Mohabatkar ldquoPrediction ofmetalloproteinase family based on the concept ofChoursquos pseudoamino acid composition using a machine learning approachrdquoJournal of Structural and Functional Genomics vol 12 no 4 pp191ndash197 2011

[66] Y K Chen and K B Li ldquoPredicting membrane protein typesby incorporating protein topology domains signal peptidesand physicochemical properties into the general form of Choursquospseudo amino acid compositionrdquo Journal ofTheoretical Biologyvol 318 pp 1ndash12 2013

[67] C Huang and J Q Yuan ldquoA multilabel model based on Choursquospseudo-amino acid composition for identifying membraneproteins with both single and multiple functional typesrdquo TheJournal of Membrane Biology vol 246 pp 327ndash334 2013

[68] S S Sahu and G Panda ldquoA novel feature representationmethod based on Choursquos pseudo amino acid composition forprotein structural class predictionrdquo Computational Biology andChemistry vol 34 no 5-6 pp 320ndash327 2010

[69] M Hayat and A Khan ldquoDiscriminating outer membraneproteins with fuzzy K-nearest neighbor algorithms based on thegeneral form of Choursquos PseAACrdquo Protein and Peptide Lettersvol 19 no 4 pp 411ndash421 2012

[70] MKhosravian F K FaramarziMM BeigiM Behbahani andH Mohabatkar ldquoPredicting antibacterial peptides by the con-cept of Choursquos pseudo-amino acid composition and machinelearning methodsrdquo Protein amp Peptide Letters vol 20 pp 180ndash186 2013

[71] HMohabatkarMM Beigi K Abdolahi and SMohsenzadehldquoPrediction of allergenic proteins by means of the concept ofChoursquos pseudo amino acid composition and amachine learningapproachrdquoMedicinal Chemistry vol 9 pp 133ndash137 2013

[72] L Nanni A Lumini D Gupta and A Garg ldquoIdentifyingbacterial virulent proteins by fusing a set of classifiers basedon variants of Choursquos pseudo amino acid composition and onevolutionary informationrdquo IEEEACM Transactions on Compu-tational Biology and Bioinformatics vol 9 no 2 pp 467ndash4752012

[73] T H Chang L C Wu T Y Lee S P Chen H D Huang andJ T Horng ldquoEuLoc a web-server for accurately predict protein

subcellular localization in eukaryotes by incorporating variousfeatures of sequence segments into the general form of ChoursquosPseAACrdquo Journal of Computer-Aided Molecular Design vol 27pp 91ndash103 2013

[74] S Zhang Y Zhang H Yang C Zhao and Q Pan ldquoUsing theconcept of Choursquos pseudo amino acid composition to predictprotein subcellular localization an approach by incorporatingevolutionary information and von Neumann entropiesrdquo AminoAcids vol 34 no 4 pp 565ndash572 2008

[75] R Zia Ur and A Khan ldquoIdentifying GPCRs and their typeswith Choursquos pseudo amino acid composition an approach frommulti-scale energy representation and position specific scoringmatrixrdquo Protein amp Peptide Letters vol 19 pp 890ndash903 2012

[76] X Y Sun S P Shi J D Qiu S B Suo S Y Huang and R PLiang ldquoIdentifying protein quaternary structural attributes byincorporating physicochemical properties into the general formof Choursquos PseAAC via discrete wavelet transformrdquo MolecularBioSystems vol 8 pp 3178ndash3184 2012

[77] L Nanni and A Lumini ldquoGenetic programming for creatingChoursquos pseudo amino acid based features for submitochondrialocalizationrdquo Amino Acids vol 34 no 4 pp 653ndash660 2008

[78] M Esmaeili H Mohabatkar and S Mohsenzadeh ldquoUsing theconcept of Choursquos pseudo amino acid composition for risk typeprediction of human papillomavirusesrdquo Journal of TheoreticalBiology vol 263 no 2 pp 203ndash209 2010

[79] H Mohabatkar ldquoPrediction of cyclin proteins using Choursquospseudo amino acid compositionrdquo Protein and Peptide Lettersvol 17 no 10 pp 1207ndash1214 2010

[80] H Mohabatkar M M Beigi and A Esmaeili ldquoPredictionof GABA(A) receptor proteins using the concept of Choursquospseudo-amino acid composition and support vector machinerdquoJournal of Theoretical Biology vol 281 no 1 pp 18ndash23 2011

[81] Y Xu J Ding L Y Wu and K C Chou ldquoiSNO-PseAAC pre-dict cysteine S-nitrosylation sites in proteins by incorporatingposition specific amino acid propensity into pseudo amino acidcompositionrdquo PLoS ONE vol 8 Article ID e55844 2013

[82] W Chen H Lin PM Feng C Ding Y C Zuo andK C ChouldquoiNuc-PhysChem a sequence-based predictor for identifyingnucleosomes via physicochemical propertiesrdquo PLoS ONE vol7 Article ID e47843 2012

[83] B Li T Huang L Liu Y Cai and K C Chou ldquoIdentificationof colorectal cancer related genes with mrmr and shortest pathin protein-protein interaction networkrdquo PLoS ONE vol 7 no 4Article ID e33393 2012

[84] Y Jiang T Huang C Lei Y F Gao Y D Cai and K C ChouldquoSignal propagation in protein interaction network duringcolorectal cancer progressionrdquo BioMed Research Internationalvol 2013 Article ID 287019 9 pages 2013

[85] P Du X Wang C Xu and Y Gao ldquoPseAAC-Builder across-platform stand-alone program for generating variousspecial Choursquos pseudo-amino acid compositionsrdquo AnalyticalBiochemistry vol 425 no 2 pp 117ndash119 2012

[86] D S Cao Q S Xu and Y Z Liang ldquoPropy a tool to generatevarious modes of Choursquos PseAACrdquo Bioinformatics vol 29 pp960ndash962 2013

[87] H Shen and K C Chou ldquoPseAAC a flexible web serverfor generating various kinds of protein pseudo amino acidcompositionrdquo Analytical Biochemistry vol 373 no 2 pp 386ndash388 2008

[88] K C Chou ldquoUsing pair-coupled amino acid composition topredict protein secondary structure contentrdquo Journal of ProteinChemistry vol 18 no 4 pp 473ndash480 1999

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

BioMed Research International 13

[89] W Liu and K C Chou ldquoPrediction of protein secondarystructure contentrdquo Protein Engineering vol 12 no 12 pp 1041ndash1050 1999

[90] M Bhasin and G P S Raghava ldquoClassification of nuclearreceptors based on amino acid composition and dipeptidecompositionrdquo The Journal of Biological Chemistry vol 279 no22 pp 23262ndash23266 2004

[91] H Lin andH Ding ldquoPredicting ion channels and their types bythe dipeptidemode of pseudo amino acid compositionrdquo Journalof Theoretical Biology vol 269 no 1 pp 64ndash69 2011

[92] H Lin and Q Li ldquoUsing pseudo amino acid composition topredict protein structural class approached by incorporating400 dipeptide componentsrdquo Journal of Computational Chem-istry vol 28 no 9 pp 1463ndash1466 2007

[93] L Nanni and A Lumini ldquoCombing ontologies and dipeptidecomposition for predicting DNA-binding proteinsrdquo AminoAcids vol 34 no 4 pp 635ndash641 2008

[94] K-C Chou ldquoThe convergence-divergence duality in lectindomains of selectin family and its implicationsrdquo FEBS Lettersvol 363 no 1-2 pp 123ndash126 1995

[95] A A Schaffer L Aravind T L Madden et al ldquoImprovingthe accuracy of PSI-BLAST protein database searches withcomposition-based statistics and other refinementsrdquo NucleicAcids Research vol 29 no 14 pp 2994ndash3005 2001

[96] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST a tool for discovery in protein databasesrdquo Trends inBiochemical Sciences vol 23 no 11 pp 444ndash447 1998

[97] J Deng ldquoGrey entropy and grey target decision makingrdquoJournal of Grey System vol 22 no 1 pp 1ndash24 2010

[98] B M Bolstad R A Irizarry M Astrand and T P Speed ldquoAcomparison of normalizationmethods for high density oligonu-cleotide array data based on variance and biasrdquo Bioinformaticsvol 19 no 2 pp 185ndash193 2003

[99] J-Y Shi S-W Zhang Q Pan Y-M Cheng and J XieldquoPrediction of protein subcellular localization by support vectormachines using multi-scale energy and pseudo amino acidcompositionrdquo Amino Acids vol 33 no 1 pp 69ndash74 2007

[100] T Denoeux ldquoA 120581-nearest neighbor classification rule based onDempster-Shafer theoryrdquo IEEE Transactions on Systems Manand Cybernetics vol 25 no 5 pp 804ndash813 1995

[101] J M Keller M R Gray and J A Givens ldquoA fuzzy k-nearestneighbours algorithmrdquo IEEE Transactions on Systems Man andCybernetics vol 15 no 4 pp 580ndash585 1985

[102] X Xiao P Wang and K Chou ldquoiNR-physchem a sequence-based predictor for identifying nuclear receptors and theirsubfamilies via physical-chemical property matrixrdquo PLoS ONEvol 7 no 2 Article ID e30869 2012

[103] I Roterman L Konieczny W Jurkowski K Prymula and MBanach ldquoTwo-intermediate model to characterize the structureof fast-folding proteinsrdquo Journal of Theoretical Biology vol 283no 1 pp 60ndash70 2011

[104] X Xiao P Wang and K C Chou ldquoGPCR-2L predicting Gprotein-coupled receptors and their types by hybridizing twodifferentmodes of pseudo amino acid compositionsrdquoMolecularBioSystems vol 7 no 3 pp 911ndash919 2011

[105] X Xiao P Wang and K C Chou ldquoQuat-2L a web-server forpredicting protein quaternary structural attributesrdquo MolecularDiversity vol 15 no 1 pp 149ndash155 2011

[106] X Zheng C Li and J Wang ldquoAn information-theoreticapproach to the prediction of protein structural classrdquo Journalof Computational Chemistry vol 31 no 6 pp 1201ndash1206 2010

[107] H Shen J Yang X Liu and K C Chou ldquoUsing supervisedfuzzy clustering to predict protein structural classesrdquo Biochem-ical and Biophysical Research Communications vol 334 no 2pp 577ndash581 2005

[108] K-C Chou and C-T Zhang ldquoReview prediction of proteinstructural classesrdquo Critical Reviews in Biochemistry and Molec-ular Biology vol 30 no 4 pp 275ndash349 1995

[109] P C Mahalanobis ldquoOn the generalized distance in statisticsrdquoProceedings of the National Institute of Sciences of India vol 2pp 49ndash55 1936

[110] R M Centor ldquoSignal detectability the use of ROC curves andtheir analysesrdquoMedical Decision Making vol 11 no 2 pp 102ndash106 1991

[111] K-C Chou ldquoUsing subsite coupling to predict signal peptidesrdquoProtein Engineering vol 14 no 2 pp 75ndash79 2001

[112] K C Chou ldquoPrediction of protein signal sequences and theircleavage sitesrdquo Proteins vol 42 pp 136ndash139 2001

[113] K C Chou ldquoPrediction of signal peptides using scaledwindowrdquoPeptides vol 22 no 12 pp 1973ndash1979 2001

[114] K C Chou ldquoSome remarks on predicting multi-label attributesinmolecular biosystemsrdquoMolecular Biosystems vol 9 pp 1092ndash1100 2013

[115] K C Chou and H B Shen ldquoCell-PLoc 2 0 an improvedpackage of web-servers for predicting subcellular localization ofproteins in various organismsrdquoNatural Science vol 2 pp 1090ndash1103 2010

[116] M Gribskov and N L Robinson ldquoUse of receiver operatingcharacteristic (ROC) analysis to evaluate sequence matchingrdquoComputers and Chemistry vol 20 no 1 pp 25ndash33 1996

[117] Y Yamanishi M Kotera M Kanehisa and S Goto ldquoDrug-target interaction prediction from chemical genomic and phar-macological data in an integrated frameworkrdquo Bioinformaticsvol 26 no 12 pp i246ndashi254 2010

[118] Z He J Zhang X Shi et al ldquoPredicting drug-target interactionnetworks based on functional groups and biological featuresrdquoPLoS ONE vol 5 no 3 Article ID e9603 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology