ppis earch e ngine : gene ontology-based...

9
This article was downloaded by: [University of Chicago Library] On: 10 November 2014, At: 23:06 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Computer Methods in Biomechanics and Biomedical Engineering Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/gcmb20 PPISearchEngine: gene ontology-based search for protein–protein interactions Byungkyu Park a , Guangyu Cui b , Hyunjin Lee b , De-Shuang Huang c & Kyungsook Han b a Institute for Information and Electronics Research, Inha University, Incheon, 402-751, South Korea b Department of Computer Science and Information Engineering, Inha University, Incheon, 402-751, South Korea c Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China Published online: 09 Feb 2012. To cite this article: Byungkyu Park, Guangyu Cui, Hyunjin Lee, De-Shuang Huang & Kyungsook Han (2013) PPISearchEngine: gene ontology-based search for protein–protein interactions, Computer Methods in Biomechanics and Biomedical Engineering, 16:7, 691-698, DOI: 10.1080/10255842.2011.631528 To link to this article: http://dx.doi.org/10.1080/10255842.2011.631528 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Upload: kyungsook

Post on 14-Mar-2017

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: PPIS               earch               E               ngine               : gene ontology-based search for protein–protein interactions

This article was downloaded by: [University of Chicago Library]On: 10 November 2014, At: 23:06Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Computer Methods in Biomechanics and BiomedicalEngineeringPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gcmb20

PPISearchEngine: gene ontology-based search forprotein–protein interactionsByungkyu Parka, Guangyu Cuib, Hyunjin Leeb, De-Shuang Huangc & Kyungsook Hanb

a Institute for Information and Electronics Research, Inha University, Incheon, 402-751,South Koreab Department of Computer Science and Information Engineering, Inha University, Incheon,402-751, South Koreac Department of Computer Science and Technology, Tongji University, Shanghai, 201804,ChinaPublished online: 09 Feb 2012.

To cite this article: Byungkyu Park, Guangyu Cui, Hyunjin Lee, De-Shuang Huang & Kyungsook Han (2013) PPISearchEngine:gene ontology-based search for protein–protein interactions, Computer Methods in Biomechanics and Biomedical Engineering,16:7, 691-698, DOI: 10.1080/10255842.2011.631528

To link to this article: http://dx.doi.org/10.1080/10255842.2011.631528

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: PPIS               earch               E               ngine               : gene ontology-based search for protein–protein interactions

PPISEARCHENGINE: gene ontology-based search for protein–protein interactions

Byungkyu Parka, Guangyu Cuib, Hyunjin Leeb, De-Shuang Huangc and Kyungsook Hanb*

aInstitute for Information and Electronics Research, Inha University, Incheon 402-751, South Korea; bDepartment of Computer Scienceand Information Engineering, Inha University, Incheon 402-751, South Korea; cDepartment of Computer Science and Technology,

Tongji University, Shanghai 201804, China

(Received 30 April 2011; final version received 10 October 2011)

This paper presents a new search engine called PPISEARCHENGINE which finds protein–protein interactions (PPIs) using thegene ontology (GO) and the biological relations of proteins. For efficient retrieval of PPIs, each GO term is assigned a primenumber and the relation between the terms is represented by the product of prime numbers. This representation is hiddenfrom users but facilitates the search for the interactions of a query protein by unique prime factorisation of the number thatrepresents the query protein. For a query protein, PPISEARCHENGINE considers not only the GO term associated with thequery protein but also the GO terms at the lower level than the GO term in the GO hierarchy, and finds all the interactions ofthe query protein which satisfy the search condition. In contrast, the standard keyword-matching or ID-matching searchmethod cannot find the interactions of a protein unless the interactions involve a protein with explicit annotations. To thebest of our knowledge, this search engine is the first method that can process queries like ‘for protein p with GO g1, find p’sinteraction partners with GO g2’. PPISEARCHENGINE is freely available to academics at http://search.hpid.org/.

Keywords: protein–protein interaction; search engine; gene ontology

1. Introduction

The explosion of protein–protein interaction (PPI) data

(Schueler and Bornberg-Bauer 2010) from proteomics

studies has resulted in the development of a large number

of databases for efficient storage and retrieval of the data

as well as computational methods for the prediction of

PPIs and consequently complex networks of PPIs (Ahmed

et al. 2011; Gomez et al. 2011; Gonzalez-Diaz et al. 2008;

Guharoy et al. 2011; Hu et al. 2011; Krishnadev and

Srinivasan 2011; Park et al. 2009; Procaccini et al. 2011;

Rodriguez-Soca et al. 2010). Many databases allow the

user to retrieve PPIs using a syntactic search method such

as keyword matching or ID matching. However, such

syntactic search methods do not consider the biological

relation between keywords, and often miss the interactions

that involve a protein with no explicit annotations. As a

result, they retrieve too few or no search results despite

many potential matches present in the database.

As a de facto standard for describing gene products and

their characteristics, the gene ontology (GO) provides the

largest and reliable vocabulary (Barrell et al. 2009). GO is

growing fast and has more than 34,000 terms as of 26 April

2011. In an effort to provide a semantic search method

based on GO, we have recently developed a representation

method that facilitates the search for PPIs (Park and Han

2010). It assigns each GO term a prime number and

represents the relation among the GO terms by the product

of the prime numbers. This representation is completely

hidden from users but enables a search engine to find all

the relevant interactions of a query protein by unique

prime factorisation of the numbers. For a GO term, all the

GO terms at the lower level are automatically considered

than the term in the GO hierarchy when searching for PPIs.

In our previous work (Park and Han 2010), a prototypical

search system was implemented for human proteins to

demonstrate the feasibility of the representation scheme.

There are a few databases that allow the user to retrieve

PPIs using GO terms. In BIND (Bader et al. 2003), for

example, searching for PPIs with GO terms is possible, but

its search is syntactic since it does not consider the

biological relation between the GO terms. For example,

BIND returns 5100 PPIs for a query of ‘ATP binding’,

whereas it returns only 96 interactions for a query of

‘nucleotide binding’. The term ‘nucleotide binding’ is at a

higher level than ‘ATP binding’ in the GO hierarchy, but it

returns much fewer search results than ‘ATP binding’.

More specific comparison of our search method with ID-

matching search method with the GO terms is shown later

in this paper. IntAct (Aranda et al. 2010) can search PPIs

using GO. It supports a query type like ‘for every protein p

with GO g, find the interaction partners of p’ but cannot

deal with a more complex query such as ‘for protein p with

GO g1, find p’s interaction partners with GO g2’.

There are search methods that allow searching with the

GO terms but are limited to a specific organism. As the

Munich Information Center for Protein Sequences (MIPS)

q 2013 Taylor & Francis

*Corresponding author. Email: [email protected]

Computer Methods in Biomechanics and Biomedical Engineering, 2013

Vol. 16, No. 7, 691–698, http://dx.doi.org/10.1080/10255842.2011.631528

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:06

10

Nov

embe

r 20

14

Page 3: PPIS               earch               E               ngine               : gene ontology-based search for protein–protein interactions

protein interaction resource on yeast, MPact (Guldener et al.

2006) allows the user to find the interactions between yeast

proteins. The search method of MPact is different from ours

in many aspects: (1) while our search method is based on

GO, MPact is based on Funcat, which is much smaller than

GO. The total number of categories of Funcat (Ruepp et al.

2004) is only 1362, which is about 4% of the GO terms. As of

19 November 2010, GO has 33,028 GO terms (19,897 terms

in biological process, 2773 terms in cellular component and

8898 terms in molecular function); (2) MPact is limited to

Saccharomyces cerevisiae (S. cerevisiae) proteins only

while our search engine can search PPIs in several

organisms; (3) MPact cannot retrieve more than 1500

PPIs, whereas our search engine does not have a limit in the

number of the search results. PepCyber (Gong et al. 2008)

provides the interactions of human proteins which are

mediated by phosphoprotein-binding domains. PepCyber

allows the user to restrict the search results to those involved

in a specific pathway, which is annotated by GO terms. But,

fewer than 20 GO terms are used to classify the pathways, so

its search method is not fully GO based. DroID (Murali et al.

2011) supports a GO-based search for PPIs in Drosophila

melanogaster (D. melanogaster). In a database called

PRIME (http://prime.ontology.ims.u-tokyo.ac.jp/), the user

can find proteins with a GO term, but cannot find PPIs with a

GO term.

As an extension of our previous work, we developed a

full-scale search engine called PPISEARCHENGINE (http://

search.hpid.org/). This search engine has improved from

the previous prototype system in several ways: (1) it can

search PPIs not only in Homo sapiens (H. sapiens) but also

in other species such as S. cerevisiae, D. melanogaster,

Caenorhabditis elegans (C. elegans) and a few viruses;

(2) it supports more diverse query types specified by GO

terms; (3) it can handle obsolete or alternative GO terms;

(4) for a protein with no GO annotations known, it can still

perform GO-based search for interactions using the

sequence data of the protein. The rest of this paper

presents PPISEARCHENGINE and the comparative analysis

of the search engine with the ID-matching method.

2. Representation method

To represent the relation of GO terms, we modified the

original Godel numbering (Nagel and Newman 2001) as

follows. Let T ¼ {t1; t2; . . . ; tn} be a set of GO terms. We

first assign a prime number pi to each term ti in T, where piis the ith prime number. The relation of a term ti with other

terms is represented by the product of the prime numbers

corresponding to ti and its ancestors in the GO hierarchy:

Gi ¼YancestorðiÞ

k¼i

pk: ð1Þ

Consider the GO hierarchy in Figure 1. The term t9 that

represents the transcription factor activity has five

ancestors (t7, t5, t3, t2, t1) in GO hierarchy. The relation

of t9 with other terms is encoded by multiplying the prime

number for t9 by the other prime numbers for its ancestors

in the hierarchy:

G9 ¼ p9p7p5p3p2p1: ð2Þ

This representation enables us to unambiguously infer

the hierarchical relation of any term ti with its ancestors by

the factorisation of Gi into prime numbers. A hypothesis

such as ‘t9 is a specialised term of t7’ can be tested by

modulo operation since the hypothesis is true when G9,

encoding of t9, has a prime factor of p7:

G9 ; 0ðmod p7Þ: ð3Þ

This representation is hidden from users but enables a

search engine to efficiently find PPIs in a biologically

meaningful way. The search engine can find all

interactions involving the query protein in almost real

time since the interaction partners of the query protein can

be found unambiguously by the prime factorisation of the

modified Godel numbers representing the query protein

and the search conditions (Park and Han 2010). This

makes our method different from the standard search

methods such as keyword-matching or ID-matching

search methods. Keyword-matching or ID-matching

search methods often miss the interactions involving a

protein that has no explicit annotations matching the

search condition, but our method retrieves such inter-

actions as well if they satisfy the search condition with a

more specific term in the ontology.

3. Implementation

We developed a GO-based search engine called

PPISEARCHENGINE, which uses the representation to

handle several types of queries. PPISEARCHENGINE was

implemented in the C# programming language, and the

Java BigInteger class was used to store the prime numbers

and to perform multiplication and modulo operations on

them.

Currently, PPISEARCHENGINE can find protein inter-

actions in several organisms, which include H. sapiens,

S. cerevisiae, D. melanogaster and C. elegans (Chen et al.

2006), or interactions of H. sapiens proteins with virus

proteins. Among various virus species, the current version

of the GO-based search engine can handle the human

immunodeficiency virus 1 (HIV-1) (Fu et al. 2009) and

hepatitis C virus (HCV) (de Chassey et al. 2008). Table 1

shows the total number of PPIs that can be retrieved by the

current version of PPISEARCHENGINE.

When the user does not provide a specific GO term

in the query, the GO-based search is still possible for

human proteins. For a query sequence, we first run BLAST

B. Park et al.692

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:06

10

Nov

embe

r 20

14

Page 4: PPIS               earch               E               ngine               : gene ontology-based search for protein–protein interactions

(Altschul et al. 1997) against the proteins with GO annotation

which have the same sequence as the query protein. After

finding all the possible GO terms associated with the proteins

that match the query protein, we let the user select one of the

GO terms. Finding a GO term from a sequence is quite useful

when the user does not know a GO term in advance or wants

to try searching with different GO terms. This functionality is

currently provided for human proteins only, but will be

extended to other organisms in the near future.

As GO is updated on a daily basis, PPISEARCHENGINE

also updates its representation of GO terms. After

downloading the ontology file in the OBO format, it

extracts key attributes of GO terms, such as ‘alt_id’

and ‘is_obsolete’ attributes, and updates ‘alt_id’ and

‘is_obsolete’ tables for the relevant GO terms and their

relations. When the user enters an alternative term in the

query, it searches PPIs with its representative term instead

of the alternative term. For example, when the user enters

GO:0019952 or GO:0050876 in the query, a search is

performed using their representative term ‘reproduction’

(GO:0000003). This is similar to how the GO database

handles alternative GO terms with AmiGO, which is the

official tool of the GO database for browsing and searching

the database; when an alternative GO term is entered,

AmiGO uses a representative term of the alternative term.

The current version of PPISEARCHENGINE also allows the

user to search using an obsolete GO term, but displays the

search result with a warning message.

4. User interface

PPISEARCHENGINE supports both ‘Simple search by GO

term name’ and ‘Advanced search by GO term ID’ (e.g.

see Figure 2). There are three types of queries in the

‘Advanced search by GO term ID’.

Figure 1. Example of our representation of GO terms. Each GO term ti is assigned a unique prime number pi. The relation of ti withother GO terms is encoded by a modified Godel number Gi ¼

QancestorðiÞk¼i pk, which is the product of the prime numbers corresponding to ti

and its ancestors in the GO hierarchy.

Table 1. The number of PPIs that can be retrieved by the GO-based search engine.

H. sapiens S. cerevisiae D. melanogaster C. elegans

H. sapiens 38,756

HIV-1 2058

HCV 377

S. cerevisiae 10,881

D. melanogaster 28,408

C. elegans 4699

Computer Methods in Biomechanics and Biomedical Engineering 693

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:06

10

Nov

embe

r 20

14

Page 5: PPIS               earch               E               ngine               : gene ontology-based search for protein–protein interactions

(1) For every protein annotated with a GO term t, find its

interaction partners.

(2) For every protein annotated with a GO term t1, find its

interaction partners with a GO term t2.

(3) For every protein annotated with two GO terms t1 and

t2 that are connected by Boolean operators AND, OR

and NOT, find its interaction partners.

For the first query type, the search engine performs

modulo operation on all Gis by a prime number p

corresponding to the term t, and extracts all specialised

terms of t in the GO hierarchy (i.e. terms at a lower level

than t). Once it extracts all relevant GO terms, it retrieves

all interactions involving a protein annotated with at least

one of the GO terms. For the second query type, the search

engine performs modulo operation on all Gis by prime

numbers p1 and p2 to extract the specialised terms of t1and t2. It then retrieves all interactions between proteins

annotated with t1 or t1’s specialised term and proteins

annotated with t2 or t2’s specialised term. For the third

Figure 2. The user interface for the retrieval of PPIs by either ‘Simple search by GO term name’ or ‘Advanced search by GO term ID’.Due to the autocomplete functionality, a partial GO term entered in the ‘Simple search by GO term name’ is expanded into one or morecomplete GO terms. When the user does not know a GO term but the sequence data only, the search engine finds all possible GO termsassociated with the protein.

B. Park et al.694

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:06

10

Nov

embe

r 20

14

Page 6: PPIS               earch               E               ngine               : gene ontology-based search for protein–protein interactions

query type, the search engine extracts the specialised terms

of t1 and t2 in the same way as in the second query type. It

then finds all interactions involving a protein annotated

with the specialised terms of t1 and t2 that are connected by

a Boolean operator such as AND, OR or NOT.

Figure 3 shows an example of the search results for

interactions that involve a protein associated with

‘carbohydrate binding’ (GO:0030246). For the GO term

GO:0030246, the search engine displays a GO-tree, which

shows relevant GO terms and the number of proteins

associated with the terms. The PPIs found by the search

engine is visualised as a network when the user clicks the

‘Interaction Network’ button, and can be saved either in

the PSI-MI format (Hermjakob et al. 2004) or in the PSI-

MI format with XML style sheets.

5. Comparison of two search methods

For comparative purpose, we tested both PPISEARCHEN-

GINE and the ID-matching search method on the

interaction data of H. sapiens proteins of the Human

Protein Reference Database (HPRD) (Prasad et al. 2009).

Table 2 shows the number of PPIs found by the two

methods. In Figure 1, the GO term ‘molecular_function’

(GO:0003674) is the root node of the GO hierarchy for

molecular function, and all other terms are the specialised

terms of ‘molecular_function’. A total of 472 GO terms

are used for annotating H. sapiens proteins in HPRD

release 8, but there is no H. sapiens protein with explicit

annotation ‘molecular function’ or ‘protein complex

scaffold’. PPISEARCHENGINE first inferred 753 GO terms

(see Supplementary Table 1) from the 472 GO terms and

Figure 3. Example of simple search by GO term name. Step 1: select an organism. Step 2: enter a GO term name or ID. Step 3: choose aGO term to use for search when there are multiple GO terms with the partial GO term entered by the user. Step 4: search results aredisplayed in a GO tree, which shows the GO terms of the proteins found by the search engine and the number of proteins associated withthe terms within the bracket next to a GO term. When the user clicks a GO term in the tree, the proteins with the GO term annotationare listed with their interaction information. Clicking the ‘Interaction Network’ button runs WebInterViewer to visualise a PPI network ofthe proteins.

Computer Methods in Biomechanics and Biomedical Engineering 695

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:06

10

Nov

embe

r 20

14

Page 7: PPIS               earch               E               ngine               : gene ontology-based search for protein–protein interactions

found 37,028 interactions for a query of ‘molecular

function’ and 4556 interactions for a query of ‘protein

complex scaffold’. In contrast, the ID-matching method

found no protein interactions for the GO terms ‘molecular

function’ and ‘protein complex scaffold’ (Table 2).

With a query of ‘binding’ (GO:0005488), PPISEARCH-

ENGINE found 18,432 interactions, but the ID-matching

search method retrieved only 203 interactions. With a query

of ‘protein binding’ (GO:0005515), which is a specialised

term of ‘binding’ (GO:0005488) (see Figure 1), PPI-

SEARCHENGINE found 9259 interactions, but the ID-

matching search method found only 1588 interactions.

The term ‘binding’ is at a higher level than ‘protein binding’

in the GO hierarchy, but the ID-matching search method

returns much fewer search results for ‘binding’ than

‘protein binding’. These search anomalies occur because

the ID-matching search method does a purely syntactic

search and does not consider the relation of GO terms at all.

In contrast, PPISEARCHENGINE finds interactions not only

by the GO term specified in the query but also by all

specialised terms of the term.

Table 2. The number of PPIs found in H. sapiens by twomethods.

GO

ID-matching

search

GO-based

search

Molecular function (GO:0003674) 0 37,028

Binding (GO:0005488) 203 18,432

Transcription regulator

Activity (GO:0030528) 5621 9581

Protein binding (GO:0005515) 1588 9259

Nucleic acid binding (GO:0003676) 10 8891

Protein complex scaffold (GO:0032947) 0 4556

DNA binding (GO:0003677) 1955 7077

Receptor signalling complex

Scaffold activity (GO:0030159) 4556 4556

Transcription factor

Activity (GO:0003700) 5307 5308

Table 3. The number of PPIs found in S. cerevisiae by twomethods.

GO

ID-matching

search

GO-based

search

Biological process

Biological process (GO:0008150) 0 9889

Metabolic process (GO:0008152) 10 6783

Primary metabolic

Process (GO:0044238) 0 6092

Macromolecule metabolic

Process (GO:0043170) 0 5483

Protein metabolic

Process (GO:0019538) 4 2068

Molecular function

Nucleic acid binding (GO:0003676) 11 1660

DNA binding (GO:0003677) 223 565

Transcription factor

Activity (GO:0003700) 98 98

Cellular component

Intracellular (GO:0005622) 0 9407

Cytoplasm (GO:0005737) 2747 5923

Cytosol (GO:0005829) 221 254

Table 4. Response time for finding PPIs by the GO-based search engine using the queries of Tables 2 and 3. The GO-based searchengine runs on an Intel Core 2 Duo E6400 processor with 4 GB RAM.

GO term#PPIs inH. sapiens Time (s)

Molecular function (GO:0003674) 37,028 30.06Binding (GO:0005488) 18,432 41.25Transcription regulator activity (GO:0030528) 9581 19.53Protein binding (GO:0005515) 9259 21.08Nucleic acid binding (GO:0003676) 8891 20.21Protein complex scaffold (GO:0032947) 4556 11.06DNA binding (GO:0003677) 7077 15.58Receptor signalling complex

Scaffold activity (GO:0030159) 4556 11.02Transcription factor activity (GO:0003700) 5308 12.61

GO term #PPIs in S. cerevisiae Time (s)Biological process (GO:0008150) 9889 5.14Metabolic process (GO:0008152) 6783 4.10Primary metabolic process (GO:0044238) 6092 3.92Macromolecule metabolic process (GO:0043170) 5483 3.73Protein metabolic process (GO:0019538) 2068 1.58Nucleic acid binding (GO:0003676) 1660 1.24DNA binding (GO:0003677) 565 0.76Transcription factor activity (GO:0003700) 98 0.49Intracellular (GO:0005622) 9407 5.02Cytoplasm (GO:0005737) 5923 3.66Cytosol (GO:0005829) 254 0.61

B. Park et al.696

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:06

10

Nov

embe

r 20

14

Page 8: PPIS               earch               E               ngine               : gene ontology-based search for protein–protein interactions

Table 3 shows another comparison of the two methods

in S. cerevisiae proteins. The GO term ‘biological process’

(GO:0008150) is the root node of GO hierarchy for the

biological process of S. cerevisiae proteins. With a query of

GO:0008150, PPISEARCHENGINE found 9889 interactions

between S. cerevisiae proteins, but the ID-matching

search retrieved no interactions. With a query of

GO:0008152 for metabolic process, which is the descen-

dent node of GO:0008150 in the GO hierarchy,

PPISEARCHENGINE found 6783 interactions, but the ID-

matching search found only 10 interactions. The ID-

matching search returned more search results with a more

specific term than with a less specific term. The ID-

matching search found no interactions with a query of

‘primary metabolic process’ (GO:0044238) or ‘macromol-

ecule metabolic process’ (GO:0043170), but found four

interactions with a query of ‘protein metabolic process’

(GO:0019538), which is at a lower level than GO:0044238

or GO:0043170.

Table 4 shows the response time of PPISEARCHENGINE

for the queries of Tables 2 and 3. A large number of PPIs

were found for each query, but the longest response time

was 41.25 s.

6. Conclusions

This paper presented a GO-based search engine for PPIs

using a new representation. Given a query protein with

optional search conditions expressed in one or more GO

terms, the search engine finds all H. sapiens, S. cerevisiae,

D. melanogaster, C. elegans, HIV-1 and HCV proteins

associated with the GO terms and the interactions of the

proteins. The search engine provides autocomplete

functionality for GO terms, so a partial term entered by

the user is expanded into one or more complete GO terms

that are consistent with the partial term. The search engine

can handle obsolete or alternative GO terms, and it can

search PPIs using a protein sequence.

So far there have been no databases of PPIs that can

process queries like ‘For protein p with GO g1, find p’s

interaction partners with GO g2’. To the best of our

knowledge, this is the first search engine that can deal with

such queries. The current version of the search engine can

be applied to the proteins in a few species, but will be

extended to more species in the near future. The detailed

methods for using the search engine are described at http://

search.hpid.org/.

Acknowledgements

This work was supported by the Basic Science Research Program(2011-0003766) and, in part, by the Key Research InstituteProgram (2011-0018394) through the National ResearchFoundation of Korea (NRF) funded by the Ministry of Education,Science and Technology (MEST).

References

Ahmed KS, Saloma NH, Kadah YM. 2011. Improving theprediction of yeast protein function using weighted protein-protein interactions. Theor Biol Med Model. 8:11–17.

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z,Miller W, Lipman DJ. 1997. Gapped BLAST andPSI-BLAST: a new generation of protein database searchprograms. Nucleic Acids Res. 38:3389–3402.

Aranda B, Achuthan P, Alam-Faruque Y, Armean I, Bridge A,Derow C, Feuermann M, Ghanbarian AT, Kerrien S,Khadake J, et al. 2010. The IntAct molecular interactiondatabase in 2010. Nucleic Acids Res. 38:D525–D531.

Bader GD, Betel D, Hogue CWV. 2003. BIND: the Biomolecularinteraction network database. Nucleic Acids Res. 31:248–250.

Barrell D, Dimmer E, Huntley RP, Binns D, O’Donovan C,Apweiler R. 2009. The GOA database in 2009 – anintegrated gene ontology annotation resource. Nucleic AcidsRes. 37:D396–D403.

Chen J, Hsu W, Lee ML, Ng SK. 2006. Increasing confidence ofprotein interactomes using network topological metrics.Bioinformatics. 22:1998–2004.

de Chassey B, Navratil V, Tafforeau L, Hiet MS, Aublin-Gex A,Agaugue S, Meiffren G, Pradezynski F, Faria BF, Chantier T,et al. 2008. Hepatitis C virus infection protein network.Mol Syst Biol. 4:230, doi: 10.1038/msb.2008.66.

Fu W, Sanders-Beer BE, Katz KS, Maglott DR, Pruitt KD,Ptak RG. 2009. Human immunodeficiency virus type 1,human protein interaction database at NCBI. Nucleic AcidsRes. 37:D417–D422.

Gomez A, Cedano J, Amela I, Planas A, Pinol J, Querol E. 2011.Gene ontology function prediction in mollicutes usingprotein-protein association networks. BMC Sys Biol. 5:49,doi: 10.1186/1752-0509-5-49.

Gong WM, Zhou DH, Ren YL, Wang YJ, Zuo ZX, Shen YP,Xiao FF, Zhu Q, Hong AL, Zhou X, et al. 2008.PepCyber:P,PEP: a database of human protein-proteininteractions mediated by phosphoprotein-binding domains.Nucleic Acids Res. 36:D679–D683.

Gonzalez-Diaz H, Gonzalez-Diaz Y, Santana L, Ubeira FM,Uriarte E. 2008. Proteomics, networks and connectivityindices. Proteomics. 8(4):750–778.

Guharoy M, Pal A, Dasgupta M, Chakrabarti P. 2011. PRICE(PRotein Interface Conservation and Energetics): a server forthe analysis of protein-protein interfaces. J Struct FunctGenomics. 12(1):33–41.

Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A,Mewes HW, Stumpflen V. 2006. MPact: the MIPS proteininteraction resource on yeast. Nucleic Acids Res. 34:D436–D441.

Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik R,Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U,von Mering C, et al. 2004. The HUPOPSI’s molecularinteraction format – a community standard for therepresentation of protein interaction data. Nat Biotechnol.22:177–183.

Hu L, Huang T, Liu XJ, Cai YD. 2011. Predicting proteinphenotypes based on protein-protein interaction network.PLoS One. 6(3):e17668.

Krishnadev O, Srinivasan N. 2011. Prediction of protein-proteininteractions between human host and a pathogen and itsapplication to three pathogenic bacteria. Int J BiolMacromol. 48(4):613–619.

Murali T, Pacifico S, Yu JK, Guest S, Roberts GG, Finley RL.2011. DroID 2011: a comprehensive, integrated resource for

Computer Methods in Biomechanics and Biomedical Engineering 697

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:06

10

Nov

embe

r 20

14

Page 9: PPIS               earch               E               ngine               : gene ontology-based search for protein–protein interactions

protein, transcription factor, RNA and gene interactions for

Drosophila. Nucleic Acids Res. 39:D736–D743.

Nagel E, Newman JR. 2001. In: Hofstadter DR, editor. Godel’s

Proof. New York: New York University Press.

Park B, Han K. 2010. An ontology-based search engine for

protein-protein interactions. BMC Bioinform. 11(Suppl. 1):

S23, doi: 10.1186/1471-2105-11-S1-S23.

Park D, Singh R, Baym M, Liao CS, Berger B. 2011. IsoBase:

a database of functionally related proteins across PPI

networks. Nucleic Acids Res. 9:D295–D300.

Park SJ, Choi JS, Kim BC, Jho SW, Ryu JW, Park D, Lee KA,

Bhak J, Il Kim S. 2009. PutidaNET: interactome database

service and network analysis of Pseudomonas putida

KT2440. BMC Genomics. 10(Suppl. 3):S18, doi:10.1186/

1471-2164-10-S3-S18.

Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S,

Mathivanan S, Telikicherla D, Raju R, Shafreen B,

Venugopal A, et al. 2009. Human protein referencedatabase-2009 update. Nucleic Acids Res. 37:D767–D772.

Procaccini A, Lunt B, Szurmant H, Hwa T, Weigt M. 2011.Dissecting the specificity of protein-protein interaction inbacterial two-component signaling: orphans and crosstalks.PLoS One. 6(5):e19729.

Rodriguez-Soca Y, Munteanu CR, Dorado J, Pazos A, Prado-Prado FJ, Gonzalez-Diaz H. 2010. Trypano-PPI: a webserver for prediction of unique targets in trypanosomeproteome by using electrostatic parameters of protein-proteininteractions. J Proteome Res. 5:1182–1190.

Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M,Tetko I, Guldener U, Mannhaupt G, Munsterkotter M, et al.2004. The FunCat, a functional annotation scheme forsystematic classification of proteins from whole genomes.Nucleic Acids Res. 32:5539–5545.

Schueler A, Bornberg-Bauer E. 2010. The evolution of proteininteraction networks. Method Mol Biol. 696:273–289.

B. Park et al.698

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go L

ibra

ry]

at 2

3:06

10

Nov

embe

r 20

14