[contributions to nephrology] proteomics in nephrology volume 141 || practical bioinformatics for...

14
Thongboonkerd V, Klein JB (eds): Proteomics in Nephrology. Contrib Nephrol. Basel, Karger, 2004, vol 141, pp 79–92 Practical Bioinformatics for Proteomics Visith Thongboonkerd, Jon B. Klein Core Proteomics Laboratory, Kidney Disease Program, Department of Medicine, University of Louisville, Louisville, Ky., USA In current proteomic analysis, matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) is commonly used for protein identification. The MS-based protein identification relies on pattern matching of observed molecular masses of peptides compared to a theoretical set of masses generated by a database. On many occasions the identified proteins are designated as ‘unknown protein’, ‘unnamed protein’, ‘putative protein’, ‘hypothetical pro- tein’ or an ‘unnamed gene product’, etc. All of these terms refer to proteins predicted from DNA sequences or proteins of unknown function that have been submitted to the database, but for which other information is limited or unknown. When this case occurs, investigators frequently ignore the proteins and the data may be of limited usefulness. Bioinformatic analyses are of substantial assistance in characterizing hypothetical proteins identified by peptide mass fingerprinting using MALDI- MS data. The data obtained from bioinformatic analyses may make a further study more focused. Additionally, bioinformatic techniques ‘unmask’ those unknown proteins, which may turn out to be common or well-known proteins. In this chapter, we demonstrate a practical bioinformatic approach we have derived to perform ‘data mining’ of unknown or hypothetical proteins. The authors present this approach not from the viewpoint of experts in bioinfor- matics (which we are assuredly not), but from our extensive experience as users of bioinformatics to analyze the data obtained from MALDI-MS. We provide an example in this chapter of an unnamed protein that was identified in mouse kidney proteome and was up-regulated in diabetic kidneys. Downloaded by: Univ. of Michigan, Taubman Med.Lib. 141.213.236.110 - 9/17/2013 8:10:13 PM

Upload: jb

Post on 15-Dec-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Thongboonkerd V, Klein JB (eds): Proteomics in Nephrology.

Contrib Nephrol. Basel, Karger, 2004, vol 141, pp 79–92

Practical Bioinformatics for Proteomics

Visith Thongboonkerd, Jon B. Klein

Core Proteomics Laboratory, Kidney Disease Program,

Department of Medicine, University of Louisville,

Louisville, Ky., USA

In current proteomic analysis, matrix-assisted laser desorption/ionization

mass spectrometry (MALDI-MS) is commonly used for protein identification.

The MS-based protein identification relies on pattern matching of observed

molecular masses of peptides compared to a theoretical set of masses generated

by a database. On many occasions the identified proteins are designated as

‘unknown protein’, ‘unnamed protein’, ‘putative protein’, ‘hypothetical pro-

tein’ or an ‘unnamed gene product’, etc. All of these terms refer to proteins

predicted from DNA sequences or proteins of unknown function that have been

submitted to the database, but for which other information is limited or

unknown. When this case occurs, investigators frequently ignore the proteins

and the data may be of limited usefulness.

Bioinformatic analyses are of substantial assistance in characterizing

hypothetical proteins identified by peptide mass fingerprinting using MALDI-

MS data. The data obtained from bioinformatic analyses may make a further

study more focused. Additionally, bioinformatic techniques ‘unmask’ those

unknown proteins, which may turn out to be common or well-known proteins.

In this chapter, we demonstrate a practical bioinformatic approach we have

derived to perform ‘data mining’ of unknown or hypothetical proteins. The

authors present this approach not from the viewpoint of experts in bioinfor-

matics (which we are assuredly not), but from our extensive experience as users

of bioinformatics to analyze the data obtained from MALDI-MS. We provide

an example in this chapter of an unnamed protein that was identified in mouse

kidney proteome and was up-regulated in diabetic kidneys.

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 2: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Thongboonkerd/Klein 80

Protein Identification

A common approach in proteomic analysis is using two-dimensional gel

electrophoresis (2-DE) to separate proteins and using MALDI-MS to identify

proteins. The data obtained by MS analysis are reported as mass spectra in mass

per charge (m/z) units. Peptide mass fingerprinting is then used to match

sample masses to theoretical masses in the protein databases. Initially, in any

analysis of MALDI-produced mass spectra, the most important issue is to

determine the validity of any match to the queried database. A number of

analytical tools for peptide mass fingerprinting have been produced and are

made available either through the Internet or as proprietary software. This

chapter focuses on publicly accessible peptide mass fingerprint tools. Each

search tool uses a different algorithm and has its own criteria to determine

significant matches between the experimental and database peptide masses,

but some search engines do not directly provide the criteria used to determine

the significance. Most search tools allow molecular size (Mr) and isoelectric

point (pI) data from 2-DE to be added into search parameters to guide the

search [1]. However, using narrow-ranged or restricted search may not allow the

identification of protein fragments and protein multimers. Additionally, post-

translational modified proteins, especially those in body fluids commonly

expressed on the 2-D gels as horizontal series of the same protein with changes

in their pI and Mr [2], may not be apparent when the search is constrained.

A post-translational modification (PTM), for example multiple phosphory-

lations, can shift a phosphoprotein spot from its initial resting position (non-

phosphorylated) up to 2 pI units [unpubl. data], making prediction of pI range

incorrect. Using the pI range of its phosphorylated position may mislead the

fingerprint results, which may not match with this protein if phosphorylation is

not included in the search parameters. Therefore, using restricted searches may

miss some of those proteins with PTMs.

We have generated an algorithm to guide peptide mass fingerprinting that is

based on our work identifying approximately 2,000 proteins from 4,000 excised

protein spots during the past few years. Our criteria (fig. 1) rely on an integrative

approach between non-restricted search (i.e. using the Mascot[i] search engine) and

restricted search (i.e. using the ProFound[ii] search engine). Based on our mass

spectrometers’ capabilities and our sample processing, the search parameter

assumption we employ are that peptides are monoisotopic, oxidized at methionine

residues, and carbamidomethylated at cysteine residues (as residues are reduced

and alkylated in our preps). A maximum of 1 missed cleavage and 150-ppm mass

tolerance are employed. Using these search parameters and criteria, sensitivity and

specificity are adequate and matching results are reproducible and consistent with

other confirmatory techniques such as immunoblot analyses [3–5].

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 3: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Practical Bioinformatics 81

In our approach, peptide masses are first subjected to non-restricted search

using the Mascot[i] tool, in which probability-based MOWSE (MOlecular

Weight SEarch) scores are reported as 10*log10(P), where P is the absolute

probability and scores �74 are considered significant [6]. After completion of

the Mascot[i] search, the Mr and pI are used to classify all the initial significant

hits into definite hits, negative hits (false positive), multimers, fragments, or

proteins that are likely to be modified, which the FindMod[iii] tool can be used

to predict potential PTMs by mass shift [7]. Samples with non-significant

scores are subjected to restricted search using predicted Mr and pI ranges based

on the spot positions on the gels. Several search tools are available for restricted

search, for example the ProFound[ii] search engine. The ProFound search algo-

rithm uses Bayesian theory to rank the protein sequences in the database by

their probability of the occurrence where Z scores are estimated by the distance

to the population mean in units of standard deviation [8]. Z scores �1.65 (con-

fidence interval �95%) are considered significant. Samples that have not been

definitely identified by non-restricted search can be frequently matched by a

restricted search. Because of recent advances in high-resolution 2-DE, up to

10,000 protein spots can be separated in a 2-D gel. As a result, a number of low

MALDI-TOF MS

�4 Masses �4 Masses

Negative Non-restricted search

Significant scores Non-significant scores

Restricted search (pI&Mr ranges)

Significant scores Non-significant scores

Definite

Same protein as the adjacent spots

(case of PTMs)

�10% coverage and high scores

�10% coverage and low scores

Definite Probable Negative

Mr: Y pI: N

(Delta pI� 2)

PTMs search

Mr: Y, N pI: N

(Delta pI�2)

Negative

Mr: N pI: Y

Multimers/ fragments

Mr: Y pI: Y

Definite

Fig. 1. Schematic approach and optimized criteria for peptide mass fingerprinting.

Abbreviations: Y, the Mr or pI of the sample protein spot is in the expected position on the

gel; N, the Mr or pI of the sample protein spot is not in the expected position on the gel.

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 4: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Thongboonkerd/Klein 82

abundance proteins can be visualized, but cannot be identified with certainty by

MALDI-MS. Therefore, in our analysis we initially classify a protein as a ‘pos-

sible’ identification in analyses where Z scores or probability-based MOWSE

scores are close (Z scores are �1.00 but �1.65, or probability-based MOWSE

scores are �50 but �74) to significant levels, but do not reach the significant

levels and sequence coverage is more than 10%. In these instances, the protein

identification is repeated and the possible hit may be definitively identified. We

find this approach, while labor-intensive, improves the sensitivity and speci-

ficity of peptide mass fingerprinting.

In our diabetic study, we identified a protein spot, at approximately 20 kDa

with the pI of 5.0–5.5, as an unnamed protein product (gi|12841975; accession

and locus BAB25424; species: Mus musculus; probability-based MOWSE

score: 104; ProFound Z score: 2.43, and sequence coverage: 50%) in mouse

kidney proteome. The expression level of this unknown protein was increased

in diabetic OVE26 mouse kidneys compared to the normal. The OVE26 mouse

is a transgenic model mimicking the human type 1 diabetes. We then performed

additional bioinformatic analyses to generate new data that might shed light on

a possible role for the unknown protein in diabetic nephropathy.

Protein Characterization

The amino acid sequence of gi|12841975 contains a total of 187 residues

and is shown below:

1 MAADISQWAG PFCLQEVDEP PQHALRVDYA GVTVDELGKV LTPTQVMNRP

51 SSISWDGLDP GKLYTLVLTD PDAPSRKDPK FREWHHFLVV NMKGNDISSG

101 TVLSDYVGSG PPSGTGLHRY VWLVYEQEQP LSCDEPILSN KSGDNRGKFK

151 VETFRKKYNL GAPVAGTCYQ AEWDDYVPKL YEQLSGK

Using the Compute pI/Mw[iv] tool, the calculated pI and molecular weight

(Mw) of this unknown protein were 5.19 and 20.86 kDa, respectively, which cor-

responded to the spot position in the 2-D gel that we had performed of OVE26

kidney tissue. This tool allows the computation of the theoretical pI and Mw for

a list of SWISS-PROT and/or TrEMBL entries or for a user entered sequence [9,

10]. Other protein characteristics were obtained by using the ProtParam[v] tool:

• Total negatively charge residues: 25

• Total positively charge residues: 19

• Formula: C935H1427N247O284S6

• Estimated half-life: 30 h (in reticulocytes); �20 h (in yeast), and �10 h (in

E. coli)• The instability index: 29.97 (stable protein)

• Aliphatic index: 73.42

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 5: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Practical Bioinformatics 83

Sequence Similarity and Homology

For unknown proteins, we routinely perform a similarity search generally

using the Basic Local Alignment Search Tool (BLAST) tool present on the

NCBI protein entry. In the case of gi|12841975, we used the NCBI BLink[vi],

an automatic link to the protein BLAST search, to determine similarities

and homology to the unknown protein. The search results showed that there

were 127 BLAST hits of 35 unique species that had similar sequences to

the unknown protein (fig. 2). The top hit was a hippocampal cholinergic

neurostimulating peptide precursor protein (HCNPpp, species: M. musculus)

with a score of 1,013 and 99% identities of 187 residues, without any gap.

Fig. 2. Homology or similarity search using the NCBI BLink[vi]. A total of 127

sequences from 35 unique species shared some identities with the unknown protein

gi|12841975. The top 27 hits are shown in this figure with various scores. The greater the

score, the better the match.

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 6: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Thongboonkerd/Klein 84

However, we identified the protein in the mouse kidney, not in the brain. We,

therefore, examined whether HCNPpp has other synonyms using the SWISS-

PROT and TrEMBL database[vii] search [11]. Surprisingly, this protein has sev-

eral synonyms including phosphatidylethanolamine-binding protein (PEBP),

prostatic-binding protein, neuropolypeptide h3, Raf kinase inhibitor protein

(RKIP), and basic cytosolic 21-kDa protein, which were all in the top 20 lists

of the BLAST hits shown in figure 2. This underscores an important practical

point about peptide mass fingerprint analysis. The helter-skelter and highly

redundant nomenclature of proteins is frequently misleading.

Domain Scan, Protein Family and Cellular Function

To analyze the putative cellular function of the unknown protein

gi|12841975 (later identified as PEBP), its domain and family were identified.

The domain scan was performed using the NCBI Conserved Domain

Summary[viii]. Shown in figure 3a, a domain of PEBP (PBP domain) was found

in the sequence, from the residue 21 through 172. The results were consistent

with the ProDom[ix] output, the ProSite[x] profiles, and the results from Pfam[xi]

and Pfam collection of hidden Markov models (SIB ProfileScan Sever[xii]),

shown in figure 3c [12, 13]. The family of this protein was obtained by using

the InterPro[xiii] and Pfam[xi] search tools [14, 15]. It was clear that this protein

was in the PBP family (fig. 3b). Therefore, this protein should function as a

binding protein to phosphatidylethanolamine, phospholipid, ATP, and opioids,

and also to phosphatidylinositol and phosphatidylcholine with lower affinity.

Additionally, it belongs to the serine protease inhibitor (SERPIN) superfamily

and inhibits thrombin, neuropsin and chymotrypsin, but does not inhibit trypsin,

tissue-type plasminogen activator and elastase. Finally, it should interact with

and inhibit Raf-1 kinase activity.

Subcellular Location

Apart from the domain scan and family search to understand the pro-

tein family and function, predicting subcellular location of the protein may

provide some additional information about an unknown protein. We used

the SubLoc[xiv] tool [16] to predict the subcellular location of the unknown

protein (later identified as PEBP) in our study. The SubLoc results showed

that this protein was likely to be present in the nucleus of the kidney cells, with

a reliability index of 1 and expected accuracy of 56%. We then used another

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 7: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Practical Bioinformatics 85

search tool, TargetP[xv] v1.0 [17], to confirm these results. Unfortunately, the

TargetP results showed that this protein located at any other locations than

mitochondrial and secretory pathways and were not able to confirm the former

analysis.

1 20 40 60 80 100

PBP

120 140 160 187

a

b

c

Fig. 3. Domain scan and protein family. Phosphatidylethanolamine-binding protein

(PBP) domain was found in the unknown protein using the NCBI Conserved Domain

Summary[viii] (a) and SIB ProfileScan Server[xii] (c). Additionally, the InterPro[xiii] scan

confirmed that the unknown protein was in the PBP family (b).

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 8: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Thongboonkerd/Klein 86

Transmembrane Prediction and Hydropathicity

To examine whether the unknown protein gi|12841975 (later identified

as PEBP) is a transmembrane protein, we used the TMpred[xvi] tool to predict

membrane-spanning regions and their orientation. The algorithm for TMpred is

based on the statistical analysis of TMbase, a database of naturally occurring

transmembrane proteins. The prediction was made by using a combination of

several weight matrices for scoring and an assumption that the transmembrane

helices contain at least 17 continuous hydrophobic residues. There was only

1 potential transmembrane region (TMR), residues 97–118 (a total of 22

hydrophobic residues), predicted from a calculation from inside to outside

helices (fig. 4). However, there was no TMR observed when the calculation

was made in the opposite direction, from outside to inside helices. Another

search tool, the TMHMM v. 2.0[xvii], was used to confirm the former analysis.

There was no TMR observed in the sequence of this protein using the TMHMM

500

0

TMpred output for unknown

i-�oo-�i

�500

�1,000

�1,500

�2,000

�2,500

�3,000

�3,500

�4,000

�4,500

�5,0000 20 40 60 80 100 120 140 160 180 200

Fig. 4. Transmembrane prediction. The TMpred[xvi] tool was used to examine whether

the unknown protein gi|12841975 contains any transmembrane region (TMR). The calcu-

lation was performed using the assumption that the transmembrane helices contain at least

17 continuous hydrophobic residues. There was only 1 potential TMR, residues 97–118, pre-

dicted from a calculation from inside to outside helices. There was no TMR predicted from

outside to inside calculation. Therefore, it is unlikely that this protein is a transmembrane

protein.

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 9: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Practical Bioinformatics 87

v. 2.0 tool [18, 19], either from inside to outside or vice versa. Therefore, it is

unlikely that this protein is a transmembrane protein. The Kyte-Doolittle

Hydropathy Plot[xviii] was then used to further analyze whether this unknown

protein is a hydrophilic or hydrophobic protein [20] as to indirectly confirm the

two former analyses. Shown in figure 5 is the hydropathy plot created from

the hydropathic indices of individual residues. The negative value denotes the

hydrophilic property of the residue, whereas the positive index represents the

hydrophobicity. The average index or the grand average of hydropathicity

(GRAVY) was calculated using the indices of all residues. The GRAVY of

the unknown protein was �0.527 indicate that this protein is a hydrophilic

protein.

Post-Translational Modifications (PTMs) Prediction

Because the presence of a PTM causes peptide mass shift [7], potential

PTMs can be predicted, but not conclusively demonstrated, by matching the

mass difference (mass difference � theoretical mass – observed mass) to the

masses of known PTMs. To date, there are at least 30 known PTMs provided

in the FindMod database. We used the FindMod[iii] tool [7] to predict poten-

tial PTMs in gi|12841975 (later identified as PEBP) using the unmatched

masses with a maximum of 150-ppm mass tolerance (� mass). Only glycosy-

lation was observed in the unmatched masses of the unknown protein. The

GlycoMod[xix] tool [21] was then used to further analyze the initial results

4.0

3.0

2.0

1.0

0.0

�1.0

�2.0

�3.0

�4.0H2O

CH3

10 180SEQ1 sequence

Window: 9

R

UBNEZQH

PYSWTG

AM

CF

LVI

K

Fig. 5. Kyte-Doolittle Hydropathy Plot[xvii]. The plot was created by various hydropathic

indices of individual amino acid residues. Most of the residues had negative indices, which

indicate the hydrophilicity of the protein. The grand average of hydropathicity (GRAVY)

was �0.527, indicating that the unknown protein gi|12841975 is a hydrophilic protein.

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 10: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Thongboonkerd/Klein 88

and multiple O-linked glycosylation sites were observed in this protein.

Potential phosphorylation sites of an observed sequence can be predicted by

using the NetPhos 2.0 server[xx] [22]. Several potential phosphorylation sites at

serine, threonine and tyrosine residues were found in the gi|12841975 sequence

(fig. 6).

Motif Scan and Protein-Protein Interactions

We used the Scansite[xxi] tool to search for motifs within the protein that are

likely to be phosphorylated by specific protein kinases or bind to domains such

as SH2, 14-3-3, or PDZ domains [23]. The motif scanning by this tool utilizes

an entropy approach that assesses the probability of a site matching the motif

using the selectivity values and sums the logs of the probability values for each

amino acid in the candidate sequence. The program then indicates the percentile

ranking of the candidate motif in respect to all potential motifs in a protein of

interest [23]. Shown in figure 7 are all motifs observed in the unknown protein

gi|12841975 using the Scansite[xxi] tool with medium stringency. The motifs

found were p85 SH3, Itk SH3, calmodulin-dependent kinase 2, Akt kinase,

PKC (�, �, , and ), ATM kinase, casein kinase 1 and PDZ class 1. However,

only two motifs, PKC and PDZ class 1, were observed when we used a

SerineSerineThreonine

TyrosineThreshold

00

1

20 40 60 80 100 120 140 160 180Sequence position

NetPhos 2.0: predicted phosphorylation sites in sequence

Pho

spho

ryla

tion

pot

entia

l

Fig. 6. Potential phosphorylation sites. The NetPhos 2.0 server[xix] was used to predict

potential phosphorylation sites in the sequence of the unknown protein gi|12841975 and

several potential phosphorylation sites were found at serine, threonine and tyrosine residues.

Scores greater than the threshold (0.500) indicate the more likelihood of those phosphoryla-

tion sites.

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 11: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Practical Bioinformatics 89

Baso_ST_kin S54

Baso_ST_kin T153

SH3P71SH3P74

PBP(19–173)

Ribosonal_L31(152–162)

Surfaceaccessibility1.0

100187 AA

PDZS185

Predictedsites

Acid _ST_kinT101

DNA_dam_kinS123

a

b

Fig. 7. Motif scan. We used the Scansite[xx] tool to explore motifs in the unknown pro-

tein gi|12841975. A total of 9 motifs were observed in this unknown protein using medium

stringency (a). Details of sites, scores, confidence interval and sequences are shown in (b).

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 12: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Thongboonkerd/Klein 90

high-stringency parameter. Similar results were obtained using the SMART[xxii]

and ScanProsite[xxiii] tools.

Summary

We used various bioinformatic tools to examine the unknown protein

gi|12841975 that was up-regulated in mouse diabetic kidneys. The data indicate

that this unknown protein is, indeed, the PEBP. Motif scanning showed that this

protein contains several kinase motifs, especially PKC that plays an important

role in the pathogenesis of diabetic nephropathy [24, 25]. We therefore hypoth-

esize that this protein (PEBP) has a potential functional role in PKC-dependent

pathogenic pathways of diabetic nephropathy. Further study will be focused on

phosphorylation pathways of the PEBP and its substrates. In summary, we have

presented a case study that outlines our approach to further characterize the

unknown proteins identified by peptide mass fingerprinting. Publicly accessi-

ble bioinformatic tools can provide a wealth of information to guide subsequent

approaches that use traditional molecular biology tools.

Indices for Bioinformatic Tools

[i] Mascot – http://www.matrixscience.com[ii] ProFound – http://129.85.19.192/profound_bin/WebProFound.exe[iii] FindMod – http://us.expasy.org/tools/findmod/[iv] Compute pI/Mw – http://us.expasy.org/tools/pi_tool.html[v] ProtParam – http://us.expasy.org/tools/protparam.html[vi] BLink – http://www.ncbi.nlm.nih.gov[vii] SWISS-PROT and TrEMBL database – http://ca.expasy.org/sprot/[viii] NCBI Conserved Domain Summary – http://www.ncbi.nlm.nih.gov[ix] ProDom – http://prodes.toulouse.inra.fr/prodom/2002.1/html/home.php[x] ProSite – http://us.expasy.org/prosite/[xi] Pfam – http://www.sanger.ac.uk/Software/Pfam/[xii] SIB ProfileScan Server – http://hits.isb-sib.ch/cgi-bin/PFSCAN?[xiii] InterPro – http://www.ebi.ac.uk/interpro/scan.html[xiv] SubLoc – http://www.bioinfo.tsinghua.edu.cn/SubLoc/[xv] TargetP v1.0 – http://www.cbs.dtu.dk/services/TargetP/[xvi] TMpred – http://www.ch.embnet.org/software/TMPRED_form.html[xvii] TMHMM v. 2.0 – http://www.cbs.dtu.dk/services/TMHMM-2.0/[xviii] Kyte-Doolittle Hydropathy Plot – http://fasta.bioch.virginia.edu/o_fasta/

grease.htm[xix] GlycoMod – http://us.expasy.org/tools/glycomod/[xx] NetPhos 2.0 server – http://www.cbs.dtu.dk/services/NetPhos-2.0/

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 13: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Practical Bioinformatics 91

[xxi] Scansite – http://scansite.mit.edu[xxii] SMART – http://smart.embl-heidelberg.de[xxiii] ScanProsite – http://us.expasy.org/tools/scanprosite/

References

1 Fenyo D: Identifying the proteome: Software tools. Curr Opin Biotechnol 2000;11:391–395.

2 Thongboonkerd V, McLeish KR, Arthur JM, Klein JB: Proteomic analysis of normal human

urinary proteins isolated by acetone precipitation or ultracentrifugation. Kidney Int 2002;62:

1461–1469.

3 Thongboonkerd V, Klein JB, Pierce WM, Jevans AW, Arthur JM: Sodium loading changes urinary

excretion: A proteomic analysis. Am J Physiol Renal Physiol 2003;284:F1155–F1163.

4 Arthur JM, Thongboonkerd V, Scherzer JA, Cai J, Pierce WM, Klein JB: Differential expression

of proteins in renal cortex and medulla: A proteomic approach. Kidney Int 2002;62:1314–1321.

5 Gozal E, Gozal D, Pierce WM, Thongboonkerd V, Scherzer JA, Sachleben LR, Zhang ZG, Cai J,

Klein JB: Proteomic analysis of CA1 and CA3 regions of rat hippocampus and differential

susceptibility to intermittent hypoxia. J Neurochem 2002;83:331–345.

6 Perkins DN, Pappin DJ, Creasy DM, Cottrell JS: Probability-based protein identification by

searching sequence databases using mass spectrometry data. Electrophoresis 1999;20:3551–3567.

7 Wilkins MR, Gasteiger E, Gooley AA, Herbert BR, Molloy MP, Binz PA, Ou K, Sanchez JC,

Bairoch A, Williams KL, Hochstrasser DF: High-throughput mass spectrometric discovery of

protein post-translational modifications. J Mol Biol 1999;289:645–657.

8 Zhang W, Chait BT: ProFound: An expert system for protein identification using mass spectro-

metric peptide mapping information. Anal Chem 2000;72:2482–2489.

9 Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez JC, Frutiger S, Hochstrasser D:

The focusing positions of polypeptides in immobilized pH gradients can be predicted from their

amino acid sequences. Electrophoresis 1993;14:1023–1031.

10 Wilkins MR, Gasteiger E, Bairoch A, Sanchez JC, Williams KL, Appel RD, Hochstrasser DF: Protein

identification and analysis tools in the ExPASy server. Methods Mol Biol 1999;112:531–552.

11 Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ,

Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowl-

edgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003;31:365–370.

12 Falquet L, Pagni M, Bucher P, Hulo N, Sigrist CJ, Hofmann K, Bairoch A: The PROSITE

database, its status in 2002. Nucleic Acids Res 2002;30:235–238.

13 Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P: PROSITE:

A documented database using patterns and profiles as motif descriptors. Brief Bioinform 2002;3:

265–274.

14 Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M,

Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W,

Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R,

Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD,

Servant F, Sigrist CJ, Vaughan R, Zdobnov EM: The InterPro Database, 2003 brings increased

coverage and new features. Nucleic Acids Res 2003;31:315–318.

15 Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL,

Marshall M, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res 2002;30:

276–280.

16 Hua S, Sun Z: Support vector machine approach for protein subcellular localization prediction.

Bioinformatics 2001;17:721–728.

17 Emanuelsson O, Nielsen H, Brunak S, von Heijne G: Predicting subcellular localization of

proteins based on their N-terminal amino acid sequence. J Mol Biol 2000;300:1005–1016.

18 Moller S, Croning MD, Apweiler R: Evaluation of methods for the prediction of membrane-

spanning regions. Bioinformatics 2001;17:646–653.

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM

Page 14: [Contributions to Nephrology] Proteomics in Nephrology Volume 141 || Practical Bioinformatics for Proteomics

Thongboonkerd/Klein 92

19 Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology

with a hidden Markov model: Application to complete genomes. J Mol Biol 2001;305:567–580.

20 Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol

Biol 1982;157:105–132.

21 Cooper CA, Gasteiger E, Packer NH: GlycoMod – A software tool for determining glycosylation

compositions from mass spectrometric data. Proteomics 2001;1:340–349.

22 Blom N, Gammeltoft S, Brunak S: Sequence and structure-based prediction of eukaryotic protein

phosphorylation sites. J Mol Biol 1999;294:1351–1362.

23 Yaffe MB, Leparc GG, Lai J, Obata T, Volinia S, Cantley LC: A motif-based profile scanning

approach for genome-wide prediction of signaling pathways. Nat Biotechnol 2001;19:348–353.

24 Raptis AE, Viberti G: Pathogenesis of diabetic nephropathy. Exp Clin Endocrinol Diabetes 2001;

109(suppl 2):424–437.

25 Lehmann R, Schleicher ED: Molecular mechanism of diabetic nephropathy. Clin Chim Acta 2000;

297:135–144.

Visith Thongboonkerd, MD

Core Proteomics Laboratory, Kidney Disease Program

Department of Medicine, University of Louisville

570 S. Preston Street, Suite 102, Louisville, KY 40202 (USA)

Tel. �1 502 8522366, Fax �1 502 8524384, E-Mail [email protected]

Dow

nloa

ded

by:

Uni

v. o

f Mic

higa

n, T

aubm

an M

ed.L

ib.

141.

213.

236.

110

- 9/

17/2

013

8:10

:13

PM