[contributions to nephrology] proteomics in nephrology volume 141 || practical bioinformatics for...
TRANSCRIPT
Thongboonkerd V, Klein JB (eds): Proteomics in Nephrology.
Contrib Nephrol. Basel, Karger, 2004, vol 141, pp 79–92
Practical Bioinformatics for Proteomics
Visith Thongboonkerd, Jon B. Klein
Core Proteomics Laboratory, Kidney Disease Program,
Department of Medicine, University of Louisville,
Louisville, Ky., USA
In current proteomic analysis, matrix-assisted laser desorption/ionization
mass spectrometry (MALDI-MS) is commonly used for protein identification.
The MS-based protein identification relies on pattern matching of observed
molecular masses of peptides compared to a theoretical set of masses generated
by a database. On many occasions the identified proteins are designated as
‘unknown protein’, ‘unnamed protein’, ‘putative protein’, ‘hypothetical pro-
tein’ or an ‘unnamed gene product’, etc. All of these terms refer to proteins
predicted from DNA sequences or proteins of unknown function that have been
submitted to the database, but for which other information is limited or
unknown. When this case occurs, investigators frequently ignore the proteins
and the data may be of limited usefulness.
Bioinformatic analyses are of substantial assistance in characterizing
hypothetical proteins identified by peptide mass fingerprinting using MALDI-
MS data. The data obtained from bioinformatic analyses may make a further
study more focused. Additionally, bioinformatic techniques ‘unmask’ those
unknown proteins, which may turn out to be common or well-known proteins.
In this chapter, we demonstrate a practical bioinformatic approach we have
derived to perform ‘data mining’ of unknown or hypothetical proteins. The
authors present this approach not from the viewpoint of experts in bioinfor-
matics (which we are assuredly not), but from our extensive experience as users
of bioinformatics to analyze the data obtained from MALDI-MS. We provide
an example in this chapter of an unnamed protein that was identified in mouse
kidney proteome and was up-regulated in diabetic kidneys.
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Thongboonkerd/Klein 80
Protein Identification
A common approach in proteomic analysis is using two-dimensional gel
electrophoresis (2-DE) to separate proteins and using MALDI-MS to identify
proteins. The data obtained by MS analysis are reported as mass spectra in mass
per charge (m/z) units. Peptide mass fingerprinting is then used to match
sample masses to theoretical masses in the protein databases. Initially, in any
analysis of MALDI-produced mass spectra, the most important issue is to
determine the validity of any match to the queried database. A number of
analytical tools for peptide mass fingerprinting have been produced and are
made available either through the Internet or as proprietary software. This
chapter focuses on publicly accessible peptide mass fingerprint tools. Each
search tool uses a different algorithm and has its own criteria to determine
significant matches between the experimental and database peptide masses,
but some search engines do not directly provide the criteria used to determine
the significance. Most search tools allow molecular size (Mr) and isoelectric
point (pI) data from 2-DE to be added into search parameters to guide the
search [1]. However, using narrow-ranged or restricted search may not allow the
identification of protein fragments and protein multimers. Additionally, post-
translational modified proteins, especially those in body fluids commonly
expressed on the 2-D gels as horizontal series of the same protein with changes
in their pI and Mr [2], may not be apparent when the search is constrained.
A post-translational modification (PTM), for example multiple phosphory-
lations, can shift a phosphoprotein spot from its initial resting position (non-
phosphorylated) up to 2 pI units [unpubl. data], making prediction of pI range
incorrect. Using the pI range of its phosphorylated position may mislead the
fingerprint results, which may not match with this protein if phosphorylation is
not included in the search parameters. Therefore, using restricted searches may
miss some of those proteins with PTMs.
We have generated an algorithm to guide peptide mass fingerprinting that is
based on our work identifying approximately 2,000 proteins from 4,000 excised
protein spots during the past few years. Our criteria (fig. 1) rely on an integrative
approach between non-restricted search (i.e. using the Mascot[i] search engine) and
restricted search (i.e. using the ProFound[ii] search engine). Based on our mass
spectrometers’ capabilities and our sample processing, the search parameter
assumption we employ are that peptides are monoisotopic, oxidized at methionine
residues, and carbamidomethylated at cysteine residues (as residues are reduced
and alkylated in our preps). A maximum of 1 missed cleavage and 150-ppm mass
tolerance are employed. Using these search parameters and criteria, sensitivity and
specificity are adequate and matching results are reproducible and consistent with
other confirmatory techniques such as immunoblot analyses [3–5].
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Practical Bioinformatics 81
In our approach, peptide masses are first subjected to non-restricted search
using the Mascot[i] tool, in which probability-based MOWSE (MOlecular
Weight SEarch) scores are reported as 10*log10(P), where P is the absolute
probability and scores �74 are considered significant [6]. After completion of
the Mascot[i] search, the Mr and pI are used to classify all the initial significant
hits into definite hits, negative hits (false positive), multimers, fragments, or
proteins that are likely to be modified, which the FindMod[iii] tool can be used
to predict potential PTMs by mass shift [7]. Samples with non-significant
scores are subjected to restricted search using predicted Mr and pI ranges based
on the spot positions on the gels. Several search tools are available for restricted
search, for example the ProFound[ii] search engine. The ProFound search algo-
rithm uses Bayesian theory to rank the protein sequences in the database by
their probability of the occurrence where Z scores are estimated by the distance
to the population mean in units of standard deviation [8]. Z scores �1.65 (con-
fidence interval �95%) are considered significant. Samples that have not been
definitely identified by non-restricted search can be frequently matched by a
restricted search. Because of recent advances in high-resolution 2-DE, up to
10,000 protein spots can be separated in a 2-D gel. As a result, a number of low
MALDI-TOF MS
�4 Masses �4 Masses
Negative Non-restricted search
Significant scores Non-significant scores
Restricted search (pI&Mr ranges)
Significant scores Non-significant scores
Definite
Same protein as the adjacent spots
(case of PTMs)
�10% coverage and high scores
�10% coverage and low scores
Definite Probable Negative
Mr: Y pI: N
(Delta pI� 2)
PTMs search
Mr: Y, N pI: N
(Delta pI�2)
Negative
Mr: N pI: Y
Multimers/ fragments
Mr: Y pI: Y
Definite
Fig. 1. Schematic approach and optimized criteria for peptide mass fingerprinting.
Abbreviations: Y, the Mr or pI of the sample protein spot is in the expected position on the
gel; N, the Mr or pI of the sample protein spot is not in the expected position on the gel.
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Thongboonkerd/Klein 82
abundance proteins can be visualized, but cannot be identified with certainty by
MALDI-MS. Therefore, in our analysis we initially classify a protein as a ‘pos-
sible’ identification in analyses where Z scores or probability-based MOWSE
scores are close (Z scores are �1.00 but �1.65, or probability-based MOWSE
scores are �50 but �74) to significant levels, but do not reach the significant
levels and sequence coverage is more than 10%. In these instances, the protein
identification is repeated and the possible hit may be definitively identified. We
find this approach, while labor-intensive, improves the sensitivity and speci-
ficity of peptide mass fingerprinting.
In our diabetic study, we identified a protein spot, at approximately 20 kDa
with the pI of 5.0–5.5, as an unnamed protein product (gi|12841975; accession
and locus BAB25424; species: Mus musculus; probability-based MOWSE
score: 104; ProFound Z score: 2.43, and sequence coverage: 50%) in mouse
kidney proteome. The expression level of this unknown protein was increased
in diabetic OVE26 mouse kidneys compared to the normal. The OVE26 mouse
is a transgenic model mimicking the human type 1 diabetes. We then performed
additional bioinformatic analyses to generate new data that might shed light on
a possible role for the unknown protein in diabetic nephropathy.
Protein Characterization
The amino acid sequence of gi|12841975 contains a total of 187 residues
and is shown below:
1 MAADISQWAG PFCLQEVDEP PQHALRVDYA GVTVDELGKV LTPTQVMNRP
51 SSISWDGLDP GKLYTLVLTD PDAPSRKDPK FREWHHFLVV NMKGNDISSG
101 TVLSDYVGSG PPSGTGLHRY VWLVYEQEQP LSCDEPILSN KSGDNRGKFK
151 VETFRKKYNL GAPVAGTCYQ AEWDDYVPKL YEQLSGK
Using the Compute pI/Mw[iv] tool, the calculated pI and molecular weight
(Mw) of this unknown protein were 5.19 and 20.86 kDa, respectively, which cor-
responded to the spot position in the 2-D gel that we had performed of OVE26
kidney tissue. This tool allows the computation of the theoretical pI and Mw for
a list of SWISS-PROT and/or TrEMBL entries or for a user entered sequence [9,
10]. Other protein characteristics were obtained by using the ProtParam[v] tool:
• Total negatively charge residues: 25
• Total positively charge residues: 19
• Formula: C935H1427N247O284S6
• Estimated half-life: 30 h (in reticulocytes); �20 h (in yeast), and �10 h (in
E. coli)• The instability index: 29.97 (stable protein)
• Aliphatic index: 73.42
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Practical Bioinformatics 83
Sequence Similarity and Homology
For unknown proteins, we routinely perform a similarity search generally
using the Basic Local Alignment Search Tool (BLAST) tool present on the
NCBI protein entry. In the case of gi|12841975, we used the NCBI BLink[vi],
an automatic link to the protein BLAST search, to determine similarities
and homology to the unknown protein. The search results showed that there
were 127 BLAST hits of 35 unique species that had similar sequences to
the unknown protein (fig. 2). The top hit was a hippocampal cholinergic
neurostimulating peptide precursor protein (HCNPpp, species: M. musculus)
with a score of 1,013 and 99% identities of 187 residues, without any gap.
Fig. 2. Homology or similarity search using the NCBI BLink[vi]. A total of 127
sequences from 35 unique species shared some identities with the unknown protein
gi|12841975. The top 27 hits are shown in this figure with various scores. The greater the
score, the better the match.
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Thongboonkerd/Klein 84
However, we identified the protein in the mouse kidney, not in the brain. We,
therefore, examined whether HCNPpp has other synonyms using the SWISS-
PROT and TrEMBL database[vii] search [11]. Surprisingly, this protein has sev-
eral synonyms including phosphatidylethanolamine-binding protein (PEBP),
prostatic-binding protein, neuropolypeptide h3, Raf kinase inhibitor protein
(RKIP), and basic cytosolic 21-kDa protein, which were all in the top 20 lists
of the BLAST hits shown in figure 2. This underscores an important practical
point about peptide mass fingerprint analysis. The helter-skelter and highly
redundant nomenclature of proteins is frequently misleading.
Domain Scan, Protein Family and Cellular Function
To analyze the putative cellular function of the unknown protein
gi|12841975 (later identified as PEBP), its domain and family were identified.
The domain scan was performed using the NCBI Conserved Domain
Summary[viii]. Shown in figure 3a, a domain of PEBP (PBP domain) was found
in the sequence, from the residue 21 through 172. The results were consistent
with the ProDom[ix] output, the ProSite[x] profiles, and the results from Pfam[xi]
and Pfam collection of hidden Markov models (SIB ProfileScan Sever[xii]),
shown in figure 3c [12, 13]. The family of this protein was obtained by using
the InterPro[xiii] and Pfam[xi] search tools [14, 15]. It was clear that this protein
was in the PBP family (fig. 3b). Therefore, this protein should function as a
binding protein to phosphatidylethanolamine, phospholipid, ATP, and opioids,
and also to phosphatidylinositol and phosphatidylcholine with lower affinity.
Additionally, it belongs to the serine protease inhibitor (SERPIN) superfamily
and inhibits thrombin, neuropsin and chymotrypsin, but does not inhibit trypsin,
tissue-type plasminogen activator and elastase. Finally, it should interact with
and inhibit Raf-1 kinase activity.
Subcellular Location
Apart from the domain scan and family search to understand the pro-
tein family and function, predicting subcellular location of the protein may
provide some additional information about an unknown protein. We used
the SubLoc[xiv] tool [16] to predict the subcellular location of the unknown
protein (later identified as PEBP) in our study. The SubLoc results showed
that this protein was likely to be present in the nucleus of the kidney cells, with
a reliability index of 1 and expected accuracy of 56%. We then used another
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Practical Bioinformatics 85
search tool, TargetP[xv] v1.0 [17], to confirm these results. Unfortunately, the
TargetP results showed that this protein located at any other locations than
mitochondrial and secretory pathways and were not able to confirm the former
analysis.
1 20 40 60 80 100
PBP
120 140 160 187
a
b
c
Fig. 3. Domain scan and protein family. Phosphatidylethanolamine-binding protein
(PBP) domain was found in the unknown protein using the NCBI Conserved Domain
Summary[viii] (a) and SIB ProfileScan Server[xii] (c). Additionally, the InterPro[xiii] scan
confirmed that the unknown protein was in the PBP family (b).
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Thongboonkerd/Klein 86
Transmembrane Prediction and Hydropathicity
To examine whether the unknown protein gi|12841975 (later identified
as PEBP) is a transmembrane protein, we used the TMpred[xvi] tool to predict
membrane-spanning regions and their orientation. The algorithm for TMpred is
based on the statistical analysis of TMbase, a database of naturally occurring
transmembrane proteins. The prediction was made by using a combination of
several weight matrices for scoring and an assumption that the transmembrane
helices contain at least 17 continuous hydrophobic residues. There was only
1 potential transmembrane region (TMR), residues 97–118 (a total of 22
hydrophobic residues), predicted from a calculation from inside to outside
helices (fig. 4). However, there was no TMR observed when the calculation
was made in the opposite direction, from outside to inside helices. Another
search tool, the TMHMM v. 2.0[xvii], was used to confirm the former analysis.
There was no TMR observed in the sequence of this protein using the TMHMM
500
0
TMpred output for unknown
i-�oo-�i
�500
�1,000
�1,500
�2,000
�2,500
�3,000
�3,500
�4,000
�4,500
�5,0000 20 40 60 80 100 120 140 160 180 200
Fig. 4. Transmembrane prediction. The TMpred[xvi] tool was used to examine whether
the unknown protein gi|12841975 contains any transmembrane region (TMR). The calcu-
lation was performed using the assumption that the transmembrane helices contain at least
17 continuous hydrophobic residues. There was only 1 potential TMR, residues 97–118, pre-
dicted from a calculation from inside to outside helices. There was no TMR predicted from
outside to inside calculation. Therefore, it is unlikely that this protein is a transmembrane
protein.
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Practical Bioinformatics 87
v. 2.0 tool [18, 19], either from inside to outside or vice versa. Therefore, it is
unlikely that this protein is a transmembrane protein. The Kyte-Doolittle
Hydropathy Plot[xviii] was then used to further analyze whether this unknown
protein is a hydrophilic or hydrophobic protein [20] as to indirectly confirm the
two former analyses. Shown in figure 5 is the hydropathy plot created from
the hydropathic indices of individual residues. The negative value denotes the
hydrophilic property of the residue, whereas the positive index represents the
hydrophobicity. The average index or the grand average of hydropathicity
(GRAVY) was calculated using the indices of all residues. The GRAVY of
the unknown protein was �0.527 indicate that this protein is a hydrophilic
protein.
Post-Translational Modifications (PTMs) Prediction
Because the presence of a PTM causes peptide mass shift [7], potential
PTMs can be predicted, but not conclusively demonstrated, by matching the
mass difference (mass difference � theoretical mass – observed mass) to the
masses of known PTMs. To date, there are at least 30 known PTMs provided
in the FindMod database. We used the FindMod[iii] tool [7] to predict poten-
tial PTMs in gi|12841975 (later identified as PEBP) using the unmatched
masses with a maximum of 150-ppm mass tolerance (� mass). Only glycosy-
lation was observed in the unmatched masses of the unknown protein. The
GlycoMod[xix] tool [21] was then used to further analyze the initial results
4.0
3.0
2.0
1.0
0.0
�1.0
�2.0
�3.0
�4.0H2O
CH3
10 180SEQ1 sequence
Window: 9
R
UBNEZQH
PYSWTG
AM
CF
LVI
K
Fig. 5. Kyte-Doolittle Hydropathy Plot[xvii]. The plot was created by various hydropathic
indices of individual amino acid residues. Most of the residues had negative indices, which
indicate the hydrophilicity of the protein. The grand average of hydropathicity (GRAVY)
was �0.527, indicating that the unknown protein gi|12841975 is a hydrophilic protein.
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Thongboonkerd/Klein 88
and multiple O-linked glycosylation sites were observed in this protein.
Potential phosphorylation sites of an observed sequence can be predicted by
using the NetPhos 2.0 server[xx] [22]. Several potential phosphorylation sites at
serine, threonine and tyrosine residues were found in the gi|12841975 sequence
(fig. 6).
Motif Scan and Protein-Protein Interactions
We used the Scansite[xxi] tool to search for motifs within the protein that are
likely to be phosphorylated by specific protein kinases or bind to domains such
as SH2, 14-3-3, or PDZ domains [23]. The motif scanning by this tool utilizes
an entropy approach that assesses the probability of a site matching the motif
using the selectivity values and sums the logs of the probability values for each
amino acid in the candidate sequence. The program then indicates the percentile
ranking of the candidate motif in respect to all potential motifs in a protein of
interest [23]. Shown in figure 7 are all motifs observed in the unknown protein
gi|12841975 using the Scansite[xxi] tool with medium stringency. The motifs
found were p85 SH3, Itk SH3, calmodulin-dependent kinase 2, Akt kinase,
PKC (�, �, , and ), ATM kinase, casein kinase 1 and PDZ class 1. However,
only two motifs, PKC and PDZ class 1, were observed when we used a
SerineSerineThreonine
TyrosineThreshold
00
1
20 40 60 80 100 120 140 160 180Sequence position
NetPhos 2.0: predicted phosphorylation sites in sequence
Pho
spho
ryla
tion
pot
entia
l
Fig. 6. Potential phosphorylation sites. The NetPhos 2.0 server[xix] was used to predict
potential phosphorylation sites in the sequence of the unknown protein gi|12841975 and
several potential phosphorylation sites were found at serine, threonine and tyrosine residues.
Scores greater than the threshold (0.500) indicate the more likelihood of those phosphoryla-
tion sites.
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Practical Bioinformatics 89
Baso_ST_kin S54
Baso_ST_kin T153
SH3P71SH3P74
PBP(19–173)
Ribosonal_L31(152–162)
Surfaceaccessibility1.0
100187 AA
PDZS185
Predictedsites
Acid _ST_kinT101
DNA_dam_kinS123
a
b
Fig. 7. Motif scan. We used the Scansite[xx] tool to explore motifs in the unknown pro-
tein gi|12841975. A total of 9 motifs were observed in this unknown protein using medium
stringency (a). Details of sites, scores, confidence interval and sequences are shown in (b).
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Thongboonkerd/Klein 90
high-stringency parameter. Similar results were obtained using the SMART[xxii]
and ScanProsite[xxiii] tools.
Summary
We used various bioinformatic tools to examine the unknown protein
gi|12841975 that was up-regulated in mouse diabetic kidneys. The data indicate
that this unknown protein is, indeed, the PEBP. Motif scanning showed that this
protein contains several kinase motifs, especially PKC that plays an important
role in the pathogenesis of diabetic nephropathy [24, 25]. We therefore hypoth-
esize that this protein (PEBP) has a potential functional role in PKC-dependent
pathogenic pathways of diabetic nephropathy. Further study will be focused on
phosphorylation pathways of the PEBP and its substrates. In summary, we have
presented a case study that outlines our approach to further characterize the
unknown proteins identified by peptide mass fingerprinting. Publicly accessi-
ble bioinformatic tools can provide a wealth of information to guide subsequent
approaches that use traditional molecular biology tools.
Indices for Bioinformatic Tools
[i] Mascot – http://www.matrixscience.com[ii] ProFound – http://129.85.19.192/profound_bin/WebProFound.exe[iii] FindMod – http://us.expasy.org/tools/findmod/[iv] Compute pI/Mw – http://us.expasy.org/tools/pi_tool.html[v] ProtParam – http://us.expasy.org/tools/protparam.html[vi] BLink – http://www.ncbi.nlm.nih.gov[vii] SWISS-PROT and TrEMBL database – http://ca.expasy.org/sprot/[viii] NCBI Conserved Domain Summary – http://www.ncbi.nlm.nih.gov[ix] ProDom – http://prodes.toulouse.inra.fr/prodom/2002.1/html/home.php[x] ProSite – http://us.expasy.org/prosite/[xi] Pfam – http://www.sanger.ac.uk/Software/Pfam/[xii] SIB ProfileScan Server – http://hits.isb-sib.ch/cgi-bin/PFSCAN?[xiii] InterPro – http://www.ebi.ac.uk/interpro/scan.html[xiv] SubLoc – http://www.bioinfo.tsinghua.edu.cn/SubLoc/[xv] TargetP v1.0 – http://www.cbs.dtu.dk/services/TargetP/[xvi] TMpred – http://www.ch.embnet.org/software/TMPRED_form.html[xvii] TMHMM v. 2.0 – http://www.cbs.dtu.dk/services/TMHMM-2.0/[xviii] Kyte-Doolittle Hydropathy Plot – http://fasta.bioch.virginia.edu/o_fasta/
grease.htm[xix] GlycoMod – http://us.expasy.org/tools/glycomod/[xx] NetPhos 2.0 server – http://www.cbs.dtu.dk/services/NetPhos-2.0/
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Practical Bioinformatics 91
[xxi] Scansite – http://scansite.mit.edu[xxii] SMART – http://smart.embl-heidelberg.de[xxiii] ScanProsite – http://us.expasy.org/tools/scanprosite/
References
1 Fenyo D: Identifying the proteome: Software tools. Curr Opin Biotechnol 2000;11:391–395.
2 Thongboonkerd V, McLeish KR, Arthur JM, Klein JB: Proteomic analysis of normal human
urinary proteins isolated by acetone precipitation or ultracentrifugation. Kidney Int 2002;62:
1461–1469.
3 Thongboonkerd V, Klein JB, Pierce WM, Jevans AW, Arthur JM: Sodium loading changes urinary
excretion: A proteomic analysis. Am J Physiol Renal Physiol 2003;284:F1155–F1163.
4 Arthur JM, Thongboonkerd V, Scherzer JA, Cai J, Pierce WM, Klein JB: Differential expression
of proteins in renal cortex and medulla: A proteomic approach. Kidney Int 2002;62:1314–1321.
5 Gozal E, Gozal D, Pierce WM, Thongboonkerd V, Scherzer JA, Sachleben LR, Zhang ZG, Cai J,
Klein JB: Proteomic analysis of CA1 and CA3 regions of rat hippocampus and differential
susceptibility to intermittent hypoxia. J Neurochem 2002;83:331–345.
6 Perkins DN, Pappin DJ, Creasy DM, Cottrell JS: Probability-based protein identification by
searching sequence databases using mass spectrometry data. Electrophoresis 1999;20:3551–3567.
7 Wilkins MR, Gasteiger E, Gooley AA, Herbert BR, Molloy MP, Binz PA, Ou K, Sanchez JC,
Bairoch A, Williams KL, Hochstrasser DF: High-throughput mass spectrometric discovery of
protein post-translational modifications. J Mol Biol 1999;289:645–657.
8 Zhang W, Chait BT: ProFound: An expert system for protein identification using mass spectro-
metric peptide mapping information. Anal Chem 2000;72:2482–2489.
9 Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez JC, Frutiger S, Hochstrasser D:
The focusing positions of polypeptides in immobilized pH gradients can be predicted from their
amino acid sequences. Electrophoresis 1993;14:1023–1031.
10 Wilkins MR, Gasteiger E, Bairoch A, Sanchez JC, Williams KL, Appel RD, Hochstrasser DF: Protein
identification and analysis tools in the ExPASy server. Methods Mol Biol 1999;112:531–552.
11 Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ,
Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowl-
edgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003;31:365–370.
12 Falquet L, Pagni M, Bucher P, Hulo N, Sigrist CJ, Hofmann K, Bairoch A: The PROSITE
database, its status in 2002. Nucleic Acids Res 2002;30:235–238.
13 Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P: PROSITE:
A documented database using patterns and profiles as motif descriptors. Brief Bioinform 2002;3:
265–274.
14 Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M,
Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W,
Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R,
Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD,
Servant F, Sigrist CJ, Vaughan R, Zdobnov EM: The InterPro Database, 2003 brings increased
coverage and new features. Nucleic Acids Res 2003;31:315–318.
15 Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL,
Marshall M, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res 2002;30:
276–280.
16 Hua S, Sun Z: Support vector machine approach for protein subcellular localization prediction.
Bioinformatics 2001;17:721–728.
17 Emanuelsson O, Nielsen H, Brunak S, von Heijne G: Predicting subcellular localization of
proteins based on their N-terminal amino acid sequence. J Mol Biol 2000;300:1005–1016.
18 Moller S, Croning MD, Apweiler R: Evaluation of methods for the prediction of membrane-
spanning regions. Bioinformatics 2001;17:646–653.
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM
Thongboonkerd/Klein 92
19 Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology
with a hidden Markov model: Application to complete genomes. J Mol Biol 2001;305:567–580.
20 Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol
Biol 1982;157:105–132.
21 Cooper CA, Gasteiger E, Packer NH: GlycoMod – A software tool for determining glycosylation
compositions from mass spectrometric data. Proteomics 2001;1:340–349.
22 Blom N, Gammeltoft S, Brunak S: Sequence and structure-based prediction of eukaryotic protein
phosphorylation sites. J Mol Biol 1999;294:1351–1362.
23 Yaffe MB, Leparc GG, Lai J, Obata T, Volinia S, Cantley LC: A motif-based profile scanning
approach for genome-wide prediction of signaling pathways. Nat Biotechnol 2001;19:348–353.
24 Raptis AE, Viberti G: Pathogenesis of diabetic nephropathy. Exp Clin Endocrinol Diabetes 2001;
109(suppl 2):424–437.
25 Lehmann R, Schleicher ED: Molecular mechanism of diabetic nephropathy. Clin Chim Acta 2000;
297:135–144.
Visith Thongboonkerd, MD
Core Proteomics Laboratory, Kidney Disease Program
Department of Medicine, University of Louisville
570 S. Preston Street, Suite 102, Louisville, KY 40202 (USA)
Tel. �1 502 8522366, Fax �1 502 8524384, E-Mail [email protected]
Dow
nloa
ded
by:
Uni
v. o
f Mic
higa
n, T
aubm
an M
ed.L
ib.
141.
213.
236.
110
- 9/
17/2
013
8:10
:13
PM