identification and sequence analysis of novel proteins in ... · arpita gantayet masters of applied...
TRANSCRIPT
i
Identification and Sequence Analysis of Novel
Proteins in the Zebra Mussel Adhesive Apparatus
by
Arpita Gantayet
A thesis submitted in conformity with the requirements
for the degree of Masters of Applied Science
Institute of Biomaterials and Biomedical Engineering
University of Toronto
© Copyright by Arpita Gantayet 2012
ii
Identification and Sequence Analysis of Novel Proteins in the
Zebra Mussel Adhesive Apparatus
Arpita Gantayet
Masters of Applied Science
Institute of Biomaterials and Biomedical Engineering
University of Toronto
2012
Abstract
The freshwater zebra mussel Dreissena polymorpha is a biofouling species that adheres to varied
substrates underwater using a proteinaceous byssus that consists of a bundle of threads tipped
with adhesive plaques. This underwater adhesion is an inspiration for the development of
medical and dental bioadhesives, however, the byssus is highly resistant to biochemical
characterization owing to extensive cross-linking and therefore, limited information is available
on the mechanisms of adhesion and cohesion of byssal proteins. We report here on the
identification and sequence analysis of eight novel byssal proteins identified in the soluble
extract and insoluble matrix from induced, freshly secreted byssal threads with minimal cross-
linking, using gel electrophoresis and LC-MS/MS sequencing techniques. Identified byssal
proteins have theoretical molecular weights ranging from 4.1 kDa to 20.1 kDa and isoelectric
points ranging from 4.2 to 9.6 and have several common characteristics including consensus
repeat patterns, block structures and defined sequence motifs.
iii
Acknowledgements
There are several individuals whom I would like to thank for their direction, support and
encouragement during this project. Most importantly, I would like to thank my supervisor, Dr.
Eli Sone for his guidance and mentorship and for creating such an incredible learning experience
over the course of this project. His devotion to his students, his attention to detail, his
unwavering enthusiasm and the flexibility of his supervision are truly admirable.
My sincerest gratitude goes as well to Dr. Lily Ohana for her invaluable insights and guidance
during the project and for all her help with protocol development and manuscript editing. I am
also very grateful to my committee members, Dr. Christopher Yip and Dr. Jonathan Rocheleau
for their valuable feedback and thorough insight on my research.
Additionally, I would like to thank all my colleagues in the Sone Lab: Bryan Quan, Alex Lausch,
Kyle Serkies, Jason Miklas, Mikhael Burke, Erin McNeill, Callie Bazak, Zachariah Grodzinski
and Catherine Tran for their in-depth discussions and for maintaining a fun and motivating
atmosphere in the lab. Special thanks go to Bryan and Alex for their constant willingness to help
and answer questions. Thank you as well to Kyle and Trevor Gilbert for collecting the mussels.
I would also like to acknowledge Zahra Mirzaei for her advice and assistance concerning
numerous protocols, Douglas Baumann for his help with DLS and James Holcroft for his help
with electrophoretic analysis. A big thank you as well to all the labs that allowed access to their
equipment: Dr. Craig Simmons, Dr. Ben Ganss, Dr. Molly Shoichet and Dr. Walid Houry. I
would also like to express thanks to Paul Taylor, Li Zhang and Reynaldo Interior for their advice
on proteomic analysis.
Last but not the least, I would like to thank my family, my parents and my little sister, for always
believing in me, every step of the way. With their tremendous love, patience, support and
encouragement, everything becomes possible.
iv
Table of Contents
Acknowledgements ...................................................................................................................................... iii
List of Tables .............................................................................................................................................. vii
List of Figures ............................................................................................................................................ viii
List of Abbreviations ................................................................................................................................... xi
List of Appendices ...................................................................................................................................... xii
Chapter 1: Introduction ................................................................................................................................ 1
1.1 Background ......................................................................................................................................... 1
1.1.1 Introduction to zebra mussels ...................................................................................................... 1
1.1.2 The zebra mussel byssus .............................................................................................................. 3
1.1.3 Byssal composition in marine mussels ........................................................................................ 6
1.1.4 Byssal composition in zebra mussels ......................................................................................... 10
1.1.5 Protein identification in the zebra mussel byssus ...................................................................... 11
1.1.5 Comparison of zebra mussel adhesion with adhesion in other species ...................................... 14
1.1.6 Comparison of zebra mussel and quagga mussel byssal proteins .............................................. 17
1.2 Motivation ......................................................................................................................................... 18
1.2.1 Mussel inspired bioadhesives ..................................................................................................... 18
1.2.2 Targeted anti-fouling strategies ................................................................................................. 19
1.3 Objectives ......................................................................................................................................... 20
1.4 Overview ........................................................................................................................................... 21
Chapter 2: Adhesive Mechanisms in Freshwater Zebra Mussels: Identification and Sequence Analysis of
Novel Proteins ............................................................................................................................................. 22
2.1 Abstract ............................................................................................................................................. 23
2.2 Introduction ....................................................................................................................................... 24
2.3 Methods............................................................................................................................................. 26
2.3.1 Protein extraction from induced byssal threads ......................................................................... 26
2.3.2 Dialysis, lyophilization and quantification of protein samples .................................................. 27
2.3.3 Amino acid analysis ................................................................................................................... 27
2.3.4 Tricine polyacrylamide gel electrophoresis (Tricine-PAGE) and gel silver-staining ................ 28
2.3.5 Digestion of protein gel bands ................................................................................................... 28
2.3.6 Liquid chromatography – tandem mass spectrometry (LC-MS/MS) ......................................... 29
2.3.7 Sequence data analysis ............................................................................................................... 29
v
2.4 Results ............................................................................................................................................... 30
2.4.1 Optimal conditions for zebra mussel protein extraction and analysis ........................................ 30
2.4.2 Identification of novel foot proteins in the zebra mussel byssus ............................................... 32
2.4.3 Comparisons of LC-MS/MS derived sequences of Dpfp1, Dpfp2 and Dpfp5 .......................... 35
2.4.4 Sequence properties of the EST-derived sequence of the novel Dpfp5 protein ......................... 37
2.4.5 Sequence properties of the EST-derived sequence of Dpfp2 ..................................................... 40
2.5 Discussion ......................................................................................................................................... 44
2.6 Acknowledgments ............................................................................................................................. 51
Chapter 3: Novel Proteins Identified in the Insoluble Byssal Matrix of the Freshwater Zebra Mussel
Dreissena polymorpha ................................................................................................................................ 52
3.1 Abstract ............................................................................................................................................. 53
3.2 Introduction ....................................................................................................................................... 54
3.3 Methods............................................................................................................................................. 57
3.3.1 Protein extraction from induced byssal threads/plaques ............................................................ 57
3.3.2 Protein digestion ........................................................................................................................ 58
3.3.3 Liquid chromatography – tandem mass spectrometry (LC-MS/MS) ......................................... 58
3.3.4 Database matching and protein identification ............................................................................ 59
3.3.5 Sequence data analysis ............................................................................................................... 60
3.4 Results and Discussion ..................................................................................................................... 61
3.4.1 Identification of novel and known proteins in base-insoluble thread and plaque matrices........ 61
3.4.2 Sequence properties of novel byssal proteins identified in the insoluble extracts ..................... 66
3.4.3 Proline and tyrosine (P, Y) rich proteins .................................................................................... 67
3.4.4 Glycine rich proteins .................................................................................................................. 71
3.4.5 Proline and Cysteine (P, C) rich proteins ................................................................................... 78
3.4.6 Analysis of the set of zebra mussel byssal proteins identified in the insoluble matrix .............. 80
3.5 Conclusion ........................................................................................................................................ 85
3.6 Acknowledgements ........................................................................................................................... 85
Chapter 4: Conclusions, Preliminary work and Future Directions ............................................................ 86
4.1 Summary and Conclusions................................................................................................................ 86
4.2 Preliminary Additional Studies ......................................................................................................... 88
4.2.1 Comparing zebra and quagga mussel byssal proteins ................................................................ 88
4.2.2 Peptide mimics: an insight into byssal protein interactions ....................................................... 90
4.3 Future work ....................................................................................................................................... 91
vi
4.3.1 Identification of other novel zebra mussel byssal proteins ........................................................ 92
4.3.2 Determining protein distribution within the byssal plaque ........................................................ 92
4.3.3 Characterizing structure and chemical reactivity of byssal proteins .......................................... 93
4.4 Significance and Conclusions ........................................................................................................... 95
Appendix A: Quagga Mussel Adhesion: Novel Proteins and their Byssal Distribution ............................ 96
Appendix B: Peptide Mimics of the Zebra Mussel Byssal Protein Dpfp1 .............................................. 102
References ................................................................................................................................................. 108
vii
List of Tables
Table 1-1. Summary of the location, function, prominent amino acid content and sequence properties of
known marine mussel byssal proteins. Each protein is summarized from one of three marine mussel
species; Mytilus edulis (Me), Mytilus californianus (Mc) and Mytilius galloprovincialis (Mg). .................. 9
Table 1-2. Zebra mussel foot proteins Dpfp1 - 3 ........................................................................................ 12
Table 1-3. Tandem repeat sequences from marine mussel, freshwater mussel and other aquatic species.. 16
Table 2-1. Summary of molecular weight, DOPA content and sequence information of the three known D.
polymorpha foot proteins (Dpfp) ................................................................................................................ 25
Table 2-2. Comparisons of the amino acid compositions in mole % (number of residues per 100 residues)
in zebra mussel mature and induced thread/plaques and in soluble thread and plaque extracts. ................ 32
Table 2-3. Comparisons of three zebra mussel byssal proteins sequenced by LC-MS/MS. ....................... 36
Table 3-1. Summary of the molecular weight and sequence information available for the six identified
zebra mussel byssal proteins (Dpfp), in decreasing order of their molecular weights as determined by gel
electrophoresis. ........................................................................................................................................... 57
Table 3-2. Sequences of novel byssal proteins identified in insoluble plaque and thread extracts by LC-
MS/MS analysis and database matching against a zebra mussel foot protein cDNA library. The proteins
have been named Dpfp6 – Dpfp12 in decreasing order of their molecular weights (MW). ....................... 64
Table 3-3. Theoretical mole % compositions of prominent amino acids found in the sequences of
previously known zebra mussel byssal proteins and novel byssal proteins indentified in the insoluble
byssal extracts. ............................................................................................................................................ 67
viii
List of Figures
Figure 1-1. Underwater attachment of a zebra mussel to an aquarium wall by means of its proteinaceous
byssus, secreted by the ‘Foot’. ...................................................................................................................... 3
Figure 1-2. Illustration of the structure of the zebra mussel byssus and its adhesive plaque-substrate
interface, adapted with permission [6]. (A) Schematic of the macrostructure of the byssus. (B)
Transmission Electron Microscopy image of the plaque-substrate interface depicting a 10-20 nm
interfacial adhesive layer. ............................................................................................................................. 4
Figure 1-3. Illustration of possible DOPA-mediated interactions in the mussel byssus, adapted from [17].6
Figure 1-4. Spatial distribution of byssal proteins in the marine mussel Mytilus edulis; reproduced with
permission [19]. ............................................................................................................................................ 8
Figure 1-5. Pattern of tandem consensus repeats in the primary sequence of foot-derived Dpfp-1
(AF265353.1). The 22 N-terminal repeats of P(V/E)YP(T/S)(K/Q)X are italicized and the 15 highly
conserved C-terminal repeats of KPGPYDYDGPYDK are bolded. .......................................................... 13
Figure 2-1. Electrophoretic identification of zebra mussel byssal proteins. (A) Byssal proteins identified
in an extract from 65 complete byssal threads (Byssal T/P). (B) Byssal proteins identified in the extracts
from 100 separated threads and 100 separated plaques.. ............................................................................ 34
Figure 2-2. Alignment of the multiple EST sequence matches derived for the Dpfp5 gel band ................ 38
ix
Figure 2-3. Illustration of the consensus sequence repeats identified in the EST derived sequence of
Dpfp5 (AM230139).. .................................................................................................................................. 39
Figure 2-4. Alignment of the multiple EST sequence matches derived for the Dpfp2 (22 kDa) gel band. 41
Figure 2-5. Illustration of the tandem repeat pattern identified in the EST derived sequence of Dpfp2
(AM229730). (A) Sequence depicting five full repeats of a 22 residue consensus sequence
KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where non-italicized residues represent highly conserved
residues. (B) Kyte-Doolittle hydropathy plot of the sequence.. .................................................................. 43
Figure 3-1. Sequence analysis of the EST derived sequence of Dpfp6 (AM229723) (A) Repeat pattern of
the consensus sequence KPGPYDYDGPYDK. (B) Sequence alignment of the Dpfp6 sequence with the
C-terminus (residues 230 – 430) of previously described byssal protein Dpfp1 (AF265353).. ................. 69
Figure 3-2. Illustration of the pattern of sequence repeats in the clustered EST derived sequence of
Dpfp12. ....................................................................................................................................................... 71
Figure 3-3. Sequence alignment of the EST derived sequences of Dpfp7 (A) Alignment of the three
variants of Dpfp7 (Dpfp7α, β and γ) amongst each other. (B) Alignment of the Dpfp7α sequence with the
EST derived sequence of Dpfp5 (AM230139) described previously ......................................................... 74
Figure 3-4. Illustration of repeat patterns in the EST derived sequence of Dpfp9 obtained by clustering the
Dpfp9α and Dpfp9β sequences.. ................................................................................................................. 77
Figure 3-5. Sequence alignment of cysteine containing byssal proteins; Dpfp10, Dpfp11α and previously
described Dpfp5 (AM230139) [73].. .......................................................................................................... 80
x
Figure A-1. Electrophoretic identification of quagga mussel byssal proteins.. .......................................... 97
Figure A-2. Electrophoretic determination of the distribution of quagga mussel byssal proteins between
thread and plaque.. ...................................................................................................................................... 99
Figure A-3. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF
MS) analysis of the quagga mussel thread and plaque from an induced, freshly secreted byssal thread. 100
Figure B-1. Circular Dichroism spectrum of a 2:1 Fe3+
: fusion peptide solution in BisTris buffer. ........ 104
Figure B-2. Dynamic Light Scattering measurements of the effect of iron (III) to fusion peptide ratio on
size of aggregates formed. ........................................................................................................................ 105
Figure B-3. Transmission Electron Microscopy (TEM) images depicting the effect of two Fe3+
: fusion
peptide (FP) ratios (2:1 and 1:2) on the size of aggregates formed.. ........................................................ 106
xi
List of Abbreviations
DOPA: 3, 4 – Dihydroxyphenylalanine
preCOLS: Prepepsinized collagens
Tmp: Thread matrix protein
Mfp: Mussel foot protein (marine)
Dpfp: Dreissena polymorpha foot protein
Dbfp: Dreissena bugensis foot protein
SDS-PAGE: Sodium dodecylsulfate polyacrylamide gel electrophoresis
AU-PAGE: Acetic acid urea polyacrylamide gel electrophoresis
Tricine PAGE: Tricine polyacrylamide gel electrophoresis
MALDI-TOF MS: Matrix-assisted laser desorption ionization time-of-flight mass spectrometry
LC-MS/MS: Liquid chromatography - tandem mass spectrometry
PFF: Peptide fragment fingerprinting
EB: Extraction buffer
MW: Molecular weight
pI: Isoelectric point
xii
List of Appendices
Appendix A: Quagga Mussel Adhesion: Novel Proteins and their Byssal Distribution………96
Appendix B: Peptide Mimics of the Zebra Mussel Byssal Protein Dpfp1……………………..102
1
Chapter 1:
Introduction
1.1 Background
1.1.1 Introduction to zebra mussels
Zebra mussels (Dreissena polymorpha) are small freshwater bivalves that are able to adhere
strongly to a variety of substrates in their underwater habitats. The species is native to the Black,
Caspian and Azov Seas but was able to spread through several parts of Europe by the early
1800’s by means of man-made canals and attachments to hulls of shipping vessels [1]. This
invasive species was accidentally introduced into the Great Lakes in the late 1980s, likely by a
transoceanic shipping vessel, and has rapidly spread through North American water bodies ever
since [2]. Over the years, the zebra mussels have had major negative impacts on the economy
and ecology in the Great Lakes and waterways [2]. Owing to their ability to stick to the shells of
other mussels and form layered clumps, the mussels are able to clog water intake and distribution
pipes in industrial and domestic facilities and are able to completely encrust boat hulls [1]. They
also interfere with boating and navigation, foul beaches, increase corrosion and contaminate
potable water [1]. Their impact on the ecosystem includes the displacement of native bivalve
species and destruction of fish habitats and populations by disrupting the food web [1]. The
invasive zebra mussels are therefore a major source of biofouling in North American water
bodies.
Mature zebra mussels are sessile organisms that attach to surfaces by means of a proteinaceous
structure called the byssus that consists of a number of threads with adhesive pads called plaques
at the tips. Shells of mature mussels are typically around 2.5 cm in length though larger shells
can be up to 4cm long [1]. Striped markings on the shell give these mussels their name. These
markings can however vary and hence the species is called polymorpha which means ‘many
forms’ [1]. Zebra mussels are dioecious species, having two sexes, which release gametes into
the water column for fertilization to occur. The fertilized eggs form larvae called velligers within
2
3 – 5 days [3]. Velligers have a structure called the vellum that enables rapid mobility and
dispersion to new areas and they reach adulthood and sexual maturity within a year [3]. Zebra
mussels are filter feeders that primarily feed by filtering algae, zooplankton and other organic
matter through their inhalant and exhalant siphons [1] (Figure 1-1).
On the ventral side of the shell, zebra mussels have an opening called the pedal gape that allows
for extension of the holdfast called the byssus and a muscular organ called the ‘foot’ (Figure 1-
1). The ‘foot’, for one, helps in mussel locomotion but more importantly, it houses the gland that
produces, stores and secretes the byssal precursor proteins [1]. During byssogenesis, the byssal
precursors are secreted into a cleft at the base of the foot called the ventral groove. The byssal
threads are then formed from the precursors by contractions of the foot and are deposited onto
the substrate for attachment, through the pedal gape [4]. Figure 1-1 shows the ventral side of a
zebra mussel sticking to the wall of an aquarium. The image displays the ‘foot’ extending from
the pedal gape and even a mature byssus mediating underwater attachment. When the mussel
chooses to move or get rid of a damaged byssus, it secretes enzymes that break down byssal
material at the proximal end near the foot and then discards the byssus [1].
The closely related freshwater mussel, Dreissena bugensis or quagga mussel is also a source of
biofouling in the North American Great Lakes and waterways and also utilizes a byssus to
mediate adhesion [5]. It can be distinguished from the zebra mussels based on several physical
characteristics including size, shell shape, ventral angle and position of the pedal gape [1]. Most
prominently, the bottom surface of the quagga mussel shell is convex whereas in zebra mussels,
it is concave or flat. Therefore, when placed on a flat surface, the zebra mussel stays upright but
the quagga mussel topples or tilts to a side [1].
3
Figure 1-1. Underwater attachment of a zebra mussel to an aquarium wall by means of its
proteinaceous byssus, secreted by the ‘foot’.
1.1.2 The zebra mussel byssus
The zebra mussel byssus is the proteinaceous apparatus employed by the mussel to anchor to a
variety of substrates underwater [1]. A schematic of the byssus is illustrated in Figure 1-2A,
depicting a number of threads with adhesive plaques at the tips. The byssus (~ 10 mm long from
stem to plaque) [6] consists of up to 600 threads that bundle together at the proximal end into a
stem that arises from an attachment point in the foot, called the root [1]. The attachment of zebra
mussel threads to the stem as a bundle is different than the attachment in marine mussels where
the threads attach along the sides of the stem [6], [7]. Each zebra mussel byssal thread (~ 20 – 50
µm wide) has an interior composed of longitudinal fibers, an exterior cuticle and an adhesive
4
plaque (~ 1 mm wide) at the distal tip that has a fibrous morphology and can be either dense or
porous [6]. The ultrastructure of the plaque-substrate interface reveals an approximately 10-20
nm adhesive layer that makes direct and continuous contact with the surface and is left behind on
the substrate when the bulk plaque matrix is pulled off [6]. This layer also shows greater electron
density during transmission electron microscopy as compared to the rest of the plaque [6].
A. B.
Figure 1-2. Illustration of the structure of the zebra mussel byssus and its adhesive plaque-
substrate interface, adapted with permission [6]. (A) Schematic of the macrostructure of the
byssus. (B) Transmission Electron Microscopy image of the plaque-substrate interface depicting
a 10-20 nm interfacial adhesive layer.
While the zebra mussel and closely related quagga mussel are two of the few freshwater mussel
species known to possess a byssus, marine mussels, such as Mytilus edulis, M. californianus, M.
galloprivincialis, Perna canaliculus and Brachiodontes exustus are more commonly known to
possess byssi. Interestingly, the freshwater and marine mussels have evolved independently of
Plaque
Substrate
Substrate
Adhesive
Layer
5
each other, as different subclasses, [3], [8] to develop a byssus that is superficially very similar,
however the overall composition and distribution of amino acids within the byssus varies
between the species [9], [10]. One of the most notable similarities between the dreissenid and
mytilid species is that both their byssi are composed of proteins containing the rare amino acid 3,
4 – dihydroxyphenylalanine (DOPA). The fact that the species have evolved independently to
incorporate this same residue indicates that DOPA must play an important role in mussel
adhesion [10]. DOPA is a post-translational modification of tyrosine formed by catechol oxidase
mediated hydroxylation of tyrosine [9]. In the marine mussel byssus, DOPA is known to undergo
a number of different kinds of interactions and is responsible for both adhesive and cohesive
functions in the byssus (Figure 1-3). In its native form DOPA can bind to metal oxide surfaces
and thereby mediate mussel adhesion to surfaces [11]. Additionally, as demonstrated in marine
mussel proteins [12], DOPA can form bis- or tris- complexes with iron (III) ions and thus
mediate metal mediated cohesive cross-links among DOPA containing proteins (Figure 1-3).
Such interactions lead to the hardness and cohesive strength of the cuticle [13] and can also
explain a method of adhesion to iron containing surfaces [14]. Thirdly, DOPA frequently gets
oxidized to DOPA quinone by the enzyme catechol oxidase and by means of basic pH conditions
(Figure 1-3). DOPA quinone can then undergo cross-linking with other DOPA and lysine
residues [15] and can form cysteinyldopa adducts with cysteine residues [16] thereby leading to
extensive covalent cross-linking and cohesion among byssal proteins [9].
6
Figure 1-3. Illustration of possible DOPA-mediated interactions in the mussel byssus, adapted
from [17].
1.1.3 Byssal composition in marine mussels
The marine mussels have been studied much more extensively than zebra mussels and hence, a
majority of the information available on byssal protein compositions has been derived from these
saltwater species, especially from the Mytilus family [18]. In the marine mussel byssus, the
thread has a very distinct protein composition from the plaque. While the majority of the DOPA-
containing proteins are present in the plaque, the thread is majorly composed of collagen-like
load bearing proteins called preCOLS (prepepsinized collagens). The DOPA containing proteins
in the byssus are often heavily cross-linked and render the mature structure quite resistant to
extraction. Hence, these are frequently first identified as precursor proteins in the secretory
organ, the foot, and are then subsequently studied in the byssus. Since a majority of the proteins
7
are isolated in the foot, these are named foot proteins (‘fp’), preceeded by the first letters of the
genus and species from which they were isolated and succeeded by a numerical identifier. For
example, Mefp-1 and Mcfp-1 are DOPA-containing proteins in Mytilus edulis and Mytilus
californianus, respectively. While the same numerical identifier is generally indicative of similar
protein sequences, byssal functions and byssal distributions within the mytilid species, the
identifiers do not correlate directly to freshwater mussel proteins.
The distribution of proteins within the mussel byssus is often very closely related to their byssal
roles. Figure 1-4 illustrates the distribution of DOPA-containing and collagenous proteins within
the byssus of M. edulis [19]. As mentioned earlier, most of the DOPA-containing proteins are
present in the plaque where they can play different roles; as adhesive between plaque and
substrate, as structural matrix cohesion proteins, as linker proteins between thread and plaque
and as protective proteins in the cuticle [18]. Thus, within the plaque itself, the foot proteins can
be distributed in the footprint (mfp-3, 5, 6), within the plaque foam (mfp-2, 4) or in the plaque
cuticle (mfp-1) [18] (Figure 1-4). Table 1-1 summarizes the byssal distribution and functions of
the six marine mussel foot proteins (mfp’s) identified thus far. While homologs of each protein
have been identified in different mussel species, Table 1-1 summarizes just one homolog
belonging to the M. edulis (Me), M. californianus (Mc) or M. galloprovincialis (Mg) species. In
order to better correlate the byssal protein distributions and functions with the sequence
properties of the protein, Table 1-1 also describes the molecular weight (MW), isoelectric point
(pI), DOPA content and prominent amino acid compositions within each of the foot proteins.
Thus, the adhesive proteins, mfp-3 and mfp-5 have the highest DOPA content (> 20%), followed
by the cuticle protein mfp-1 which has approximately 13% DOPA and then followed by the
structural bulk matrix proteins which have < 5% DOPA [20]. Additionally, the plaque footprint
proteins are generally smaller than the rest of the byssal proteins (Table 1-1) [20].
The proteins of the thread, the preCOLS and the thread matrix proteins (TMPs), are also
described in Table 1-1. PreCOLS are composed of a bent collagen core with variable flanking
sequences and histidine rich sequences at the termini [21]. At the distal end near the plaque,
preCOL-D with silk fibroin-like flanking sequences provides the mechanical stiffness needed at
the substrate [22]. At the proximal end near the mussel, preCOL-P with elastin-like flanking
8
sequences provides the mechanical flexibility needed near the foot tissue [23]. This gradient
distribution of preCOLS thus allows optimal substrate adhesion without damaging the foot tissue
[9]. A non-gradient preCOL called preCOL-NG, believed to mediate the preCOL-P/D fibers, has
glycine rich flanking sequences and is distributed uniformly through the thread [24].
Additionally, thread matrix proteins (TMPs) have been identified in the thread that provide a
viscoelastic matrix to separate and possibly lubricate collagenous microfibrils during tension-
induced deformation [25].
Figure 1-4. Spatial distribution of byssal proteins in the marine mussel Mytilus edulis;
reproduced with permission [19].
9
Table 1-1. Summary of the location, function, prominent amino acid content and sequence properties
of known marine mussel byssal proteins. Each protein is summarized from one of three marine
mussel species; Mytilus edulis (Me), Mytilus californianus (Mc) and Mytilius galloprovincialis (Mg).
(Phospho) and (hydroxy) indicate phosphoserine and hydroxyarginine modifications, respectively.
Marine
Mussel
Protein
Byssal
Distribution
Byssal
Function
Prominent Amino Acids (mol%) MW
(kDa), pI
Species
[Ref.] 1
st 2
nd 3
rd DOPA
[26]
Mfp-1
Cuticle on
thread and
plaque
Protects
byssal core
P (24) K (20) Y (19) ~13 ~110 kDa
pI >10
Me [27]
Mfp-2
Plaque foam Cohesion,
load
spreading
C (15) G (14) K (12) ~3 ~40 kDa
pI ~9
Me [28]
Mfp-3
Plaque
footprint
Adhesion G (20-25) Y (20-23) R (16- 21)
(hydroxy)
~20 ~6 kDa
pI >11
Me [29]
Mfp-4
Plaque foam
at thread
anchor zone
Links
plaque
proteins to
thread
collagens
H (24) V (13) N/D (11) ~5 ~ 80 kDa
pI 10.2
Mc [30]
Mfp-5
Plaque
footprint
Adhesion G (21) K (20) S (11)
(phospho)
~30 ~9.5 kDa
pI ~9
Me [31]
Mfp-6
Plaque
footprint
Restores
DOPA
adhesion,
mediates
crosslinks
[16]
Y (20) G (14)
N/D (14)
C (11) ~4 ~11 kDa,
pI 9.5
Mc [32]
Tmp-1 Thread
matrix
Separates
collagen
fibrils
G (33) Y (18) N/D (17) 3 57 kDa,
pI 9.5
Mg [25]
PreCOL-
P
Proximal
thread
Load-
bearing,
elastic
G (39) P (14) A (9) < 1 76 kDa,
pI 11.6
Me [23]
PreCOL-
D
Distal thread Load-
bearing,
stiff
G (36) A (18) P (13) < 1 78 kDa,
pI 10.1
Me [22]
PreCOL-
NG
Throughout
thread
Load
bearing
G (39)
A (15) P (11) < 1 76 kDa,
pI 8.0
Me [24]
10
1.1.4 Byssal composition in zebra mussels
There are several major differences in the byssal composition of the marine mussel and zebra
mussel. A major distinction is that the marine mussel has distinct amino acid compositions
between the thread and plaque whereas the zebra mussel byssus contains similar amino acid
compositions between the byssal regions (maximum difference is 2.4 mol%) [10]. The zebra
mussel thread and plaque also have similar DOPA compositions (~0.6 mol%) and both lack
hydroxyproline which is a significant characteristic of collagenous proteins [10]. Thus, while the
marine mussel plaque and thread proteins can be classified as being DOPA-containing and
collagen-like respectively, the zebra mussel byssus contains DOPA-containing proteins all the
way through, though this total DOPA content is lower than in marine mussels [10]. Another
distinction from the marine mussel byssus is the glycosylation of serine and predominantly
threonine, specifically by O-linked galactosamines. [33]. Such glycosylation is not seen in the
marine mussels [20].
While zebra mussels do not display the spatial differences in overall byssal protein compositions
seen in marine mussels, they do maintain spatial control over DOPA oxidation and individual
protein distribution. Upon chemical maturation (within the first 24 hours of thread deposition),
loss of DOPA staining is clearly observed within the thread and plaque bulk matrix but not at the
plaque-substrate interface [10]. Thus, even as DOPA in the bulk matrix get oxidized and/or
undergo other non-covalent interactions over time, unoxidized DOPA residues are maintained at
the interface layer responsible for adhesion [10], [17]. One way that the mussel maintains this
spatial control is by controlling the levels of catechol oxidase between different regions of the
byssus [17]. The presence of higher levels of catechol oxidase in the thread and plaque interior
versus at the thread-plaque interface allows for greater cohesive cross-linking in the bulk matrix
versus greater adhesion by uncrosslinked DOPA at the interface [17] (Figure 1-3). Further,
MALDI-TOF mass spectrometry of the thread, plaque and plaque adhesive interface has
revealed spatial differences in the distribution of individual protein peaks between the byssal
regions [34]. A higher electron density of the 10 – 20 nm adhesive layer in comparison with the
bulk plaque matrix additionally suggests a distinct composition of this layer [6].
11
1.1.5 Protein identification in the zebra mussel byssus
While much information is available on the protein composition of the marine mussel byssus,
such information is limited in the zebra mussel. Extensive DOPA cross-linking renders the
mature zebra mussel byssus highly resistant to biochemical characterization by techniques such
as solubilisation and immunohistochemical labelling [10], [33], [35]. While in marine mussels,
extractions from the foot and the byssus can be done with acidic buffers that are preferred for
easily oxidized DOPA proteins, the zebra mussel DOPA containing proteins are only extractible
in alkaline borate/urea buffers [33]. Using this basic extraction of proteins from the mussels foot,
Rzepecki and Waite, 1993 identified three protein bands that stained for DOPA on sodium
dodecylsulfate and acetic acid urea polyacrylamide gels (SDS-PAGE and AU-PAGE) [33].
These proteins were assumed to be byssal proteins based on their ability to stain for DOPA and
were called Dreissena polymorpha foot proteins; Dpfp1, Dpfp2 and Dpfp3 [33]. While the
extraction of proteins from the mussel foot allowed the useful identification of precursor byssal
proteins, it does not necessarily confirm that the identified proteins are present in the byssus and
does not provide any information on the distribution of these proteins between different regions
of the byssus [33]. In the same study, the presence of Dpfp1 and Dpfp2 in the byssus was thus
confirmed by acidic extraction from mature threads and electrophoresis on an AU-PAGE gel
[33]. Additionally, the analysis of separated thread and plaque extracts on this gel revealed
potential Dpfp1 bands in both the thread and plaque and a potential Dpfp2 band uniquely in the
thread, however broad smearing of the proteins did not allow any firm conclusions on their
localizations [33].
Of the three zebra mussel byssal proteins identified thus far, Dpfp1 has been best characterized.
Table 1-2 summarizes the molecular weight, pI, DOPA content and sequence information
available for Dpfp1, 2 and 3. Dpfp1, with an unusual acidic pI that makes it distinct from marine
mussel byssal proteins (Table 1-1), appears to have two protein forms which run at 76 and 65
kDa on a PAGE gel, possibly representing sequence variants or variants with differing
hydroxylation and glycosylation modifications [33]. The presence of two variants has also been
confirmed by MALDI-TOF mass spectrometry which provided more accurate MWs at 54.5 and
48.6 kDa respectively [35]. Dpfp1 and Dpfp2 (26 kDa, basic pI) have somewhat similar
maximum DOPA contents of 6.6 and 7 mol% respectively and are more easily resolved and
12
isolated than Dpfp3, which runs as a cluster of small proteins in the range of 12 – 13 kDa [33].
Dpfp1 is the only zebra mussel byssal protein for which the full primary sequence is known [35].
Due to the protease resistance of the protein, the primary structure of Dpfp1 was determined by
reverse transcription of mRNA in the mussels foot into cDNA followed by deduction of the
primary sequence from overlapping cDNA sequences [35]. With Dpfp2 on the other hand, only
fragments of the protein sequence have been determined using Edman degradation analysis [33].
This analysis classified the Dpfp2 peptide fragments somewhat arbitrarily into repeats of two
motifs; K(K/T)Y(X/P)E and *Y(P/X)*(Y/K)*D where * represents a variable amino acid, Y
represents DOPA and X represents a glycosylation of serine or threonine [33].
Table 1-2. Zebra mussel foot proteins Dpfp1 - 3
Dreissena
polymorpha
foot
protein
MW by Gel
Electrophoresis
(kDa) [33]
MW by
MALDI-TOF
(kDa) [35]
Maximum
DOPA
Content
[33]
pI [33]
Sequence
Information
Available
Dpfp-1 76 and 65 54.5 and 48.6 6.6% 5.3 - 6.5 Primary sequence
[35]
Dpfp-2 26 Unknown 7% > 9 Peptide fragment
sequences [33]
Dpfp-3 12-13 Unknown Unknown Unknown None
As depicted in Figure 1-5, the deduced sequence of Dpfp1 is dominated by tandem repeats of
two consensus sequences; 22 repeats of a somewhat variable heptapeptide
P(V/E)YP(T/S)(K/Q)X at the N-terminus and 15 highly conserved repeats of a tridecapeptide
KPGPYDYDGPYDK at the C-terminus [35]. While galactosamine modifications to serine and
threonine residues are more extensive in the N-terminus, tyrosine hydroxylations to DOPA are
more frequently observed in the C-terminus [35]. Within the C-terminal repeat, the DOPA (Y)
modification is observed specifically on the first tyrosine in the KPGPYDYDGPYDK repeat and
makes up 40% of the protein’s tyrosine content [33], [35]. Additionally, the N-terminus of Dpfp1
is moderately basic (pI 8.7) and its C-terminus is quite acidic (pI 4.7). Thus, Dpfp1 has a block
polymer like structure consisting of distinct post-translational modifications, repeat patterns and
13
isoelectric points between the N and C-terminus, suggesting that the segregated motifs might
play a specific role in the architecture and mechanism of assembly of the protein [35]. The
deduced Dpfp1 sequence has additionally been used to develop a recombinant version of the
protein to produce a Dpfp1 antibody in rabbits, for use in immunolocalization analysis [4].
Though Dpfp1could not be localized within the mature byssal thread (likely due to masking of
epitopes), it was identified in acid-extracted and homogenized byssal threads and in the foot
tissue specifically (not in other control tissues) thus indicating that it is a precursor protein
produced, stored and secreted in the mussels foot [4]. Additionally, in the foot, Dpfp1 was found
uniformly localized in the byssal canal in secretory granules surrounding the ventral groove and
therefore might be evenly distributed thought the byssus, thus suggesting that it might possess a
load-bearing structural role within the byssus [4].
MFSVVSFCLLAAGFGSSLGGSSDWTEKTSQSTIPTISGWSFFTTKSPLNPTLFTTKR
PEYVTLS PVYPTKI PNYTTKP PVYPTKV PEYPTKD PTYPTFKT PEYPTKV PEYPTKV
PTYPTFQT PEYPTPTKY PVYPSQS PAYPTQY PEYPSQY PVYPDQY PVYPNQY PVKQDHD
PVYPPRS PLYGWRR PVYPKKT PVYPYL PLYPGYQ PEYHRRP PVYP PVYPY DPVEDK
KPGPYDYDGPYDK NPGPYDYDGPYNK KPNPYGTDWQYDK KTGPYVPIKPDDK
KPNPYGTDWQYDK KTGPYVPDKSEDK KPGPYDYDGPYDK NPGPYDSDGPYNK
KPGPYDYDGPYDK NPGPYDYNGPYDK KPGPYDYDGPYDI KPGPYDYDVPYDK
KPDPYDTDGPYDK KTGPYVPDKPDDK KTDPYVPDVPLEP PGPLGK
Figure 1-5. Pattern of tandem consensus repeats in the primary sequence of foot-derived Dpfp-1
(AF265353.1). Consensus repeat sequences are underlined. The 22 N-terminal repeats of
P(V/E)YP(T/S)(K/Q)X are italicized and the 15 highly conserved C-terminal repeats of
KPGPYDYDGPYDK are bolded [35]. The N-terminal signal peptide is bolded and italicized.
14
Complementing the identification of comparatively high molecular weight byssal proteins by gel
electrophoresis, MALDI-TOF MS analysis of the zebra mussel byssus has revealed the presence
of several low molecular weight byssal proteins ranging from 3.7 to 7 kDa [34]. While this
analysis does not characterize individual proteins, it identifies distinctive differences in the
distribution of byssal proteins between thread, plaque and plaque footprint [34]. Interestingly, in
spite of similar amino acid compositions between the plaque and thread, the thread and plaque
bulk have mass spectra that are almost completely non-overlapping except for peaks at 4.5 and
4.6 kDa [34]. Also, as supported by differences in electron density between the plaque bulk and
plaque-substrate interface [6], the plaque footprint has proteins in the 5.8 to 7 kDa range that are
absent in the plaque bulk. Interestingly however, some of these interface protein spectra also
overlap with the thread bulk spectra [34]. The presence of these proteins in the plaque footprint
and thread bulk but not in the plaque bulk thus indicates a high level of spatial control in the
byssus [34]. Significantly as well, there are a number of protein spectra that are identified
uniquely in the plaque footprint including a peak at 5892 Da and another at 6399 Da that has a
hydroxylated counterpart which likely contains a DOPA modification [34]. These could very
well represent byssal proteins with adhesive functions as witnessed with the low molecular
weight proteins in marine mussels (Table 1-1) [20], [34]. A comparison of overall protein
secondary structure between thread, plaque and plaque interface additionally revealed the
predominance of β-sheets between all three regions [14].
1.1.5 Comparison of zebra mussel adhesion with adhesion in other species
In addition to the marine and freshwater mussels, underwater adhesion is a characteristic of
several aquatic species including sandcastle worms [36], barnacles [37], starfish [38], sea
cucumbers [39] and caddisfly larvae [40]. Comparisons of protein compositions between species
can thus be useful in determining common prerequisites of water-resistant adhesion. The
sandcastle worm, Phragmatopoma californica, which builds its habitat by sticking sand grains
together underwater, has also evolved independently of the mussels (as a different phylum) to
incorporate DOPA residues in its cement proteins [41]. In the sandcastle worms however,
adhesion/cohesion from DOPA-containing proteins is complemented by adhesion from proteins
15
significantly rich in phosphorylated serine [41]. Interestingly as well, unlike marine mussels
which generally possess basic adhesive/cohesive proteins, the sandcastle worm has a mix of
acidic and basic cement proteins [42] which makes it comparable to the zebra mussel byssal
mixture (albeit currently incomplete) of acidic and basic proteins.
One of the most significant similarities between the zebra mussel byssal protein Dpfp1 and
adhesive proteins from other species is the presence of prominent tandem repeats in the protein
primary sequences. Table 1-3 displays a list of consensus sequences that are characteristic of
different byssal proteins from different marine mussel species [20] and from some less-studied
freshwater mussel species including Limnoperna fortunei [43] and the quagga mussel Dreissena
bugensis [5]. Numbers in brackets beside the sequence indicate the number of consensus repeats
in the protein. Table 1-3 also describes consensus sequences belonging to adhesives from the
sandcastle worm and the liver fluke Fasciola hepatica [44]. Adhesive proteins from the vitellaria
of the liver fluke are responsible for egg shell hardening and also contain significant amount of
DOPA in their sequences [44]. Tandem repeats are also seen in spider silk proteins, such as those
based on fibroin sequences, to impart mechanical strength to the spider web, thus further
specifying the importance of such repeats in load bearing functions [45].
16
Table 1-3. Tandem repeat sequences from marine mussel, freshwater mussel and other aquatic
species. X represents variable residues, Y represents DOPA and P represents (di)hydroxyproline
modifications.
Species Protein Consensus sequence (# of repeats) Reference
Marine mussels
Mytilus edulis Mefp-1 AKPSYPPTYK (80) [27], [20]
Mefp-2 Epidermal Growth Factor (EGF) motifs (11) [28], [20]
Mefp-3 (R/N)RY (4) [29], [20]
Mefp-5 YK (8) [31], [20]
PreCOL-P Collagen flanked with elastic domains [21]
PreCOL-D Collagen flanked with silk fibroin like domains [21]
PreCOL-NG Collagen flanked with glycine rich cell wall
protein like domains
[21]
Mytilus californianus Mcfp-1 PKISYPPTYK [46]
Mcfp-4 HVHTHRVLHK (36); DDHVNDIAQTA (16) [30]
Mcfp-6 No major repeats [32]
Mytilus
galloprovincialis
Tmp-1 GYG [25]
Perna canaliculus Pcfp-1 PYVK [47]
Aulocomya ater AGYGGXK [48]
Brachiodontes exustus GKPSPYDPGYK [49]
Freshwater mussels
Dreissena bugensis Dbfp1 DKYPGGGN [5]
Dreissena polymorpha Dpfp1 PVYPTKX (22), KPGPYDYDGPYDK (15) [35]
Dpfp2 K(K/T)Y(X/P)E, XY(P/X)X(Y/K)XD [33]
Limnoperna fortunei KPTQYSDEYK [43]
Other aquatic species
Phragmatopoma
californica
(Sandcastle worm)
Pc-1 VGGYGYGKK [42]
Pc-2 HPAVXHKALGGYG [42]
Pc-3 [S]nY where S is often phosphorylated [42]
Fasciola hepatica
(Liver fluke)
GGGYGGYGK [44]
17
1.1.6 Comparison of zebra mussel and quagga mussel byssal proteins
The freshwater quagga mussel byssus has also previously been characterized for protein
composition. Similar to zebra mussels, extraction of precursor proteins from the mussel foot and
staining them for DOPA led to the identification of four quagga mussel byssal proteins
(Dreissena bugensis foot proteins or Dbfps) called Dbfp0, 1, 2 and 3 [33]. Three of these, Dbfp1
(80 and 69 kDa), Dbfp2 (22 kDa) and Dbfp3 (12-13 kDa) have molecular weights similar to the
three identified Dpfp proteins (Table 1-2) and could therefore represent homologs of the zebra
mussel byssal proteins [33]. Mass spectrometry analysis reveals a single Dbfp1 peak at 68 kDa
as compared to Dpfp1 peaks at 48.6 and 54.5 kDa. The Dbfp0 protein with a molecular weight
greater than 200 kDa was however not identified as a homolog in the zebra mussel extract.
Additionally, biochemical characterization of Dbfp1 revealed that, like Dpfp1, it is also a
tandemly repetitive, acidic DOPA-containing protein containing N-acetylgalactosamine
glycosylations O-linked to threonine residues but, it has a much lower DOPA content of 0.55 ±
0.35 mol% compared to the 6.6% DOPA seen in Dpfp1 [5]. N-terminal sequencing of pepsin
degraded peptide digests of Dbfp1 revealed a primary sequence partly composed of unique
octapeptide consensus repeats of DKTPGGGN [5] that differ from the repeats seen in Dpfp1
[33], [35] (Table 1-3). This is different than in marine mussels which generally have much
sequence homology between byssal proteins from congeneric species [5]. Additionally, while
both Dpfp1 and Dbfp1 have high contents of Asx, Pro, Gly, Tyr and Lys, Dbfp1 has
approximately 1.8 times less proline, about 3.5 times more glycine and about 1.7 times more
tyrosine than Dpfp1 thus indicating prominent differences between the proteins [5], [33]. Similar
to zebra mussels, overall amino acid analysis of quagga mussel thread and plaques reveals
similar DOPA contents of 0.10 and 0.12 mol% respectively [5]. These values however are less
than the DOPA content of approximately 0.6 mol% seen in zebra mussel thread and plaques,
thus indicating potentially comparable differences in the DOPA dependence of their proteins [5].
Overall, comparisons of zebra mussel adhesion with adhesion in other freshwater mussels can
give additional useful insights into sequence features that are important to its adhesion/cohesion.
18
1.2 Motivation
There are two long-term motivations for studying zebra mussel byssal adhesion. Firstly, since the
mussels are able to adhere strongly to a variety of substrates underwater, they are an inspiration
for the development of biological adhesives that are water-resistant, biocompatible and have high
adhesive strength. A second motivation that is more specific to zebra mussels is the development
of anti-fouling strategies targeted specifically at this biofouling species.
1.2.1 Mussel inspired bioadhesives
Firstly, mussel adhesive proteins are an inspiration for the development of biological adhesives
for medical and dental applications such as cell and tissue adhesives and as surgical glues for
sealing soft and hard tissues [26]. Mussel adhesive proteins can adhere to a variety of organic
and inorganic substrates underwater including glass, plastic, metal, teflon [26] and even to living
materials such as mammalian cells and porcine skin [50]. In its native oxidation state, a single
DOPA molecule can mediate very strong, reversible and non-covalent bonding to titanium oxide
surfaces at a bond strength of several hundred piconewtons [11]. Additionally, in the mussel
byssus, mussel adhesive proteins are able to maintain adhesion/cohesion forces across surfaces
with different elastic moduli such as between the soft plaque and hard substrate [20]; a property
that would be very useful for sticking together hard and soft tissues such as gum and tooth [26].
Importantly, the mussel adhesive proteins are unlikely to invoke an immune response [50].
Taking all these factors into account, mussel inspired bioadhesives are promising as strong,
water-resistant and biocompatible adhesives with flexible/elastic adhesion [26]. Other adhesives
currently used in medicine include fibrins that are biocompatible, show rapid curing and have
high adhesion strength to tissues but have low cohesion strength, thereby limiting their
applications [51]. Cyanoacrylates provide strong adhesion and fast set times but they also have
high stiffness and release toxic byproducts including cyanoacetate and formaldehyde that can
cause acute inflammation in the tissue [52]. Other synthetic adhesives are limited in that they are
generally poor at water displacement [26].
19
In the past, most of the research on mussel adhesive proteins has focused on marine mussels
[26]. In fact, commercial applications of marine mussel glues are already in use. Kollodis Inc.
sells recombinant mussel foot proteins whereas Cell-Tak prepares acetic acid suspensions of
mfp-1, 2 and 3 for promotion of in vitro cell attachment [53]. Since the freshwater zebra mussels
have evolved independently of the marine mussels, occupy different habitats, have different
byssal compositions and are able to maintain adhesion/cohesion with a much lower DOPA
content than marine mussels, they can give further insights into conserved protein properties
required for adhesion and can provide an alternative adhesive mechanism that can be mimicked
in the development of bioadhesives.
1.2.2 Targeted anti-fouling strategies
As discussed previously, zebra mussels are a major invasive, biofouling species in North
American water bodies that have had major economic impacts on water-based industries and that
have also negatively impacted local ecosystems in the lakes and waterways [1]. Their large
populations, the absence of predators, lack of competition, the high mobility of their larvae and
their ability to adhere strongly to a variety of substrates makes the problem even more acute
causing economic impacts that have been far in excess of $100 million [2].Therefore, a second
long-term rationale for studying zebra mussel adhesion is to develop anti-fouling strategies that
are affordable, non-toxic to the ecosystem and are targeted specifically at the zebra mussels.
Strategies used may include proactive strategies targeted at the veliger stage to prevent
settlement and reactive strategies that would target attached mussels [54]. Zebra mussels have
high tolerance to current chemical control strategies which include oxidizing chemicals such as
chlorine, chlorine dioxide, ozone, bromine and potassium permanaganate and non-oxidizing
chemicals such as potassium chloride most of which are often toxic to other species in the local
ecosystem as well [54]. Thus, it is important to learn more about the molecular basis of zebra
mussel adhesion such that anti-fouling agents can be developed that specifically target this
adhesive mechanism. The insights from such studies will help develop strategies that include
designing surface coatings to minimize byssal attachment, preventing byssal secretion from the
mussels foot itself or perhaps disrupting cohesive interactions within the byssus [54].
20
1.3 Objectives
The overall goal of this research is to gain a better understanding of the molecular basis of
adhesion in zebra mussels so that in the future, this knowledge can be implemented in the design
of alternate mussel inspired bioadhesives and targeted anti-fouling strategies. In order to
understand mussel adhesion however, there is a need to first characterize the proteins that
constitute the byssus and identify the proteins that are responsible for its cohesive strength within
the thread and plaque and adhesive strength at the thread-plaque interface. However, owing to
extensive DOPA cross-linking within its mature structure, the zebra mussel byssus has
stubbornly evaded characterization thus far. There are thus major gaps in our understanding of
the protein composition of the zebra mussel byssus that must first be addressed before we can
proceed towards realizing our long-term aims.
The only zebra mussel byssal proteins known thus far have been identified as DOPA-staining
precursors in the mussels foot, however this reveals no information on byssal distribution and
completely overlooks any DOPA-poor or DOPA-lacking proteins in the byssus. Additionally, the
lower DOPA content of the zebra mussel byssus as compared to the marine mussel byssus
indicates that in addition to DOPA-based interactions, other DOPA independent protein
interactions must also play an important role in zebra mussel byssal adhesion/cohesion and may
contribute to the mussels ability to adhere to varied substrates, both hydrophobic and
hydrophilic.
Hence, the primary objective of this work is to identify and sequence novel proteins in the zebra
mussel byssus (both DOPA-rich and DOPA-deficient) and to determine their distribution
between different regions of the byssus. This information will allow us to identify protein
features and sequence motifs that are characteristic of zebra mussel byssal proteins and will set
the stage for characterization of the adhesive/cohesive mechanisms of byssal proteins in the
future.
21
In order to achieve this objective, we perform our analysis on induced, freshly secreted byssal
threads that have minimal cross-linking and are more amenable to characterization. We can then
address our primary objective as three sub-objectives:
1. To identify, sequence and determine the spatial distribution of novel proteins in the soluble
byssal extract by performing gel electrophoresis of the thread and plaque extract followed by
tandem mass spectrometry sequencing analysis of digested protein gel bands.
2. To sequence, identify and determine the spatial distribution of novel proteins in the insoluble
byssal matrix by performing tandem mass spectrometry sequencing analysis on the digested
thread and plaque matrix.
3. To compare the primary sequence structures of zebra mussel byssal proteins in order to
identify protein characteristics and sequence motifs that are characteristic of
adhesive/cohesive proteins in the species.
1.4 Overview
This thesis consists of four chapters. Chapter 1 introduces the overall background, motivation
and objectives of the research. Chapter 2 and Chapter 3 represent manuscripts of scientific
papers that are not yet submitted. Objective 1, as described in section 1.3, is addressed in
Chapter 2: ‘Adhesive Mechanisms in Freshwater Zebra Mussels: Identification and Sequence
Analysis of Novel Proteins’. Objective 2 is addressed in Chapter 3: ‘Novel Proteins identified in
the Insoluble Byssal Matrix of the Freshwater Zebra Mussel Dreissena polymorpha’. Objective 3
is addressed through both Chapters 2 and 3. Chapter 4 summarizes the results and discussion
from the previous chapters and also relates these to preliminary studied described in Appendix A
and Appendix B. Chapter 4 additionally addresses future work that must be undertaken to
further extend our understanding of zebra mussel adhesion.
22
Chapter 2:
Adhesive Mechanisms in Freshwater Zebra
Mussels: Identification and Sequence Analysis of
Novel Proteins
Arpita Gantayeta, Lily Ohana
a and Eli D. Sone
a,b,c *
a Institute of Biomaterials & Biomedical Engineering; University of Toronto, Toronto, ON,
Canada
b Department of Materials Science & Engineering, University of Toronto, Toronto, ON, Canada
c Faculty of Dentistry; University of Toronto, Toronto, ON, Canada
*Corresponding author. Email: [email protected]
This chapter is in preparation as a manuscript to be submitted to the journal ‘Biofouling’
I, Arpita Gantayet, performed the experiments and analysis and wrote the paper. Lily Ohana
assisted with protocol development for protein extraction, quantification and gel electrophoresis
and edited the paper.
23
2.1 Abstract
The biofouling freshwater zebra mussels (Dreissena polymorpha) adhere to a variety of
substrates underwater by means of a proteinaceous structure called the byssus, which consists of
a number of threads with adhesive plaques at the tips, and are therefore an inspiration for
developing medical bioadhesives. The byssal proteins however remain largely uncharacterized
due to extensive 3,4-dihydroxyphenylalanine (DOPA) cross-linking which renders the mature
structure largely resistant to extraction and immunolocalization. The functions of these proteins
thus remain a mystery. We report here on the byssal distribution and sequence properties of
novel and previously known byssal proteins. We identified three novel zebra mussel byssal
proteins by performing gel electrophoresis of proteins extracted from freshly secreted threads
and plaques, in which cross-linking is minimized. LC-MS/MS sequencing analysis and cDNA
database matching revealed that the novel Dpfp5 protein is an acidic protein with a block
structure and varied repeat patterns.
Keywords: bioadhesion; plaque; threads; mussel adhesive proteins; DOPA; LC-MS/MS
24
2.2 Introduction
Zebra mussels (Dreissena polymorpha) are an invasive species that were accidentally introduced
into the North American Great Lakes in the late 1980s [2]. These freshwater bivalves are able to
adhere to a variety of surfaces underwater and are therefore a major source of biofouling. They
are able to spread rapidly by anchoring to boat hulls and have had a huge economic impact on
water dependent industries and a major ecological impact on native ecosystems in the Great
Lakes [2], [1]. The mussels adhere to substrates by secreting a proteinaceous structure called the
byssus that consists of a number of threads with adhesive plaques at the tips and that is
surrounded by an exterior cuticle layer that serves as a protective lacquer [33]. Zebra mussels are
one of a few freshwater mussels known to produce a byssus and have evolved independently, as
a different subclass, than the much studied marine mussels [3], [8]. The zebra mussel and marine
mussel byssi are superficially similar [55] and are both composed of proteins containing the rare
amino acid 3, 4- dihydroxyphenylalanine (DOPA) which is produced by the enzymatic
hydroxylation of tyrosine and is responsible for varied adhesive and cohesive interactions in the
byssus [9]. DOPA can form multiple metal mediated ligations to give cohesive strength to the
cuticle [46], it can undergo covalent cross-linking with DOPA and other residues to provide
cohesive strength to the thread and plaque [15] and in its native form, DOPA can bind to metal
oxide surfaces and mediate surface adhesion [11]. The two mussel species however differ in their
overall protein compositions and amino acid distributions within the byssus and zebra mussels
even have a lower DOPA content than the marine mussels, thus indicating important roles for
other DOPA-independent interactions [9], [10]. Understanding the molecular mechanisms of
adhesion in the zebra mussel byssus may thus ultimately be useful in the design of alternate
water-resistant biological adhesives for medical and dental applications. This knowledge will
also be valuable in the development of non-toxic, targeted anti-fouling strategies against the
biofouling species.
The zebra mussel adhesive layer is characterized by a 10 – 20 nm thick layer at the plaque-
substrate interface that stains differently than the bulk plaque matrix and remains attached to the
substrate even when the plaque is removed [6]. MALDI mass spectrometry analysis revealed a
range of 5.8 – 7 kDa proteins in this layer, however, larger proteins were not detected, likely due
25
to heavy DOPA cross-linking [34]. MS also revealed that in spite of similar amino acid
compositions between thread and plaque [10], there are differences in protein composition
between the thread and plaque bulk and between the plaque and the plaque-substrate interface
[34]. However, none of these proteins have yet been identified. In marine mussels, the byssal
thread consists of a mixture of three collagenous proteins and the plaque and cuticle contain six
different 3, 4- dihydroxyphenylalanine (DOPA) containing adhesive, linker and lacquer proteins
[18]. In zebra mussels, on the other hand, amino acid analysis has revealed that both the thread
and the plaque comprise DOPA containing proteins [10]. However, the composition and
distribution of these proteins remains largely uncharacterized due to extensive DOPA cross-
linking which renders the mature structure largely resistant to extraction and
immunolocalization. The only three precursor proteins (D. polymorpha foot proteins) identified
thus far, Dpfp-1, 2 and 3, were identified as byssal proteins based on their ability to stain for
DOPA in an extract from the mussels ‘foot’, the organ that secretes the precursor proteins that
form the mature byssus [33]. Extraction of Dpfp1 and Dpfp2 from mature byssal threads [33]
and the immunolocalization of Dpfp-1 in byssal thread extracts [4] confirmed the presence of
these proteins in the byssus. While the approximate molecular weights of Dpfp1, Dpfp2 and
Dpfp3 have been determined by gel electrophoresis [33], accurate MS mass measurements and
full sequence information has been determined only for Dpfp1 [35] (Table 2-1). Recently, a
cDNA library was created by Xu and Faisal, 2008, representing genes expressed uniquely in the
zebra mussel foot [56]. This library was also used to isolate expressed sequence tags (ESTs) that
are up-regulated or down-regulated during byssogenesis [57].
Table 2-1. Summary of molecular weight, DOPA content and sequence information of the three
known D. polymorpha foot proteins (Dpfp)
Foot Protein MW by Gel
Electrophoresis
(kDa) [33]
MW by
MALDI-TOF
(kDa) [35]
Maximum
DOPA
Content [33]
Sequence information
known
Dpfp-1 76 and 65 54.5 and 48.6 6.6% Primary sequence [35]
Dpfp-2 26 Unknown 7% Peptide fragment
sequences [33]
Dpfp-3 12-13 Unknown Unknown None
26
Thus far, limited information is available on the composition and distribution of zebra mussel
byssal proteins. For one, the mature byssus is greatly resistant to analysis due to extensive DOPA
cross-linking. At the same time, extraction of precursor proteins from the foot does not reveal
any information on the distribution of proteins between different regions of the byssus. Also,
staining specifically for DOPA limits the identification of any non-DOPA containing proteins
that might also be present in the byssus. To overcome these challenges, we induce the secretion
of fresh threads such that these have minimal DOPA cross-linking and are thus less resistant to
extraction. Fresh byssal threads are induced by injecting the mussel`s foot with potassium
chloride, a method that has been used previously only in marine mussels [58], [30], [16] and now
for the first time in freshwater mussels. Using this method we are able to study protein
composition of the byssus after secretion from the foot but before extensive cross-linking. In
marine mussels, the induced byssal threads have been shown to be indistinguishable from the
natural threads [16], [25] thus allowing their study as a model system. Here, we report on our
identification of novel byssal proteins and characterization of their byssal distribution and
sequence properties. The identified byssal proteins, whether DOPA containing or non DOPA
containing, could serve different purposes within the byssus, potentially as adhesive between
plaque and substrate, as medium of cohesiveness for structural stability in the byssus and as a
varnish for protection of the byssus from degradation [9].
2.3 Methods
2.3.1 Protein extraction from induced byssal threads
Zebra Mussels were collected from Round lake, Ontario, Canada and kept for up to 60 days in an
aquarium at room temperature in artificial freshwater prepared using a recipe by M. Sprung,
1987 [59]. The mussels were dissected and the foot was injected with ~ 0.03 ml of 0.56M
Potassium Chloride (KCl) using an 18G syringe, as described by Tamarin et al., 1976, thus
leading to the induction of the byssal thread [58]. After 3-5 minutes, the induced thread/plaque
was located in the ventral groove of the foot, pulled out with tweezers, washed in a drop of
27
deionized water and extracted. The extraction method was adapted from Zhao and Waite, 2006
[30] with several changes: per extraction, 6 - 14 byssal threads were extracted in 250 µL of basic
extraction buffer (EB) (0.2M sodium borate, 4M urea, 1mM KCN, 1mM EDTA, and 10 mM
ascorbic acid), prepared using a recipe adapted from Rzepecki and Waite, 1993 [33]. Samples
were homogenized on ice in a 1mL Ground Glass Hand-Held Tissue Grinder, sonicated with a
probe sonicator (15 times, 2sec. each) and centrifuged (17000 g, 8 min., 4°C) [30]. The
supernatant (soluble extract) and the pellet (insoluble matrix) were stored separately at -20°C.
Where relevant, the byssal thread was separated into thread and plaque prior to extraction.
2.3.2 Dialysis, lyophilization and quantification of protein samples
Soluble extracts from the required number of extractions were pooled together and dialyzed
against 0.15 M Sodium Borate (pH 8.1 – 8.4) to get rid of urea and basic EB components and
then against nitrogen bubbled 1% acetic acid to eliminate sodium borate and acidify the sample
before lyophilization [60]. Dialysis steps were done using a 2 kDa molecular weight cutoff
(Thermo Scientific Slide-A-Lyzer Dialysis Cassette G2 (#87718)), with stirring for 2 hours, 3
hours and overnight at 4°C in 300 times the sample volume of dialysis buffer. Dialyzed samples
were aliquoted, lyophilized (Gibson-Air ModulyoD Lyophilizer) and stored at -20°C. Protein
quantities were determined according to absorbance measurements at 280 nm (Nanodrop ND-
1000 Spectrophotometer, Thermo Scientific) of samples resuspended in deionized water.
Resuspended samples were stored in liquid nitrogen prior to use.
2.3.3 Amino acid analysis
Amino acid analysis of mature and induced byssal threads and of soluble protein extracts was
performed using a Waters Acquity UPLC Gradient and Detector and the Waters Empower 2
Chromatography Software by the Advanced Protein Technology Centre at Sick Kids Hospital,
Toronto. Samples were dried in pyrolyzed borosilicate tubes in a vacuum centrifugal
concentrator and subjected to vapour phase hydrolysis by 6N HCl with 1% phenol at 110°C for
28
48 hours under a pre-purified nitrogen atmosphere. After hydrolysis, excess HCl was removed
by vacuum and hydrolyzates were washed with redrying solution and derivatized with
phenyisothiocyanate (PITC), followed by reverse phase HPLC.
2.3.4 Tricine polyacrylamide gel electrophoresis (Tricine-PAGE) and gel
silver-staining
Proteins from lyophilized samples were separated by Tricine-PAGE electrophoresis using
premade Novex 16%, 1mm, Tricine gels (Invitrogen, EC6695BOX), Novex Tricine SDS 2X
Sample Buffer (Invitrogen, LC1676) and Novex Tricine SDS 10X Running Buffer (Invitrogen,
LC1675). The gels were run at 125 V for 1.5 hours using an XCell SureLock Mini-Cell
Electrophoresis System (Invitrogen, EI0001). Silver-staining of proteins in the gel was adapted
from Mortz et al., 2001 [61]. Briefly, gels were fixed (40% ethanol, 10% acetic acid, 50% water,
1 hour), washed in deionized water (30 min.), sensitized in 0.02% sodium thiosulfate (1 min.)
and washed 3 times in water (20 sec. each). Gels were then incubated in 0.1% cold silver nitrate
solution containing 0.02 % formaldehyde (20 min, 4°C), washed 3 times in water (20 sec each),
transferred to a new tray and then washed again in water (1 min). Gels were developed in 3%
sodium carbonate containing 0.05% formaldehyde until staining was sufficient. Staining was
terminated with 5% acetic acid and gels were stored at 4°C in 1% acetic acid.
2.3.5 Digestion of protein gel bands
Silver stained protein gel bands were cleaved with trypsin at the Advanced Protein Technology
Centre, Sick Kids Hospital, Toronto. Briefly, the excised gel bands were destained by incubating
in a 1:1 mixture of 30 mM potassium ferricyanide and 100mM sodium thiosulfate (15 min.),
washing with deionized water, washing with 50 mM ammonium bicarbonate and then shrinking
with 50% acetonitrile/25 mM ammonium bicarbonate. Samples were reduced with 10 mM DTT
(30 min, 56°C) and alkylated with 100 mM iodoacetamide (15 min., dark, room temperature)
followed by shrinking with 50% acetonitrile/25 mM ammonium bicarbonate (15 min.). Samples
29
were then digested with 13 ng/µL trypsin (Porcine, Sequencing Grade, Promega) overnight at
37°C and the liquid was collected. Peptides were extracted by vortexing sample separately with
25 mM ammonium bicarbonate, 5% formic acid, 100% acetonitrile, 5% formic acid and 100%
acetonitrile and all supernatants were pooled together. Extracted peptides were lyophilized by
SpeedVac centrifugation and resuspended in 20uL 0.1% formic acid in water for LC-MS/MS
analysis.
2.3.6 Liquid chromatography – tandem mass spectrometry (LC-MS/MS)
LC-MS/MS analysis was performed by the Advanced Protein Technology Centre at Sick Kids
Hospital, Toronto. The digested peptides were loaded onto a 150 μm ID pre-column (Magic C18,
Michrom Biosciences) at 4 μl/min. and separated over a 75 μm ID analytical column packed into
an emitter tip containing the same packing material. The peptides were eluted over 60 min at
300 nl/min using a 0 to 40% acetonitrile gradient in 0.1% formic acid using an EASY n-LC
nano-chromatography pump (Proxeon Biosystems, Odense Denmark). The peptides were then
eluted into a LTQ-Orbitrap XL hybrid mass spectrometer (Thermo-Fisher, Bremen, Germany)
operated in a data dependant mode. MS was acquired at 60,000 FWHM resolution in the Fourier
Transform Mass Spectrometer (FTMS) and MS/MS was carried out in the linear ion trap. 6
MS/MS scans were obtained per MS cycle.
2.3.7 Sequencing data analysis
MS/MS data was searched using Mascot (Matrix Sciences, London UK) by matching against
zebra mussel and metazoa protein databases and against a zebra mussel Expressed Sequence Tag
(EST) library virtually translated in six different reading frames using Virtual Ribosome 1.1
(http://www.cbs.dtu.dk/services/VirtualRibosome/). This cDNA library, representing genes
expressed uniquely in the mussel foot, was prepared by Xu and Faisal, 2008 using a BD
Clontech PCR-Select cDNA Subtraction Kit and comprises 750 genes with Accession numbers
AM229723 to AM230448 (downloaded November 2011 from the Genbank Server) [56]. During
30
creation of the library, base pairs were removed from the 5’ end of the cDNA to create blunt
ends for ligation of adaptor sequences for cDNA amplification purposes [56]. Therefore, in
several sequences where base pairs were removed from the 5’ translated region, the virtual
protein sequences are incomplete at the N-terminus [62].
MS data was visualized and validated using Scaffold 3.3.1 (Proteome Software Inc., Portland,
OR). Peptide identifications were accepted at greater than 80.0% probability and protein
identifications were accepted at greater than 95.0% probability with at least 1 identified peptide.
Parent ion and fragment ion mass tolerances were set to 20 PPM and 0.40 Da respectively and
hits were confirmed manually by inspecting the spectra. The data was searched using
carbamidomethylation as fixed modification and deamidation of asparagine and glutamine,
hydroxylation of lysine and tyrosine, oxidation of methionine, acetylation of the N-terminus and
phosphorylation of serine and threonine as variable modifications.
The theoretical mass, pI and amino acid composition of virtual EST protein matches were
determined using EMBOSS Pepstats, Kyte-Doolittle Hydropathy Plots were obtained using
EMBOSS Pepwindow and amino acid distribution metrics were determined using the EMBOSS
pepinfo tool all on the European Bioinformatics Institute website
(http://www.ebi.ac.uk/Tools/emboss/pepinfo/). Signal peptides were searched using the SignalP
4.0 (http://www.cbs.dtu.dk/services/SignalP/) and PrediSi (http://www.predisi.de/) online tools
and multiple sequence alignments were performed with the Clustal-W2 online tool
(http://www.ebi.ac.uk/Tools/msa/clustalw2/). Protein homology searches were done using NCBI
Protein BLAST (Basic Local Alignment Search Tool)
(http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins).
2.4 Results
2.4.1 Optimal conditions for zebra mussel protein extraction and analysis
To enhance protein extraction directly from the zebra mussel byssal thread, we induced the
secretion of fresh byssal threads having minimal DOPA cross-linking. Maximal extraction of
31
soluble proteins from the induced byssal threads is then important to suitably analyze the
proteins. We found that the basic extractions described in the methods section were the most
effective in extracting proteins as compared to acidic extractions using acetic acid (5% or 8 %)
and 8M urea (results not shown here). The basic extractions give smaller pellets of non-soluble
extract than acidic extractions. Also, when loaded on 15% Acetic Acid Urea (AU) PAGE gels
and silver stained, basic extracts (acidified for gel compatibility) display more protein gel bands
as compared to acidic extracts which do not reveal any visible protein bands (results not shown
here). A280 absorbance readings revealed that approximately 3.6 µg of protein is extracted per
byssal thread/plaque when these are extracted in an acidic extraction buffer (5% acetic acid and
8M urea). This measurement could not be performed on basic extracts because the basic buffer
itself absorbs at 280 nm.
After extraction and homogenization, the byssal extracts are centrifuged to separate out the
soluble and insoluble proteins. The mass of protein present in soluble byssal extracts of separated
threads and plaques, that were pooled, dialyzed, lyophilized and resuspended in water, is
determined by measuring absorbance readings at 280 nm. In the thread, approximately 4.0 µg of
protein was extracted per mussel and in the plaque, ~6.7 µg was extracted per mussel. Thus
proteins are more easily extracted from the zebra mussel plaque as compared to its thread.
Amino acid analysis of the soluble thread and plaque extracts reveals that they have very similar
amino acid compositions (Table 2-2). This is consistent with the observation that the mature
thread and mature plaque have the same amino acid content [10]. Comparisons of the mature and
induced threads/plaques showed similar amino acid contents except for aspartic acid/asparagine
which has a much higher mol% in the mature (19.5%) versus the induced (6.0%) byssal threads
(Table 2-2). Since the induced byssal threads are artificially secreted by forced secretion with
KCl, this could lead to different protein compositions of induced versus mature byssal threads.
Thus, it is important to consider potential differences when using fresh induced threads as the
model system. Interestingly, amino acid analysis did not detect DOPA in any of the byssal
samples (Table 2-2), likely because oxidation of DOPA to DOPA quinone followed by DOPA
quinone covalent cross-linking makes these residues resistant to amino acid analysis [63].
32
Table 2-2. Comparisons of the amino acid compositions in mole % (number of residues per 100
residues) in zebra mussel mature and induced thread/plaques and in soluble thread and plaque
extracts.
Amino Acid Mature Thread/Plaque*
Induced Thread/Plaque*
Induced Soluble Thread**
Induced Soluble Plaque**
Asx (D/N) 19.5 6.0 6.1 5.7 Glu (E/Q) 6.9 6.2 5.6 6.7 Ser (S) 3.8 6.2 10.3 11.4 Gly (G) 21.7 20.5 27.1 24.7 His (H) 0.6 1.4 3.6 4.1
Arg (R) 2.6 4.2 2.2 3.5 Thr (T) 4.8 6.3 1.9 2.7 Ala (A) 2.3 5.0 7.5 6.3 Pro (P) 6.1 6.4 8.7 8.7 Tyr (Y) 9.0 8.4 2.1 2.3 Val (V) 6.8 7.0 2.3 2.6 Met (M) 0.9 3.8 9.6 9.2 Cys (C) 0.3 1.4 1.5 - Ile (I) 5.1 5.1 4.0 3.6 Leu (L) 4.7 5.9 1.9 3.2 Phe (F) 1.8 3.0 3.7 3.1
Lys (K) 3.0 3.2 2.0 2.2 DOPA - - - -
* Amino acid analysis was performed on intact byssal threads
**Amino acid analysis was performed on protein extracts
2.4.2 Identification of novel foot proteins in the zebra mussel byssus
Proteins extracted from induced byssal threads were dialyzed and lyophilized as described in the
methods and then electrophoresed and visualized on a silver-stained tricine-PAGE gel. Figure 2-
1 displays the protein bands identified in the intact byssal thread/plaque (T/P) and in separated
threads and plaques. Six protein bands were visible in the byssal T/P extract (Figure 2-1A). The
two bands between 90 and 65 kDa correspond to the molecular weights of the two forms of the
previously identified Dpfp1 protein (76 and 65 kDa). The thick band between 30 and 20 kDa
corresponds to the previously known protein Dpfp2 with a molecular weight of 26 kDa. Both
Dpfp1 and Dpfp2 previously stained for DOPA in foot extracts and were identified in extracts
from mature byssal threads as well [33]. A third DOPA containing protein called Dpfp3, also
33
previously seen in the foot extracts, is however not observed on the gel in Figure 2-1. The three
gel bands labeled with underlines represent novel byssal proteins that were not previously known
to be present in the zebra mussel byssus. We call these proteins Dpfp0 (>210 kDa), Dpfp4 (>90
kDa) and Dpfp5 (~30 kDa).
In the amino acid comparison between zebra mussel thread and plaque extracts, the thread and
plaque have similar amino acid contents (Table 2-2) thus indicating that they have soluble
proteins with similar amino acid compositions. When proteins were extracted from separated
threads and plaques and loaded on the same gel, a number of faint protein bands and one major
protein band (~30 kDa) was observed in each lane (Figure 2-1B). While the band corresponding
to Dpfp0 appears to be uniquely in the plaque, the very faint band corresponding to Dpfp4
appears to be absent in the plaque. However since unequal masses of protein (222 µg of thread
extract and 400 µg of plaque extract) were loaded on the gel, conclusive direct comparisons of
band densities cannot be made. The bands corresponding to Dpfp1, Dpfp2, Dpfp5 and a ~50 kDa
protein are present both in the thread and in the plaque with Dpfp5 having the most prominent
bands. In spite of loading almost twice as much plaque extract on the gel, the density of the
Dpfp5 band in the plaque is same or less as compared to the thread. Thus, in comparison, the
thread extract has a higher composition of Dpfp5.
The type of gel used for protein identification has an impact on the visualization of protein
bands. Several of the lower molecular weight proteins observed in the silver-stained Tricine
PAGE gel in Figure 2-1 were not observed on a Sodium Dodecyl Sulfate Polyacrylamide (SDS-
PAGE) gel (results not shown here). Additionally, gel bands were only seen when several protein
extracts were pooled together and lyophilized before electrophoresis. Even up to 15 byssal
threads in a single extraction were not sufficient to view bands (result not shown here) as
compared to the proteins seen when 65 byssal threads from 13 extractions of 5 byssal threads
were pooled together (Byssal T/P lane in Figure 2-1A).
34
Figure 2-1. Electrophoretic identification of zebra mussel byssal proteins. Proteins were
extracted from induced, freshly secreted byssal threads, dialysed and lyophilized as described in
the methods. Lyophilized proteins were resuspended in double distilled water and loaded on 16%
Tricine PAGE gels that were then silver-stained for protein visualization. The masses in brackets
on the gel represent the mass of lyophilized protein loaded for each sample. (A) Byssal proteins
identified in an extract from 65 complete byssal threads (Byssal T/P). The leftmost lane contains
a Colorburst molecular weight ladder. Underlined proteins represent novel byssal foot proteins
that we have chosen to call Dpfp0 (>210 kDa), Dpfp4 (>90 kDa) and Dpfp5 (~30 kDa). The
other protein bands correspond to the molecular weights of previously known DOPA containing
foot proteins, Dpfp1 (76, 65 kDa) and Dpfp2 (26 kDa). (B) Byssal proteins identified in the
extracts from 100 separated threads and 100 separated plaques. Arrows indicate bands observed
on the gel, most of them corresponding to the bands seen in the byssal T/P in Figure 2-1A.
35
2.4.3 Comparisons of LC-MS/MS derived sequences of Dpfp1, Dpfp2 and
Dpfp5
The protein gel bands of presumed Dpfp1 (76 kDa), Dpfp2 (26 kDa) and Dpfp5 (~30 kDa) in the
byssal T/P lane in Figure 2-1A were subjected to in-gel trypsin digestion and LC-MS/MS mass
spectrometry analysis. The protein mass spectra were then matched against known zebra mussel
proteins and against virtually translated EST sequences from a cDNA library of genes unique to
the zebra mussel foot. Dpfp1 did not match any of the EST sequences in the cDNA library but
did match the known sequence of Dpfp1 (AF265353), thus confirming its identity [35]. The mass
spectra of protein bands Dpfp2 and Dpfp5 matched to a number of EST sequences virtually
translated in any of six reading frames (±1, ±2, ±3). Table 2-3 shows a sequence match
comparison of the three sequenced proteins, Dpfp1, Dpfp5 and Dpfp2. The accession numbers
described for Dpfp2 and Dpfp5 represent the sequence match that has the highest molecular
weight corresponding to the protein’s molecular weight as identified by gel electrophoresis. The
theoretical protein masses are derived from the matched sequence and are in each case smaller
than the molecular weight seen by gel electrophoresis (Table 2-3). This discrepancy could be
because the EST sequences do not account for post-translational modifications such as tyrosine
hydroxylation to DOPA and protein glycosylations such as those seen by Rzepecki and Waite,
1993 in Dpfp1 and Dpfp2 [33]. Incomplete N-termini of the sequences, owing to limitations in
the creation of the cDNA library (see Methods), may also contribute to the inconsistency.
Significantly, while Dpfp1 is known to run at 76 and 65 kDa on the gel [33], Matrix Associated
Laser Desorption Ionization mass spectrometry indicates that the protein forms actually have
molecular weights of 54.5 and 48.6 kDa, respectively [35]. Thus the discrepancy can be greatly
attributed to an overestimation of molecular weights when the byssal proteins run on the gel.
Based on the theoretical pI, Dpfp1 and Dpfp5 are acidic proteins unlike Dpfp2 which is basic.
Also, Dpfp2 peptide matches reveal post-translational modifications in the form of glutamine
deamidation (Q) and tyrosine hydroxylation to DOPA (Y). No such modifications are detected in
the spectrum matches obtained for Dpfp1 and Dpfp5 (Table 2-3).
36
The Scaffold program describes a ‘protein identification probability’ value that indicates the
probability that a deduced sequence matches the protein. Mascot peptide scores indicate the
certainty that the mass spectrum matches the respective peptide sequence. These are described in
Table 2-3. Dpfp1 and Dpfp2 have a protein identification probability of 100%. The derived
sequence of Dpfp2 (AM229739) has a high probability match and three mass spectrum matching
peptides (Table 2-3) and this sequence compares very well to the sequence fragments obtained
by automated Edman degradation of Dpfp2 by Rzepecki and Waite, 1993 [33]. Hence, we can be
very confident about the Dpfp2 – EST match. The Dpfp5 derived sequence (AM230139) has a
protein identification probability of 95% and the single spectrum match has a Mascot score of
46, thus indicating a high certainty of the sequence to spectrum match. While there is only a
single spectrum match shown here, an additional mass spectrum match
(NDVDGNENIVGGQSNAVGGK) was also observed when the EST match was identified in the
insoluble byssal matrix that is found in the pellet after centrifugation, as described elsewhere
[64], thus further supporting the protein identification.
Table 2-3. Comparisons of three zebra Mussel byssal proteins sequenced by LC-MS/MS.
Foot
Protein
MW by
Electrop
horesis
(kDa)
EST
Accession
Number
(Protein
Identification
Probability)
Mass Spectra
Matching peptide
sequences
Mascot
Peptide
Score
Theor
-etical
mass
(kDa)
Theore
tical
pI
Dpfp1 76/65 AF265353
(100%)
SPLYGWR 19 49.3 5.2
TGPYVPIKPDDK 29
TRVYPYLPLYPGYQPE
YHR
22
Dpfp5
~30 AM230139a
(95%)
YVGEGNNVGEQR 46 20.1c 6.4
c
Dpfp2 26 AM229730a
(100%) QAYPVYPEK
b 25 15.3
c 9.0
c
QSYPVYPEK b
38
YPEKPYPGYQDYWGK 26
a The representative EST sequence is the one that has the highest MW corresponding to that seen by electrophoresis.
b Q and Y signify glutamine deamidation and tyrosine hydroxylation to DOPA, respectively
c The mass and pI shown represent the protein sequence excluding the N-terminal adaptor sequence
37
2.4.4 Sequence properties of the EST-derived sequence of the novel Dpfp5
protein
The sequence of the novel Dpfp5 protein has been determined for the very first time through
EST matching of peptide mass spectra. Figure 2-2 depicts the multiple sequence alignment of
four matches. The aligned sequences are quite similar with some regions that are present in all
matches (red sequence at C-terminus) and other regions that are missing or different in some
matches (blue or purple sequences at N-terminus). In all the matches, the orange sequence
represents an adaptor sequence that was inserted during the creation of the cDNA library [56]. In
all of the EST matches, no N-terminal signal peptide or methionine residue (representing start
codon) was observed, likely because the N-terminus of the sequences is incomplete due to
removal of base pairs from the 5’ end of the cDNA for adaptor ligation for cDNA cloning
purposes [56]. Premature termination of reverse transcription, due to strong mRNA secondary
structure, could also result in an incomplete 5’ cDNA end and therefore in an incomplete N-
terminal sequence [62]. An incomplete N-terminus also means that we do not know the correct
reading frame of the cDNA sequence and hence we have to theoretically translate the cDNA in
all three positive reading frames. cDNA second strand synthesis during amplification means that
we must theoretically translate from the C-terminus in the three negative reading frames as well.
The sequence matches obtained are independent of the reading frame of the virtually translated
sequence (shown in brackets beside the EST accession number). On checking for post
translational modifications (PTM), no tyrosine hydroxylation (DOPA) or other PTM was
detected.
38
AM230139(+1) GRGNSISSGRPGRYNSWPPKPNQPQQPQQPQQPPQPPRYPQP-SYPAYPP 49
AM230094(+1) GRGNSISSGRPGRYNSWPPKPNQPQQPQQPQQPPQPPRYPQP-SYPAYPP 49
AM230093(-2) ------LAWSRPR-------------------------YPQQQSYPAYPP 19
AM230120(+1) GRGNSISSGRPGR------------------------------------- 13
AM230139 QQSYPAYPPKQSYPTYPPKQSYPAYPPKQSYPTNPPYNPCDAVYCRPIYC 99
AM230094 QQSYPAYPPKQ---------SYPAYPPKQSYPTNPRYNPCAAVYCHPIYC 90
AM230093 KQSYPTYPPKQ---------SYPAYPPKQSYPTNPPYNPCDAVYCRPIYC 60
AM230120 --------------------------------------------------
AM230139 NYGQYTPQGECCPQCNPGTYLPEKWSWKGNNVVGDQEKYVGEGNNVGEQR 149
AM230094 NYGQYTPQGECCPQCNPGTYLPEKWSWQGNNVVGDQEKYVGEGNNVGEQR 140
AM230093 NYGQYTPQGECCPQCNPGTYLPEKWSWQGNNVVGDQEKYVGEGNNVGEQR 110
AM230120 ----------------SGTYLPEKWSWQGNNVVGDQEKYVGEGNNVGEQR 47
********** **********************
AM230139 NDVDGNENIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 196
AM230094 NDVGGNANIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 187
AM230093 NDVSGNSNIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 157
AM230120 NDVSGNSNIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 94
*** ** ****************************************
Figure 2-2. Alignment of the multiple EST sequence matches derived for the Dpfp5 gel band in Figure 2-
1A. The gel band was digested by in-gel tryptic digestion and the fragmented peptides were subject to
LC-MS/MS mass spectrometry analysis. The mass spectra obtained were then matched against the
virtually translated EST cDNA library of zebra mussel foot proteins. The bracketed numbers besides the
accession numbers represent the reading frame of the virtually translated EST sequence. The numbers at
the end of the sequence rows represent the position of the last amino acid in the row. The peptide matches
are aligned and color coded to show regions of sequence similarity between matches. The orange
sequence at the N-terminus represents adaptor sequences added during cDNA amplification. The colors
red (100%), blue (75%) and purple (50%) represent the percent sequence similarity between different
EST matches. The yellow highlight represents peptide sequences that matched directly from the mass
spectra. Q and Y signify glutamine deamidation and tyrosine hydroxylation to DOPA, respectively. *
indicates residues that are conserved between all EST matches. The first accession number represents the
sequence that is further analyzed through the paper.
For further analysis of the Dpfp5 sequence properties, we chose to exclude the N-terminal
adaptor sequence (orange) and analyze the longest and thus the most complete sequence
(AM230139) as described in Table 2-3. The AM230094 sequence (Figure 2-2) is mostly similar
to AM230139 except that it is missing a single SYPTYP repeat in the blue region. The Dpfp5
sequence (AM230139) is 183 residues long and has a theoretical mass of 20.1 kDa (Figure 2-3).
The sequence is rich in proline (P, 18%), glycine (G, 12%), glutamine (Q, 12%), asparagine (N,
10%) and tyrosine (Y, 10%) which may be hydroxylated to DOPA. Additionally, different
39
regions of the Dpfp5 protein display noticeable distinctions in amino acid properties and repeat
patterns. The N-terminus (residues 1 – 68) has a theoretical pI of 9.56 whereas the rest of the
protein has a pI of 4.45. Additionally, the N-terminus is quite hydrophilic in contrast to the rest
of the protein which is mostly hydrophobic. The Dpfp5 sequence consists of similar mol% of
aliphatic (12%) and aromatic (12%) residues and same mol% of acidic (7%) and basic (7%)
residues. While positive residues (K+R) are uniformly distributed through the sequence, negative
residues (D+E) are absent at the N-terminus. The N-terminus is rich in triads of proline and
glutamine (generally PQQ and PKQ) alternately underlined and highlighted in green (Figure 2-
3). The latter of these repeats are interspersed with consensus repeats of SYP(A/T)YP
highlighted in blue. The middle region of the Dpfp5 sequence (residues 69 – 114) has a pI of
5.87 and has no discernible repeats. It does however contain six cysteine residues that are
otherwise absent from the rest of the protein. The C-terminal of the sequence (residues 115 –
183) has a theoretical pI of 4.21 and consists of ten VGG repeats (highlighted in yellow) where
the third glycine is occasionally substituted with aspartic acid (D), glutamic acid (E) or
tryptophan (W). Alternate VGG repeats are followed by a glutamine (Q) residue and all except
one are preceded by an asparagine (N) residue at the second last position. Five GN(N/D/T)
repeats are also observed at the C-terminus, preceding the VGG repeats.
YNSWPPKPNQPQQPQQPQQPPQPPRYPQP (29)
SYPAYPPQQSYPAYPPKQSYPTYPPKQSYPAYPPKQSYP (68)
TNPPYNPCDAVYCRPIYCNYGQYTPQGECCPQCNPGTYLPEKW (111)
SWKGNNVVGDQEKYVGEGNNVGEQRNDVDGNENI VGGQSNA (152)
VGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG (183)
Figure 2-3. Illustration of the pattern of repeats identified in the EST derived sequence of Dpfp5
(AM230139). The adaptor sequence inserted during cDNA cloning has been excluded and the N-
terminus of the sequence is incomplete. The bracketed numbers represent the sequence position
of the last amino acid in the row. Alternating underlined and non-underlined green highlighted
sequences represent proline and glutamine rich triads. Blue, grey and yellow highlights represent
other repeat sequences. Cysteine residues are indicated in red.
40
2.4.5 Sequence properties of the EST-derived sequence of Dpfp2
While incomplete and unordered fragments of the protein sequence of Dpfp2 had previously
been determined by automated Edman degradation [33], we have determined here for the first
time a more complete sequence of Dpfp2 through EST matching of its peptide mass spectra.
Figure 2-4 depicts the multiple sequence alignment of five of these matches. The five aligned
sequences are quite similar with some regions that are conserved in all matches (red sequence at
C-terminus) and other regions that are missing or different in some matches (blue, purple or
green sequences at N-terminus). All the sequences except the sequence in black in AM229733
correspond quite well to the Dpfp2 sequence fragments observed by Rzepecki and Waite, 1993
[33]. The AM229733 sequence is therefore not studied in further analysis of the protein. In all
the matches, the orange sequence represents an adaptor that was inserted during the creation of
the cDNA library [56]. Like with Dpfp5, this replaced several N-terminal residues and thus, the
N-terminus of the Dpfp2 sequence (likely including the signal peptide) is incomplete. The
sequence matches obtained are independent of the reading frame of the virtually translated
sequence. On checking for post translational modifications, one DOPA (Y) residue was located
in the spectrum match in the 40% conserved region near the N-terminus. For further analysis of
the Dpfp2 sequence properties, we chose to exclude the N-terminal adaptor sequence (orange)
and analyze the sequence with accession number AM229730 as described in Table 2-3. This
sequence is 125 residues long and has a theoretical mass of 15.3 kDa. It is rich in charged
residues including 10% glutamic acid (E) and 19% lysine (K). It is also rich in proline (P) (16%)
and threonine (T) (12%) and is richest in tyrosine (Y) (23%).
41
AM229730(+2) APGRHGGRGNSISSGRPGR-YQEKTYPGYPPKQAYPVYPEKTYPEKTYPAY 50
AM229733(+1) --------GRGNLAWSRPRCTRRKLIRVILQDKHILYIVRNRYPEKTYAAY 43
AM229731(-2) -------NSLVISSGRPGR-YQEKTYPGYPPKQAYPVYPEKTYPEKTYPAY 43
AM229735(+1) -------GRGNSISVVAAE-VK----------------------------- 14
AM230118(+1) -------GRGNSISVVAAE-V------------------------------ 13
AM229730 PTKKSYPEYPEKTYTKKTYEAYPTKDSYTVYPDKKYTEKTYEAYPTKDSY 100
AM229733 PTKKSYPEYPEKTYTKKTYEAYPTKDSYTVYPDKKYTEKTYEAYPTKDSY 93
AM229731 PTKKSYPEYPEKTYTKKTYEAYPTKDS---YPDKKYTEKTYEAYPTKDSY 90
AM229735 ----------------KTYEAYPTKDSYTVYPDKKYTEKTYEAYPTKDSY 48
AM230118 -----------------------------VYPDKKYTEKTYEAYPTKDSY 34
********************
AM229730 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWGK 144
AM229733 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWGK 137
AM229731 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWG- 133
AM229735 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWGK 92
AM230118 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWGK 78
*******************************************
Figure 2-4. Alignment of the multiple EST sequence matches derived for the Dpfp2 (26 kDa)
gel band in Figure 2-1A. The gel band was digested by in-gel tryptic digestion and the
fragmented peptides were subject to LC-MS/MS mass spectrometry analysis. The mass spectra
obtained for each band was then matched against the virtually translated EST cDNA library of
zebra mussel foot proteins. The bracketed numbers besides the accession numbers represent the
reading frame in which the EST sequence was virtually translated. The numbers at the end of the
sequence rows represent the position of the last amino acid in the row. The peptide matches are
aligned and color coded to show regions of similarity between matches. The orange sequence at
the N-terminus represents adaptor sequences added during cDNA amplification. The colors red
(100%), blue (80%), purple (60%) and green (40%) represent the percent sequence similarity
among the EST matches. The yellow highlight represents peptide sequences that matched
directly from the mass spectra. Q and Y signify glutamine deamidation and tyrosine
hydroxylation to DOPA, respectively. * indicates residues that are conserved between all EST
matches. The first accession number represents the sequence that is further analyzed through the
paper.
42
Analysis of the EST derived sequence of Dpfp2 reveals five full tandem repeats of a 22 residue
consensus sequence that make up the bulk central region of the protein. Figure 2-5A shows this
pattern of repeats with each repeat on a different line. The consensus sequence can be
represented by KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where non-italicized residues
represent highly conserved residues. There are five tyrosines (Y) in every consensus and the
position of the tyrosine residue is always conserved (indicated in bold in Figure 2-5A). Rzepecki
and Waite, 1993 had identified two fragments of this Dpfp2 consensus sequence; K-(K/T)-Y-
(X/P)-E and *-Y-(P/X)-*-(Y/K)-*-D, where * is any residue, Y is DOPA and X was speculated
to be glycosylated threonine [33]. Within the EST match, the one DOPA (Y) residue was
identified in the first full repeat though there are others (as detected by Rzepecki and Waite,
1993) that were not detected by the LC-MS/MS machine. The deduced Dpfp2 sequence contains
only 7 mol% aliphatic residues, 24% aromatic residues and 32% charged residues. There is an
equal distribution of non-polar and polar residues. Comparing hydrophobicity, the Kyte-Doolittle
Hydropathy Plot in Figure 2-5B illustrates a repeating pattern of rising and falling
hydrophobicity. Higher hydropathy scores indicate higher hydrophobicity. The KT/KK residues
at the 18th
position in the consensus represent the hydrophilic start and end of each of the central
four hydrophobic peaks.
43
A.
YQE (3)
KTYPGYPPKQAYPVYPEKTYPE (25)
KTYPAYPTKKSYPEYPEKTYTK (47)
KTYEAYPTKDSYTVYPDKKYTE (69)
KTYEAYPTKDSYTVYPDKKYTE (91)
KKYEAYPTKQSYPVYPEKKYPE (113)
KPYPGYQDYWGK (125)
B.
Figure 2-5. Illustration of the tandem repeat pattern identified in the EST derived sequence of
Dpfp2 (AM229730). (A) Sequence depicting five full repeats of a 22 residue consensus sequence
KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where non-italicized residues represent highly
conserved residues. Each full repeat is on a new line and tyrosine residues with conserved
positions within the consensus are indicated in bold. The bracketed numbers represent the
sequence position of the last amino acid in the row. The underlined residues indicate post-
translational modifications; Q and Y signify glutamine deamidation and tyrosine hydroxylation
to DOPA, respectively. (B) Kyte-Doolittle hydropathy plot of the sequence. Higher hydropathy
scores indicate higher hydrophobicity.
44
2.5 Discussion
We report here on the byssal distribution and sequence properties of novel and previously know
byssal proteins. We identified three novel zebra mussel byssal proteins by performing gel
electrophoresis of proteins extracted from freshly secreted, minimally cross-linked byssal
threads. These novel proteins, Dpfp0 (>210 kDa), Dpfp4 (>90 kDa) and Dpfp5 (~30 kDa) did
not previously stain for DOPA in the foot extract [33] and were thus never before known to be
present in the byssus. Further, we used peptide fragment fingerprinting (LC-MS/MS analysis and
cDNA database matching) to determine a likely protein sequence for the novel Dpfp5 protein.
We also identified two previously known DOPA proteins, Dpfp1 (76 and 65 kDa) and Dpfp2 (26
kDa) on the gel and determined a more complete sequence of Dpfp2 to complement the
previously known fragments of the sequence [33].
The EST derived sequence of Dpfp5 displays interesting repeat patterns and sequence properties
within the protein (Figure 2-3). Sequence repeats are a characteristic of several adhesive proteins
including those in the much studied marine mussels and sandcastle worm [65], [42] and are
therefore important to study. The N-terminus of Dpfp5 (residues 1 – 68) is basic with a pI of
9.56, is quite hydrophilic as compared to the rest of the protein and lacks negative residues
identified in the rest of the protein. This N-terminus is rich in repeats of glutamine (Q, 21%) and
proline (P, 37%) residues that each make up only 7% of the rest of the protein. Extended
polyglutamine (polyQ) sequences are a characteristic of neurodegenerative diseases where
expansion of polyQ stretches is believed to cause aggregation of the protein. Popiel et al., 2004
found that inserting proline into the expanded polyQ stretch suppressed protein aggregation and
cytotoxicity [66]. As such, in Dpfp5, the glutamine chain possibly contributes to the
structure/function of the protein and the interspersed prolines may be present to ensure that the
glutamine residues do not aggregate. The N-terminus also has a number of SYPAYP repeats that
are interspersed between an additional set of P(K/Q)Q repeats. However, the number of these
repeats slightly varies between EST matches. While most of the EST matches have only three
full repeats of SYPAYPP(K/Q)Q, the match described in Figure 2-3 (AM230139) has an
additional fourth SYPTYPPKQ repeat (Figure 2-2). These slightly different sequences might
represent different Dpfp5 variants. These variants could arise as multiple copies of the same gene
45
in the form of different mature RNAs. The mRNA variants can be created by RNA editing or
alternate splicing of one primary RNA transcript [67]. Additionally, since the protein sample is
prepared by collecting byssal threads from several different mussels, allelic variation between
mussels could contribute to the different protein forms we see [29].
A BLAST search of the proline and glutamine chain in the first 29 Dpfp5 residues revealed PQQ
sequence homologies with a number of extracellular structural proteins including glycoproteins
from the zona pellucida (oocyte egg coat) of the Winter Flounder Flatfish [Pseudopleuronectes
americanus] (score 56.2) and with the Choriogenin H minor glycoprotein in the Zona Radiata of
Fundulus heteroclitus (score 53.7). These glycoproteins are believed to be involved in hardening
reactions involving alterations to the structure of the protein [68], [69]. This might thus be
comparable to the maturation process of the zebra mussel byssal thread. Lyons et al., 1993 found
PQQ repeats in the Winter Flounder glycoprotein gene to be part of a longer (PQQ)4PKY repeat
and suggested that the lysine, tyrosine and glutamine residues might be involved in cross linking
owing to their positioning [69]. As such, the conserved positioning of these residues in the Dpfp5
terminus could also indicate similar interactions. Importantly, such repeats, containing a proline
at every third residue position interspersed with hydrophilic residues, are a common feature of
many extracellular structural proteins [69] thus indicating a structural role for the N-terminus in
Dpfp5. PQQ homologies were also seen in a structural integral membrane protein [Streptomyces
roseosporus] (score 55.8) and to a CCAAT-binding transcription factor subunit HAPB
[Arthroderma benhamiae] (score 55.8) and MEF2D Transcriptional activator protein (53.7)
amongst other DNA binding proteins.
The 46 central residues in the Dpfp5 sequence (69 – 114) have a pI of 5.84 and are most
abundant in proline (17%), tyrosine (13%) and cysteine (13%). This region displays no
discernible repeat pattern but is interesting because it contains cysteine residues that are
otherwise absent from the rest of the Dpfp5 sequence. Such specific cysteine localization could
possibly indicate a role for disulfide bridge interactions by the middle region of the protein.
Cysteine residues are also known for their potential roles as antioxidants owing to the hydrogen
atom in the thiol group available for donation [70]. Yu et al., 2011 demonstrated in the marine
mussel Mytilus californianus that cysteine containing byssal proteins can provide an acidic,
reducing environment that can reduce dopaquinone back to DOPA and thus restore DOPA
46
adhesion [16]. They also found that this thiol rich protein can later transform into a cross-linker
with the DOPA containing protein by forming S-cysteinyldopa adducts [16]. Thus, Yu et al.,
2011 identified this cysteine rich protein as a plaque antioxidant as well as a cross-linker to
improve plaque cohesion [16]. The distribution of cysteine residues uniquely within the middle
region of Dpfp5 thus indicates that this region might play a similar role in maintaining DOPA
adhesion and in mediating cohesion within the byssus.
The Dpfp5 C-terminus (residues 115 – 183) with a pI of 4.21 also has its distinct repeats and
amino acid compositions. In contrast to 29% proline present in the rest of the sequence, the C-
terminus has absolutely no proline. It also has only 1 tyrosine compared to 17 in the rest of the
sequence. The near absence of tyrosine from the C-terminus indicates that if any DOPA
(hydroxylated tyrosine) is present in the protein it is likely not in the C-terminus. Thus, this
section of the protein is most likely not involved in any DOPA dependent adhesion/cohesion.
The C-terminus is also composed of a very high percent of glycine (28%) and valine (16%)
compared to only a few of each in the middle region and absolutely none in the N-terminus.
These residues are represented in a series of VGG repeats richly constituting the C-terminus. The
VGG repeats are also a characteristic of the sandcastle worm (Phragmatopoma californica)
adhesive protein pc-1 [36]. The sandcastle worm uses its adhesive proteins to stick sand grains
together underwater. It has evolved independently of the zebra mussel (belonging to different
phyla) to incorporate a similar repeat sequence in its protein. This indicates that the VGG repeat
might be an important contributor to adhesion/cohesion and might be a key repeat for
bioadhesive glues to possess. In Dpfp5, the third glycine in VGG is occasionally substituted with
positive residues glutamic acid (D) and aspartic acid (E). These additional charges might
contribute to stronger charge – charge interactions of Dpfp5 within the byssus. GNN repeats
through the C-terminus and conserved positioning of arginine (R) preceding the VGG repeat
may also have specific functions in the protein.
The EST derived sequence of Dpfp5 (Figure 2-3) reveals striking similarities as well as some
unique properties as compared to the byssal protein Dpfp1. Dpfp1 is an acidic protein with a
diblock polymer structure consisting of an N-terminus with 22 repeats of a heptapeptide
consensus motif and a C-terminus with 16 repeats of a tridecapeptide [35]. In Dpfp5, the N-
47
terminus has a highly basic pI of 9.56 whereas the rest of the protein has an acidic pI of 4.45.
This is similar to Dpfp1 which has an N-terminus with pI 9.02 and C-terminus with pI 4.62.
Such striking charge differences between the protein ends must have an implication in the
mechanism of assembly and/or modes of interactions of these proteins. In both Dpfp1 and
Dpfp5, the two oppositely charged termini could possibly interact with each other in a certain
way allowing the proteins to assemble into a specific structure. Or perhaps the two ends interact
with distinct surfaces or different byssal materials thus functioning as connectors or increasing
the range of interactions the protein can undergo. These speculations and sequence interactions
must be further investigated by studying peptide mimics of these proteins. Interestingly as well,
Dpfp5 (theoretical pI 6.41) is only the second identified acidic byssal protein after Dpfp1 (pI
5.24) since marine mussel byssal precursor proteins are generally basic [9]. Thus, Dpfp5 is
similar to Dpfp1 in that it is also acidic and its termini are distinct with different kinds of repeats
and charges; however Dpfp5 is different in that it also has a middle sequence with no discernible
pattern that connects the two blocks. Further, the Dpfp5 N-terminus is quite hydrophilic in
comparison with the rest of the protein. This is unlike Dpfp1 which shows no such distinction.
The protein termini in Dpfp5 may thus have different solubility properties and different
affinities. For example, the hydrophilic N-terminal end might have a greater affinity for
hydrophilic surfaces or other hydrophilic proteins within the byssus whereas the hydrophobic C-
terminus would prefer to stay buried among hydrophobic proteins or maintain interactions with
hydrophobic surfaces at the plaque-substrate interface.
Knowledge about the distribution and sequence properties of byssal proteins can also give us
useful insights into their function and mode of adhesion/cohesion within the byssus. The newly
identified Dpfp5 byssal protein was identified by gel electrophoresis in both the thread and the
plaque extracts (Figure 2-1B). It must therefore have a role relevant to both. Since a plaque-
substrate adhesive role would not be required in the thread, we suggest that this protein might
have a cohesive or an alternate adhesive role in the byssus. This role could possibly include
DOPA dependent interactions such as cohesive DOPA quinone cross-linking or metal mediated
interactions in the cuticle [9]. It could possibly also include DOPA-independent chemical
interactions involving other protein residues such as covalent bonding, ionic binding, hydrogen
bonding, dipole interactions and/or van der waal interactions. Since the sequencing method did
48
not reveal information on the presence or absence of DOPA in Dpfp5, we can only speculate on
the proteins dependence on DOPA. Dpfp5 has only 10% tyrosine in its sequence compared to
23% in the derived Dpfp2 sequence and 15% in Dpfp1. Thus Dpfp5 has less tyrosine available
for hydroxylation to DOPA and hence possibly has a less DOPA dependent role than the other
proteins. That Dpfp5 might contain little or no DOPA is consistent with the finding that Dpfp5
did not stain for DOPA when extracted from the mussel’s foot by Rzepecki and Waite, 1993b
[33]. Lower DOPA compositions could also help explain better extraction and/or electrophoresis
of Dpfp5 compared to the other byssal proteins, as is witnessed with more prominent bands on
the gel (Figure 2-1B).
In addition to Dpfp5, we also investigated the sequence properties of the previously identified
DOPA containing protein, Dpfp2. Dpfp2 runs at 26 kDa on the gel and, like Dpfp5, is present in
both the thread and the plaque extracts (Figure 2-1). Thus, it must have a DOPA dependent role
that is relevant to both the thread and plaque. However, the Dpfp2 gel bands are not as prominent
as the Dpfp5 bands. This may possibly be because the presence of DOPA makes the protein
more resistant to extraction and electrophoresis. Owing to its highly basic pI of 9.32, it is also
possible that the protein is not sufficiently extracted in basic extraction buffer. With Dpfp2 as
well, the N-terminus of the EST derived sequence is incomplete, but here the most striking
properties seem to be at the sequence core. The bulk of the protein, from residues 4 to 117,
consists of five full repeats a 22 residue consensus sequence
KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where non-italicized residues represent highly
conserved residues (Figure 2-5A). Interestingly, the position of the tyrosine residues is
conserved in all of the repeats. Since tyrosine is hydroxylated to DOPA, this indicates that the
positioning of the DOPA residue plays an important role in the structure/function of the protein.
The deamidation of glutamine (Q) near the N-terminus may also hold some significance in the
function of the protein. Sagert and Waite, 2009 suggest that deamidation in the byssal proteins
might occur to provide charge heterogeneity or primary structure heterogeneity to the protein
[25]. It is however also possible that the deamidation seen here occurs due to experimental
conditions such as temperature, buffer and the basic pH [71]. It is also interesting that the
AYPTK(D/Q) and the SYPVYPE regions within the consensus are somewhat similar to the
SYPAYP repeat in the N-terminus of the Dpfp5 protein. Dpfp2 is very different than Dpfp1 and
49
Dpfp5 in that it does not have a copolymer block structure and is highly basic. It has no
distinctions between different regions of the protein. It has the same consensus sequence
throughout and alternating hydrophobic and hydrophilic regions as seen in Figure 2-5B. Thus,
its assembly and modes of interactions are potentially different than that of these other byssal
proteins.
Comparisons of zebra mussel adhesive proteins with adhesive proteins from other species can be
useful in determining common properties of adhesive mixtures that have evolved independently.
The zebra mussel byssus contains a mixture of basic (Dpfp2) and acidic (Dpfp1 and Dpfp5)
proteins. Like the zebra mussel, the sandcastle worm also has adhesive proteins with distinct
charges. One of these is strongly acidic with a pI 2.5 and two are basic with a pI greater than 9
[36]. These protein pI similarities between the species indicate a possible requirement for
distinctly charged proteins in an adhesive mixture. In marine mussels however, the byssal
precursor proteins are generally basic [9]. Like Dpfp1 and Dpfp5, one of the adhesive proteins in
the sandcastle worm, pc-3A, is also a highly acidic diblock protein with an acidic N-terminus
and basic C-terminus [41]. Thus, its sequence distribution is similar to that of Dpfp1 and Dpfp5,
once again revealing similar adhesive mechanisms between species.
It is interesting that while several EST sequence matches were obtained for Dpfp5 and Dpfp2
bands, no EST sequences matched to Dpfp1. It is unlikely that this is due to limitations in the
tryptic digestion and LC-MS/MS of the Dpfp1 gel band because three sufficient peptide matches
were obtained against the full sequence of Dpfp1 (Table 2-3). It is thus possible that the EST
library lacks the cDNA for this protein possibly because its mRNA was not present in the
mussel’s foot at the time that the library was created. In addition to Dpfp5, two other novel
byssal proteins, Dpfp0 and Dpfp4 were also identified by gel electrophoresis (Figure 2-1). Like
Dpfp5, these proteins did not previously stain for DOPA in the foot extract [33]. Dpfp0, with a
molecular weight greater than 210 kDa, runs at a similar molecular weight to the Dbfp0 protein
in the closely related freshwater mussel, the quagga mussel (Dreissena bugensis) [33]. The zebra
and quagga mussels possess a number of potentially homologous proteins pairs including Dpfp1
and Dbfp1, Dpfp2 and Dbfp2 and Dpfp3 and Dbfp3. Thus, it is possible that Dpfp0 and Dbfp0
are also homologues of each other. However, Dbfp0 is known to be a DOPA containing protein,
50
indicating that if these are in fact homologous, then Dpfp0 might be a DOPA containing protein
as well. It would not be right to disregard Dpfp0 as a DOPA protein just because it did not stain
for DOPA in the foot extract in the experiments done in 1993 [33]. Its low band density as
compared to the other gel bands in Figure 2-1 could indicate that it is not easily extracted or not
present in sufficient quantities in the byssus and this could maybe explain why it did not
previously stain for DOPA [33]. With regards to localization, Dpfp0 was seen in the plaque
extract but not in the thread extract. However, almost twice as much plaque extract was loaded
and hence we cannot conclude that this protein is unique to the plaque. The novel protein Dpfp4,
on the other hand, appears to be present uniquely in the thread but the bands are too faint to make
any conclusions on the distribution of the protein.
Our investigation has thus provided some useful insights into proteins constituting the zebra
mussel byssus. Analysis of induced, freshly secreted byssal threads allowed us to extract proteins
at a stage of minimal DOPA cross-linking. Electrophoresis of these extracts allowed us to
identify novel byssal proteins and compare their distribution between the thread and plaque. The
novel protein Dpfp5 was found localized in both the thread and plaque and its putative sequence
revealed distinct N and C termini with distinct repeats and amino acid properties. The incomplete
N-terminus, owing to the incomplete 5’ end of the cDNA sequence, however limits our complete
understanding of the Dpfp5 sequence. The similarities of Dpfp5 to the diblock copolymer
structure of Dpfp1and distinction from the uniblock, basic properties of the Dpfp2 protein
provide an interesting insight into byssal protein properties. Further, comparisons of these zebra
mussel proteins with adhesive proteins in the much studied marine mussels and sandcastle worm
reveals common adhesive mechanisms that have evolved independently and that must therefore
be important in adhesion. Future work must look into zebra mussel protein distribution at the
plaque substrate interface. Adhesive/cohesive interactions of the byssal proteins must also be
investigated by studying peptide mimics of the proteins. An understanding of zebra mussel
adhesion will ultimately be useful in the development of biocompatible and water resistant
adhesives for medical and dental applications. Additionally, this knowledge will reveal byssal
properties that must be targeted in the development of antifouling agents against this biofouling
species.
51
2.6 Acknowledgments
The authors gratefully acknowledge Dr. Craig Simmons and Dr. Ben Ganss for access to
electrophoretic equipment and Zahra Shahrokh for advice on protocols. We would like to thank
Trevor Gilbert and Kyle Serkies for collecting the mussels. We also thank Li Zhang and
Reynaldo Interior of the Advanced Protein Technology Centre, Sick Kids, Toronto for LC-
MS/MS and amino acid analysis, respectively. This work was supported by the National
Sciences and Engineering Research Council (NSERC) of Canada, the Canadian Foundation for
Innovation (CFI), and an Ontario Graduate Scholarship (OGS).
52
Chapter 3:
Novel Proteins Identified in the Insoluble Byssal Matrix
of the Freshwater Zebra Mussel Dreissena polymorpha
Arpita Gantayeta and Eli D. Sone
a,b,c *
a Institute of Biomaterials & Biomedical Engineering; University of Toronto, Toronto, ON,
Canada
b Department of Materials Science & Engineering, University of Toronto, Toronto, ON, Canada
c Faculty of Dentistry; University of Toronto, Toronto, ON, Canada
*Corresponding author: Email: [email protected]
This chapter is in preparation as a manuscript to be submitted to the journal ‘Marine
Biotechnology’.
53
3.1 Abstract
The freshwater zebra mussel Dreissena polymorpha is an invasive, biofouling species that
adheres to a variety of substrates underwater using a proteinaceous holdfast called the byssus and
is therefore an inspiration for the development of water-resistant bioadhesives for medical and
dental applications. The byssus, consisting of a number of threads with adhesive plaques at the
tips, utilizes a rare amino acid called 3, 4-dihydroxyphenylalanine (DOPA) to mediate adhesion
and cohesion within the byssus. This is similar to the much-studied marine mussel byssus but the
DOPA compositions are lower in zebra mussels, thus indicating the importance of other non-
DOPA interactions as well. Extensive DOPA cross-linking however renders the zebra mussel
byssus highly resistant to analysis and therefore limits byssal protein identification. We report
here on the sequencing and identification of seven novel byssal proteins in the insoluble byssal
matrix following protein extraction from induced, freshly secreted byssal threads with minimal
cross-linking. Comparisons of the protein sequences, as determined by LC-MS/MS analysis and
spectrum matching against a zebra mussel cDNA library, identified repeat patterns and block
structures as common features of zebra mussel byssal proteins and identified varying theoretical
molecular weights (4.1 to 14.6 kDa) and isolectric points (4.2 – 9.6) of byssal proteins. All
proteins contain one or more defined sequence motifs including glycine rich, proline and tyrosine
rich and proline and cysteine rich motifs and all of the proteins were identified in both the thread
and plaque matrices.
Keywords: bioadhesion, DOPA, LC-MS/MS, mussel adhesion proteins, plaque, threads
54
3.2 Introduction
Zebra mussels (Dreissena polymorpha) are an invasive freshwater mussel species that are native
to the Black, Caspian and Azov Seas and were accidentally introduced into North American
water bodies such as the Great lakes in the late 1980’s [1]. These bivalves are able to attach
strongly to a variety of hard substrates underwater using a proteinaceous structure called the
byssus that consists of a number of threads with adhesives plaques at the tips and that is
surrounded by a protective layer called the cuticle [33]. The mussels’ ability to attach to boat
hulls has allowed them to spread rapidly over the years and in addition to causing major
ecological repercussions, this biofouling species is also able to clog water intake pipes and affect
water based industries such that its economic impact has been far in excess of $100 million [2].
The zebra mussel byssus is superficially similar to the byssi of the marine mussels which have
evolved independently as a different subclass [8], [3] and have been studied much more
extensively [18], however the zebra and marine mussel byssi differ in their overall amino acid
contents and hence, their overall protein compositions [33]. The byssi of both species contain
proteins containing the rare amino acid 3,4 dihydroxyphenylalanine (DOPA), however the
marine mussels have much higher compositions of this residue which is a post translational
hydroxylation of tyrosine [10, 18]. Additionally, while the zebra mussel byssus contains similar
compositions of DOPA and other amino acids within the threads and adhesive plaques, marine
mussels have threads composed of collagenous proteins versus DOPA containing proteins
localized in the plaques [9]. Underwater adhesive proteins are a characteristic of several aquatic
species including mussels [9], sandcastle worms [36], barnacles [37], starfish [38], sea
cucumbers [39] and caddisfly larvae [40]. The sandcastle worm, Phragmatopoma californica has
also evolved independently of the mussels to incorporate DOPA residues in some of its cement
proteins for the purpose of sticking sand grains together underwater [41]. Thus, DOPA is an
important component of adhesive proteins in other species as well. In the mussel byssus, DOPA
can undergo a variety of adhesive and cohesive interactions. It can form multiple metal mediated
ligations to give cohesive strength to the cuticle [46], it can undergo catechol oxidase mediated
oxidation to DOPA quinone followed by covalent cross-linking with DOPA, lysine and cysteine
in order to provide cohesive strength to the thread and plaque [15] and in its native form, DOPA
can bind to metal oxide surfaces and mediate surface adhesion [11]. However, the low DOPA
55
content in zebra mussels byssal proteins [10] and the mussels ability to adhere to a variety of
substrates (both hydrophobic and hydrophilic) [72] indicates that other amino acid interactions
other than DOPA must also contribute to the adhesion and cohesion functions of byssal proteins.
Information on the protein compositions of the zebra mussel byssus and an insight into their
sequence properties will thus be useful in understanding the molecular mechanism of zebra
mussel adhesion. This knowledge will ultimately be useful in the design of biocompatible and
water-resistant adhesives for dental and medical applications and will contribute to the design of
targeted anti-fouling agents against this biofouling species.
Mature zebra mussel byssal threads are highly resistant to extraction and immunolocalization due
to extensive cross-linking of DOPA residues [33] thus making protein composition analysis quite
difficult. So far, six Dreissena polymorpha foot proteins (Dpfp0 – 5) have been identified by gel
electrophoresis and the primary sequences of three of these are known. Dpfp1, Dpfp2 and Dpfp3
were first identified as byssal proteins based on their ability to stain for DOPA in an
electrophoresed extract from the ‘foot’, the organ that secretes precursor byssal proteins to form
byssal threads [33]. Dpfp0, Dpfp4 and Dpfp5 were identified as silver-stained gel bands in a
soluble extract from induced, freshly secreted byssal threads with minimal cross-linking [73].
However, no information is available on the DOPA content of these proteins. Table 3-1
describes the molecular weights, DOPA contents and primary sequence information available for
each of the six identified byssal proteins. While Dpfp1 is the only protein for which the full
primary sequence is known [35], the primary sequences of Dpfp2 [33], [73] and Dpfp5 [73] have
also been determined by tandem mass spectrometry analysis and database matching against a
zebra mussel foot cDNA library created by Xu and Faisal, 2008 [56], however, the cDNA
sequences are potentially incomplete at the N-terminus. Dpfp1 has a block-copolymer structure
consisting of 22 repeats of a heptapeptide P(V/E)YP(T/S(K/Q)X at the N-terminus and 16
repeats of a tridecapeptide KPGPY*DYDGPYDK at the C-terminus, where Y* represents
DOPA [35]. Dpfp2 contains five full repeats of the near consensus sequence
KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where the position of the tyrosine residue is always
conserved [73]. Dpfp5 also has a block structure where the N-terminus is rich in proline (P) and
glutamine (Q) triads, the C-terminus is rich valine (V), glycine (G) and asparine (N) based triads
and the middle region contains cysteine that is absent in the other sequences [73]. Interestingly,
all three byssal proteins have been identified in both the thread and the plaque [33], [73]. Matrix-
56
assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) of
mature byssal threads has additionally revealed the presence of several small molecular weight
byssal proteins in the range of 3.7 – 7 kDa that have not been identified yet [34]. These proteins
have distinct distributions between different regions of the byssus including the thread, plaque
and 10-20 nm [6] adhesive interface [34]. Identification of these proteins and information on
their sequences can therefore provide useful insights into the adhesive/cohesive roles of byssal
proteins.
Since mature byssal threads are highly resistant to extraction, and protein identification in the
mussels foot reveals no information on byssal distribution, we induce the secretion of fresh
threads such that byssal protein composition can be studied after secretion from the foot but
before extensive cross-linking. This method has been used extensively in marine mussels [58],
[25] and once before in the freshwater zebra mussels [73]. In marine mussels, it has been shown
that the induced byssal thread compositions are almost the same as in naturally secreted threads
[16], [25]. Assuming a similar scenario in zebra mussels, we use fresh, induced threads as a
model system to better understand mechanisms in natural threads. Following protein extraction
from induced byssal threads, the byssal sample can be centrifuged to separate out soluble extract
and insoluble matrix proteins. However, while the soluble proteins can be separated and
analyzed by electrophoresis [73], such analysis is not possible for the insoluble byssal proteins.
Therefore, we characterize the protein composition of the insoluble matrix by directly
performing tandem mass spectrometry analysis following trypsin digestion of the pellet proteins.
Here, we report on the identification of novel as well as previously known byssal proteins in
base-insoluble thread and plaque matrices and analyze the sequences of the seven novel byssal
proteins identified by zebra mussel cDNA database matching.
57
Table 3-1. Summary of the molecular weight and sequence information available for the six
identified zebra mussel byssal proteins (Dpfp), in decreasing order of their molecular weights as
determined by gel electrophoresis.
Dreissena
polymorpha
Foot Protein
(Dpfp)
MW by
Electropho
-resis (kDa)
MW by
MALDI-
TOF
(kDa) [35]
Max
DOPA
Content
[33]
Primary
sequence
information
known
Theoretical MW
and pI based on
primary sequence
Dpfp0 [73] > 210 - - - -
Dpfp4 [73] > 90 - - - -
Dpfp1 [33],
[73]
76 & 65, 54.5 & 48.6 6.6% Full sequence
[35]
49 kDa, pI 5.3-6.5
Dpfp5 [73] ~ 30 - - Incomplete N-
terminus [73]
20.1 kDa a, pI 6.4 a
Dpfp2 [33],
[73]
26 - 7.0% Incomplete N-
terminus [73]
15.3 kDa a, pI 9.4 a
Dpfp3 [33] 12-13 - Unknown - -
a MW and pI are theoretically calculated based on primary sequence lacking signal peptide
3.3 Methods
3.3.1 Protein extraction from induced byssal threads/plaques
Zebra Mussels collected from Round lake, Ontario, Canada were kept for up to 60 days at room
temperature in an aquarium in artificial freshwater prepared with a recipe by M. Sprung, 1987
[59]. Protein extractions were performed as described previously [73]. Firstly, as described by
Tamarin et al., 1976, the mussels were dissected and the foot was injected with approximately 30
µL of 0.56M Potassium Chloride (KCl) using an 18G syringe, thus leading to the secretion of a
fresh byssal thread [58]. After 3-5 minutes, the induced thread/plaque was located in the foot’s
ventral groove, was pulled out with tweezers, washed in a drop of deionized water and extracted
in extraction buffer. The extraction method was adapted from Zhao and Waite, 2006 [30]; byssal
threads (around 6 to 14 per extraction) were extracted in 250 µL of basic extraction buffer (EB)
58
(0.2M sodium borate, 4M urea, 1mM KCN, 1mM EDTA, and 10 mM ascorbic acid) that was
prepared using a recipe adapted from Rzepecki and Waite, 1993 [33]. Samples were
homogenized on ice in a 1mL Ground Glass Hand-Held Tissue Grinder, sonicated with a probe
sonicator (15 times, 2sec each) and centrifuged (17000 g, 8 min, 4°C) [30]. The supernatant
(soluble extract) and the pellet (insoluble matrix) were stored separately at -20°C. Where
relevant, the byssal thread was separated into thread and plaque prior to extraction.
3.3.2 Protein digestion
The base-insoluble matrix proteins were trypsin digested at the Advanced Protein Technology
Centre, Sick Kids Hospital, Toronto by suspending in 50 mM ammonium bicarbonate, reducing
with 10 mM DTT (30 min, 56°C), alkylating with 55 mM iodoacetamide (15 min, dark, room
temperature) and then digesting with 1µg trypsin (Porcine, Sequencing Grade, Promega)
overnight at 37°C. Extracted peptides were lyophilized by SpeedVac centrifugation and
reconstituted in 20 µL 0.1% formic acid in water for LC-MS/MS analysis.
3.3.3 Liquid chromatography – tandem mass spectrometry (LC-MS/MS)
LC-MS/MS analysis of extracted peptides was performed by the Advanced Protein Technology
Centre at MaRS Discovery District, Toronto. The digested peptides were loaded onto a 100 μm
ID pre-column (Dionex) at 4 μl/min and separated over a 50 μm ID analytical column (C18 2um,
Dionex). The peptides were eluted over 60 min. at 250 nl/min. using a 0 to 35% acetonitrile
gradient in 0.1% formic acid using an EASY n-LC 1000 nano-chromatography pump (Thermo
Fisher, Odense Denmark). The peptides were eluted into a Q-Exactive hybrid
quadrupole/orbitrap mass spectrometer (Thermo-Fisher, Bremen, Germany) operated in a data
dependant mode. Data was acquired at 70,000 FWHM resolution in the MS mode and 17,500
FWHM in the MS/MS mode. 10 MS/MS scans were obtained per MS cycle.
59
3.3.4 Database matching and protein identification
MS data was analyzed using two mass spectrometry software, SEQUEST (Thermo Fisher
Scientific, San Jose, CA, USA; version 1.3.0.339) and PEAKS (Bioinformatics Solutions Inc.,
Waterloo, Ontario, Canada). While SEQUEST directly matches the mass spectrum data from the
protein extract against theoretical spectra from a library [74], the PEAKS program first
determines de novo peptide sequences based on the mass spectrum data and then matches these
sequences against a library [75]. In both programs, MS/MS data was searched against zebra
mussel and metazoa protein databases and against a zebra mussel Expressed Sequence Tag
(EST) library virtually translated in six different reading frames using Virtual Ribosome 1.1
(http://www.cbs.dtu.dk/services/VirtualRibosome/). The cDNA library comprising 750 genes
with Accession numbers AM229723 to AM230448 (downloaded November 2011 from the
Genbank Server) was prepared by Xu and Faisal, 2008 using a BD Clontech PCR-Select cDNA
Subtraction Kit [56] and represents genes expressed uniquely in the mussel foot. During creation
of this library, base pairs were removed from the 5’ end of the DNA to add adaptor sequences for
cDNA amplification. If the base pairs are removed from the 5’ translated region, the virtual
protein sequences are incomplete at the N-terminus [62].
When using SEQUEST, tandem mass spectra were first extracted, charge state deconvoluted and
deisotoped by BioWorks version 3.3. SEQUEST was searched with a fragment ion mass
tolerance of 20 PPM and a parent ion tolerance of 15 PPM and hits were manually confirmed by
inspecting the spectra. Iodoacetamide derivative of cysteine was specified as a fixed
modification and deamidation of asparagines and glutamine, hydroxylation of tyrosine (DOPA)
and oxidation of methionine were specified as variable modifications. MS/MS based peptide and
protein identifications were visualized and validated using a program called Scaffold 3.4.9
(Proteome Software Inc., Portland, OR). SEQUEST peptide identifications required at least
deltaCn scores of greater than 0.10 and XCorr scores of greater than 1.2, 1.5, 2.0 and 2.2 for
singly, doubly, triply and quadruply charged peptides respectively. The XCorr score for peptide
matches measures how closely the actual spectra fit the theoretical spectra [74]. Peptide
identifications were accepted at greater than 95.0% probability and protein identifications were
accepted at greater than 95.0% probability with at least 2 identified peptides.
60
Using PEAKS, de novo sequences derived from MS/MS data were matched against the databases
with tyrosine hydroxylation to DOPA set as a variable modification. Parent ion and fragment ion
mass tolerances were set to 5 PPM and 0.01 Da respectively and hits were manually confirmed
by inspecting the spectra. In PEAKS, the identification probabilities of the protein and peptide
matches are indicated by the formula -10LogP. Protein identifications were accepted at a -
10LogP score greater than 50 with at least two identified peptides. In some relevant EST
matches, in both SEQUEST and PEAKS, where the protein identification criteria were not met
and where less than two peptides were identified, the protein identification was justified based on
the presence of repeat patterns and/or similarities to other known byssal proteins. The exceptions
are described in the results section.
3.3.5 Sequencing data analysis
The theoretical mass, pI and amino acid composition of virtual EST protein matches were
determined using EMBOSS Pepstats and amino acid distribution metrics were determined using
the EMBOSS Pepinfo tools on the European Bioinformatics Institute website
(http://www.ebi.ac.uk/Tools/emboss/pepinfo/). Signal peptides were searched using the SignalP
4.0 (http://www.cbs.dtu.dk/services/SignalP/) and PrediSi (http://www.predisi.de/) online tools
and multiple sequence alignments were performed with the Clustal-W2 online tool
(http://www.ebi.ac.uk/Tools/msa/clustalw2/). Protein homology searches were done using NCBI
Protein BLAST (Basic Local Alignment Search Tool)
(http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins). Conserved domains were predicted
using SMART (Simple Modular Architecture Research Tool) (http://smart.embl-heidelberg.de/).
61
3.4 Results and Discussion
3.4.1 Identification of novel and known proteins in base-insoluble thread and
plaque matrices
Owing to the cross-linking mediated resistance to characterization of mature zebra mussel byssal
threads, we performed our analysis on fresh threads that have undergone minimal cross-linking.
Induced byssal threads were separated into thread and plaque and separate peptide fragment
fingerprinting analysis was performed on their insoluble matrices. This involved LC-MS/MS
analysis of the digested matrices followed by spectrum matching against a cDNA library of zebra
mussel foot proteins. Analysis of the MS/MS data with SEQUEST and PEAKS led to the
identification of a number of EST matches that sometimes overlapped and that were sometimes
unique to either program. EST matches were therefore primarily determined using SEQUEST
and PEAKS was then used to supplement the identifications. The identified EST matches were
found to represent both previously known zebra mussel byssal proteins as well as novel protein
sequences. Table 3-2 describes the accession numbers of the novel EST matches, the program
they were identified by as well as the probability of the protein and peptide-spectrum matches. In
addition to byssal proteins, contaminations from other cellular proteins in the foot tissue
including cytoplasmic actin, translation elongation factor 1α, alpha tubulin, histone 3 and
cyclophilin A were also identified with PEAKS -10LogP probabilities of 193, 120, 77, 70 and
49, respectively.
All of the EST matches (for both known and novel byssal proteins) obtained from the insoluble
matrices were identified in both the thread and the plaque extracts, though sometimes with
differing probabilities. The non-specific distribution of Dpfp1 [33], [73], Dpfp2 [73] and Dpfp5
[73] had also previously been determined by electrophoresis of soluble thread and plaque
extracts, although distribution analysis of Dpfp2 in mature threads and plaques by Rzepecki and
Waite, 1993 had, with some uncertainty, revealed unique localization of Dpfp2 in the thread
[33]. In the current distribution analysis of separated thread and plaque matrices, it appears that
all of the sequenced zebra mussel byssal proteins identified thus far have a non-specific
distribution between thread and plaque, however there is an additional possibility of some thread
62
contamination in plaque samples since separation of thread from plaque is not always exact.
Additionally, any thread or plaque based linker proteins present near the thread-plaque anchor
zone might appear to be present in both the thread and plaque during analysis, thereby making it
difficult to determine their distributions.
Among the novel EST matches, similar EST sequences with the same spectrum match were
clustered together and described as the single sequence of a putative novel protein. A total of
seven novel proteins were thus identified in the insoluble extracts. These were named Dpfp6 –
Dpfp12 in decreasing order of their molecular weights. Three of these, Dpfp7, Dpfp9 and
Dpfp11 are represented by two or more similar sequences with unique peptide matches. These
protein sequences are described as variants α, β and γ of the novel protein. Table 3-2 describes
the deduced sequences of the seven novel byssal proteins Dpfp6 – Dpfp12 and their variants.
Post translational modifications (PTM) observed include asparagine deamidation (N) and
cysteine carbamidomethylation (C). While deamidation can be an artifact of basic pH conditions
and protein aging [76], [77], cysteine carbamidomethylation is a fixed modification that occurs
due to alkylation of cysteine during trypsin digestion [76]. No tyrosine hydroxylations to DOPA
were observed in any of the peptide matches. Signal peptides (underlined sequences in Table 3-
2) were observed for all proteins except Dpfp6 and Dpfp8. The missing or incomplete signal
sequences could be due to removal of base pairs from the 5’ end in order to add adaptor
sequences for cDNA amplification during creation of the cDNA library [56]. For our purposes,
the adaptor sequences were removed from the N-terminus of all EST matches prior to
sequencing analysis. In sequences with intact signal peptides, the base pairs may have been
removed from the 5’ untranslated region. Additionally, while stop codons were observed at the
C-terminus of almost all EST matches, stop codons were surprisingly absent in the Dpfp12
matches. The YLGRDHANRIPAA sequence at the Dpfp12 C-terminal end is however observed
throughout the cDNA library as the end sequence in several C-terminal non-coding sequences.
The EST sequence AM230111 is an example where the sequence is preceded by a stop codon.
Therefore, the YLGRDHANRIPAA sequence was assumed to be after a missing stop codon and
was removed from the end during sequence analysis.
The only three zebra mussel byssal proteins with known sequences, Dpfp1 [35], Dpfp2 [33], [73]
and Dpfp5 [73], were also identified in the insoluble byssal extracts. In SEQUEST, Dpfp1
63
(AF265353), Dpfp2 (AM229739) and Dpfp5 (AM230139) were identified with 100%
identification probability with the identification of 8, 5 and 2 unique peptides, respectively.
Previously, the gel band derived EST match for Dpfp5 had revealed only one peptide-spectrum
match (YVGEGNNVGEQR) [73]. Here an additional match (NDVDGNENIVGGQSNAVGGK)
with an XCorr score of 2.89 was found. This confirms the previous identification of Dpfp5. In
1993, Rzepecki and Waite had additionally identified a precursor DOPA containing protein
called Dpfp3 by electrophoresis of the zebra mussel foot extract [33]. This protein ran as a 12-13
kDa mixture on the gel but its sequence has not yet been determined [33]. It is therefore possible
that some of the novel proteins identified in this analysis might actually correspond to the known
protein Dpfp3. However, electrophoresis often overestimates the molecular weight of the DOPA
containing proteins and we are unaware of all PTMs on the novel sequences. Hence, it is not
possible to determine which of these novel proteins, if any, might correspond to Dpfp3.
64
Table 3-2. Sequences of novel byssal proteins identified in insoluble plaque and thread extracts by LC-MS/MS analysis and database
matching against a zebra mussel foot protein cDNA library. The proteins have been named Dpfp6 – Dpfp12 in decreasing order of
their molecular weights (MW). Sequences are deduced from clustered sequencing where more than 1 EST match was found. Signal
peptides are underlined and peptides that match with mass spectra are indicated in bold. The probability of their matches is also
indicated. Post translational modifications observed include asparagine deamidation (N) and cysteine carbamidomethylation (C).
Protein Name
(GenBank Accession
Number)
Protein Sequence (# of amino acids) a
Theoreti
cal
MW a,
pI a
Protein
Identification
Probability
(PEAKS b,
SEQUEST c)
Matching Peptide
Sequences
Peptide Scores
PEAKS
(-10logP)
SEQUEST
(X-Corr)
Dpfp6 b
(AM229723, 229736,
229737)
YDPVEDKKPGPYDYDGPYDKNPGPYDYDGPYDKKP
DPYGTDWQYDKKTGPYVPDKSEDKKPGPYDYDGP
YDKNPGPYDYNGPYDKKPGPYDYDGPYDKKPGPYD
YDGPYDIKPGPYDYDVPRPRPR (126)
14.6 kDa
pI 4.2
109 b YDYDGPYDK
NPGPYDYDGPYDK
KPDPYGTDWQYDKK
KPGPYDYDGPYDK
19
50
61
53
Dpfp7α bc
(AM230153)
MFSTVTLVLLVSCCGVALSSWIPYGKSYLPQQPAGK
GGYWNSYLPQYENYGPQQYQGSYWPGPWGGWRGN
NVGSQGNSVSGYGNAVGSQGNNVDGYGNDVGWQW
NSVDGKGNYVGSQWNSVN (103)
11.2 kDa
pI 6.5
80 b
89 % c
SYLPQQPAGK 69
2.2
Dpfp7β c
(AM230070)
MFSTVTLVLLVSCCGAAFSSWSPYWNSYLPGQGSGK
GGYWNSNVPKYGSYWPQQYPSYSGSYWPGWGNNV
GSQGNSVRGYGNAVGSQGNDVSGYGNDVGWQWNS
VDGKGNYVGSQWNSVN (101)
11.0 kDa
pI 8.7
100% c GGYWNSNVPK 2.2
Dpfp7γ c
(AM230146, 230189,
230411)
MFSTVTIVLLVSCCGAALSSWIPYGNSYSPEQGKGGY
WNSYLPKYESYRPQQYPSYPGSYWPGPWGGWQGDN
VGSQKNSVDGTGNYVGWQKNYVN (76)
8.7 kDa
pI 8.7
100% c GGYWNSYLPK
NSVDGTGNYVGWQK
2.5
4.5
Dpfp8 c
(AM230242, 230362)
VQDHMSVRLDNVLKVLGGVATGNKYSSDEIATLV
GSTGGGSVNTGGYSKGTYPVPYGTGGVSGYKSGG
R (69)
6.9 kDa
pI 9.5
100% c LDNVLKVLGGVATGNK
YSSDEIATLVGSTGGGSV
NTGGYSK
GTYPVPYGTGGVSGYK
2.1
6.3
3.5
65
Dpfp9α c
(AM229975)
MNIKQLMCLLVAAVALLAIAPVANAQYYDYGYGGN
NYGYPGNYGYGGNYGGYPGKYGDYDNYGGGWLY
KILGGGGKGKGKWGGYGGYGK (64)
6.8 kDa
pI 9.3
89% c YGDYDNYGGGWLYK 4.3
Dpfp9β c
(AM229830)
MNTKQLMCVLYAAVVLLAVANAQYCDYGYGGNNY
GYPGNYGYGGNYGGYPRNYGDYDNYGGGWLYKIL
GGGGIGKGKWGGYGGYGK (64)
6.8 kDa
pI 8.8
82% c NYGDYDNYGGGWLYK 4.2
Dpfp10 c
(AM230045, 230046,
230047, 230048,
230050, 230051,
230052, 230053,
230054, 230055,
230056)
MLSAVSFLLLVTLYVTVSSQTYKGYPPPKPYPKDPCY
KVYCPPIYCPKGQYTPPGECCPRCKKGYGYQDPDP
YFPGGK (59)
6.7 kDa
pI 8.8
100% c VYCPPIYCPK
GQYTPPGECCPR
2.7
3.0
Dpfp11α c
(AM230400)
MLSAVTLLLLVSCCGMALSQWGGDSCRPIYPPLDCRL
VFCQPAINCRYGNYTPKGHCCSVCIEDCWGWPWPW
GK (55)
6.4 kDa
pI 7.6
89% c LVFCQPAINCR 3.2
Dpfp11β b
(AM230182)
MLSAVTLLLLVSCCGMALGQWGGDRCSPRYPPLDCT
VVLCAFPINCRYGSFTPKGRCCPVCIEDCWGWPWPG
K (54)
6.1 kDa
pI 8.0
58b
YGSFTPK 49
Dpfp12 c
(AM230355, 230369,
230302)
MFSAATLLLLVSFYGTASGQYWNSYRPYPVYPPKQT
YPSYPDKKYPSYPEKT (33)
4.1 kDa
pI 9.6
100% c QTYPSYPDK
YPSYPEK
2.3
2.3
a Sequence properties were calculated after removing the predicted signal peptide sequence
b EST sequence identified with PEAKS program
c EST sequence identified with Scaffold program using the SEQUEST search engine
66
3.4.2 Sequence properties of novel byssal proteins identified in the insoluble
extracts
The seven novel byssal proteins identified in the insoluble extracts display varied sequence
characteristics. After removing signal peptides, the sequence lengths range from 33 to 114
residues, molecular weights range from 4 kDa to 15 kDa and the isoelectric points range from
acidic (pI 4.2) to basic (pI 9.6) (Table 3-2). Table 3-3 shows the theoretical compositions (in
mol %) of the most prominent amino acids present in the sequence of the novel proteins as well
as previously known zebra mussel byssal proteins. These byssal proteins are collectively rich in
Pro (P), Gly (G), Tyr (Y) and Asx (D/N) and also have significant compositions of Ser (S), Lys
(K) and Val (V). While most protein sequences contain absolutely no cysteine, Dpfp9β, Dpfp10,
Dpfp11 (α and β) and Dpfp5 contain 1, 6, 8 and 6 cysteine residues, respectively. Based on these
theoretical amino acid compositions of the protein sequences, the byssal proteins can be
categorized as ‘Glycine rich’, ‘Proline and Tyrosine rich’ (P,Y rich) and ‘Proline and Cysteine
rich’ (P, C rich) as shown in Table 3-3. The P, C rich category not only includes proteins with
high P and C content but also includes proteins where cysteine is not one of the most prominent
amino acids but is still present in significant compositions relative to other byssal proteins.
67
Table 3-3. Theoretical mole % compositions of prominent amino acids found in the sequences of
previously known zebra mussel byssal proteins and novel byssal proteins indentified in the
insoluble byssal extracts.
Proteins Variants Prominent amino acids Protein
Category
NOVEL 1st 2
nd 3
rd Notable
Dpfp6 - P (22%) D (21%) Y (20%) P, Y rich
Dpfp7 Dpfp7α G (21%) N (13%) Y, S (11%) Glycine rich
Dpfp7β G (21%) S (16%) N (13%)
Dpfp7γ G (17%) Y (15%) S (12%)
Dpfp8 - G (17%) V (13%) S (12%) Glycine rich
Dpfp9
Dpfp9α G (41%) Y (25%) K (9%) Glycine rich
Dpfp9β G (39%) Y (23%) N (9%)
Dpfp10 - P (25%) Y (17%) K (14%) C (10%, 5th) P, C rich
Dpfp11 Dpfp11α C (15%) P (13%) G (11%) P, C rich
Dpfp11β C (15%) P (15%) G (11%)
Dpfp12 - P (24%) Y (24%) K (12%) P, Y rich
KNOWN
Dpfp1[35] - P (24%) Y (24%) D, K (11%) P, Y rich
Dpfp2 [33],
[73]
- Y (23%) K (18%) P (16%) P, Y rich
Dpfp5 [73] - P (18%) G (12%) Q (12%) Y (10%, 4th)
C (3%, 7th)
P, Y & P, C
rich
3.4.3 Proline and tyrosine (P, Y) rich proteins
Dpfp6, a protein resembling the C-terminus of Dpfp1
The EST matches obtained for Dpfp6 give high probability protein matches in PEAKS and
contain four peptide spectrum matches thus indicating a high probability identification of the
protein (Table 3-2). The sequence however displays no signal peptide and might therefore be
incomplete at the N-terminus owing to limitations in the creation of the cDNA library [62] (see
Methods). The clustered sequence of Dpfp6 is 126 residues long and has a theoretical MW of
14.6 kDa and an acidic theoretical pI of 4.2 (Table 3-2). The sequence is richest in proline,
aspartic acid and tyrosine (Table 3-3) and also contains considerable lysine (13%) and glycine
68
(12%). Interestingly, the sequence contains several repeats of a consensus sequence
KPGPYDYDGPYDK as shown in Figure 3-1A. The consensus sequence consists of two PY
diads (indicated in bold in Figure 3-1A) and the position of the first diad is conserved through
all the consensus repeats.
A BLAST search of the Dpfp6 sequence did not reveal any significant protein matches.
Interestingly though, when aligned against the sequence of Dpfp1, Dpfp6 shows high levels of
similarity with the C-terminus of the Dpfp1 protein (residues 230 – 430) (Figure 3-1B). Both
proteins contain repeats of the KPGPYDYDGPYDK consensus sequence shown as underlined
sequences in Figure 3-1B. Dpfp6 (pI 4.2) also resembles the C-terminus of Dpfp1 (pI 4.3) in
terms of its highly similar acidic isoelectric point. These similarities in pI and repeat patterns
indicate that Dpfp6 must have a function very similar to that of the Dpfp1 C-terminus. In Dpfp1,
the C-terminus is specifically the DOPA containing region of the protein where the Y in
KPGPYDYDGPYDK indicates a DOPA modification [35]. Thus, if the similarities between
Dpfp1 and Dpfp6 are extended to DOPA compositions as well, this could indicate a specifically
DOPA-dependent role for Dpfp6. The two sequences however differ in the number of consensus
sequence repeats (Figure 3-1B). Dpfp1 contains 15 full and 1 incomplete repeat of the consensus
sequence as described by Anderson and Waite, 2000 [35]. Dpfp6 contains only 8 complete
repeats and two incomplete repeats of the consensus sequence. There is a possibility that Dpfp6
is just a truncated version of a Dpfp1 variant whose mRNA was not completely reverse
transcribed during creation of the cDNA library, however this is unlikely since only one of the
four peptide-spectrum matches in Dpfp6 (KPGPYDYDGPYDK) overlaps with the sequence of
Dpfp1 (Table 3-2). It is interesting that Dpfp6 mimics only the acidic part of the Dpfp1 protein,
possibly suggesting a specialized role for the protein’s acidity within the byssus.
69
A.
YDPVEDK
KPGPYDYDGPYDK
NPGPYDYDGPYDK
KPDPYGTDWQYDK
KTGPYVPDKSEDK
KPGPYDYDGPYDK
NPGPYDYNGPYDK
KPGPYDYDGPYDK
KPGPYDYDGPYDI
KPGPYDYDVPRPRPR
B.
Dpfp6 --------------------YDPVEDK KPGPYDYDGPYDK NPGPYDYDGPYDK 22
Dpfp1(C-term) YPGYQPEYHRRPPVYPPVYPYDPVEDK KPGPYDYDGPYDK NPGPYDYDGPYNK 255
******* ************* *********** *
Dpfp6 ----------------------------KPDPYGTDWQYDK KTGPYVPDKSEDK 48
Dpfp1(C-term) KPNPYGTDWQYDK KTGPYVPIKPDDK KPNPYGTDWQYDK KTGPYVPDKSEDK 307
** ********** *************
Dpfp6 KPGPYDYDGPYDK---------------NPGPYDYNGPYDK KPGPYDYDGPYDK 87
Dpfp1(C-term) KPGPYDYDGPYDK NPGPYDSDGPYNK KPGPYDYDGPYDK NPGPYDYNGPYDK 359
************* ****** ***** ****** *****
Dpfp6 KPGPYDYDGPYDI KPGPYDYDVP----RPRPR---------------------- 115
Dpfp1(C-term) KPGPYDYDGPYDI KPGPYDYDVPYDK KPDPYDTDGPYDK KTGPYVPDKPDDK 411
************* **********
Dpfp6 --------------------
Dpfp1(C-term) KTDPYVPDVPLEP PGPLGK 430
Figure 3-1. Sequence analysis of the EST derived sequence of Dpfp6 (AM229723) (A) Repeat
pattern of the consensus sequence KPGPYDYDGPYDK. PY diads are indicated in bold letters.
(B) Sequence alignment of the Dpfp6 sequence with the C-terminus (residues 230 – 430) of
previously described byssal protein Dpfp1 (AF265353). Underlined sequences represent
alternate repeats of the KPGPYDYDGPYDK consensus sequence. Italicized sequences indicate
incomplete repeats. * indicates residues that are conserved between the sequences.
70
Dpfp12, a protein resembling fragments of Dpfp1, Dpfp2 and Dpfp5
The EST matches obtained for Dpfp12 revealed a 100% protein identification probability in
SEQUEST and displayed two unique peptide-spectrum matches (Table 3-2) but while the
Dpfp12 sequences display N-terminal signal peptides, no stop codon is observed at the C-
terminus (Figure 3-2). As described earlier when introducing the EST match results, the
YLGRDHANRIPAA sequence at the C-terminal end is preceded by a stop codon in several
sequences throughout the cDNA library and was therefore assumed to be after a missing stop
codon and was removed during sequence analysis. The clustered EST derived sequence of
Dpfp12 is then 33 amino acids long and has a theoretical MW of 4.1 kDa and a very basic
theoretical pI of 9.6 (Table 3-2). Unlike all of the other identified zebra mussel byssal proteins,
the Dpfp12 sequence has no glycine. This sequence is richest in tyrosine, proline and lysine
(Table 3-3) and also contains much serine (9%) and threonine (6%). It contains three repeats of
the consensus YPSYPXK where non-italicized residues are highly conserved and X = P, D, E
(Figure 3-2). The consensus thus contains two diads of YP that are highly conserved between
the repeats and that are indicated in bold in Figure 3-2. Dpfp12 also contains a fourth repeat
(residues 1 – 7) that partially matches the consensus sequence and consists of two tyrosine
residues separated by three residues (Figure 3-2). When aligned against known zebra mussel
byssal proteins, Dpfp1, Dpfp2 and Dpfp5, the Dpfp12 sequence shows some similarity with all
of the proteins. Comparing to the N-terminus of Dpfp1, the PYPVYP sequence in Dpfp12 is
similar to KYPVYP, TYPSYPD is similar to QYPEYPS and KYPSYP is similar to QYPVYP in
Dpfp1 [35]. Similarities with Dpfp2 include YPDKK and YPEKTY which resemble YPDKKTY
in Dpfp2 [73], [33]. Dpfp2 as well is poor in glycine. It has only 2% glycine and is YP rich.
Comparing to Dpfp5, the PYPVYPPKQTYPSYPDK sequence in Dpfp12 corresponds well to
the SYPTYPPKQSYPAYPPK sequence in the N-terminus of Dpfp5 [73]. The N-terminus of
Dpfp5 is also similar in that it contains no glycine and is YP rich.
A BLAST search of the complete Dpfp12 sequence revealed no significant matches. However, a
BLAST search only of the last 26 residues, which comprises the three consensus repeats
(PYPVYPPKQTYPSYPDKKYPSYPEKT), reveals matches with Dpfp1 (score 36.3 bits), two
variants of the Mytilus californianus foot protein mfp-1 (max score 35.4) and with Mytilus edulis
mfp-1 (score 33.3). The mfp-1 protein in M. edulis (108 kDa) and M. californianus (90 kDa)
71
[18] is a key byssal protein present strictly in the cuticle that is rich in proline (25%), lysine
(21%), tyrosine (19%), threonine (13%) and serine (9%) [78]. It is highly basic with a pI of 10.0,
contains 13% DOPA and unlike the other marine mussel proteins, is composed of only 0.4%
glycine [78]. Thus, while Dpfp12 (4.1 kDa) differs from mfp-1 in terms of its much lower
molecular weight, it greatly resembles mfp-1 in terms of homology matches, overall prominent
amino acid content, low glycine content and basicity. The similarities could thus possibly
suggest an mfp-1 like cuticle based role for Dpfp12. Using the partial BLAST search, matches
are also seen with other protective structural proteins including a putative cuticle protein in the
Brine shrimp Artemia franciscana (score 38.8 bits) and choriogenin H minor protein in the
acellular vitelline envelope surrounding the oocyte in the Mangrove Killifish (Kryptolebias
marmoratus) (score 34.6). These matches again indicate a structural, protective function for
Dpfp12 in an aquatic environment such as in the cuticle of the zebra mussel byssus.
MFSAATLLLLVSFYGTASG
QYWNSYR (7)
PYPVYPPKQ (16)
TYPSYPDK (24)
KYPSYPEKT (33)
YLGRDHANRIPAA
Figure 3-2. Illustration of the pattern of sequence repeats in the clustered EST derived sequence
of Dpfp12. YP diads are indicated in bold. Numbers in brackets represent the position of the last
amino acid in the row. The italicized sequence represents the sequence believed to be preceded
by a stop codon and thus ignored during sequence analysis.
3.4.4 Glycine rich proteins
Dpfp7, a Dpfp-5 like protein
Three variants of the Dpfp7 sequence were identified by EST matching. These variants, named
Dpfp7α, β and γ are similar in sequence layout and repeat patterns but display unique peptide –
spectrum matches which indicates that the different variants must exist at the protein level
72
(Table 3-2). Dpfp7γ has a 100% identification probability and two peptide spectrum matches
with high SEQUEST XCorr scores (Table 3-2). Dpfp7α and Dpfp7β each contain only a single
good spectrum match and therefore do not meet the identification criteria. However, owing to the
high degree of similarity between the three variants (Figure 3-3A), we consider Dpfp7α and
Dpfp7β to be positive byssal protein matches as well. The Dpfp7α and Dpfp7β sequences
represent a single EST match whereas the Dpfp7γ sequence represents a clustered sequence
(Table 3-2). After removal of the N-terminal signal peptides, Dpfp7α and Dpfp7β have similar
theoretical molecular weights of 11.2 and 11.0 kDa, respectively. However, while Dpfp7α (pI
6.5) is acidic, Dpfp7β (pI 8.7) is basic. Dpfp7γ is smaller with a MW of 8.7 kDa but is also basic
(pI 8.7) like Dpfp7β (Table 3-2). The three variants are rich in glycine (G), asparagine (N),
serine (S), tyrosine (Y) and also contain considerable glutamine (Q) and tryptophan (W) (Table
3-3). Dpfp7α and β have a greater composition of glycine, asparagine and valine than Dpfpλ and
Dpfpβ has more serine than the other two. All three variants have 11 tyrosine residues in the
sequence and the position of Y is generally conserved (Figure 3-3A).
The Dpfp7 variants display specific repeat patterns that are distinct between the N-terminus and
C-terminus. Figure 3-3A illustrates this pattern of repeats. The N-terminus in all variants
contains four repeats of the near consensus sequence SY(L/W)PQQ shown in yellow highlights
where non-italicized residues are highly conserved. The C-terminus is rich in repeats of
GNNVG(G/S) shown in blue highlights, where non-italicized residues are highly conserved.
While Dpfp7α and β have six full and one incomplete C-terminal repeat, Dpfp7γ has only two
full and one such incomplete C-terminal repeat (Figure 3-3A). In addition to differing in the
kinds of consensus repeats, the N- and C-termini of Dpfp7 also vary in their isoelectric points.
The N-terminus of Dpfp7 is basic (Dpfp7α, β and γ have N-terminal pI 9.3, 9.3 and 8.8
respectively) and C-terminus is acidic (Dpfp7α, β and γ have C-terminal pI 3.8, 4.3 and 6.4
respectively). They are thus comparable to previously known byssal proteins Dpfp1 [35] and
Dpfp5 [73] which are also known to possess termini with distinct repeats and pI’s.
A BLAST search of the Dpfp7 variants did not reveal any significant protein matches. However,
when aligned against the EST derived sequence of Dpfp5 described previously [73], Dpfp7
shows several regions of similarity as shown in grey highlights in Figure 3-3B for Dpfp7α.
Major similarities include a couple of aligned, near consensus (S/A)Y(L/P)PQQ repeats at the N-
73
terminus and several aligned, near consensus GNNVGG repeats at the C-terminus of both
proteins. The Dpfp7 N-terminus does not have the continuous chains of PQQ like repeats seen in
Dpfp5 and instead has SY(L/W)PQQ and other intermediate sequences. Regions of Dpfp7α
similarity with the middle region of Dpfp5 are limited (Figure 3-3B) and unlike the middle
region of Dpfp5, Dpfp7 has no cysteine residues. In both proteins however a series of tryptophan
(W) residues contribute to the transition to the C-terminus. Dpfp7 is therefore like a compact
version of Dpfp5, containing a bit of its N-terminus, much of its C-terminus and missing the
middle cysteine containing region. Dpfp7 must thus have similar byssal roles as Dpfp5 but
lacking the cysteine related functions.
The presence of protein variants is commonly seen in marine mussel byssal proteins. Some
marine mussel proteins such as the plaque protein mfp-3 have even up to 35 different sequence
variants [18]. The sequence variants, including those of Dpfp7, could also represent different
mature RNA (mRNA) variants created by RNA editing or alternate splicing of one primary
RNA transcript [67]. Additionally, since multiple mussels are used for analysis, there could be
multiple alleles of the same gene in the cDNA library and the Dpfp7 sequence variants could
arise from these allelic variations between mussels [29]. Zhao et al., 2006 identified several
variants of the M. californianus interfacial protein mcfp-3 having varied isoelectric points and Y
and R modifications [79]. It was speculated that these variants increase the variety of interactions
the protein can undergo and provide flexibility to match with varied surface features [78].
Similarly, the differences between the Dpfp7 variants, such as varied repeat patterns and
differing size and isoelectric points, could also have evolved to promote such variety and
flexibility in zebra mussel adhesion.
74
A.
Dpfp7α SWIPYGKSYLPQQPAGKGGYWNSYLPQYENYGPQQ---YQGSYWPGPWGGWR 49
Dpfp7β SWSPYWNSYLPGQGSGKGGYWNSNVPKYGSYWPQQYPSYSGSYWPGW----- 47
Dpfp7γ SWIPYGNSYSPEQ--GKGGYWNSYLPKYESYRPQQYPSYPGSYWPGPWGGWQ 50
** ** ** * * ******** * * * *** * ******
Dpfp7α GNNVGSQGNSVSGYGNAVGSQGNNVDGYGNDVGWQWNSVDGKGNYVGSQWNSVN 103
Dpfp7β GNNVGSQGNSVRGYGNAVGSQGNDVSGYGNDVGSQWNSVDGKGNYVGSQWNSVN 101
Dpfp7γ ----------------------------GDNVGSQKNSVDGTGNYVGWQKNYVN 76
* ** * ***** ***** * * **
B.
Dpfp7α --SWIP-------------------YGK----SYLPQQ--PAG--KGGY- 20
Dpfp5 YNSWPPKPNQPQQPQQPQQPPQPPRYPQPSYPAYPPQQSYPAYPPKQSYP 50
** * * * *** ** * *
Dpfp7α -------WNSYLPQY--------------------ENYG---------PQ 34
Dpfp5 TYPPKQSYPAYPPKQSYPTNPPYNPCDAVYCRPIYCNYGQYTPQGECCPQ 100
* * *** **
Dpfp7α QYQGSYWPGPWGGWR--------------GNNVGSQGNSVSGYGNAVGSQ 70
Dpfp5 CNPGTYLPEKWS-WKGNNVVGDQEKYVGEGNNVGEQRNDVDGNENIVGGQ 149
* * * * * ***** * * * * * ** *
Dpfp7α GNNVDGYGNDVGWQWNSVDGKGNYVGSQWNSVN- 103
Dpfp5 SNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 183
* * * ***** * * * * ** ** * * *
Figure 3-3. Sequence alignment of the EST derived sequences of Dpfp7 (A) Alignment of the
three variants of Dpfp7 (Dpfp7α, β and γ) amongst each other. The yellow and blue highlights
represent repeats of two different consensus sequences. * represents residues conserved in all
three variants. (B) Alignment of the Dpfp7α sequence with the EST derived sequence of Dpfp5
(AM230139) described previously [73]. Grey highlights represent regions of similarity between
the proteins. In all Dpfp7 sequences, signal peptides have been removed. Numbers at the end of
the sequence rows indicate the position of the last amino acid in the row. * represents residues
conserved between Dpfp5 and Dpfp7α.
75
Dpfp8
The EST matches for Dpfp8 have a 100% identification probability with three high scoring
peptide-spectrum matches (Table 3-2). No distinguishable signal peptide was identified in spite
of the presence of a methionine residue at the N-terminus. This methionine might therefore not
represent a start codon and the absence of a signal peptide could be due to an incomplete N-
terminus owing to limitations in the creation of the cDNA library [62] (see Methods). The
clustered sequence of Dpfp8 is 69 amino acids long with a theoretical MW of 6.9 kDa and pI of
9.5. The sequence is richest in glycine and is also rich in serine and valine (Table 3-3). Unlike
the other byssal proteins which are generally rich in proline, this putative sequence contains only
two proline residues. The Dpfp8 sequence in Table 3-2 indicates that its four negative residues
are restricted within the first 30 residues at the N-terminus and the seven positive residues are
spread throughout. No discernible repeat pattern is seen within the sequence though a number of
GGX repeats are seen where X = V, G, Y, S, R (Table 3-2). A BLAST search of the sequence
did not reveal any significant protein matches.
Dpfp9, a YGY/YGGY rich byssal protein
Two variants of the Dpfp9 sequence were identified by EST matching. Dpfp9α (AM229975) and
Dpfp9β (AM229830) have similar sequences that differ by four residues and have unique
spectrum matches (Table 3-2). The two sequences however have less than 95% identification
probabilities and each display only a single spectrum match, albeit with high SEQUEST XCorr
scores greater than 4.00 (Table 3-2). Even though the sequences do not meet ideal identification
criteria, they are still justified as positive matches because they possess a specific repeat pattern
of consensus sequences and show amino acid contents that are characteristic of byssal proteins.
After removal of the N-terminal signal peptide, the Dpfp9α and Dpfp9β variants are 64 amino
acids long each with a theoretical MW of 6.8 kDa and with basic pI’s of 9.3 and 8.8,
respectively. Dpfp9 overwhelmingly has the highest mol% (around 40%) of glycine among all
zebra mussel byssal proteins. The sequence is also very rich in tyrosine and contains asparagine
(N) and lysine (K) as the next most prominent residues (Table 3-3). Like Dpfp8 and unlike other
identified byssal proteins, Dpfp9 also has only 3% proline in its sequence.
76
Dpfp9 contains several repeats of the near consensus sequence NYG(G/-)Y(G/P)G where non-
italicized amino acids indicate highly conserved residues. The repeats are shown in green
highlights in Figure 3-4. The N and C-terminus of Dpfp9 are however distinct in their pattern of
repeats. The N-terminus consists of five of these YGY/YGGY containing consensus repeats
making up the first 35 residues of the sequence. The C-terminus (residues 36 – 64) contains only
one of these consensus repeats preceded by long chains of glycine often interspersed with lysine
residues (Figure 3-4). Both termini have 13 glycines each. The C-terminal region with long
glycine chains (residues 36 – 56) represents a sequence with high hydrophobicity compared to
the rest of the protein. Of the eight charged residues in the sequence, all three negative residues
(D) are at the N-terminus and all five positive residues (K) are at the C-terminus (Figure 3-4).
The two termini also have very distinct pI’s. In Dpfp9α, the N-terminus is acidic with pI 3.8 and
the C-terminus is highly basic with pI 10.4. Thus, Dpfp9 resembles Dpfp1, Dpfp5 and Dpfp7 in
terms of having a block structure but differs from these proteins in that they have the reverse
structure, basic N-termini and acidic C-termini [73].
A BLAST search of the full Dpfp9 sequence does not reveal any significant matches however
the individual consensus sequence and partial N and C termini sequences do show sequence
homologies in BLAST. The NYGYPG consensus itself matches to a putative cuticle protein
from the Tobacco Hornworm, Manduca sexta (score 23.5 bits). A search of the partial N-
terminal sequence of Dpfp9α containing residues 11 to 38 (Figure 3-4) also reveals structural
matches such as to a secretory eggshell protein precursor from the liver fluke Clonorchis sinensis
(score 46.0 bits) and to other putative outer membrane proteins. Interestingly, the partial N-
terminal sequence revealed strong matches to Shematrin proteins (max score 48.1) in the mantle
shell of the pearl oyster, Pinctada fucata, a marine bivalve mollusc [80]. Shematrins are a family
of glycine-rich structural proteins that comprise repeats with two or more glycines followed by a
hydrophobic amino acid [80]. Such repeats including the GY and GGY repeats seen in Dpfp9 are
also found in spider silk and flagelliform silk proteins that impart physical strength to spider
webs [45] and are also a characteristic of structural cell wall glycine-rich proteins (GRP) in some
plants species [81]. The BLAST search of the C-terminal glycine chain of Dpfp9α
(GGGGKGKGKWGGYGGYGK) also revealed similar homologies with a 29.9 bits match with
the flagelliform silk protein from the tick, Hyalomma marginatum rufipes and a 28.6 bits match
with a secretory eggshell protein precursor from Clonorchis sinensis. Thus, it appears that both
77
the N- and C-terminal sequences of Dpfp9 bear much similarity with structural proteins, thereby
indicating a structural role for Dpfp9 within the byssus, both in the thread and plaque.
The glycine and tyrosine repeats of Dpfp9 are also a characteristic of some adhesive/cohesive
proteins secreted by the sandcastle worm Phragmatopoma californica to stick sand grains
together underwater [42]. Pc-1 (pI 9.7) and pc-2 (pI 9.9) are basic, DOPA containing precursor
cement proteins containing glycine rich consensus repeats of VGGYGYGGKK and
HPAVHKALGGYG respectively. Interestingly, very similar to the amino acid compositions
seen in Dpfp9 (Table 3-3), Pc-1 contains 45% glycine, 19% tyrosine and 14% lysine [42]. GYG
and YGY triads are also a prominent feature of the 57 kDa thread matrix protein tmp-1 (pI 9.5)
identified in the marine mussel M. galloprovincialis [25]. The TMP’s are also a glycine, tyrosine
and asparagine rich protein family that separate the collagenous microfibrils within the threads of
marine mussels and thus also play a structural role within the byssus [25]. The sequence
similarities with the marine mussel TMPs thus further assert that Dpfp9 must play a structural
role in the byssus.
MNTKQLMCLLVAAVVLLAIAPVANA
QY(Y/C) (3)
DYGYGGN (10)
NYGYPG (16)
NYGYGG (22)
NYGGYP(G/R) (29)
(N/K)YGDYD (35)
NYGGGWLYKIL (46)
GGGG(K/I)GKGKWG (57)
GYGGYGK (64)
Figure 3-4. Illustration of repeat patterns in the EST derived sequence of Dpfp9 obtained by
clustering the Dpfp9α and Dpfp9β sequences. Residues of the form (X/Y) represent the four
positions that differ between Dpfp9α and Dpfp9β, respectively. Near consensus sequences of
YG(G/-)Y(G/P)G are highlighted in green. Numbers represent the position of the last amino acid
in the row. The signal peptide is underlined.
N – terminal region
C – terminal region
78
3.4.5 Proline and Cysteine (P, C) rich proteins
Dpfp10 and Dpfp11, proteins resembling the cysteine containing region of Dpfp5
The clustered sequence of Dpfp10 is derived from several EST matches, all of which have 100%
identification probability and two peptide-spectrum matches with high SEQUEST XCorr scores
(Table 3-2). Following the removal of the signal peptide, the clustered Dpfp10 sequence is 59
amino acids long with a theoretical MW of 6.7 kDa and basic pI of 8.8. The sequence is richest
in proline (25%) and tyrosine (17%) and also contains much lysine (14%), glycine (12%) and
cysteine (10%).No significant repeat pattern is discernible in the sequence but the six cysteines
are generally associated with proline, lysine and tyrosine residues as seen with the CYK, CPP,
CPK, CCP and CKK triads (Table 3-2). A BLAST search of the protein shows several matches
to extracellular matrix proteins (max score 34.7 bits), kielin/chordin-like proteins (max score
34.2) and other Bone Morphogenetic Protein (BMP) binding proteins (max score 33.5) where the
major region of similarity is in the CPP and YTPPGECCPRC regions of the Dpfp10 sequence.
The significance of these matches is not readily apparent, however.
The Dpfp11 EST matches were identified as two variants with similar sequences that differ in 13
residues and have unique spectrum matches (Table 3-2). Dpfp11α (AM230400) and Dpfp11β
(AM230182) were identified with different programs, SEQUEST and PEAKS, respectively.
Both sequences have a low protein identification probability and each display only a single
spectrum match, albeit with high peptide probability scores (Table 3-2). However, even though
the sequences do not meet ideal identification criteria, they are still considered positive matches
here because they show a high degree of similarity to other byssal adhesive proteins, Dpfp5 and
Dpfp10 (Figure 3-5). After removal of the N-terminal signal peptide, the Dpfp11α and Dpfp11β
variants are 55 and 54 amino acids long, respectively. They have theoretical MW of 6.4 kDa and
6.1 kDa and theoretical pI of 7.6 and 8.0 respectively. The Dpfp11 sequences are richest in
cysteine (15%), proline (13 – 15%) and glycine (11%) (Table 3-3) and also contain considerable
tryptophan (7 – 9%) specifically at the C-terminal end. No significant repeat pattern is
discernible in the sequence. The eight cysteines are sometimes associated with arginine (R)
(CRP, CRL and CRY triads) but mostly with variable residues (Table 3-2). A BLAST search of
Dpfp11α reveals no significant protein matches however a search of Dpfp11β shows matches to
79
extracellular matrix proteins from several species (max score 34.7 bits) especially in the
TPKGRCCPVC region of the sequence. Again, these matches do not give any clear indication of
the structural/functional properties of the Dpfp11 variants.
Dpfp10 and Dpfp11 are the only two novel byssal proteins identified in the insoluble extract that
contain significant number of cysteine residues in their EST derived sequences. Dpfp9β contains
only 1 cysteine in its sequence. The only other zebra mussel byssal protein known to contain
cysteine is Dpfp5 [73]. When the sequences of Dpfp10 and Dpfp11α are aligned against that of
Dpfp5, similarities are seen with sequences in and around the cysteine containing region of
Dpfp5. The similar sequences are shown in blue highlights in Figure 3-5. Dpfp10 shows two
major sequence matches to Dpfp5, one in its N-terminal YP rich region (residues 1 – 68) and the
other in its middle cysteine containing region (residues 69 – 114) [73]. Dpfp11 displays just one
of these matches, in the cysteine containing region (Figure 3-5). Thus, while Dpfp7 mimics the
two termini of Dpfp5, Dpfp10 and Dpfp11 mimic the Dpfp5 middle region.
While Dpfp10 has 10 tyrosines and 15 prolines in its sequence, Dpfp11α/β has only 2/3 tyrosines
and 7/8 prolines. In Dpfp11, the N and C termini have similar hydrophobicities but in Dpfp10,
the N-terminus is more hydrophobic than the C-terminus. Additionally, Dpfp10 has more
similarities to Dpfp5 than does Dpfp11. The sequence differences between Dpfp10 and Dpfp11
could thus be reflective of different cysteine related roles. In marine mussels, two foot proteins
mfp-2 (15 mole% Cys) and mfp-6 (11% Cys) have the most prominent cysteine compositions
with cysteine being almost absent in the others byssal proteins [78]. Mfp-2 is an abundant plaque
matrix protein that consists of epidermal growth factor (EGF) domains stabilized by disulfide
bonds, a common characteristic of ECM proteins [18]. Mfp-6 is present in the plaque footprint
and plays a role both as a plaque antioxidant that restores DOPA adhesion by reducing
dopaquinone (oxidized DOPA) and as a cross-linker than can improve plaque cohesion by
forming S-cysteinyldopa adducts [16]. Thus, Dpfp10 and the two Dpfp11 variants could
similarly play roles as cohesive and/or adhesion promoting proteins in the byssus. Further
information on the oxidation states and cross-linking properties of their cysteine residues will
however be required to better understand their specific functions [16].
80
Dpfp5 YNSWPPKPNQPQQPQQPQQPPQPPRYPQPSYPAYPPQQSYPAYPPKQSYPTYPPKQSYPAYPPKQSYP 68
Dpfp10 -------------------------------------------------------QTYKGYPPPKPYP 13
Dpfp11α --------------------------------------------------------QWGGDSCRPIYP 12
**
Dpfp5 TNPPYNPCDAVYCRP-IYCNYGQYTPQGECCPQCNPGTYLPEKWSWKGNNVVGDQEKYVGEGNNVGE 134
Dpfp10 K----DPCYKVYCPP-IYCPKGQYTPPGECCPRCKKG------YGYQ------DPDPYFP------- 56
Dpfp11α P----LDCRLVFCQPAINCRYGNYTPKGHCCSVCIED-----CWGWP------WP------------ 52
* * * * * * * *** * ** *
Dpfp5 QRNDVDGNENIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 183
Dpfp10 -------------------GGK--------------------------- 59
Dpfp11α -------------------WGK--------------------------- 55
**
Figure 3-5. Sequence alignment of cysteine containing byssal proteins; Dpfp10, Dpfp11α and
previously described Dpfp5 (AM230139) [73]. * indicates residues that are conserved between
the sequences. The highlights depict regions of similarity between the proteins. Numbers at the
end of the sequence rows indicate the position of the last amino acid in the row.
3.4.6 Analysis of the set of zebra mussel byssal proteins identified in the
insoluble matrix
The sequence analysis of the insoluble byssal matrix revealed seven novel and three previously
known byssal proteins that were each identified in both the thread and plaque matrices. However,
this does not at all mean that we have identified all of the proteins present in the zebra mussel
byssus. There may no doubt have been other relevant EST matches that did not meet
identification criteria and were therefore not identified as byssal proteins. Additionally,
limitations in the cDNA library or in byssal protein extraction may also restrict our identification
of novel proteins. It is interesting that in addition to being identified in the insoluble matrices, the
known proteins, Dpfp1, Dpfp2 and Dpfp5 were previously also identified in soluble extracts by
gel electrophoresis [73]. The presence of these proteins in the insoluble extract could thus
possibly be due to partial cross-linking that prevents all of the protein from being solubilized.
Sequence comparisons of the ten zebra mussel byssal proteins sequenced thus far reveals a
number of protein characteristics that are common features of the byssal proteins. Firstly, as seen
in adhesive proteins from many species, low sequence complexity due to the presence of
81
consensus sequence repeats is a common characteristic of most of the identified byssal proteins.
In some proteins such as Dpfp1, Dpfp2 and Dpfp6, the repeat pattern is in the form of tandem
repeats and in others such as Dpfp7, Dpfp9 and Dpfp12, the repeat patterns is in the form of
shorter repeats irregularly interspersed with other residues. Others such as Dpfp8, Dpfp10 and
Dpfp11, however, display no notable repeats. As well, the repeat sequences themselves display
varying degrees of consensus. Within the sequence of Dpfp1 itself, the N-terminal repeats have
poor consensus in contrast to the C-terminal repeats which are strongly conserved in their
sequences [35]. Another prominent characteristic of the zebra mussel byssal proteins is the
appearance of a block structure within the protein sequence. The blocks within a sequence may
vary in their sequence repeats, their isoelecetric points and even in their post-translational
modifications (e.g. Dpfp1 [35]). In terms of pI, Dpfp1, Dpfp5 and Dpfp7 have a di-block
structure with a basic N-terminus and an acidic C-terminus and Dpfp9 is a di-block with acidic
N-terminus and basic C-terminus. In terms of repeat patterns, Dpfp1, Dpfp7 and Dpfp9 have di-
block structures but Dpfp5 has three blocks containing different kinds of repeats. The cement
protein pc-3A in the sandcastle worm also has a pI based block structure similar to Dpfp1, Dpfp5
and Dpfp7 [41] thus illustrating that block structures must play an important role in
adhesion/cohesion mechanism of adhesive proteins.
The zebra mussel byssus also displays some interesting characteristics with respect to its protein
mixture. The mussels protein collection encompasses a wide range of isoelectric points ranging
from acidic to basic. Most of the proteins are basic, however there are some such as Dpfp1
(theoretical pI 5.1), Dpfp5 (theoretical pI 6.4) [73], Dpfp6 (theoretical pI 4.2) and Dpfp7α
(theoretical pI 6.5) that are acidic. This is in contrast to marine mussel proteins that are generally
basic (pI >9) [18] but is similar to sandcastle worms which have a heterogeneously charged
adhesive mixture containing one strongly acidic (pI 2.5) and two strongly basic (pI > 9) proteins
among others [36]. With respect to molecular weights as well, there is a range in protein sizes
ranging from 4.1 kDa (theoretical MW of Dpfp12) (Table 3-2) to 49 kDa (theoretical MW of
Dpfp1) [35] to greater than 210 kDa (electrophoretic MW of Dpfp0) [73]. Such molecular
weight ranges are also seen in marine mussels however their highest molecular weight proteins
are represented by collagenous proteins where more than one subunit is bundled [18].
82
Analysis of the ten zebra mussel byssal protein sequences defines a number of distinct sequence
motifs that are common to two or more byssal proteins. These motifs are based not only on
amino acid compositions (as described in Table 3-3) but also on the pattern of repeats of the
amino acids. One prominent set of sequence motifs are those that are rich in YP diads. These
include Dpfp2, Dpfp12 and the N-termini of Dpfp1, Dpfp5 and Dpfp7. In fact, Dpfp12 resembles
YP containing fragments of Dpfp1, Dpfp2 and Dpfp5 (Figure 3-2) and YP based similarities are
also observed between the N-termini of Dpfp5 and 7. Another motif, though not as common, is
the motif rich in PY diads that can be clearly distinguished from the YP rich motifs. These are
observed in Dpfp6 and the C-terminus of Dpfp1. Two additional sequence motifs are based on
glycine rich motifs and include those that contain long glycine runs of the form GGX (Dpfp8 and
the C-terminus of Dpfp9), where X is a variable residue, and others that possess glycine rich
repeats (C-termini of Dpfp5 and Dpfp7 and the N-terminus of Dpfp9). A fifth motif is a cysteine
rich motif seen only in Dpfp10, Dpfp11 and the middle region of Dpfp5. In Dpfp5 and Dpfp10,
cysteine is often closely associated with proline whereas in Dpfp11, the residues are quite
independently distributed. The presence of these five common sequence motifs between ten
sequences could potentially indicate that only a few specific motifs are actually required for
byssal protein functions but the different motif combinations in different proteins then provide
the variety and flexibility needed by the mussel to adapt to different conditions and surface
features. While we were able to identify a number of sequence motifs by drawing comparisons
amongst the byssal proteins, domain searches of the byssal proteins using the Simple Modular
Architecture Research Tool (SMART) did not reveal any significant domain identifications
within the sequences, thus emphasizing the low sequence complexity of these proteins.
In spite of significant compositional differences, direct comparisons of zebra mussel byssal
proteins with proteins identified in the much studied marine mussel byssi can be a useful
indication of their roles within the byssus. As discussed previously, the sequence similarities and
homology matches between Dpfp12 and the marine mussel cuticle protein mfp-1 indicate that
Dpfp12 might also play a role as a protective structural protein in the cuticle of the zebra mussel
byssus [46]. In marine mussels, mfp-6 is a cysteine containing footprint protein that potentially
acts to restore DOPA adhesion as well as promotes cohesion in the plaque matrix [16]. Mfp-2 is
also a cysteine-containing plaque matrix protein where the cysteine residues form disulfide
bonds and impart a structural role to the protein [82]. The almost uniquely cysteine containing
83
zebra mussel byssal proteins Dpfp5, Dpfp10 and Dpfp11 could mimic the functions of mfp-2
and/or mfp-6 and play either an adhesion promoting and/or cohesive role in the byssus. In marine
mussels, the interfacial adhesive proteins, mfp-3 (25% Gly) and mfp-5 (20% Gly), have more
glycine than plaque matrix cohesive proteins mfp-6 (15%), mfp-2 (14%) and mfp-4 (5%) and
much more glycine than the cuticle protein mfp-1 (0.4%) [78]. High glycine content could
therefore possibly be indicative of more interfacial adhesive roles in the zebra mussel proteins as
well. While Dpfp7 (17 – 21%), Dpfp8 (17%) and Dpfp9 (39 – 40%) all have comparatively high
levels of glycine, its distribution varies between them all and could dictate whether the protein
role is adhesive or cohesive. Other structural proteins in the marine mussel byssus include
histidine rich mfp-4, a plaque matrix protein that acts as a linker between thread and plaque [30]
and glycine, tyrosine and asparagine rich thread matrix proteins (TMPs) that separate and
perhaps lubricate collagenous fibrils under tension in the thread [25]. Dpfp9, which is also rich in
G, Y and N and likely plays a structural role in the byssus as described previously, may thus
somewhat mimic the structural functions of the TMPs in the zebra mussel thread and plaque.
In the marine mussel byssus, the cohesive proteins in the thread, plaque and cuticle consistently
display repetitive patterns of consensus sequences. However, the proteins near the adhesive
interface (Mfp-3, 5 and 6) consist only of a single non-repeating sequence [18]. If this is taken as
a rule, Dpfp8 would likely be an adhesive interface protein and Dpfp7 and Dpfp9 would be
cohesive matrix proteins. Additionally, the marine mussel adhesive interface proteins, Mfp-3 (6
kDa) and Mfp-5 (9 kDa) [18] and the interfacial antioxidant and cross-linker protein Mfp-6 (11
kDa) [16] have low molecular weights in comparison with the rest of the proteins. Thus it
appears that in marine mussels, the footprint proteins have evolved to be smaller than the other
byssal proteins [18]. If that is true in the zebra mussel byssus, then some of these low molecular
glycine rich proteins such as Dpfp8 (6.9 kDa) could also be expected to play an interfacial
adhesive role. However, since all of the zebra mussel byssal proteins identified thus far have
been found in both the thread and plaque, it is not possible to directly determine which, if any, of
the identified proteins might play an adhesive role at the plaque-substrate interface.
Previously, MALDI-TOF mass spectrometry analysis of mature byssal threads by Gilbert and
Sone, 2010 revealed the presence of several low molecular weight proteins (3.7 to 7 kDa) that
had different distributions between the thread, plaque and plaque footprint [34]. A number of the
84
zebra mussel byssal proteins (Dpfp8, 9, 10, 11 and 12) identified in this analysis fall within this
range (Table 3-2). Dpfp10 (6727 Da) has significant correspondence to major MALDI peaks of
6737 Da in the thread and 6742 Da in the plaque footprint. These peaks even display an adjacent
protein peak with hydroxylation (16 kDa difference) indicating a DOPA modification in the
protein. Dpfp11α (6364 Da) as well shows correspondence to a peak with a DOPA modification,
at 6399 Da, appearing uniquely in the footprint. This makes it similar to the cysteine and DOPA
containing marine mussel protein mfp-6 that is also in the footprint where it plays a cross-linking
or anti-oxidant role [83]. Additionally, Dpfp12 (4132 Da), which we have speculated is a cuticle
protein, corresponds to a peak at 4159 Da detected only in the thread and clearly not detected
elsewhere [34]. However, since the cuticle makes up a much greater percent of the thread cross-
section as compared to the plaque cross-section, there might just not have been enough cuticle
protein to be detected in the plaque.
Marine mussels have distinct protein compositions between thread and plaque with collagenous
proteins in the thread and DOPA containing proteins in the plaque [9]. In contrast, the zebra
mussel byssus contains similar amino acid compositions between thread and plaque and
therefore likely has similar protein compositions between the two. Thus, while there might be
some specialized adhesive proteins at the plaque-substrate interface, the distributions of cohesive
matrix proteins may be similar throughout the byssus, possibly explaining why most proteins
indentified are present in both thread and plaque. Through MALDI mass spectrometry analysis
of different byssal regions, Gilbert and Sone, 2010 identified the presence of similar proteins
peaks between the thread and footprint thus indicating that the same proteins might be present in
the thread and at the adhesive interface and that even if the adhesive proteins were identified in
our analysis, they cannot be pinpointed as adhesive because they are also identified in the thread.
In scenarios where the same protein is present in both the adhesive footprint and the thread, the
enzyme catechol oxidase that oxidizes DOPA to DOPA quinone could possibly contribute to
functional differences [17]. Catechol oxidase is present in greater amounts in the thread and
plaque bulk than at the plaque-substrate interface and is therefore responsible for greater
cohesive cross-linking in the thread versus the plaque footprint. It may therefore be able to
impart different functions to the same protein depending on their localization within the byssus
[17]. Additionally, the MALDI analysis revealed a number of small proteins uniquely distributed
in plaque adhesive layer and other peptides between 3.7 kDa to 4.2 kDa that are uniquely
85
distributed in the thread matrix [34], however, all the byssal protein sequences identified thus far
have been found both in the thread and plaque. Thus, these low MW byssal proteins may not
been identified yet and might possibly be present in the soluble extract, requiring further
characterization.
3.5 Conclusion
LC-MS/MS analysis of base insoluble matrix proteins obtained from induced, freshly secreted
byssal threads allowed analysis at a stage of minimal DOPA cross-linking and zebra mussel
cDNA database matching of the byssal protein mass spectra then led to the identification of
previously known as well as seven novel proteins (Dpfp6 – Dpfp12) in both the thread and
plaque matrix. Sequence analysis of the zebra mussel byssal proteins reveals protein
characteristics and sequence motifs that are common features of the proteins and also reveals
several prominent sequence homologies within the byssal proteins. The current analysis has thus
greatly added to our knowledge of the protein composition of the zebra mussel byssus. Future
work must look into protein distribution specifically at the plaque substrate interface to identify
byssal proteins with specialized adhesive functions. An understanding of the molecular basis of
adhesion in zebra mussels will ultimately contribute to the development of water resistant
adhesives for medical and dental applications and will also allow the development of targeted
anti-fouling strategies against this rapidly spreading species.
3.6 Acknowledgements
The authors gratefully acknowledge Trevor Gilbert and Kyle Serkies for collecting the mussels.
We also thank Li Zhang and Paul Taylor of the Advanced Protein Technology Centre, Sick Kids,
Toronto for their technical advice on LC-MS/MS analysis. This work was supported by the
National Sciences and Engineering Research Council (NSERC) of Canada, the Canadian
Foundation for Innovation (CFI), and an Ontario Graduate Scholarship (OGS).
86
Chapter 4:
Conclusions, Preliminary work and Future Directions
4.1 Summary and Conclusions
The primary objective of this thesis was to improve our knowledge of the molecular basis of
adhesion in zebra mussels such that this knowledge can be implemented in the design of
biological adhesives for medical and dental applications and in the development of anti-fouling
strategies against this invasive, biofouling species. In the past, the zebra mussel byssus has
stubbornly evaded biochemical characterization due to extensive cross-linking of its mature
structure and has thus left major gaps in our understanding of its protein composition and
distribution. In this work, we have strongly built on the knowledge of zebra mussel byssal
proteins by performing our analysis on induced, freshly secreted byssal threads that are
minimally cross-linked and more amenable to characterization.
Over the course of this thesis, we have identified the presence of ten novel proteins (Dpfp0 and
Dpfp4 – Dpfp12) in the zebra mussel byssus by investigating the composition of both the soluble
byssal extract and insoluble byssal matrix. Previously, byssal proteins were identified only as
DOPA-staining precursors (Dpfp1 – 3) in zebra mussel foot extracts, thereby overlooking
DOPA-poor or DOPA-lacking byssal proteins. The current identifications on the other hand,
represent both DOPA-rich and DOPA-deficient byssal proteins even though these cannot
currently be distinguished as one or the other. Further, we have determined the primary sequence
structure of eight of these novel proteins (Dpfp5 – Dpfp12) and identified a more complete
sequence for Dpfp2, for which only fragments of the sequence were previously known.
Additionally, by performing our analysis on separated threads and plaques, we have determined
the byssal distribution of known (Dpfp1 and 2) and novel (Dpfp5 – Dpfp12) sequenced proteins
(Dpfp5 – Dpfp12) between the two regions of the byssus and have in fact found that all ten
proteins are present in both the thread and plaque.
87
Two very important biochemical and proteomic techniques have contributed to our findings on
the zebra mussel byssal composition; gel electrophoresis and peptide fragment fingerprinting
(PFF), which includes LC-MS/MS analysis and database sequence matching. While gel
electrophoresis was useful in the identification of proteins in the soluble byssal extract, PFF was
useful in two contexts, sequencing gel band proteins from the soluble extract, and sequencing
and thereby identifying proteins in the insoluble byssal matrix. The overlap seen in the
identification of soluble extract proteins in the insoluble byssal matrix, however, indicates that
soluble proteins might be retained in the insoluble matrix owing to partial cross-linking.
Additionally, in spite of the great information provided by these techniques, there are still certain
limitations in our identification and sequencing of byssal proteins. Since the protein gel bands
are not sufficient, proteins cannot be isolated in adequate amounts for functional characterization
of their post-translational modifications, secondary structure, biophysical properties and precise
byssal distribution. As well, stringent protein and peptide identification criteria, inaccuracies in
the cDNA library and irregular trypsin digestion could have limited identification of additional
proteins in the insoluble matrix. Since we work with induced threads in our analysis, an
additional limitation may be that these do not exactly represent the protein compositions and
distributions of the mature byssus. In marine mussels however, the induced byssal threads have
been shown to be indistinguishable from the natural threads [16], [25] and hence we work on the
assumption that they are indistinguishable in zebra mussels as well.
Putting together all the byssal protein information available from the current and previous work,
a total of 13 zebra mussel byssal proteins have been identified (Dpfp0 – 12) (ten from this work)
and the protein sequence and distribution between thread and plaque is known for ten of these
(Dpfp1, 2 and 5 – 12) (eight from this work). Comparing all of these protein sequences amongst
each other and comparing to adhesive proteins from other species reveals features that are
common characteristics of byssal proteins. Through this thesis, especially in Chapter 2 and
Chapter 3, several such comparisons have been drawn and have led to the identification of
protein characteristics and sequence motifs that are distinctive of two or more zebra mussel
byssal proteins. Similar to adhesive proteins in other species, repeat patterns of consensus
sequences are a significant characteristic of most zebra mussel byssal proteins. Additionally,
several of these byssal proteins have co-block structures where the block sequences can differ in
post-translational modifications (known only for Dpfp1), isoelectric point and/or repeat patterns.
88
Interestingly, unlike marine mussel byssal proteins that are generally basic, zebra mussels have a
mix of acidic and basic proteins, also a characteristic of the sandcastle worm cement proteins.
Significantly as well, a number of distinct sequence motifs have been identified in the zebra
mussel byssal proteins. These include motifs that are rich in YP diads or PY diads, others that are
comparatively rich in cysteine residues and even other glycine rich motifs where G is present as
long glycine runs of the form GGX (where X is variable) or as glycine rich repeats. Some
proteins even have multiple sequence motifs within their protein sequence. Another very
interesting observation in the zebra mussel byssal proteins is that several proteins have very
prominent sequence homologies to each other, often encompassing a whole block of the proteins.
For example, Dpfp6 shows strong homology to the C-terminal block of Dpfp1 (Figure 3-1B)
and the C-terminal block of the di-block Dpfp7 shows strong similarity to the C-terminal block
of the tri-block Dpfp5 (Figure 3-3B).
4.2 Preliminary Additional Studies
4.2.1 Comparing zebra and quagga mussel byssal proteins
In addition to comparing with the much-studied marine mussels, it is also useful to compare
zebra mussel adhesion to adhesion in other freshwater byssate mussel species such as the closely
related quagga mussel (Dreissena bugensis). As discussed previously in section 1.1.6, the two
freshwater mussel species are similar in having DOPA-containing proteins in both the thread and
plaque (unlike marine mussels) and the two species have some potentially homologous proteins
that run at similar molecular weights on a gel [33]. Therefore, we additionally investigated the
protein composition and distribution of the quagga mussel byssus to potentially identify other
homologous proteins and to determine other byssal characteristics that are common or unique to
either freshwater species. The results from this preliminary study are presented in Appendix A
and summarized below.
As with the zebra mussels, we once again performed extractions on induced, freshly secreted
byssal threads from quagga mussels and were able to identify a series of novel quagga mussel
89
byssal proteins in addition to some previously known ones (Figure A-1 and A-2). The quagga
mussel byssus was much more easily extractible than the zebra mussel byssus (possibly due to
lower DOPA content as discussed in section 1.1.6) and therefore a much larger number of novel
proteins were identified. These include a protein at ~50 kDa for which no homologue was seen
in the zebra mussel, one at ~35 kDa that could be homologous to Dpfp5 (~30 kDa by
electrophoresis) and one at ~ 7 kDa that could be homologous to any of the lower molecular
weight Dpfp proteins indentified in the insoluble matrix (Figure A-2). Like with the zebra
mussel byssal proteins, all of these were identified in both the thread and the plaque.
In the quagga mussel, unlike the zebra mussel, a number of significant bands were also identified
that were present almost uniquely in the plaque and could therefore have specialized adhesive
functions (Figure A-2). These include a band at > 210 kDa (Dbfp0), a band at ~22 kDa
corresponding to Dbfp2 and other bands at ~16 kDa and at 12 – 13 kDa corresponding to Dbfp3
(Figure A-2). These known DOPA-containing proteins Dbfp0, Dbfp2 and Dbfp3 are believed to
be homologous to zebra mussel DOPA proteins Dpfp0 (>210 kDa), Dpfp2 (26 kDa) and Dpfp3
(12 – 13 kDa) respectively [33], however the predominant localization of Dbfp2 and Dbfp3 in
the plaque, in contrast to Dpfp2 and possibly even Dpfp3 which we identified in both thread and
plaque, indicates that all of these presumed homologues might actually have different roles in the
their respective byssi. This is consistent with the finding that in spite of other similarities, Dbfp1
(80 and 69 kDa) and Dpfp1 (76 and 65 kDa) are actually quite different in terms of their repeats,
DOPA content and other amino acid content [5], as discussed in section 1.1.6. Thus, in general,
it appears that in spite of several superficial similarities between the byssal compositions of the
zebra and quagga mussels, their proteins might actually vary in their roles and sequence
properties. Sequencing of quagga mussel protein bands will be useful in either asserting or
refuting our assumptions; however, the lack of a quagga mussel cDNA library makes such
analysis difficult. De novo sequencing of the peptide fragments is therefore currently underway
(Appendix A).
90
4.2.2 Peptide mimics: an insight into byssal protein interactions
Throughout this thesis, we have attempted to interpret the adhesive/cohesive roles of zebra
mussel byssal proteins by correlating their primary structures with their byssal distribution and
comparing to other byssal proteins with known functions. An additional way to study the
mechanism of adhesion/cohesion of the byssal proteins is to study peptide mimics of the proteins
since peptide mimics provide an easier system to work with, can be prepared in much greater
amounts, are more amenable to solution techniques and can be modified as needed. In marine
mussel byssal proteins, peptide mimics of protein sequences with tandem repeats of consensus
sequences have been studied in order to investigate the structure and interactions of these repeats
and to design mimics for biological adhesive applications [26]. One such mimic created for
commercial applications is a fusion peptide called fp-151 that consists of an fp-5 adhesive
protein sequence from Mytilus galloprovincialis flanked with copies of the mgfp-1 cuticle
protein’s decapeptide repeat on either side [84]. This mimic displays good macro-scale adhesion
and biocompatibility for various cell types [84]. At the current stage, the study of peptide mimics
of zebra mussel byssal proteins can provide useful insights into the structure and chemical
reactivity of the proteins. Therefore, in this work, we investigated peptide mimics of the only
fully sequenced zebra mussel protein at the time, Dpfp1, to learn more about its’ mode of
interactions [14].
Gilbert, 2010 had found that a Dpfp1 inspired fusion peptide containing one N-terminal and one
C-terminal repeat of Dpfp1 self-assembles into spherical aggregates (~500 nm diameter) upon
interaction with iron (III) [14]. It was hypothesized that in addition to complexation of iron by
the single DOPA residue in the fusion peptide, other peptide-peptide interactions must also be
responsible for aggregate formation [14]. Therefore, in our analysis, we study the mode of self-
assembly and iron complexation by the Dpfp1 inspired mimetic peptide in order to elucidate its
mechanism of aggregate formation. The results from this preliminary study are described in
Appendix B. Using Circular Dichroism (CD) Spectroscopy we determined that upon interaction
with iron (III) the fusion peptide does not adopt a specific secondary structure and instead
maintains a random coil conformation (Figure B-1), thus possibly indicating that Dpfp1 might
not adopt a specific structure if it interacts with iron (III) in the byssus. Additionally, Dynamic
Light Scattering (DLS) (Figure B-2) and Transmission Electron Microscopy (TEM) (Figure B-
91
3) experiments revealed that the ratio of iron (III) to fusion peptide in the mixture affects the size
of the aggregates and the rate of increase in the size of the aggregates. The ability of the Dpfp1
fusion peptide to interact with iron (III) in such a specific way could indicate that Dpfp1 has an
iron (III) dependent role in the byssus such as in the cuticle. Additionally, as discussed in
Appendix B, interpretation of DLS results indicate that in addition to DOPA-iron complexation,
other peptide-peptide interactions amongst the Dpfp1 N- and C-terminal repeats must also be
directing aggregate formation. Further experiments on solution conditions and substitutions of
charged residues in the Dpfp1 fusion peptide will give additional insights into interactions within
the Dpfp1 protein.
4.3 Future work
The current analysis has greatly enhanced our knowledge of the protein composition of the zebra
mussel byssus, however, there are still several large gaps in our understanding and we still have a
long way to go in characterizing the mechanism of zebra mussel adhesion. For one, while several
zebra mussel byssal proteins have now been identified, there are still many more that are
unknown, as evidenced by MALDI-TOF analysis of the byssus (section 1.1.5) [34] and our
inability to identify any protein that are localized specifically in the plaque (as expected of
adhesive footprint proteins). Secondly, even though our current analysis characterizes byssal
distribution between the thread and plaque, it does not reveal any information on the distribution
of proteins within the plaque itself such as at the thread-plaque anchor zone, in the bulk plaque
matrix or at the plaque-substrate interface. As witnessed in marine mussels, the localization of
proteins within the plaque is correlated with their byssal functions (section 1.1.3) and hence,
studying zebra mussel protein distribution within the plaque is critical to our understanding of
their byssal roles. Thirdly, the structural information available on the zebra mussel byssal
proteins is still quite incomplete. The EST (Expressed Sequence Tag) derived primary sequences
in our analysis do not reveal any significant information on post-translational modifications such
as glycosylations and tyrosine hydroxylations to DOPA, thereby limiting our understanding of
their functions and interactions. Additionally, some of the EST-derived primary sequences are
incomplete at the N-terminus. Further investigations of zebra mussel byssal composition are
92
therefore required to identify other novel byssal proteins, determine protein distributions within
the plaque and better characterize protein structure and chemical reactivity. These are described
below.
4.3.1 Identification of other novel zebra mussel byssal proteins
Through the current work, we have identified a series of novel proteins by electrophoresis of the
soluble byssal extract and by peptide fragment fingerprinting (PFF) of the insoluble byssal
matrix. While these proteins have theoretical molecular weights around 4.1 kDa and above
(Table 3-2), MALDI-TOF analysis specifically of the mature thread revealed the presence of
several proteins around 3.7 – 4.2 kDa [34] that have not been identified yet. Additionally, there
are proteins in the range of 5.8 to 7 kDa that are unique to the plaque interface and have not been
identified yet (section 1.1.5) [34]. All of these might represent small proteins present in the
soluble extract that got washed off the gel during staining. Therefore, in order to identify these
proteins, PFF analysis can be performed directly on an undigested soluble extract filtered with
~10 kDa cutoff. Since these proteins are very small, they do not need to be digested and the LC-
MS/MS analysis should be able to isolate intact peptides instead of peptide fragments. Further, to
minimize pre-analysis processing before from homogenizing and separating byssal threads into
soluble extract and insoluble matrix, it will be worth attempting PFF analysis on intact, freshly
secreted byssal threads subjected to trypsin digestion. This analysis may also be useful to
validate the proteins and distributions identified thus far.
4.3.2 Determining protein distribution within the byssal plaque
While we have determined generalized distributions of several zebra mussel byssal proteins
between thread and plaque, we do not have any information on their distribution within the
plaque and hence, on their role in adhesion/cohesion. Even within the byssal plaque, proteins can
have varied localizations such as at the thread-plaque anchor zone, bulk plaque matrix and
plaque-substrate interface. Previous MALDI-TOF analysis of the different byssal regions was
done on mature byssal threads where cross-linking could have hindered the identification of
93
other significant protein peaks [34]. Thus, it will be useful to repeat this analysis with induced
byssal threads that have minimal cross-linking such that even some higher molecular weight
proteins not detected before may now be identifiable. Though this analysis does not reveal
protein sequences, it usefully identifies novel proteins and their hydroxylated (DOPA-modified)
variants, as well as their distributions between thread, plaque and plaque footprint [34]. The
small size and delicateness of fresh byssal threads may however make this analysis difficult but
the adhesive proteins at the interface layer can especially be studied by performing the mass
spectrometry analysis on an upturned plaque embedded in gelatin [34].
Immunohistochemistry is a useful technique used to localize an antigen such as a protein in a
tissue by developing a high affinity and specific antibody against it [20]. This method has been
used in marine mussels, however the analysis is difficult due to the poor antigen quality of some
proteins and/or significant shielding of epitopes in the mature byssus [20]. In zebra mussels,
immunolocalization of Dpfp1 did not detect the protein within the mature byssal threads (due to
epitope masking) but detected it in the foot tissue and homogenized, acid-extracted threads
(section 1.1.5) [4]. A next step is therefore to perform this analysis in plaques from freshly
secreted byssal threads where antigen eptiopes will not be as masked. Additionally, now that the
sequences of several byssal proteins have been identified, antibodies can be developed against
these to locate them within the different regions of the byssus and in secretory granules
surrounding the ventral groove in the mussels foot.
4.3.3 Characterizing structure and chemical reactivity of byssal proteins
The EST derived primary sequences in our analysis do not reveal any information on post-
translational modifications within the protein sequence such as the presence of glycosylations
and DOPA modifications. Such information is available only for the sequences of Dpfp1 and
Dpfp2 as determined previously by staining gels for DOPA (Arnow or NBT stain) and
glycoproteins (PAS stain) and by quantifying DOPA and carbohydrate content following
purification of the proteins [33]. Therefore, for proteins that can be identified on a gel, an
important next step is to stain for DOPA and glycoproteins to determine the nature of these
proteins. In order to be able to quantify these protein characteristics and obtain N-terminal
94
sequences (where incomplete), the proteins will however have to first be purified in sufficient
amounts from the gel thus requiring extractions on an even larger scale, which is not necessarily
feasible. Instead, the identified target proteins can now be extracted in greater amounts from the
mussels foot and then be separated by 2D gel electrophoresis. Purified proteins can then be
functionally characterized for their post-translational modifications, secondary structure,
biophysical properties and precise byssal distributions.
Peptide mimics of the sequenced byssal proteins will additionally provide useful insights into
their chemical reactivities and modes of interactions. For proteins with tandem repeats (such as
Dpfp2) the chemical reactivity of individual repeats can be studied, for proteins with co-block
structures (such as Dpfp1 (section 4.3), Dpfp5 and Dpfp9), sample repeats from each block can
be fused together and for proteins that are really small (such as Dpfp12), the whole sequence
could possibly be synthesized for characterization. Unfortunately, the position of DOPA
modifications is unknown in most of these sequences and therefore assumptions on DOPA
positions would be required when synthesizing the mimics. As with experiments involving the
fusion peptide mimic of Dpfp1 described in Appendix B, the mimetic peptides can also be
interacted with metals such as iron (III) to observe their complexation abilities.
95
4.4 Significance and Conclusions
An understanding of zebra mussel adhesion is critical to the development of specific anti-fouling
strategies against the species and will contribute to the design of biological adhesives as an
alternative to those currently based on marine mussel adhesion. However, limited information on
the zebra mussel byssal composition had thus far held back interpretation of sequence properties
and had prevented studies on their adhesive/cohesive capacities. In the current work, the
identification of several novel byssal proteins along with information on their primary structures
and byssal distributions has greatly added to our knowledge of the protein composition of the
zebra mussel byssus. This work has allowed us to identify protein characteristics and sequence
motifs that are common to zebra mussel byssal proteins and in the future, will allow further
characterization of byssal protein adhesion/cohesion by means of mimetic peptide studies and
biochemical characterization techniques such as immunolocalization. The novel proteins
identified here do not however represent the complete list of zebra mussel byssal proteins and
therefore, future efforts must be directed at updating this list, in addition to functionally
characterizing the proteins.
96
Appendix A:
Quagga Mussel Adhesion: Novel Proteins and their
Byssal Distribution
In addition to our analysis on zebra mussels, we also studied the byssal protein composition of
the closely related freshwater mussel, the quagga mussel (Dreissena bugensis) which is also a
biofouling species. As described in the Introduction in section 1.1.6, thus far only four quagga
mussel byssal proteins (Dbfp0, 1, 2 and 3) have been identified as DOPA-staining precursors
upon extraction from the mussel’s foot [33] and the partial sequence of only one of these (Dbfp1)
is known [5]. Thus, as with zebra mussels, limited information is available on the composition of
the quagga mussel byssus. Hence, with an objective to identify novel byssal proteins and
determine their distributions within the byssus, we extract quagga mussel proteins from induced,
freshly secreted byssal threads, using the protocol described for zebra mussels in sections 2.3.1
and 3.3.1 and then analyze the extracts by Tricine-PAGE gel electrophoresis as described in
section 2.3.4.
We found that the quagga mussel byssal proteins are much more easily extractible than the zebra
mussel byssal proteins. While for zebra mussels, we extract 4µg protein/mussel thread and
7µg/plaque, in quagga mussels we are able to extract 12 µg protein/mussel thread and 10
µg/plaque (as determined by A280 measurements of extracts). Additionally, several zebra mussel
extracts (~65 full byssal threads) have to be pooled together for protein bands to be visualized on
the gel (Figure 2-1) but quagga mussel bands are clearly visualized even when proteins from a
single extraction of 15 threads/plaques are loaded (Figure A-1) The reason for the easier
extraction of the quagga mussel byssus could be the presence of less DOPA in quagga mussel
byssal proteins as shown by previous comparisons of Dpfp1 (6.6 mol% DOPA) and Dbfp1 (0.55
mol%) and of the overall amino acid contents of their byssi (0.6 mol% DOPA in zebra versus
0.1mol% in quagga) (described in section 1.1.6) [33], [5].
97
Figure A-1. Electrophoretic identification of quagga mussel byssal proteins. Proteins were
extracted in basic extraction buffer (as described in the Methods sections in Chapter 2 and 3)
from 15 induced, freshly secreted byssal threads and the soluble extract was loaded on a16%
Tricine PAGE gel that that was silver-stained for protein visualization. The leftmost lane
contains a Colorburst molecular weight ladder. Dbfp# labeled proteins represent previously
identified byssal proteins and underlined proteins represent novel byssal foot proteins identified
in the extract. Numbers in brackets represent the approximate molecular weights (in kDa) of the
visible proteins. The gel has been modified to show the most relevant lanes.
98
As with Zebra mussels, electrophoretic analysis of extracts from the intact quagga mussel byssal
thread/plaque led to the identification of previously known as well as novel byssal proteins. As
shown in Figure A-1, protein bands were identified corresponding to the molecular weight of
previously known Dbfp0 (>210 kDa), the two forms of Dbfp1 (80 and 69 kDa) and Dbfp2 (22
kDa) and novel bands include those seen at approximately 30, 20 and 16 kDa (Figure A-1).
When pooled extracts of separated threads (~45) and plaques (~45) were analyzed by
electrophoresis, even a larger number of novel proteins were identified in both the thread and
plaque. These are indicated with arrows in Figure A-2. The % values beside the bands indicate
the % density of the bands relative to the plaque 7 kDa band taken as 100% density (densities
were calculated in total pixels using the Gel Analysis Software ‘UN-SCAN-IT’, Silk Scientific
Inc., Utah, USA). Most importantly, a number of significant proteins bands were identified that
are present uniquely in the plaque (indicated with red asterixes in Figure A-2). Since a greater
mass of thread protein (241 µg) versus plaque protein (187 µg) was loaded on the gel, it is
unlikely that the unique plaque bands are an artifact of protein extraction and loading. Thus,
these unique plaque proteins could represent proteins with specialized adhesive functions in the
plaque. The ~16 kDa band is the most prominent protein unique to the plaque, followed by
proteins at approximately 22 kDa (possibly Dbfp2), 20 kDa, 13 kDa (possibly Dbfp3) and 12
kDa (possibly Dbfp3) some of which might be present as light bands in the thread. The Dbfp0
protein with a molecular weight greater than 210 kDa is seen in both the thread and plaque,
however the density of the plaque band is about ten times that of the thread band.
In addition to the unique plaque proteins, a number of prominent bands were identified on the gel
in Figure A-2 that are common to both thread and plaque and could possess similar roles as the
uniformly distributed zebra mussel proteins identified in our analysis. While most of these have
comparatively high molecular weights such as the bands at approximately 80 kDa (possibly
Dbfp1), 50 kDa and 35 kDa, there is a single common band at ~ 7 kDa that is relatively smaller
(Figure A-2). Additional faint but distinct protein bands are also seen throughout the thread and
plaque lanes. These could represent new proteins not described above or maybe even variants of
the other proteins (Figure A-2). MALDI analysis of the thread and plaque from a freshly
secreted byssal thread also reveals major peaks around 7451 and ~7506 kDa (Figure A-3) that
correspond well to the ~ 7 kDa band seen by electrophoresis. In the plaque, additional low
99
intensity peaks are also seen around the major peaks. As suggested with gel bands, these peaks
could represent other new proteins or potential variants of the 7 kDa protein.
Figure A-2. Electrophoretic determination of the distribution of quagga mussel byssal proteins
between thread and plaque. Proteins from separated threads and plaques were extracted in basic
extraction buffer, pooled together, dialyzed and lyophilized as described in the Methods in
Chapter 2. Lyophilized proteins were resuspended in water and loaded on a 16% Tricine PAGE
gel that was then silver-stained for protein visualization. Masses in brackets represent the mass
of lyophilized protein loaded for each sample. Arrows indicate visible bands in the thread and
plaque and MWs (in kDa) indicate molecular weights of some significant bands. Red asterixes
indicate proteins that are almost uniquely present in the plaque extract. Per cent values beside
bands indicate the % density of the bands relative to the plaque’s 7 kDa band taken as 100%
density. The gel has been modified to show the most relevant lanes.
100
Figure A-3. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry
(MALDI-TOF MS) analysis of the quagga mussel thread and plaque from an induced, freshly
secreted byssal thread. The byssal thread was coated with sinapinic acid (3,5-dimethoxy-4-
hydroxy cinnamic acid) matrix (10 mg/mL Sinapinic acid in 50:50:0.1 water: Acetonitrile:
Trifluoroacetic acid) and analyzed using an ‘Applied Biosystems’ MALDI-TOF analyser at the
Department of Forestry, University of Toronto, Toronto.
101
In zebra mussels, following identification of novel proteins, the protein primary sequences were
determined by tandem mass spectrometry of gel bands and database matching of mass spectra
against a cDNA library representing genes expressed in the mussels foot. Unfortunately, in
quagga mussels, such peptide fragment fingerprinting (PFF) analysis cannot be done because no
such cDNA library has been created for the species. Thus, instead, we attempted to perform de
novo sequencing analysis on protein gel bands in order to infer sequence information directly
from the experimental MS/MS spectrum [85]. To improve the quality of the MS/MS spectrum,
more concentrated protein samples were analyzed by pooling together multiple gel bands of the
same protein. Analysis was thus perfomed on five bands of Dbfp0 (two from the gel in Figure
A-2 and three from a gel not shown here), the thread and plaque bands of the 35 kDa protein, the
one plaque band of the 16 kDa protein and the thread and plaque bands of the 7 kDa protein, all
taken from the gel in Figure A-2. The pooled gel bands were digested with trypsin and LC-
MS/MS spectra were then obtained as described in the Methods in section 2.3.6. The de novo
sequencing software, PEAKS (Bioinformatics Solutions Inc., Waterloo, Ontario, Canada), was
then used to derive sequences of the trypsin digested protein fragments. Analysis of these de
novo sequences is still underway.
102
Appendix B:
Peptide Mimics of the Zebra Mussel Byssal Protein
Dpfp1
Peptide mimics of zebra mussel byssal proteins can provide useful insights into the mechanism
of adhesion/cohesion of the proteins. Therefore, Gilbert, 2010 synthesized peptide mimics (by
Fmoc solid-phase synthesis) of the only fully sequenced zebra mussel byssal protein, Dpfp1, to
learn more about its mode of interaction [14]. One such mimic is a fusion peptide made by fusing
one N-terminal [P(V/E)YP(T/S)(K/Q)X] and one C-terminal consensus repeat
[KPGPY*DYDGPYDK] of Dpfp-1 (Figure 1-5). The resulting peptide has a sequence
PVYPTKYKPGPY*DYDGPYDK where Y* stands for DOPA [14]. Gilbert, 2010 found that
upon complexation with iron (III), this fusion peptide self-assembles into a film over several
days but no film is formed in the absence of DOPA or iron. When the film was characterized by
Scanning Electron Microscopy (SEM), a layer of spherical aggregates about 500 nm in diameter
was seen [14]. Such aggregate formation was not seen with just the DOPA containing C-terminal
repeat or even a double C-terminal repeat (containing 2 DOPAs) even in the presence of iron.
Gilbert, 2010 also found that the repeats must be part of the same peptide and not just mixed
together as a co-solution for this aggregate formation to occur [14]. These observations thus
indicated that the two Dpfp1 repeat sequences and Fe3+
must interact in a very specific way to
induce self assembly into aggregates [14].
Characterization of the specific interactions, between the two Dpfp1 repeat sequences and Fe3+
,
that lead to aggregate formation will provide useful information on the function and mechanism
of adhesion/cohesion of the Dpfp1 protein [14]. Therefore, here, we investigate the mode of iron
complexation and self-assembly of the Dpfp1-inspired mimetic peptide in order to elucidate its
mechanism of aggregate formation. In this direction, we study the peptides secondary structure
upon self-assembly in the presence of iron (III) using Circular Dichroism (CD) Spectroscopy and
investigate the size of aggregates formed by iron complexation under varied iron (III) to fusion
103
peptide ratios (Fe3+
: FP) using Dynamic Light Scattering (DLS) and Transmission Electron
Microscopy (TEM).
Dynamic Light Scattering (DLS) measurements of aggregate size of solutions containing iron
(III) and the DOPA-containing fusion peptide revealed the formation of aggregates of varying
sizes depending on the solution conditions such as the Fe3+
: FP ratios (as will be discussed). In
the absence of iron or replacement of peptidyl DOPA with tyrosine, no aggregate formation was
detected. However, a double C-terminal repeat (26 residues) containing 2 DOPA’s showed even
bigger aggregate formation (results not shown here). These observations reinforced the
importance of both iron and DOPA in peptide aggregate formation. All iron – peptide mixtures
were prepared by mixing 2 mg/mL filtered peptide solutions in water (pH 6.5) with an equal
volume of the desired concentration of filtered FeCl3.6H2O in 20 mM BisTris buffer (2,2-
Bis(hydroxymethyl)-2,2’,2”-nitrilotriethanol). BisTris buffer complexes the iron and ensures that
it does not precipitate out as iron hydroxide. This method was adapted from Gilbert, 2010 [14].
In order to characterize any change in the secondary structure of the peptide upon interaction
with iron (III), we performed CD spectroscopy of the fusion peptide solution both in the presence
and absence of iron (III). We found that in both conditions the peptide maintains a random coil
conformation thus indicating that the Dpfp1-inspired fusion peptide does not form any specific
secondary structure upon complexation with iron. Figure B-1 illustrates the CD spectrum of a
2:1 Fe (III): Fusion Peptide solution in BisTris where the the negative trough between 190-225
nm indicates the random coil and the 230 nm peak indicates tyrosine.
104
Figure B-1. Circular Dichroism spectrum of a 2:1 Fe3+
: fusion peptide solution in BisTris
buffer.
Next, in order to characterize the effect of the iron to peptide ratio on aggregate formation, we
used DLS analysis to determine the size of aggregates formed with different ratios. Figure B-2
shows the rate of increase in aggregate size for four Fe3+
: FP ratios over 10 minutes, five
minutes after mixing. It was seen that smaller Fe3+
: FP ratios (1:5, 1:3, 1:2) form larger
aggregates than larger Fe3+
: FP ratios (2:1) and also show a greater rate of increase in aggregate
size. The same ratios at pH 7 instead of pH 6 (shown in Figure B-2) give the same pattern of rate
of increase in aggregate size but also show much bigger aggregates being formed (results not
shown here). The pH effect could however be due to changes in interactions of iron with the
BisTris buffer which has a pKa of 6.5 at 25°C. In addition to DLS, TEM images of the 1:2 and
2:1 Fe3+
: FP solutions (five minutes after mixing) were also taken in order to visualize the shape
of the aggregates and verify the DLS findings. The TEM images as well, as shown in Figure B-
3, confirmed that smaller iron to peptide ratios form larger size aggregates. In fact, the 1:2 ratio
forms twice as large aggregates (~30 nm radius) as the 2:1 ratio (~15 nm radius) (Figure B-3)
which is similar to the pattern of aggregate sizes obtained by DLS for the 1:2 and 2:1 ratios, 48
nm and 19 nm, respectively. Interestingly as well, the TEM images indicate that the 1:2 Fe3+
: FP
solution has aggregates that tend to form clusters whereas the aggregates in the 2:1 Fe3+
: FP
solution are more dispersed (Figure B-3).
-7
-6
-5
-4
-3
-2
-1
0
1
2
190 210 230 250
Ab
sorb
an
ce
Wavelength (nm)
105
Figure B-2. Dynamic Light Scattering measurements of the effect of iron (III) to fusion peptide
ratio on size of aggregates formed. Aggregeate sizes of mixtures of four Fe3+
: FP ratios (2:1, 1:2,
1:3 and 1:5) at pH 6 were measured over ten minutes, starting from 5 minutes after mixing. DLS
measurements were taken at 20°C with 6 line measurements of each sample taken at 2 minute
intervals, using a Malvern Zetasizer NanoZS instrument. Aggregate sizes measured were not
always consistent between experiments but the general pattern of results was the same.
0
100
200
300
400
500
600
5 7 9 11 13 15
Ag
gre
gat
e R
adiu
s (n
m)
Time (min)
1:5 pH 6
1:3 pH 6
1:2 pH 6
2:1 pH 6
Fe3+
: FP ratio
106
Figure B-3. Transmission Electron Microscopy (TEM) images depicting the effect of two Fe3+
:
fusion peptide (FP) ratios (2:1 and 1:2) on the size of aggregates formed. TEM was done by
negative staining with Phosphotungstic acid (PTA) on a carbon coated Nickel grid, 5 minutes
after mixing. Images were taken with a Tecnai 20 Microscope in Mt. Sinai Hospital, Toronto.
In trying to understand why lower iron to peptide ratios lead to bigger peptide aggregates, we
consulted a hypothesis by Zeng et al., 2010 whether they analyze the effects of iron
concentration on the modes of iron-DOPA complexation within a DOPA-containing marine
mussel protein [86]. In accordance with their hypothesis, at lower iron to DOPA ratios of 1:2 and
1:3 (Figure B-2), bis- and tris-complexation takes place respectively, causing peptides to come
together and form aggregates. At higher iron to DOPA ratios (greater than 1:1), mono-
complexation takes place and the peptides are dispersed [86]. While this hypothesis explains the
results seen for most of the ratios in Figure B-2, it does not explain why the 1:5 Fe3+
: FP ratio
produces even bigger aggregates (since one Fe3+
cannot be complexed by more than three DOPA
107
residues). Therefore, in addition to iron-DOPA complexation, other interactions like peptide-
peptide interactions must also occur within the aggregate that contribute to the results we see.
Gilbert, 2010 had hypothesized that in addition to the DOPA-iron interactions within the fusion
peptide (PVYPTKYKPGPY*DYDGPYDK) where Y* is DOPA, other interactions between the
positively charged lysine (K) and negatively charged aspartic acid (D) residues could also be
responsible for aggregate formation [14]. We tested this hypothesis by introducing a charge
shielding agent, NaCl, in the solution to shield any K and D interactions, but instead of
interfering with aggregate formation, higher concentrations of NaCl led to the formation of
bigger aggregates with a greater rate of increase in aggregate size (result not shown here). It
could be that NaCl helps shield some repulsive forces between peptides and hence promotes
bigger aggregates. In the future, it will be useful to directly introduce modifications to the fusion
peptide such as replacing charged residues with uncharged glycine to better study their role in
aggregate formation. Additionally, UV-Vis spectroscopy of the varied iron to peptide mixtures
will provide useful information of the mode of iron complexation (mono-, bis-, or tris-) by the
peptidyl DOPA. Overall, our experiments have revealed some interesting observations on the
nature of the aggregates formed by the interaction of the Dpfp1-inspired fusion peptide with iron
(III), however, further experiments will be needed to better understand the mechanism of self-
assembly and iron-complexation involved in aggregate formation.
108
References
1. Claudi, R. and G.L. Mackie, Practical manual for zebra mussel monitoring and control. 1994:
CRC.
2. Strayer, D.L., Twenty years of zebra mussels: lessons from the mollusk that made headlines.
Front Ecol Environ, 2008. 7(3): p. 135-141.
3. Morton, B., The anatomy of Dreissena polymorpha and the evolution and success of the
heteromyarian form in the Dreissenoidea. Zebra mussels: biology, impacts and control, 1993.
185: p. 216.
4. Anderson, K.E. and J.H. Waite, Immunolocalization of Dpfp1, a byssal protein of the zebra
mussel Dreissena polymorpha. J Exp Biol, 2000. 203(Pt 20): p. 3065-3076.
5. Anderson, K.E. and J.H. Waite, Biochemical characterization of a byssal protein from Dreissena
bugensis (Andrusov). Biofouling, 2002. 18(1): p. 37-45.
6. Farsad, N. and E.D. Sone, Zebra mussel adhesion: Structure of the byssal adhesive apparatus in
the freshwater mussel, Dreissena polymorpha. J Struct Biol, 2012. 177(3): p. 613-620.
7. Eckroat, L.R. and L.M. Steele, Comparative morphology of the byssi of Dreissena polymorpha
and Mytilus edulis. Am Malacol Bull, 1993. 10: p. 103-108.
8. Allen, J.A., The recent Bivalvia: Their form and evolution. 1985.
9. Waite, J.H., et al., Mussel adhesion: finding the tricks worth mimicking. The journal of adhesion,
2005. 81(3-4): p. 297-317.
10. Rzepecki, L.M. and J.H. Waite, The byssus of the zebra mussel, Dreissena polymorpha. I:
Morphology and in situ protein processing during maturation. Mol Mar Biol Biotechnol, 1993.
2(5): p. 255-66.
11. Lee, H., N.F. Scherer, and P.B. Messersmith, Single-molecule mechanics of mussel adhesion.
Proc Natl Acad Sci U S A, 2006. 103(35): p. 12999-3003.
12. Taylor, S.W., et al., Ferric ion complexes of a DOPA-containing adhesive protein from Mytilus
edulis. Inorganic Chemistry, 1996. 35(26): p. 7572-7577.
13. Holten-Andersen, N., et al., Metals and the integrity of a biological coating: the cuticle of mussel
byssus. Langmuir, 2009. 25(6): p. 3323-3326.
14. Gilbert, T.W., Investigation of the protein components of the zebra mussel (Dreissena
polymorpha) byssal adhesion apparatus, in Institute of Biomaterials and Biomedical
Engineering. 2010, University of Toronto, MASc Thesis.
15. Burzio, L.A. and J.H. Waite, Cross-linking in adhesive quinoproteins: studies with model
decapeptides. Biochemistry, 2000. 39(36): p. 11147-11153.
16. Yu, J., et al., Mussel protein adhesion depends on interprotein thiol-mediated redox modulation.
Nat Chem Biol, 2011. 7(9): p. 588-90.
17. Farsad, N., T.W. Gilbert, and E.D. Sone, Adhesive structure of the freshwater zebra mussel,
Dreissena polymorpha, in Materials Research Society. 2009, Materials Research Society
Symposium Proceedings.
18. Lee, B.P., et al., Mussel-Inspired adhesives and coatings. Annu Rev Mater Res, 2011. 41: p. 99-
132.
19. Silverman, H.G. and F.F. Roberto, Understanding marine mussel adhesion. Mar Biotechnol
(NY), 2007. 9(6): p. 661-681.
20. Waite, J.H., Adhesion a la moule. Integrative and comparative biology, 2002. 42(6): p. 1172-
1180.
21. Waite, J.H., X.X. Qin, and K.J. Coyne, The peculiar collagens of mussel byssus. Matrix Biol,
1998. 17(2): p. 93-106.
22. Qin, X.X., K.J. Coyne, and J.H. Waite, Tough tendons. Mussel byssus has collagen with silk-like
domains. J Biol Chem, 1997. 272(51): p. 32623-32627.
109
23. Coyne, K.J., X.X. Qin, and J.H. Waite, Extensible collagen in mussel byssus: A natural block
copolymer. Science, 1997. 277(5333): p. 1830-1832.
24. Qin, X.X. and J.H. Waite, A potential mediator of collagenous block copolymer gradients in
mussel byssal threads. Proc Natl Acad Sci U S A, 1998. 95(18): p. 10517-10522.
25. Sagert, J. and J.H. Waite, Hyperunstable matrix proteins in the byssus of Mytilus
galloprovincialis. J Exp Biol, 2009. 212(Pt 14): p. 2224-2236.
26. Cha, H.J., D.S. Hwang, and S. Lim, Development of bioadhesives from marine mussels.
Biotechnol J, 2008. 3(5): p. 631-638.
27. Taylor, S.W., et al., trans-2, 3-cis-3, 4-Dihydroxyproline in the tandemly repeated consensus
decapeptides of an adhesive protein from Mytilus edulis. J Am Chem Soc, 1994. 116(10): p. 803-
810.
28. Rzepecki, L.M., K.M. Hansen, and J.H. Waite, Characterization of a cystine-rich polyphenolic
protein family from the blue mussel Mytilus edulis L. The Biological Bulletin, 1992. 183(1): p.
123-137.
29. Warner, S.C. and J.H. Waite, Expression of multiple forms of an adhesive plaque protein in an
individual mussel, Mytilus edulis. Marine Biology, 1999. 134(4): p. 729-734.
30. Zhao, H. and J.H. Waite, Proteins in load-bearing junctions: the histidine-rich metal-binding
protein of mussel byssus. Biochemistry, 2006. 45(47): p. 14223-14231.
31. Waite, J.H. and X. Qin, Polyphosphoprotein from the adhesive pads of Mytilus edulis.
Biochemistry, 2001. 40(9): p. 2887-2893.
32. Zhao, H. and J.H. Waite, Linking adhesive and structural proteins in the attachment plaque of
Mytilus californianus. J Biol Chem, 2006. 281(36): p. 26150-26158.
33. Rzepecki, L.M. and J.H. Waite, The byssus of the zebra mussel, Dreissena polymorpha. II:
Structure and polymorphism of byssal polyphenolic protein families. Mol Mar Biol Biotechnol,
1993. 2(5): p. 267-279.
34. Gilbert, T.W. and E.D. Sone, The byssus of the zebra mussel (Dreissena polymorpha): spatial
variations in protein composition. Biofouling, 2010. 26(7): p. 829-836.
35. Anderson, K.E. and J.H. Waite, A major protein precursor of zebra mussel (Dreissena
polymorpha) byssus: deduced sequence and significance. Biol Bull, 1998. 194(2): p. 150-160.
36. Endrizzi, B.J. and R.J. Stewart, Glueomics: an expression survey of the adhesive gland of the
sandcastle worm. J Adhes, 2009. 85(8): p. 546-559.
37. Wiegemann, M. and B. Watermann, The impact of desiccation on the adhesion of barnacles
attached to non-stick coatings. Biofouling, 2004. 20(3): p. 147-153.
38. Flammang, P., et al., A study of the temporary adhesion of the podia in the sea star asterias
rubens (Echinodermata, asteroidea) through their footprints. J Exp Biol, 1998. 201(Pt 16): p.
2383-2395.
39. Flammang, P., J. Ribesse, and M. Jangoux, Biomechanics of adhesion in sea cucumber Cuvierian
tubules (Echinodermata, Holothuroidea). ICB, 2002. 42(6): p. 1107-1115.
40. Stewart, R.J. and C.S. Wang, Adaptation of caddisfly larval silks to aquatic habitats by
phosphorylation of H-fibroin serines. Biomacromolecules, 2010. 11(4): p. 969-974.
41. Wang, C.S. and R.J. Stewart, Localization of the bioadhesive precursors of the sandcastle worm,
Phragmatopoma californica (Fewkes). J Exp Biol, 2012. 215(2): p. 351-361.
42. Zhao, H., et al., Cement proteins of the tube-building polychaete Phragmatopoma californica. J
Biol Chem, 2005. 280(52): p. 42938-42944.
43. Ohkawa, K., et al., Purification and characterization of a dopa-containing protein from the foot
of the Asian freshwater mussel, Limnoperna fortunei. Biofouling, 1999. 14(3): p. 181-188.
44. Yamamoto, H. and K. Ohkawa, Synthesis of adhesive protein from the vitellaria of the liver
flukeFasciola hepatica. Amino Acids, 1993. 5(1): p. 71-75.
45. Gatesy, J., et al., Extreme diversity, conservation, and convergence of spider silk fibroin
sequences. Science, 2001. 291(5513): p. 2603-2605.
110
46. Holten-Andersen, N., H. Zhao, and J.H. Waite, Stiff coatings on compliant biofibers: the cuticle
of Mytilus californianus byssal threads. Biochemistry, 2009. 48(12): p. 2752-2759.
47. Zhao, H. and J.H. Waite, Coating proteins: structure and cross-linking in fp-1 from the green
shell mussel Perna canaliculus. Biochemistry, 2005. 44(48): p. 15915-15923.
48. Burzio, L.A., et al., The adhesive protein of Choromytilus chorus (Molina, 1782) and Aulacomya
ater (Molina, 1782): a proline-rich and a glycine-rich polyphenolic protein. Biochim Biophys
Acta, 2000. 1479(1-2): p. 315-320.
49. Rzepecki, L.M., et al., Molecular diversity of marine glues: polyphenolic proteins from five
mussel species. Mol Mar Biol Biotechnol, 1991. 1(1): p. 78-88.
50. Ninan, L., et al., Adhesive strength of marine mussel extracts on porcine skin. Biomaterials, 2003.
24(22): p. 4091-4099.
51. Lee, B., J. Dalsin, and P. Messersmith, Biomimetic adhesive polymers based on mussel adhesive
proteins. Biological Adhesives, 2006: p. 257-278.
52. Penoff, J., Skin closures using cyanoacrylate tissue adhesives. Plastic and reconstructive surgery,
1999. 103(2): p. 730.
53. Brubaker, C.E. and P.B. Messersmith, The present and future of biologically inspired adhesive
interfaces and materials. Langmuir, 2012. 28(4): p. 2200-2205.
54. Farsad, N., Ultrastructural and Histochemical Characterization of the Zebra Mussel Adhesive
Apparatus, in Institute for Biomaterials and Biomedical Engineering. 2010, University of
Toronto, MASc Thesis.
55. Eckroat, L.R., et al., The byssus of the zebra mussel (Dreissena polymorpha): Morphology, byssal
thread formation, and detachment. Mol Mar Biol Biotechnol, 1993: p. 239-263.
56. Xu, W. and M. Faisal, Putative identification of expressed genes associated with attachment of
the zebra mussel (Dreissena polymorpha). Biofouling, 2008. 24(3): p. 157-161.
57. Xu, W. and M. Faisal, Development of a cDNA microarray of zebra mussel (Dreissena
polymorpha) foot and its use in understanding the early stage of underwater adhesion. Gene,
2009. 436(1-2): p. 71-80.
58. Tamarin, A., P. Lewis, and J. Askey, The structure and formation of the byssus attachment
plaque in Mytilus. J Morphol, 1976. 149(2): p. 199-221.
59. Sprung, M., Field and laboratory observations of Dreissena polymorpha larvae: abundance,
growth, mortality and food demands. Archives in Hydrobiology, 1989. 115(4): p. 537-561.
60. Waite, J.H., Process for purifying and stabilizing catechol-containing proteins and materials
obtained thereby. 1984, University of Connecticut, Farmington, Conn.: United States. p. 5.
61. Mortz, E., et al., Improved silver staining protocols for high sensitivity protein identification
using matrix-assisted laser desorption/ionization-time of flight analysis. Proteomics, 2001. 1(11):
p. 1359-1363.
62. Zhu, Y.Y., et al., Reverse transcriptase template switching: a SMART approach for full-length
cDNA library construction. Biotechniques, 2001. 30(4): p. 892-897.
63. Burzio, L.A., et al., In vitro polymerization of mussel polyphenolic proteins catalyzed by
mushroom tyrosinase. Comp Biochem Physiol B Biochem Mol Biol, 2000. 126(3): p. 383-389.
64. Gantayet, A. and E.D. Sone, Novel adhesive proteins identified in the insoluble byssal matrix of
the freshwater zebra mussel Dreissena polymorpha. 2012, University of Toronto, Unpublished
Manuscript: Toronto.
65. Waite, J.H., T.J. Housley, and M.L. Tanzer, Peptide repeats in a mussel glue protein: theme and
variations. Biochemistry, 1985. 24(19): p. 5010-5014.
66. Popiel, H.A., et al., Disruption of the toxic conformation of the expanded polyglutamine stretch
leads to suppression of aggregate formation and cytotoxicity. Biochem Biophys Res Commun,
2004. 317(4): p. 1200-1206.
67. Weaver, R.F., Molecular Biology. Third ed. 2005, New York: McGraw-Hill.
68. Lee, C., et al., Sequence analysis of choriogenin H gene of medaka (Oryzias latipes) and mRNA
expression. Environ Toxicol Chem, 2002. 21(8): p. 1709-1714.
111
69. Lyons, C.E., et al., Expression and structural analysis of a teleost homolog of a mammalian zona
pellucida gene. J Biol Chem, 1993. 268(28): p. 21351-21358.
70. Elias, R.J., D.J. McClements, and E.A. Decker, Antioxidant activity of cysteine, tryptophan, and
methionine residues in continuous phase beta-lactoglobulin in oil-in-water emulsions. J Agric
Food Chem, 2005. 53(26): p. 10248-10253.
71. Brennan, T.V. and S. Clarke, Deamidation and isoaspartate formation in model synthetic
peptides: The effects of sequence and solution environment. ChemInform, 1995. 26(32).
72. Marsden, J.E. and D.M. Lansky, Substrate selection by settling zebra mussels, Dreissena
polymorpha, relative to material, texture, orientation, and sunlight. Can J Zool, 2000. 78(5): p.
787-793.
73. Gantayet, A., L. Ohana, and E.D. Sone, Identification and sequence analysis of novel proteins in
the zebra mussel adhesive apparatus. 2012, University of Toronto, Unpublished Manuscript:
Toronto.
74. Gentzel, M., et al., Preprocessing of tandem mass spectrometric data to support automatic
protein identification. Proteomics, 2003. 3(8): p. 1597-610.
75. Ma, B., et al., PEAKS: powerful software for peptide de novo sequencing by tandem mass
spectrometry. Rapid communications in mass spectrometry, 2003. 17(20): p. 2337-2342.
76. Karty, J.A., et al., Artifacts and unassigned masses encountered in peptide mass mapping. J
Chromatogr B Analyt Technol Biomed Life Sci, 2002. 782(1-2): p. 363-383.
77. Wright, H.T., Nonenzymatic deamidation of asparaginyl and glutaminyl residues in proteins. Crit
Rev Biochem Mol Biol, 1991. 26(1): p. 1-52.
78. Stewart, R.J., T.C. Ransom, and V. Hlady, Natural Underwater Adhesives. J Polym Sci B Polym
Phys, 2011. 49(11): p. 757-771.
79. Zhao, H., et al., Probing the adhesive footprints of Mytilus californianus byssus. J Biol Chem,
2006. 281(16): p. 11090-11096.
80. Yano, M., et al., Shematrin: a family of glycine-rich structural proteins in the shell of the pearl
oyster Pinctada fucata. Comp Biochem Physiol B Biochem Mol Biol, 2006. 144(2): p. 254-262.
81. Lei, M. and R. Wu, A novel glycine-rich cell wall protein gene in rice. Plant Mol Biol, 1991.
16(2): p. 187-198.
82. Inoue, K., et al., Mussel adhesive plaque protein gene is a novel member of epidermal growth
factor-like gene family. J Biol Chem, 1995. 270(12): p. 6698-6701.
83. Yu, J., et al., Mussel protein adhesion depends on interprotein thiol-mediated redox modulation.
Nat Chem Biol, 2011. 7(9): p. 588-590.
84. Hwang, D.S., et al., Practical recombinant hybrid mussel bioadhesive fp-151. Biomaterials, 2007.
28(24): p. 3560-3568.
85. Hernandez, P., M. Muller, and R.D. Appel, Automated protein identification by tandem mass
spectrometry: issues and strategies. Mass Spectrom Rev, 2006. 25(2): p. 235-254.
86. Zeng, H., et al., Strong reversible Fe3+-mediated bridging between dopa-containing protein
films in water. Proc Natl Acad Sci U S A, 2010. 107(29): p. 12850-12853.