identification and sequence analysis of novel proteins in ... · arpita gantayet masters of applied...

123
Identification and Sequence Analysis of Novel Proteins in the Zebra Mussel Adhesive Apparatus by Arpita Gantayet A thesis submitted in conformity with the requirements for the degree of Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto © Copyright by Arpita Gantayet 2012

Upload: others

Post on 22-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

i

Identification and Sequence Analysis of Novel

Proteins in the Zebra Mussel Adhesive Apparatus

by

Arpita Gantayet

A thesis submitted in conformity with the requirements

for the degree of Masters of Applied Science

Institute of Biomaterials and Biomedical Engineering

University of Toronto

© Copyright by Arpita Gantayet 2012

Page 2: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

ii

Identification and Sequence Analysis of Novel Proteins in the

Zebra Mussel Adhesive Apparatus

Arpita Gantayet

Masters of Applied Science

Institute of Biomaterials and Biomedical Engineering

University of Toronto

2012

Abstract

The freshwater zebra mussel Dreissena polymorpha is a biofouling species that adheres to varied

substrates underwater using a proteinaceous byssus that consists of a bundle of threads tipped

with adhesive plaques. This underwater adhesion is an inspiration for the development of

medical and dental bioadhesives, however, the byssus is highly resistant to biochemical

characterization owing to extensive cross-linking and therefore, limited information is available

on the mechanisms of adhesion and cohesion of byssal proteins. We report here on the

identification and sequence analysis of eight novel byssal proteins identified in the soluble

extract and insoluble matrix from induced, freshly secreted byssal threads with minimal cross-

linking, using gel electrophoresis and LC-MS/MS sequencing techniques. Identified byssal

proteins have theoretical molecular weights ranging from 4.1 kDa to 20.1 kDa and isoelectric

points ranging from 4.2 to 9.6 and have several common characteristics including consensus

repeat patterns, block structures and defined sequence motifs.

Page 3: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

iii

Acknowledgements

There are several individuals whom I would like to thank for their direction, support and

encouragement during this project. Most importantly, I would like to thank my supervisor, Dr.

Eli Sone for his guidance and mentorship and for creating such an incredible learning experience

over the course of this project. His devotion to his students, his attention to detail, his

unwavering enthusiasm and the flexibility of his supervision are truly admirable.

My sincerest gratitude goes as well to Dr. Lily Ohana for her invaluable insights and guidance

during the project and for all her help with protocol development and manuscript editing. I am

also very grateful to my committee members, Dr. Christopher Yip and Dr. Jonathan Rocheleau

for their valuable feedback and thorough insight on my research.

Additionally, I would like to thank all my colleagues in the Sone Lab: Bryan Quan, Alex Lausch,

Kyle Serkies, Jason Miklas, Mikhael Burke, Erin McNeill, Callie Bazak, Zachariah Grodzinski

and Catherine Tran for their in-depth discussions and for maintaining a fun and motivating

atmosphere in the lab. Special thanks go to Bryan and Alex for their constant willingness to help

and answer questions. Thank you as well to Kyle and Trevor Gilbert for collecting the mussels.

I would also like to acknowledge Zahra Mirzaei for her advice and assistance concerning

numerous protocols, Douglas Baumann for his help with DLS and James Holcroft for his help

with electrophoretic analysis. A big thank you as well to all the labs that allowed access to their

equipment: Dr. Craig Simmons, Dr. Ben Ganss, Dr. Molly Shoichet and Dr. Walid Houry. I

would also like to express thanks to Paul Taylor, Li Zhang and Reynaldo Interior for their advice

on proteomic analysis.

Last but not the least, I would like to thank my family, my parents and my little sister, for always

believing in me, every step of the way. With their tremendous love, patience, support and

encouragement, everything becomes possible.

Page 4: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

iv

Table of Contents

Acknowledgements ...................................................................................................................................... iii

List of Tables .............................................................................................................................................. vii

List of Figures ............................................................................................................................................ viii

List of Abbreviations ................................................................................................................................... xi

List of Appendices ...................................................................................................................................... xii

Chapter 1: Introduction ................................................................................................................................ 1

1.1 Background ......................................................................................................................................... 1

1.1.1 Introduction to zebra mussels ...................................................................................................... 1

1.1.2 The zebra mussel byssus .............................................................................................................. 3

1.1.3 Byssal composition in marine mussels ........................................................................................ 6

1.1.4 Byssal composition in zebra mussels ......................................................................................... 10

1.1.5 Protein identification in the zebra mussel byssus ...................................................................... 11

1.1.5 Comparison of zebra mussel adhesion with adhesion in other species ...................................... 14

1.1.6 Comparison of zebra mussel and quagga mussel byssal proteins .............................................. 17

1.2 Motivation ......................................................................................................................................... 18

1.2.1 Mussel inspired bioadhesives ..................................................................................................... 18

1.2.2 Targeted anti-fouling strategies ................................................................................................. 19

1.3 Objectives ......................................................................................................................................... 20

1.4 Overview ........................................................................................................................................... 21

Chapter 2: Adhesive Mechanisms in Freshwater Zebra Mussels: Identification and Sequence Analysis of

Novel Proteins ............................................................................................................................................. 22

2.1 Abstract ............................................................................................................................................. 23

2.2 Introduction ....................................................................................................................................... 24

2.3 Methods............................................................................................................................................. 26

2.3.1 Protein extraction from induced byssal threads ......................................................................... 26

2.3.2 Dialysis, lyophilization and quantification of protein samples .................................................. 27

2.3.3 Amino acid analysis ................................................................................................................... 27

2.3.4 Tricine polyacrylamide gel electrophoresis (Tricine-PAGE) and gel silver-staining ................ 28

2.3.5 Digestion of protein gel bands ................................................................................................... 28

2.3.6 Liquid chromatography – tandem mass spectrometry (LC-MS/MS) ......................................... 29

2.3.7 Sequence data analysis ............................................................................................................... 29

Page 5: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

v

2.4 Results ............................................................................................................................................... 30

2.4.1 Optimal conditions for zebra mussel protein extraction and analysis ........................................ 30

2.4.2 Identification of novel foot proteins in the zebra mussel byssus ............................................... 32

2.4.3 Comparisons of LC-MS/MS derived sequences of Dpfp1, Dpfp2 and Dpfp5 .......................... 35

2.4.4 Sequence properties of the EST-derived sequence of the novel Dpfp5 protein ......................... 37

2.4.5 Sequence properties of the EST-derived sequence of Dpfp2 ..................................................... 40

2.5 Discussion ......................................................................................................................................... 44

2.6 Acknowledgments ............................................................................................................................. 51

Chapter 3: Novel Proteins Identified in the Insoluble Byssal Matrix of the Freshwater Zebra Mussel

Dreissena polymorpha ................................................................................................................................ 52

3.1 Abstract ............................................................................................................................................. 53

3.2 Introduction ....................................................................................................................................... 54

3.3 Methods............................................................................................................................................. 57

3.3.1 Protein extraction from induced byssal threads/plaques ............................................................ 57

3.3.2 Protein digestion ........................................................................................................................ 58

3.3.3 Liquid chromatography – tandem mass spectrometry (LC-MS/MS) ......................................... 58

3.3.4 Database matching and protein identification ............................................................................ 59

3.3.5 Sequence data analysis ............................................................................................................... 60

3.4 Results and Discussion ..................................................................................................................... 61

3.4.1 Identification of novel and known proteins in base-insoluble thread and plaque matrices........ 61

3.4.2 Sequence properties of novel byssal proteins identified in the insoluble extracts ..................... 66

3.4.3 Proline and tyrosine (P, Y) rich proteins .................................................................................... 67

3.4.4 Glycine rich proteins .................................................................................................................. 71

3.4.5 Proline and Cysteine (P, C) rich proteins ................................................................................... 78

3.4.6 Analysis of the set of zebra mussel byssal proteins identified in the insoluble matrix .............. 80

3.5 Conclusion ........................................................................................................................................ 85

3.6 Acknowledgements ........................................................................................................................... 85

Chapter 4: Conclusions, Preliminary work and Future Directions ............................................................ 86

4.1 Summary and Conclusions................................................................................................................ 86

4.2 Preliminary Additional Studies ......................................................................................................... 88

4.2.1 Comparing zebra and quagga mussel byssal proteins ................................................................ 88

4.2.2 Peptide mimics: an insight into byssal protein interactions ....................................................... 90

4.3 Future work ....................................................................................................................................... 91

Page 6: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

vi

4.3.1 Identification of other novel zebra mussel byssal proteins ........................................................ 92

4.3.2 Determining protein distribution within the byssal plaque ........................................................ 92

4.3.3 Characterizing structure and chemical reactivity of byssal proteins .......................................... 93

4.4 Significance and Conclusions ........................................................................................................... 95

Appendix A: Quagga Mussel Adhesion: Novel Proteins and their Byssal Distribution ............................ 96

Appendix B: Peptide Mimics of the Zebra Mussel Byssal Protein Dpfp1 .............................................. 102

References ................................................................................................................................................. 108

Page 7: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

vii

List of Tables

Table 1-1. Summary of the location, function, prominent amino acid content and sequence properties of

known marine mussel byssal proteins. Each protein is summarized from one of three marine mussel

species; Mytilus edulis (Me), Mytilus californianus (Mc) and Mytilius galloprovincialis (Mg). .................. 9

Table 1-2. Zebra mussel foot proteins Dpfp1 - 3 ........................................................................................ 12

Table 1-3. Tandem repeat sequences from marine mussel, freshwater mussel and other aquatic species.. 16

Table 2-1. Summary of molecular weight, DOPA content and sequence information of the three known D.

polymorpha foot proteins (Dpfp) ................................................................................................................ 25

Table 2-2. Comparisons of the amino acid compositions in mole % (number of residues per 100 residues)

in zebra mussel mature and induced thread/plaques and in soluble thread and plaque extracts. ................ 32

Table 2-3. Comparisons of three zebra mussel byssal proteins sequenced by LC-MS/MS. ....................... 36

Table 3-1. Summary of the molecular weight and sequence information available for the six identified

zebra mussel byssal proteins (Dpfp), in decreasing order of their molecular weights as determined by gel

electrophoresis. ........................................................................................................................................... 57

Table 3-2. Sequences of novel byssal proteins identified in insoluble plaque and thread extracts by LC-

MS/MS analysis and database matching against a zebra mussel foot protein cDNA library. The proteins

have been named Dpfp6 – Dpfp12 in decreasing order of their molecular weights (MW). ....................... 64

Table 3-3. Theoretical mole % compositions of prominent amino acids found in the sequences of

previously known zebra mussel byssal proteins and novel byssal proteins indentified in the insoluble

byssal extracts. ............................................................................................................................................ 67

Page 8: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

viii

List of Figures

Figure 1-1. Underwater attachment of a zebra mussel to an aquarium wall by means of its proteinaceous

byssus, secreted by the ‘Foot’. ...................................................................................................................... 3

Figure 1-2. Illustration of the structure of the zebra mussel byssus and its adhesive plaque-substrate

interface, adapted with permission [6]. (A) Schematic of the macrostructure of the byssus. (B)

Transmission Electron Microscopy image of the plaque-substrate interface depicting a 10-20 nm

interfacial adhesive layer. ............................................................................................................................. 4

Figure 1-3. Illustration of possible DOPA-mediated interactions in the mussel byssus, adapted from [17].6

Figure 1-4. Spatial distribution of byssal proteins in the marine mussel Mytilus edulis; reproduced with

permission [19]. ............................................................................................................................................ 8

Figure 1-5. Pattern of tandem consensus repeats in the primary sequence of foot-derived Dpfp-1

(AF265353.1). The 22 N-terminal repeats of P(V/E)YP(T/S)(K/Q)X are italicized and the 15 highly

conserved C-terminal repeats of KPGPYDYDGPYDK are bolded. .......................................................... 13

Figure 2-1. Electrophoretic identification of zebra mussel byssal proteins. (A) Byssal proteins identified

in an extract from 65 complete byssal threads (Byssal T/P). (B) Byssal proteins identified in the extracts

from 100 separated threads and 100 separated plaques.. ............................................................................ 34

Figure 2-2. Alignment of the multiple EST sequence matches derived for the Dpfp5 gel band ................ 38

Page 9: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

ix

Figure 2-3. Illustration of the consensus sequence repeats identified in the EST derived sequence of

Dpfp5 (AM230139).. .................................................................................................................................. 39

Figure 2-4. Alignment of the multiple EST sequence matches derived for the Dpfp2 (22 kDa) gel band. 41

Figure 2-5. Illustration of the tandem repeat pattern identified in the EST derived sequence of Dpfp2

(AM229730). (A) Sequence depicting five full repeats of a 22 residue consensus sequence

KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where non-italicized residues represent highly conserved

residues. (B) Kyte-Doolittle hydropathy plot of the sequence.. .................................................................. 43

Figure 3-1. Sequence analysis of the EST derived sequence of Dpfp6 (AM229723) (A) Repeat pattern of

the consensus sequence KPGPYDYDGPYDK. (B) Sequence alignment of the Dpfp6 sequence with the

C-terminus (residues 230 – 430) of previously described byssal protein Dpfp1 (AF265353).. ................. 69

Figure 3-2. Illustration of the pattern of sequence repeats in the clustered EST derived sequence of

Dpfp12. ....................................................................................................................................................... 71

Figure 3-3. Sequence alignment of the EST derived sequences of Dpfp7 (A) Alignment of the three

variants of Dpfp7 (Dpfp7α, β and γ) amongst each other. (B) Alignment of the Dpfp7α sequence with the

EST derived sequence of Dpfp5 (AM230139) described previously ......................................................... 74

Figure 3-4. Illustration of repeat patterns in the EST derived sequence of Dpfp9 obtained by clustering the

Dpfp9α and Dpfp9β sequences.. ................................................................................................................. 77

Figure 3-5. Sequence alignment of cysteine containing byssal proteins; Dpfp10, Dpfp11α and previously

described Dpfp5 (AM230139) [73].. .......................................................................................................... 80

Page 10: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

x

Figure A-1. Electrophoretic identification of quagga mussel byssal proteins.. .......................................... 97

Figure A-2. Electrophoretic determination of the distribution of quagga mussel byssal proteins between

thread and plaque.. ...................................................................................................................................... 99

Figure A-3. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF

MS) analysis of the quagga mussel thread and plaque from an induced, freshly secreted byssal thread. 100

Figure B-1. Circular Dichroism spectrum of a 2:1 Fe3+

: fusion peptide solution in BisTris buffer. ........ 104

Figure B-2. Dynamic Light Scattering measurements of the effect of iron (III) to fusion peptide ratio on

size of aggregates formed. ........................................................................................................................ 105

Figure B-3. Transmission Electron Microscopy (TEM) images depicting the effect of two Fe3+

: fusion

peptide (FP) ratios (2:1 and 1:2) on the size of aggregates formed.. ........................................................ 106

Page 11: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

xi

List of Abbreviations

DOPA: 3, 4 – Dihydroxyphenylalanine

preCOLS: Prepepsinized collagens

Tmp: Thread matrix protein

Mfp: Mussel foot protein (marine)

Dpfp: Dreissena polymorpha foot protein

Dbfp: Dreissena bugensis foot protein

SDS-PAGE: Sodium dodecylsulfate polyacrylamide gel electrophoresis

AU-PAGE: Acetic acid urea polyacrylamide gel electrophoresis

Tricine PAGE: Tricine polyacrylamide gel electrophoresis

MALDI-TOF MS: Matrix-assisted laser desorption ionization time-of-flight mass spectrometry

LC-MS/MS: Liquid chromatography - tandem mass spectrometry

PFF: Peptide fragment fingerprinting

EB: Extraction buffer

MW: Molecular weight

pI: Isoelectric point

Page 12: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

xii

List of Appendices

Appendix A: Quagga Mussel Adhesion: Novel Proteins and their Byssal Distribution………96

Appendix B: Peptide Mimics of the Zebra Mussel Byssal Protein Dpfp1……………………..102

Page 13: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

1

Chapter 1:

Introduction

1.1 Background

1.1.1 Introduction to zebra mussels

Zebra mussels (Dreissena polymorpha) are small freshwater bivalves that are able to adhere

strongly to a variety of substrates in their underwater habitats. The species is native to the Black,

Caspian and Azov Seas but was able to spread through several parts of Europe by the early

1800’s by means of man-made canals and attachments to hulls of shipping vessels [1]. This

invasive species was accidentally introduced into the Great Lakes in the late 1980s, likely by a

transoceanic shipping vessel, and has rapidly spread through North American water bodies ever

since [2]. Over the years, the zebra mussels have had major negative impacts on the economy

and ecology in the Great Lakes and waterways [2]. Owing to their ability to stick to the shells of

other mussels and form layered clumps, the mussels are able to clog water intake and distribution

pipes in industrial and domestic facilities and are able to completely encrust boat hulls [1]. They

also interfere with boating and navigation, foul beaches, increase corrosion and contaminate

potable water [1]. Their impact on the ecosystem includes the displacement of native bivalve

species and destruction of fish habitats and populations by disrupting the food web [1]. The

invasive zebra mussels are therefore a major source of biofouling in North American water

bodies.

Mature zebra mussels are sessile organisms that attach to surfaces by means of a proteinaceous

structure called the byssus that consists of a number of threads with adhesive pads called plaques

at the tips. Shells of mature mussels are typically around 2.5 cm in length though larger shells

can be up to 4cm long [1]. Striped markings on the shell give these mussels their name. These

markings can however vary and hence the species is called polymorpha which means ‘many

forms’ [1]. Zebra mussels are dioecious species, having two sexes, which release gametes into

the water column for fertilization to occur. The fertilized eggs form larvae called velligers within

Page 14: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

2

3 – 5 days [3]. Velligers have a structure called the vellum that enables rapid mobility and

dispersion to new areas and they reach adulthood and sexual maturity within a year [3]. Zebra

mussels are filter feeders that primarily feed by filtering algae, zooplankton and other organic

matter through their inhalant and exhalant siphons [1] (Figure 1-1).

On the ventral side of the shell, zebra mussels have an opening called the pedal gape that allows

for extension of the holdfast called the byssus and a muscular organ called the ‘foot’ (Figure 1-

1). The ‘foot’, for one, helps in mussel locomotion but more importantly, it houses the gland that

produces, stores and secretes the byssal precursor proteins [1]. During byssogenesis, the byssal

precursors are secreted into a cleft at the base of the foot called the ventral groove. The byssal

threads are then formed from the precursors by contractions of the foot and are deposited onto

the substrate for attachment, through the pedal gape [4]. Figure 1-1 shows the ventral side of a

zebra mussel sticking to the wall of an aquarium. The image displays the ‘foot’ extending from

the pedal gape and even a mature byssus mediating underwater attachment. When the mussel

chooses to move or get rid of a damaged byssus, it secretes enzymes that break down byssal

material at the proximal end near the foot and then discards the byssus [1].

The closely related freshwater mussel, Dreissena bugensis or quagga mussel is also a source of

biofouling in the North American Great Lakes and waterways and also utilizes a byssus to

mediate adhesion [5]. It can be distinguished from the zebra mussels based on several physical

characteristics including size, shell shape, ventral angle and position of the pedal gape [1]. Most

prominently, the bottom surface of the quagga mussel shell is convex whereas in zebra mussels,

it is concave or flat. Therefore, when placed on a flat surface, the zebra mussel stays upright but

the quagga mussel topples or tilts to a side [1].

Page 15: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

3

Figure 1-1. Underwater attachment of a zebra mussel to an aquarium wall by means of its

proteinaceous byssus, secreted by the ‘foot’.

1.1.2 The zebra mussel byssus

The zebra mussel byssus is the proteinaceous apparatus employed by the mussel to anchor to a

variety of substrates underwater [1]. A schematic of the byssus is illustrated in Figure 1-2A,

depicting a number of threads with adhesive plaques at the tips. The byssus (~ 10 mm long from

stem to plaque) [6] consists of up to 600 threads that bundle together at the proximal end into a

stem that arises from an attachment point in the foot, called the root [1]. The attachment of zebra

mussel threads to the stem as a bundle is different than the attachment in marine mussels where

the threads attach along the sides of the stem [6], [7]. Each zebra mussel byssal thread (~ 20 – 50

µm wide) has an interior composed of longitudinal fibers, an exterior cuticle and an adhesive

Page 16: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

4

plaque (~ 1 mm wide) at the distal tip that has a fibrous morphology and can be either dense or

porous [6]. The ultrastructure of the plaque-substrate interface reveals an approximately 10-20

nm adhesive layer that makes direct and continuous contact with the surface and is left behind on

the substrate when the bulk plaque matrix is pulled off [6]. This layer also shows greater electron

density during transmission electron microscopy as compared to the rest of the plaque [6].

A. B.

Figure 1-2. Illustration of the structure of the zebra mussel byssus and its adhesive plaque-

substrate interface, adapted with permission [6]. (A) Schematic of the macrostructure of the

byssus. (B) Transmission Electron Microscopy image of the plaque-substrate interface depicting

a 10-20 nm interfacial adhesive layer.

While the zebra mussel and closely related quagga mussel are two of the few freshwater mussel

species known to possess a byssus, marine mussels, such as Mytilus edulis, M. californianus, M.

galloprivincialis, Perna canaliculus and Brachiodontes exustus are more commonly known to

possess byssi. Interestingly, the freshwater and marine mussels have evolved independently of

Plaque

Substrate

Substrate

Adhesive

Layer

Page 17: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

5

each other, as different subclasses, [3], [8] to develop a byssus that is superficially very similar,

however the overall composition and distribution of amino acids within the byssus varies

between the species [9], [10]. One of the most notable similarities between the dreissenid and

mytilid species is that both their byssi are composed of proteins containing the rare amino acid 3,

4 – dihydroxyphenylalanine (DOPA). The fact that the species have evolved independently to

incorporate this same residue indicates that DOPA must play an important role in mussel

adhesion [10]. DOPA is a post-translational modification of tyrosine formed by catechol oxidase

mediated hydroxylation of tyrosine [9]. In the marine mussel byssus, DOPA is known to undergo

a number of different kinds of interactions and is responsible for both adhesive and cohesive

functions in the byssus (Figure 1-3). In its native form DOPA can bind to metal oxide surfaces

and thereby mediate mussel adhesion to surfaces [11]. Additionally, as demonstrated in marine

mussel proteins [12], DOPA can form bis- or tris- complexes with iron (III) ions and thus

mediate metal mediated cohesive cross-links among DOPA containing proteins (Figure 1-3).

Such interactions lead to the hardness and cohesive strength of the cuticle [13] and can also

explain a method of adhesion to iron containing surfaces [14]. Thirdly, DOPA frequently gets

oxidized to DOPA quinone by the enzyme catechol oxidase and by means of basic pH conditions

(Figure 1-3). DOPA quinone can then undergo cross-linking with other DOPA and lysine

residues [15] and can form cysteinyldopa adducts with cysteine residues [16] thereby leading to

extensive covalent cross-linking and cohesion among byssal proteins [9].

Page 18: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

6

Figure 1-3. Illustration of possible DOPA-mediated interactions in the mussel byssus, adapted

from [17].

1.1.3 Byssal composition in marine mussels

The marine mussels have been studied much more extensively than zebra mussels and hence, a

majority of the information available on byssal protein compositions has been derived from these

saltwater species, especially from the Mytilus family [18]. In the marine mussel byssus, the

thread has a very distinct protein composition from the plaque. While the majority of the DOPA-

containing proteins are present in the plaque, the thread is majorly composed of collagen-like

load bearing proteins called preCOLS (prepepsinized collagens). The DOPA containing proteins

in the byssus are often heavily cross-linked and render the mature structure quite resistant to

extraction. Hence, these are frequently first identified as precursor proteins in the secretory

organ, the foot, and are then subsequently studied in the byssus. Since a majority of the proteins

Page 19: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

7

are isolated in the foot, these are named foot proteins (‘fp’), preceeded by the first letters of the

genus and species from which they were isolated and succeeded by a numerical identifier. For

example, Mefp-1 and Mcfp-1 are DOPA-containing proteins in Mytilus edulis and Mytilus

californianus, respectively. While the same numerical identifier is generally indicative of similar

protein sequences, byssal functions and byssal distributions within the mytilid species, the

identifiers do not correlate directly to freshwater mussel proteins.

The distribution of proteins within the mussel byssus is often very closely related to their byssal

roles. Figure 1-4 illustrates the distribution of DOPA-containing and collagenous proteins within

the byssus of M. edulis [19]. As mentioned earlier, most of the DOPA-containing proteins are

present in the plaque where they can play different roles; as adhesive between plaque and

substrate, as structural matrix cohesion proteins, as linker proteins between thread and plaque

and as protective proteins in the cuticle [18]. Thus, within the plaque itself, the foot proteins can

be distributed in the footprint (mfp-3, 5, 6), within the plaque foam (mfp-2, 4) or in the plaque

cuticle (mfp-1) [18] (Figure 1-4). Table 1-1 summarizes the byssal distribution and functions of

the six marine mussel foot proteins (mfp’s) identified thus far. While homologs of each protein

have been identified in different mussel species, Table 1-1 summarizes just one homolog

belonging to the M. edulis (Me), M. californianus (Mc) or M. galloprovincialis (Mg) species. In

order to better correlate the byssal protein distributions and functions with the sequence

properties of the protein, Table 1-1 also describes the molecular weight (MW), isoelectric point

(pI), DOPA content and prominent amino acid compositions within each of the foot proteins.

Thus, the adhesive proteins, mfp-3 and mfp-5 have the highest DOPA content (> 20%), followed

by the cuticle protein mfp-1 which has approximately 13% DOPA and then followed by the

structural bulk matrix proteins which have < 5% DOPA [20]. Additionally, the plaque footprint

proteins are generally smaller than the rest of the byssal proteins (Table 1-1) [20].

The proteins of the thread, the preCOLS and the thread matrix proteins (TMPs), are also

described in Table 1-1. PreCOLS are composed of a bent collagen core with variable flanking

sequences and histidine rich sequences at the termini [21]. At the distal end near the plaque,

preCOL-D with silk fibroin-like flanking sequences provides the mechanical stiffness needed at

the substrate [22]. At the proximal end near the mussel, preCOL-P with elastin-like flanking

Page 20: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

8

sequences provides the mechanical flexibility needed near the foot tissue [23]. This gradient

distribution of preCOLS thus allows optimal substrate adhesion without damaging the foot tissue

[9]. A non-gradient preCOL called preCOL-NG, believed to mediate the preCOL-P/D fibers, has

glycine rich flanking sequences and is distributed uniformly through the thread [24].

Additionally, thread matrix proteins (TMPs) have been identified in the thread that provide a

viscoelastic matrix to separate and possibly lubricate collagenous microfibrils during tension-

induced deformation [25].

Figure 1-4. Spatial distribution of byssal proteins in the marine mussel Mytilus edulis;

reproduced with permission [19].

Page 21: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

9

Table 1-1. Summary of the location, function, prominent amino acid content and sequence properties

of known marine mussel byssal proteins. Each protein is summarized from one of three marine

mussel species; Mytilus edulis (Me), Mytilus californianus (Mc) and Mytilius galloprovincialis (Mg).

(Phospho) and (hydroxy) indicate phosphoserine and hydroxyarginine modifications, respectively.

Marine

Mussel

Protein

Byssal

Distribution

Byssal

Function

Prominent Amino Acids (mol%) MW

(kDa), pI

Species

[Ref.] 1

st 2

nd 3

rd DOPA

[26]

Mfp-1

Cuticle on

thread and

plaque

Protects

byssal core

P (24) K (20) Y (19) ~13 ~110 kDa

pI >10

Me [27]

Mfp-2

Plaque foam Cohesion,

load

spreading

C (15) G (14) K (12) ~3 ~40 kDa

pI ~9

Me [28]

Mfp-3

Plaque

footprint

Adhesion G (20-25) Y (20-23) R (16- 21)

(hydroxy)

~20 ~6 kDa

pI >11

Me [29]

Mfp-4

Plaque foam

at thread

anchor zone

Links

plaque

proteins to

thread

collagens

H (24) V (13) N/D (11) ~5 ~ 80 kDa

pI 10.2

Mc [30]

Mfp-5

Plaque

footprint

Adhesion G (21) K (20) S (11)

(phospho)

~30 ~9.5 kDa

pI ~9

Me [31]

Mfp-6

Plaque

footprint

Restores

DOPA

adhesion,

mediates

crosslinks

[16]

Y (20) G (14)

N/D (14)

C (11) ~4 ~11 kDa,

pI 9.5

Mc [32]

Tmp-1 Thread

matrix

Separates

collagen

fibrils

G (33) Y (18) N/D (17) 3 57 kDa,

pI 9.5

Mg [25]

PreCOL-

P

Proximal

thread

Load-

bearing,

elastic

G (39) P (14) A (9) < 1 76 kDa,

pI 11.6

Me [23]

PreCOL-

D

Distal thread Load-

bearing,

stiff

G (36) A (18) P (13) < 1 78 kDa,

pI 10.1

Me [22]

PreCOL-

NG

Throughout

thread

Load

bearing

G (39)

A (15) P (11) < 1 76 kDa,

pI 8.0

Me [24]

Page 22: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

10

1.1.4 Byssal composition in zebra mussels

There are several major differences in the byssal composition of the marine mussel and zebra

mussel. A major distinction is that the marine mussel has distinct amino acid compositions

between the thread and plaque whereas the zebra mussel byssus contains similar amino acid

compositions between the byssal regions (maximum difference is 2.4 mol%) [10]. The zebra

mussel thread and plaque also have similar DOPA compositions (~0.6 mol%) and both lack

hydroxyproline which is a significant characteristic of collagenous proteins [10]. Thus, while the

marine mussel plaque and thread proteins can be classified as being DOPA-containing and

collagen-like respectively, the zebra mussel byssus contains DOPA-containing proteins all the

way through, though this total DOPA content is lower than in marine mussels [10]. Another

distinction from the marine mussel byssus is the glycosylation of serine and predominantly

threonine, specifically by O-linked galactosamines. [33]. Such glycosylation is not seen in the

marine mussels [20].

While zebra mussels do not display the spatial differences in overall byssal protein compositions

seen in marine mussels, they do maintain spatial control over DOPA oxidation and individual

protein distribution. Upon chemical maturation (within the first 24 hours of thread deposition),

loss of DOPA staining is clearly observed within the thread and plaque bulk matrix but not at the

plaque-substrate interface [10]. Thus, even as DOPA in the bulk matrix get oxidized and/or

undergo other non-covalent interactions over time, unoxidized DOPA residues are maintained at

the interface layer responsible for adhesion [10], [17]. One way that the mussel maintains this

spatial control is by controlling the levels of catechol oxidase between different regions of the

byssus [17]. The presence of higher levels of catechol oxidase in the thread and plaque interior

versus at the thread-plaque interface allows for greater cohesive cross-linking in the bulk matrix

versus greater adhesion by uncrosslinked DOPA at the interface [17] (Figure 1-3). Further,

MALDI-TOF mass spectrometry of the thread, plaque and plaque adhesive interface has

revealed spatial differences in the distribution of individual protein peaks between the byssal

regions [34]. A higher electron density of the 10 – 20 nm adhesive layer in comparison with the

bulk plaque matrix additionally suggests a distinct composition of this layer [6].

Page 23: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

11

1.1.5 Protein identification in the zebra mussel byssus

While much information is available on the protein composition of the marine mussel byssus,

such information is limited in the zebra mussel. Extensive DOPA cross-linking renders the

mature zebra mussel byssus highly resistant to biochemical characterization by techniques such

as solubilisation and immunohistochemical labelling [10], [33], [35]. While in marine mussels,

extractions from the foot and the byssus can be done with acidic buffers that are preferred for

easily oxidized DOPA proteins, the zebra mussel DOPA containing proteins are only extractible

in alkaline borate/urea buffers [33]. Using this basic extraction of proteins from the mussels foot,

Rzepecki and Waite, 1993 identified three protein bands that stained for DOPA on sodium

dodecylsulfate and acetic acid urea polyacrylamide gels (SDS-PAGE and AU-PAGE) [33].

These proteins were assumed to be byssal proteins based on their ability to stain for DOPA and

were called Dreissena polymorpha foot proteins; Dpfp1, Dpfp2 and Dpfp3 [33]. While the

extraction of proteins from the mussel foot allowed the useful identification of precursor byssal

proteins, it does not necessarily confirm that the identified proteins are present in the byssus and

does not provide any information on the distribution of these proteins between different regions

of the byssus [33]. In the same study, the presence of Dpfp1 and Dpfp2 in the byssus was thus

confirmed by acidic extraction from mature threads and electrophoresis on an AU-PAGE gel

[33]. Additionally, the analysis of separated thread and plaque extracts on this gel revealed

potential Dpfp1 bands in both the thread and plaque and a potential Dpfp2 band uniquely in the

thread, however broad smearing of the proteins did not allow any firm conclusions on their

localizations [33].

Of the three zebra mussel byssal proteins identified thus far, Dpfp1 has been best characterized.

Table 1-2 summarizes the molecular weight, pI, DOPA content and sequence information

available for Dpfp1, 2 and 3. Dpfp1, with an unusual acidic pI that makes it distinct from marine

mussel byssal proteins (Table 1-1), appears to have two protein forms which run at 76 and 65

kDa on a PAGE gel, possibly representing sequence variants or variants with differing

hydroxylation and glycosylation modifications [33]. The presence of two variants has also been

confirmed by MALDI-TOF mass spectrometry which provided more accurate MWs at 54.5 and

48.6 kDa respectively [35]. Dpfp1 and Dpfp2 (26 kDa, basic pI) have somewhat similar

maximum DOPA contents of 6.6 and 7 mol% respectively and are more easily resolved and

Page 24: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

12

isolated than Dpfp3, which runs as a cluster of small proteins in the range of 12 – 13 kDa [33].

Dpfp1 is the only zebra mussel byssal protein for which the full primary sequence is known [35].

Due to the protease resistance of the protein, the primary structure of Dpfp1 was determined by

reverse transcription of mRNA in the mussels foot into cDNA followed by deduction of the

primary sequence from overlapping cDNA sequences [35]. With Dpfp2 on the other hand, only

fragments of the protein sequence have been determined using Edman degradation analysis [33].

This analysis classified the Dpfp2 peptide fragments somewhat arbitrarily into repeats of two

motifs; K(K/T)Y(X/P)E and *Y(P/X)*(Y/K)*D where * represents a variable amino acid, Y

represents DOPA and X represents a glycosylation of serine or threonine [33].

Table 1-2. Zebra mussel foot proteins Dpfp1 - 3

Dreissena

polymorpha

foot

protein

MW by Gel

Electrophoresis

(kDa) [33]

MW by

MALDI-TOF

(kDa) [35]

Maximum

DOPA

Content

[33]

pI [33]

Sequence

Information

Available

Dpfp-1 76 and 65 54.5 and 48.6 6.6% 5.3 - 6.5 Primary sequence

[35]

Dpfp-2 26 Unknown 7% > 9 Peptide fragment

sequences [33]

Dpfp-3 12-13 Unknown Unknown Unknown None

As depicted in Figure 1-5, the deduced sequence of Dpfp1 is dominated by tandem repeats of

two consensus sequences; 22 repeats of a somewhat variable heptapeptide

P(V/E)YP(T/S)(K/Q)X at the N-terminus and 15 highly conserved repeats of a tridecapeptide

KPGPYDYDGPYDK at the C-terminus [35]. While galactosamine modifications to serine and

threonine residues are more extensive in the N-terminus, tyrosine hydroxylations to DOPA are

more frequently observed in the C-terminus [35]. Within the C-terminal repeat, the DOPA (Y)

modification is observed specifically on the first tyrosine in the KPGPYDYDGPYDK repeat and

makes up 40% of the protein’s tyrosine content [33], [35]. Additionally, the N-terminus of Dpfp1

is moderately basic (pI 8.7) and its C-terminus is quite acidic (pI 4.7). Thus, Dpfp1 has a block

polymer like structure consisting of distinct post-translational modifications, repeat patterns and

Page 25: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

13

isoelectric points between the N and C-terminus, suggesting that the segregated motifs might

play a specific role in the architecture and mechanism of assembly of the protein [35]. The

deduced Dpfp1 sequence has additionally been used to develop a recombinant version of the

protein to produce a Dpfp1 antibody in rabbits, for use in immunolocalization analysis [4].

Though Dpfp1could not be localized within the mature byssal thread (likely due to masking of

epitopes), it was identified in acid-extracted and homogenized byssal threads and in the foot

tissue specifically (not in other control tissues) thus indicating that it is a precursor protein

produced, stored and secreted in the mussels foot [4]. Additionally, in the foot, Dpfp1 was found

uniformly localized in the byssal canal in secretory granules surrounding the ventral groove and

therefore might be evenly distributed thought the byssus, thus suggesting that it might possess a

load-bearing structural role within the byssus [4].

MFSVVSFCLLAAGFGSSLGGSSDWTEKTSQSTIPTISGWSFFTTKSPLNPTLFTTKR

PEYVTLS PVYPTKI PNYTTKP PVYPTKV PEYPTKD PTYPTFKT PEYPTKV PEYPTKV

PTYPTFQT PEYPTPTKY PVYPSQS PAYPTQY PEYPSQY PVYPDQY PVYPNQY PVKQDHD

PVYPPRS PLYGWRR PVYPKKT PVYPYL PLYPGYQ PEYHRRP PVYP PVYPY DPVEDK

KPGPYDYDGPYDK NPGPYDYDGPYNK KPNPYGTDWQYDK KTGPYVPIKPDDK

KPNPYGTDWQYDK KTGPYVPDKSEDK KPGPYDYDGPYDK NPGPYDSDGPYNK

KPGPYDYDGPYDK NPGPYDYNGPYDK KPGPYDYDGPYDI KPGPYDYDVPYDK

KPDPYDTDGPYDK KTGPYVPDKPDDK KTDPYVPDVPLEP PGPLGK

Figure 1-5. Pattern of tandem consensus repeats in the primary sequence of foot-derived Dpfp-1

(AF265353.1). Consensus repeat sequences are underlined. The 22 N-terminal repeats of

P(V/E)YP(T/S)(K/Q)X are italicized and the 15 highly conserved C-terminal repeats of

KPGPYDYDGPYDK are bolded [35]. The N-terminal signal peptide is bolded and italicized.

Page 26: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

14

Complementing the identification of comparatively high molecular weight byssal proteins by gel

electrophoresis, MALDI-TOF MS analysis of the zebra mussel byssus has revealed the presence

of several low molecular weight byssal proteins ranging from 3.7 to 7 kDa [34]. While this

analysis does not characterize individual proteins, it identifies distinctive differences in the

distribution of byssal proteins between thread, plaque and plaque footprint [34]. Interestingly, in

spite of similar amino acid compositions between the plaque and thread, the thread and plaque

bulk have mass spectra that are almost completely non-overlapping except for peaks at 4.5 and

4.6 kDa [34]. Also, as supported by differences in electron density between the plaque bulk and

plaque-substrate interface [6], the plaque footprint has proteins in the 5.8 to 7 kDa range that are

absent in the plaque bulk. Interestingly however, some of these interface protein spectra also

overlap with the thread bulk spectra [34]. The presence of these proteins in the plaque footprint

and thread bulk but not in the plaque bulk thus indicates a high level of spatial control in the

byssus [34]. Significantly as well, there are a number of protein spectra that are identified

uniquely in the plaque footprint including a peak at 5892 Da and another at 6399 Da that has a

hydroxylated counterpart which likely contains a DOPA modification [34]. These could very

well represent byssal proteins with adhesive functions as witnessed with the low molecular

weight proteins in marine mussels (Table 1-1) [20], [34]. A comparison of overall protein

secondary structure between thread, plaque and plaque interface additionally revealed the

predominance of β-sheets between all three regions [14].

1.1.5 Comparison of zebra mussel adhesion with adhesion in other species

In addition to the marine and freshwater mussels, underwater adhesion is a characteristic of

several aquatic species including sandcastle worms [36], barnacles [37], starfish [38], sea

cucumbers [39] and caddisfly larvae [40]. Comparisons of protein compositions between species

can thus be useful in determining common prerequisites of water-resistant adhesion. The

sandcastle worm, Phragmatopoma californica, which builds its habitat by sticking sand grains

together underwater, has also evolved independently of the mussels (as a different phylum) to

incorporate DOPA residues in its cement proteins [41]. In the sandcastle worms however,

adhesion/cohesion from DOPA-containing proteins is complemented by adhesion from proteins

Page 27: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

15

significantly rich in phosphorylated serine [41]. Interestingly as well, unlike marine mussels

which generally possess basic adhesive/cohesive proteins, the sandcastle worm has a mix of

acidic and basic cement proteins [42] which makes it comparable to the zebra mussel byssal

mixture (albeit currently incomplete) of acidic and basic proteins.

One of the most significant similarities between the zebra mussel byssal protein Dpfp1 and

adhesive proteins from other species is the presence of prominent tandem repeats in the protein

primary sequences. Table 1-3 displays a list of consensus sequences that are characteristic of

different byssal proteins from different marine mussel species [20] and from some less-studied

freshwater mussel species including Limnoperna fortunei [43] and the quagga mussel Dreissena

bugensis [5]. Numbers in brackets beside the sequence indicate the number of consensus repeats

in the protein. Table 1-3 also describes consensus sequences belonging to adhesives from the

sandcastle worm and the liver fluke Fasciola hepatica [44]. Adhesive proteins from the vitellaria

of the liver fluke are responsible for egg shell hardening and also contain significant amount of

DOPA in their sequences [44]. Tandem repeats are also seen in spider silk proteins, such as those

based on fibroin sequences, to impart mechanical strength to the spider web, thus further

specifying the importance of such repeats in load bearing functions [45].

Page 28: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

16

Table 1-3. Tandem repeat sequences from marine mussel, freshwater mussel and other aquatic

species. X represents variable residues, Y represents DOPA and P represents (di)hydroxyproline

modifications.

Species Protein Consensus sequence (# of repeats) Reference

Marine mussels

Mytilus edulis Mefp-1 AKPSYPPTYK (80) [27], [20]

Mefp-2 Epidermal Growth Factor (EGF) motifs (11) [28], [20]

Mefp-3 (R/N)RY (4) [29], [20]

Mefp-5 YK (8) [31], [20]

PreCOL-P Collagen flanked with elastic domains [21]

PreCOL-D Collagen flanked with silk fibroin like domains [21]

PreCOL-NG Collagen flanked with glycine rich cell wall

protein like domains

[21]

Mytilus californianus Mcfp-1 PKISYPPTYK [46]

Mcfp-4 HVHTHRVLHK (36); DDHVNDIAQTA (16) [30]

Mcfp-6 No major repeats [32]

Mytilus

galloprovincialis

Tmp-1 GYG [25]

Perna canaliculus Pcfp-1 PYVK [47]

Aulocomya ater AGYGGXK [48]

Brachiodontes exustus GKPSPYDPGYK [49]

Freshwater mussels

Dreissena bugensis Dbfp1 DKYPGGGN [5]

Dreissena polymorpha Dpfp1 PVYPTKX (22), KPGPYDYDGPYDK (15) [35]

Dpfp2 K(K/T)Y(X/P)E, XY(P/X)X(Y/K)XD [33]

Limnoperna fortunei KPTQYSDEYK [43]

Other aquatic species

Phragmatopoma

californica

(Sandcastle worm)

Pc-1 VGGYGYGKK [42]

Pc-2 HPAVXHKALGGYG [42]

Pc-3 [S]nY where S is often phosphorylated [42]

Fasciola hepatica

(Liver fluke)

GGGYGGYGK [44]

Page 29: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

17

1.1.6 Comparison of zebra mussel and quagga mussel byssal proteins

The freshwater quagga mussel byssus has also previously been characterized for protein

composition. Similar to zebra mussels, extraction of precursor proteins from the mussel foot and

staining them for DOPA led to the identification of four quagga mussel byssal proteins

(Dreissena bugensis foot proteins or Dbfps) called Dbfp0, 1, 2 and 3 [33]. Three of these, Dbfp1

(80 and 69 kDa), Dbfp2 (22 kDa) and Dbfp3 (12-13 kDa) have molecular weights similar to the

three identified Dpfp proteins (Table 1-2) and could therefore represent homologs of the zebra

mussel byssal proteins [33]. Mass spectrometry analysis reveals a single Dbfp1 peak at 68 kDa

as compared to Dpfp1 peaks at 48.6 and 54.5 kDa. The Dbfp0 protein with a molecular weight

greater than 200 kDa was however not identified as a homolog in the zebra mussel extract.

Additionally, biochemical characterization of Dbfp1 revealed that, like Dpfp1, it is also a

tandemly repetitive, acidic DOPA-containing protein containing N-acetylgalactosamine

glycosylations O-linked to threonine residues but, it has a much lower DOPA content of 0.55 ±

0.35 mol% compared to the 6.6% DOPA seen in Dpfp1 [5]. N-terminal sequencing of pepsin

degraded peptide digests of Dbfp1 revealed a primary sequence partly composed of unique

octapeptide consensus repeats of DKTPGGGN [5] that differ from the repeats seen in Dpfp1

[33], [35] (Table 1-3). This is different than in marine mussels which generally have much

sequence homology between byssal proteins from congeneric species [5]. Additionally, while

both Dpfp1 and Dbfp1 have high contents of Asx, Pro, Gly, Tyr and Lys, Dbfp1 has

approximately 1.8 times less proline, about 3.5 times more glycine and about 1.7 times more

tyrosine than Dpfp1 thus indicating prominent differences between the proteins [5], [33]. Similar

to zebra mussels, overall amino acid analysis of quagga mussel thread and plaques reveals

similar DOPA contents of 0.10 and 0.12 mol% respectively [5]. These values however are less

than the DOPA content of approximately 0.6 mol% seen in zebra mussel thread and plaques,

thus indicating potentially comparable differences in the DOPA dependence of their proteins [5].

Overall, comparisons of zebra mussel adhesion with adhesion in other freshwater mussels can

give additional useful insights into sequence features that are important to its adhesion/cohesion.

Page 30: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

18

1.2 Motivation

There are two long-term motivations for studying zebra mussel byssal adhesion. Firstly, since the

mussels are able to adhere strongly to a variety of substrates underwater, they are an inspiration

for the development of biological adhesives that are water-resistant, biocompatible and have high

adhesive strength. A second motivation that is more specific to zebra mussels is the development

of anti-fouling strategies targeted specifically at this biofouling species.

1.2.1 Mussel inspired bioadhesives

Firstly, mussel adhesive proteins are an inspiration for the development of biological adhesives

for medical and dental applications such as cell and tissue adhesives and as surgical glues for

sealing soft and hard tissues [26]. Mussel adhesive proteins can adhere to a variety of organic

and inorganic substrates underwater including glass, plastic, metal, teflon [26] and even to living

materials such as mammalian cells and porcine skin [50]. In its native oxidation state, a single

DOPA molecule can mediate very strong, reversible and non-covalent bonding to titanium oxide

surfaces at a bond strength of several hundred piconewtons [11]. Additionally, in the mussel

byssus, mussel adhesive proteins are able to maintain adhesion/cohesion forces across surfaces

with different elastic moduli such as between the soft plaque and hard substrate [20]; a property

that would be very useful for sticking together hard and soft tissues such as gum and tooth [26].

Importantly, the mussel adhesive proteins are unlikely to invoke an immune response [50].

Taking all these factors into account, mussel inspired bioadhesives are promising as strong,

water-resistant and biocompatible adhesives with flexible/elastic adhesion [26]. Other adhesives

currently used in medicine include fibrins that are biocompatible, show rapid curing and have

high adhesion strength to tissues but have low cohesion strength, thereby limiting their

applications [51]. Cyanoacrylates provide strong adhesion and fast set times but they also have

high stiffness and release toxic byproducts including cyanoacetate and formaldehyde that can

cause acute inflammation in the tissue [52]. Other synthetic adhesives are limited in that they are

generally poor at water displacement [26].

Page 31: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

19

In the past, most of the research on mussel adhesive proteins has focused on marine mussels

[26]. In fact, commercial applications of marine mussel glues are already in use. Kollodis Inc.

sells recombinant mussel foot proteins whereas Cell-Tak prepares acetic acid suspensions of

mfp-1, 2 and 3 for promotion of in vitro cell attachment [53]. Since the freshwater zebra mussels

have evolved independently of the marine mussels, occupy different habitats, have different

byssal compositions and are able to maintain adhesion/cohesion with a much lower DOPA

content than marine mussels, they can give further insights into conserved protein properties

required for adhesion and can provide an alternative adhesive mechanism that can be mimicked

in the development of bioadhesives.

1.2.2 Targeted anti-fouling strategies

As discussed previously, zebra mussels are a major invasive, biofouling species in North

American water bodies that have had major economic impacts on water-based industries and that

have also negatively impacted local ecosystems in the lakes and waterways [1]. Their large

populations, the absence of predators, lack of competition, the high mobility of their larvae and

their ability to adhere strongly to a variety of substrates makes the problem even more acute

causing economic impacts that have been far in excess of $100 million [2].Therefore, a second

long-term rationale for studying zebra mussel adhesion is to develop anti-fouling strategies that

are affordable, non-toxic to the ecosystem and are targeted specifically at the zebra mussels.

Strategies used may include proactive strategies targeted at the veliger stage to prevent

settlement and reactive strategies that would target attached mussels [54]. Zebra mussels have

high tolerance to current chemical control strategies which include oxidizing chemicals such as

chlorine, chlorine dioxide, ozone, bromine and potassium permanaganate and non-oxidizing

chemicals such as potassium chloride most of which are often toxic to other species in the local

ecosystem as well [54]. Thus, it is important to learn more about the molecular basis of zebra

mussel adhesion such that anti-fouling agents can be developed that specifically target this

adhesive mechanism. The insights from such studies will help develop strategies that include

designing surface coatings to minimize byssal attachment, preventing byssal secretion from the

mussels foot itself or perhaps disrupting cohesive interactions within the byssus [54].

Page 32: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

20

1.3 Objectives

The overall goal of this research is to gain a better understanding of the molecular basis of

adhesion in zebra mussels so that in the future, this knowledge can be implemented in the design

of alternate mussel inspired bioadhesives and targeted anti-fouling strategies. In order to

understand mussel adhesion however, there is a need to first characterize the proteins that

constitute the byssus and identify the proteins that are responsible for its cohesive strength within

the thread and plaque and adhesive strength at the thread-plaque interface. However, owing to

extensive DOPA cross-linking within its mature structure, the zebra mussel byssus has

stubbornly evaded characterization thus far. There are thus major gaps in our understanding of

the protein composition of the zebra mussel byssus that must first be addressed before we can

proceed towards realizing our long-term aims.

The only zebra mussel byssal proteins known thus far have been identified as DOPA-staining

precursors in the mussels foot, however this reveals no information on byssal distribution and

completely overlooks any DOPA-poor or DOPA-lacking proteins in the byssus. Additionally, the

lower DOPA content of the zebra mussel byssus as compared to the marine mussel byssus

indicates that in addition to DOPA-based interactions, other DOPA independent protein

interactions must also play an important role in zebra mussel byssal adhesion/cohesion and may

contribute to the mussels ability to adhere to varied substrates, both hydrophobic and

hydrophilic.

Hence, the primary objective of this work is to identify and sequence novel proteins in the zebra

mussel byssus (both DOPA-rich and DOPA-deficient) and to determine their distribution

between different regions of the byssus. This information will allow us to identify protein

features and sequence motifs that are characteristic of zebra mussel byssal proteins and will set

the stage for characterization of the adhesive/cohesive mechanisms of byssal proteins in the

future.

Page 33: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

21

In order to achieve this objective, we perform our analysis on induced, freshly secreted byssal

threads that have minimal cross-linking and are more amenable to characterization. We can then

address our primary objective as three sub-objectives:

1. To identify, sequence and determine the spatial distribution of novel proteins in the soluble

byssal extract by performing gel electrophoresis of the thread and plaque extract followed by

tandem mass spectrometry sequencing analysis of digested protein gel bands.

2. To sequence, identify and determine the spatial distribution of novel proteins in the insoluble

byssal matrix by performing tandem mass spectrometry sequencing analysis on the digested

thread and plaque matrix.

3. To compare the primary sequence structures of zebra mussel byssal proteins in order to

identify protein characteristics and sequence motifs that are characteristic of

adhesive/cohesive proteins in the species.

1.4 Overview

This thesis consists of four chapters. Chapter 1 introduces the overall background, motivation

and objectives of the research. Chapter 2 and Chapter 3 represent manuscripts of scientific

papers that are not yet submitted. Objective 1, as described in section 1.3, is addressed in

Chapter 2: ‘Adhesive Mechanisms in Freshwater Zebra Mussels: Identification and Sequence

Analysis of Novel Proteins’. Objective 2 is addressed in Chapter 3: ‘Novel Proteins identified in

the Insoluble Byssal Matrix of the Freshwater Zebra Mussel Dreissena polymorpha’. Objective 3

is addressed through both Chapters 2 and 3. Chapter 4 summarizes the results and discussion

from the previous chapters and also relates these to preliminary studied described in Appendix A

and Appendix B. Chapter 4 additionally addresses future work that must be undertaken to

further extend our understanding of zebra mussel adhesion.

Page 34: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

22

Chapter 2:

Adhesive Mechanisms in Freshwater Zebra

Mussels: Identification and Sequence Analysis of

Novel Proteins

Arpita Gantayeta, Lily Ohana

a and Eli D. Sone

a,b,c *

a Institute of Biomaterials & Biomedical Engineering; University of Toronto, Toronto, ON,

Canada

b Department of Materials Science & Engineering, University of Toronto, Toronto, ON, Canada

c Faculty of Dentistry; University of Toronto, Toronto, ON, Canada

*Corresponding author. Email: [email protected]

This chapter is in preparation as a manuscript to be submitted to the journal ‘Biofouling’

I, Arpita Gantayet, performed the experiments and analysis and wrote the paper. Lily Ohana

assisted with protocol development for protein extraction, quantification and gel electrophoresis

and edited the paper.

Page 35: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

23

2.1 Abstract

The biofouling freshwater zebra mussels (Dreissena polymorpha) adhere to a variety of

substrates underwater by means of a proteinaceous structure called the byssus, which consists of

a number of threads with adhesive plaques at the tips, and are therefore an inspiration for

developing medical bioadhesives. The byssal proteins however remain largely uncharacterized

due to extensive 3,4-dihydroxyphenylalanine (DOPA) cross-linking which renders the mature

structure largely resistant to extraction and immunolocalization. The functions of these proteins

thus remain a mystery. We report here on the byssal distribution and sequence properties of

novel and previously known byssal proteins. We identified three novel zebra mussel byssal

proteins by performing gel electrophoresis of proteins extracted from freshly secreted threads

and plaques, in which cross-linking is minimized. LC-MS/MS sequencing analysis and cDNA

database matching revealed that the novel Dpfp5 protein is an acidic protein with a block

structure and varied repeat patterns.

Keywords: bioadhesion; plaque; threads; mussel adhesive proteins; DOPA; LC-MS/MS

Page 36: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

24

2.2 Introduction

Zebra mussels (Dreissena polymorpha) are an invasive species that were accidentally introduced

into the North American Great Lakes in the late 1980s [2]. These freshwater bivalves are able to

adhere to a variety of surfaces underwater and are therefore a major source of biofouling. They

are able to spread rapidly by anchoring to boat hulls and have had a huge economic impact on

water dependent industries and a major ecological impact on native ecosystems in the Great

Lakes [2], [1]. The mussels adhere to substrates by secreting a proteinaceous structure called the

byssus that consists of a number of threads with adhesive plaques at the tips and that is

surrounded by an exterior cuticle layer that serves as a protective lacquer [33]. Zebra mussels are

one of a few freshwater mussels known to produce a byssus and have evolved independently, as

a different subclass, than the much studied marine mussels [3], [8]. The zebra mussel and marine

mussel byssi are superficially similar [55] and are both composed of proteins containing the rare

amino acid 3, 4- dihydroxyphenylalanine (DOPA) which is produced by the enzymatic

hydroxylation of tyrosine and is responsible for varied adhesive and cohesive interactions in the

byssus [9]. DOPA can form multiple metal mediated ligations to give cohesive strength to the

cuticle [46], it can undergo covalent cross-linking with DOPA and other residues to provide

cohesive strength to the thread and plaque [15] and in its native form, DOPA can bind to metal

oxide surfaces and mediate surface adhesion [11]. The two mussel species however differ in their

overall protein compositions and amino acid distributions within the byssus and zebra mussels

even have a lower DOPA content than the marine mussels, thus indicating important roles for

other DOPA-independent interactions [9], [10]. Understanding the molecular mechanisms of

adhesion in the zebra mussel byssus may thus ultimately be useful in the design of alternate

water-resistant biological adhesives for medical and dental applications. This knowledge will

also be valuable in the development of non-toxic, targeted anti-fouling strategies against the

biofouling species.

The zebra mussel adhesive layer is characterized by a 10 – 20 nm thick layer at the plaque-

substrate interface that stains differently than the bulk plaque matrix and remains attached to the

substrate even when the plaque is removed [6]. MALDI mass spectrometry analysis revealed a

range of 5.8 – 7 kDa proteins in this layer, however, larger proteins were not detected, likely due

Page 37: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

25

to heavy DOPA cross-linking [34]. MS also revealed that in spite of similar amino acid

compositions between thread and plaque [10], there are differences in protein composition

between the thread and plaque bulk and between the plaque and the plaque-substrate interface

[34]. However, none of these proteins have yet been identified. In marine mussels, the byssal

thread consists of a mixture of three collagenous proteins and the plaque and cuticle contain six

different 3, 4- dihydroxyphenylalanine (DOPA) containing adhesive, linker and lacquer proteins

[18]. In zebra mussels, on the other hand, amino acid analysis has revealed that both the thread

and the plaque comprise DOPA containing proteins [10]. However, the composition and

distribution of these proteins remains largely uncharacterized due to extensive DOPA cross-

linking which renders the mature structure largely resistant to extraction and

immunolocalization. The only three precursor proteins (D. polymorpha foot proteins) identified

thus far, Dpfp-1, 2 and 3, were identified as byssal proteins based on their ability to stain for

DOPA in an extract from the mussels ‘foot’, the organ that secretes the precursor proteins that

form the mature byssus [33]. Extraction of Dpfp1 and Dpfp2 from mature byssal threads [33]

and the immunolocalization of Dpfp-1 in byssal thread extracts [4] confirmed the presence of

these proteins in the byssus. While the approximate molecular weights of Dpfp1, Dpfp2 and

Dpfp3 have been determined by gel electrophoresis [33], accurate MS mass measurements and

full sequence information has been determined only for Dpfp1 [35] (Table 2-1). Recently, a

cDNA library was created by Xu and Faisal, 2008, representing genes expressed uniquely in the

zebra mussel foot [56]. This library was also used to isolate expressed sequence tags (ESTs) that

are up-regulated or down-regulated during byssogenesis [57].

Table 2-1. Summary of molecular weight, DOPA content and sequence information of the three

known D. polymorpha foot proteins (Dpfp)

Foot Protein MW by Gel

Electrophoresis

(kDa) [33]

MW by

MALDI-TOF

(kDa) [35]

Maximum

DOPA

Content [33]

Sequence information

known

Dpfp-1 76 and 65 54.5 and 48.6 6.6% Primary sequence [35]

Dpfp-2 26 Unknown 7% Peptide fragment

sequences [33]

Dpfp-3 12-13 Unknown Unknown None

Page 38: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

26

Thus far, limited information is available on the composition and distribution of zebra mussel

byssal proteins. For one, the mature byssus is greatly resistant to analysis due to extensive DOPA

cross-linking. At the same time, extraction of precursor proteins from the foot does not reveal

any information on the distribution of proteins between different regions of the byssus. Also,

staining specifically for DOPA limits the identification of any non-DOPA containing proteins

that might also be present in the byssus. To overcome these challenges, we induce the secretion

of fresh threads such that these have minimal DOPA cross-linking and are thus less resistant to

extraction. Fresh byssal threads are induced by injecting the mussel`s foot with potassium

chloride, a method that has been used previously only in marine mussels [58], [30], [16] and now

for the first time in freshwater mussels. Using this method we are able to study protein

composition of the byssus after secretion from the foot but before extensive cross-linking. In

marine mussels, the induced byssal threads have been shown to be indistinguishable from the

natural threads [16], [25] thus allowing their study as a model system. Here, we report on our

identification of novel byssal proteins and characterization of their byssal distribution and

sequence properties. The identified byssal proteins, whether DOPA containing or non DOPA

containing, could serve different purposes within the byssus, potentially as adhesive between

plaque and substrate, as medium of cohesiveness for structural stability in the byssus and as a

varnish for protection of the byssus from degradation [9].

2.3 Methods

2.3.1 Protein extraction from induced byssal threads

Zebra Mussels were collected from Round lake, Ontario, Canada and kept for up to 60 days in an

aquarium at room temperature in artificial freshwater prepared using a recipe by M. Sprung,

1987 [59]. The mussels were dissected and the foot was injected with ~ 0.03 ml of 0.56M

Potassium Chloride (KCl) using an 18G syringe, as described by Tamarin et al., 1976, thus

leading to the induction of the byssal thread [58]. After 3-5 minutes, the induced thread/plaque

was located in the ventral groove of the foot, pulled out with tweezers, washed in a drop of

Page 39: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

27

deionized water and extracted. The extraction method was adapted from Zhao and Waite, 2006

[30] with several changes: per extraction, 6 - 14 byssal threads were extracted in 250 µL of basic

extraction buffer (EB) (0.2M sodium borate, 4M urea, 1mM KCN, 1mM EDTA, and 10 mM

ascorbic acid), prepared using a recipe adapted from Rzepecki and Waite, 1993 [33]. Samples

were homogenized on ice in a 1mL Ground Glass Hand-Held Tissue Grinder, sonicated with a

probe sonicator (15 times, 2sec. each) and centrifuged (17000 g, 8 min., 4°C) [30]. The

supernatant (soluble extract) and the pellet (insoluble matrix) were stored separately at -20°C.

Where relevant, the byssal thread was separated into thread and plaque prior to extraction.

2.3.2 Dialysis, lyophilization and quantification of protein samples

Soluble extracts from the required number of extractions were pooled together and dialyzed

against 0.15 M Sodium Borate (pH 8.1 – 8.4) to get rid of urea and basic EB components and

then against nitrogen bubbled 1% acetic acid to eliminate sodium borate and acidify the sample

before lyophilization [60]. Dialysis steps were done using a 2 kDa molecular weight cutoff

(Thermo Scientific Slide-A-Lyzer Dialysis Cassette G2 (#87718)), with stirring for 2 hours, 3

hours and overnight at 4°C in 300 times the sample volume of dialysis buffer. Dialyzed samples

were aliquoted, lyophilized (Gibson-Air ModulyoD Lyophilizer) and stored at -20°C. Protein

quantities were determined according to absorbance measurements at 280 nm (Nanodrop ND-

1000 Spectrophotometer, Thermo Scientific) of samples resuspended in deionized water.

Resuspended samples were stored in liquid nitrogen prior to use.

2.3.3 Amino acid analysis

Amino acid analysis of mature and induced byssal threads and of soluble protein extracts was

performed using a Waters Acquity UPLC Gradient and Detector and the Waters Empower 2

Chromatography Software by the Advanced Protein Technology Centre at Sick Kids Hospital,

Toronto. Samples were dried in pyrolyzed borosilicate tubes in a vacuum centrifugal

concentrator and subjected to vapour phase hydrolysis by 6N HCl with 1% phenol at 110°C for

Page 40: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

28

48 hours under a pre-purified nitrogen atmosphere. After hydrolysis, excess HCl was removed

by vacuum and hydrolyzates were washed with redrying solution and derivatized with

phenyisothiocyanate (PITC), followed by reverse phase HPLC.

2.3.4 Tricine polyacrylamide gel electrophoresis (Tricine-PAGE) and gel

silver-staining

Proteins from lyophilized samples were separated by Tricine-PAGE electrophoresis using

premade Novex 16%, 1mm, Tricine gels (Invitrogen, EC6695BOX), Novex Tricine SDS 2X

Sample Buffer (Invitrogen, LC1676) and Novex Tricine SDS 10X Running Buffer (Invitrogen,

LC1675). The gels were run at 125 V for 1.5 hours using an XCell SureLock Mini-Cell

Electrophoresis System (Invitrogen, EI0001). Silver-staining of proteins in the gel was adapted

from Mortz et al., 2001 [61]. Briefly, gels were fixed (40% ethanol, 10% acetic acid, 50% water,

1 hour), washed in deionized water (30 min.), sensitized in 0.02% sodium thiosulfate (1 min.)

and washed 3 times in water (20 sec. each). Gels were then incubated in 0.1% cold silver nitrate

solution containing 0.02 % formaldehyde (20 min, 4°C), washed 3 times in water (20 sec each),

transferred to a new tray and then washed again in water (1 min). Gels were developed in 3%

sodium carbonate containing 0.05% formaldehyde until staining was sufficient. Staining was

terminated with 5% acetic acid and gels were stored at 4°C in 1% acetic acid.

2.3.5 Digestion of protein gel bands

Silver stained protein gel bands were cleaved with trypsin at the Advanced Protein Technology

Centre, Sick Kids Hospital, Toronto. Briefly, the excised gel bands were destained by incubating

in a 1:1 mixture of 30 mM potassium ferricyanide and 100mM sodium thiosulfate (15 min.),

washing with deionized water, washing with 50 mM ammonium bicarbonate and then shrinking

with 50% acetonitrile/25 mM ammonium bicarbonate. Samples were reduced with 10 mM DTT

(30 min, 56°C) and alkylated with 100 mM iodoacetamide (15 min., dark, room temperature)

followed by shrinking with 50% acetonitrile/25 mM ammonium bicarbonate (15 min.). Samples

Page 41: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

29

were then digested with 13 ng/µL trypsin (Porcine, Sequencing Grade, Promega) overnight at

37°C and the liquid was collected. Peptides were extracted by vortexing sample separately with

25 mM ammonium bicarbonate, 5% formic acid, 100% acetonitrile, 5% formic acid and 100%

acetonitrile and all supernatants were pooled together. Extracted peptides were lyophilized by

SpeedVac centrifugation and resuspended in 20uL 0.1% formic acid in water for LC-MS/MS

analysis.

2.3.6 Liquid chromatography – tandem mass spectrometry (LC-MS/MS)

LC-MS/MS analysis was performed by the Advanced Protein Technology Centre at Sick Kids

Hospital, Toronto. The digested peptides were loaded onto a 150 μm ID pre-column (Magic C18,

Michrom Biosciences) at 4 μl/min. and separated over a 75 μm ID analytical column packed into

an emitter tip containing the same packing material. The peptides were eluted over 60 min at

300 nl/min using a 0 to 40% acetonitrile gradient in 0.1% formic acid using an EASY n-LC

nano-chromatography pump (Proxeon Biosystems, Odense Denmark). The peptides were then

eluted into a LTQ-Orbitrap XL hybrid mass spectrometer (Thermo-Fisher, Bremen, Germany)

operated in a data dependant mode. MS was acquired at 60,000 FWHM resolution in the Fourier

Transform Mass Spectrometer (FTMS) and MS/MS was carried out in the linear ion trap. 6

MS/MS scans were obtained per MS cycle.

2.3.7 Sequencing data analysis

MS/MS data was searched using Mascot (Matrix Sciences, London UK) by matching against

zebra mussel and metazoa protein databases and against a zebra mussel Expressed Sequence Tag

(EST) library virtually translated in six different reading frames using Virtual Ribosome 1.1

(http://www.cbs.dtu.dk/services/VirtualRibosome/). This cDNA library, representing genes

expressed uniquely in the mussel foot, was prepared by Xu and Faisal, 2008 using a BD

Clontech PCR-Select cDNA Subtraction Kit and comprises 750 genes with Accession numbers

AM229723 to AM230448 (downloaded November 2011 from the Genbank Server) [56]. During

Page 42: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

30

creation of the library, base pairs were removed from the 5’ end of the cDNA to create blunt

ends for ligation of adaptor sequences for cDNA amplification purposes [56]. Therefore, in

several sequences where base pairs were removed from the 5’ translated region, the virtual

protein sequences are incomplete at the N-terminus [62].

MS data was visualized and validated using Scaffold 3.3.1 (Proteome Software Inc., Portland,

OR). Peptide identifications were accepted at greater than 80.0% probability and protein

identifications were accepted at greater than 95.0% probability with at least 1 identified peptide.

Parent ion and fragment ion mass tolerances were set to 20 PPM and 0.40 Da respectively and

hits were confirmed manually by inspecting the spectra. The data was searched using

carbamidomethylation as fixed modification and deamidation of asparagine and glutamine,

hydroxylation of lysine and tyrosine, oxidation of methionine, acetylation of the N-terminus and

phosphorylation of serine and threonine as variable modifications.

The theoretical mass, pI and amino acid composition of virtual EST protein matches were

determined using EMBOSS Pepstats, Kyte-Doolittle Hydropathy Plots were obtained using

EMBOSS Pepwindow and amino acid distribution metrics were determined using the EMBOSS

pepinfo tool all on the European Bioinformatics Institute website

(http://www.ebi.ac.uk/Tools/emboss/pepinfo/). Signal peptides were searched using the SignalP

4.0 (http://www.cbs.dtu.dk/services/SignalP/) and PrediSi (http://www.predisi.de/) online tools

and multiple sequence alignments were performed with the Clustal-W2 online tool

(http://www.ebi.ac.uk/Tools/msa/clustalw2/). Protein homology searches were done using NCBI

Protein BLAST (Basic Local Alignment Search Tool)

(http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins).

2.4 Results

2.4.1 Optimal conditions for zebra mussel protein extraction and analysis

To enhance protein extraction directly from the zebra mussel byssal thread, we induced the

secretion of fresh byssal threads having minimal DOPA cross-linking. Maximal extraction of

Page 43: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

31

soluble proteins from the induced byssal threads is then important to suitably analyze the

proteins. We found that the basic extractions described in the methods section were the most

effective in extracting proteins as compared to acidic extractions using acetic acid (5% or 8 %)

and 8M urea (results not shown here). The basic extractions give smaller pellets of non-soluble

extract than acidic extractions. Also, when loaded on 15% Acetic Acid Urea (AU) PAGE gels

and silver stained, basic extracts (acidified for gel compatibility) display more protein gel bands

as compared to acidic extracts which do not reveal any visible protein bands (results not shown

here). A280 absorbance readings revealed that approximately 3.6 µg of protein is extracted per

byssal thread/plaque when these are extracted in an acidic extraction buffer (5% acetic acid and

8M urea). This measurement could not be performed on basic extracts because the basic buffer

itself absorbs at 280 nm.

After extraction and homogenization, the byssal extracts are centrifuged to separate out the

soluble and insoluble proteins. The mass of protein present in soluble byssal extracts of separated

threads and plaques, that were pooled, dialyzed, lyophilized and resuspended in water, is

determined by measuring absorbance readings at 280 nm. In the thread, approximately 4.0 µg of

protein was extracted per mussel and in the plaque, ~6.7 µg was extracted per mussel. Thus

proteins are more easily extracted from the zebra mussel plaque as compared to its thread.

Amino acid analysis of the soluble thread and plaque extracts reveals that they have very similar

amino acid compositions (Table 2-2). This is consistent with the observation that the mature

thread and mature plaque have the same amino acid content [10]. Comparisons of the mature and

induced threads/plaques showed similar amino acid contents except for aspartic acid/asparagine

which has a much higher mol% in the mature (19.5%) versus the induced (6.0%) byssal threads

(Table 2-2). Since the induced byssal threads are artificially secreted by forced secretion with

KCl, this could lead to different protein compositions of induced versus mature byssal threads.

Thus, it is important to consider potential differences when using fresh induced threads as the

model system. Interestingly, amino acid analysis did not detect DOPA in any of the byssal

samples (Table 2-2), likely because oxidation of DOPA to DOPA quinone followed by DOPA

quinone covalent cross-linking makes these residues resistant to amino acid analysis [63].

Page 44: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

32

Table 2-2. Comparisons of the amino acid compositions in mole % (number of residues per 100

residues) in zebra mussel mature and induced thread/plaques and in soluble thread and plaque

extracts.

Amino Acid Mature Thread/Plaque*

Induced Thread/Plaque*

Induced Soluble Thread**

Induced Soluble Plaque**

Asx (D/N) 19.5 6.0 6.1 5.7 Glu (E/Q) 6.9 6.2 5.6 6.7 Ser (S) 3.8 6.2 10.3 11.4 Gly (G) 21.7 20.5 27.1 24.7 His (H) 0.6 1.4 3.6 4.1

Arg (R) 2.6 4.2 2.2 3.5 Thr (T) 4.8 6.3 1.9 2.7 Ala (A) 2.3 5.0 7.5 6.3 Pro (P) 6.1 6.4 8.7 8.7 Tyr (Y) 9.0 8.4 2.1 2.3 Val (V) 6.8 7.0 2.3 2.6 Met (M) 0.9 3.8 9.6 9.2 Cys (C) 0.3 1.4 1.5 - Ile (I) 5.1 5.1 4.0 3.6 Leu (L) 4.7 5.9 1.9 3.2 Phe (F) 1.8 3.0 3.7 3.1

Lys (K) 3.0 3.2 2.0 2.2 DOPA - - - -

* Amino acid analysis was performed on intact byssal threads

**Amino acid analysis was performed on protein extracts

2.4.2 Identification of novel foot proteins in the zebra mussel byssus

Proteins extracted from induced byssal threads were dialyzed and lyophilized as described in the

methods and then electrophoresed and visualized on a silver-stained tricine-PAGE gel. Figure 2-

1 displays the protein bands identified in the intact byssal thread/plaque (T/P) and in separated

threads and plaques. Six protein bands were visible in the byssal T/P extract (Figure 2-1A). The

two bands between 90 and 65 kDa correspond to the molecular weights of the two forms of the

previously identified Dpfp1 protein (76 and 65 kDa). The thick band between 30 and 20 kDa

corresponds to the previously known protein Dpfp2 with a molecular weight of 26 kDa. Both

Dpfp1 and Dpfp2 previously stained for DOPA in foot extracts and were identified in extracts

from mature byssal threads as well [33]. A third DOPA containing protein called Dpfp3, also

Page 45: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

33

previously seen in the foot extracts, is however not observed on the gel in Figure 2-1. The three

gel bands labeled with underlines represent novel byssal proteins that were not previously known

to be present in the zebra mussel byssus. We call these proteins Dpfp0 (>210 kDa), Dpfp4 (>90

kDa) and Dpfp5 (~30 kDa).

In the amino acid comparison between zebra mussel thread and plaque extracts, the thread and

plaque have similar amino acid contents (Table 2-2) thus indicating that they have soluble

proteins with similar amino acid compositions. When proteins were extracted from separated

threads and plaques and loaded on the same gel, a number of faint protein bands and one major

protein band (~30 kDa) was observed in each lane (Figure 2-1B). While the band corresponding

to Dpfp0 appears to be uniquely in the plaque, the very faint band corresponding to Dpfp4

appears to be absent in the plaque. However since unequal masses of protein (222 µg of thread

extract and 400 µg of plaque extract) were loaded on the gel, conclusive direct comparisons of

band densities cannot be made. The bands corresponding to Dpfp1, Dpfp2, Dpfp5 and a ~50 kDa

protein are present both in the thread and in the plaque with Dpfp5 having the most prominent

bands. In spite of loading almost twice as much plaque extract on the gel, the density of the

Dpfp5 band in the plaque is same or less as compared to the thread. Thus, in comparison, the

thread extract has a higher composition of Dpfp5.

The type of gel used for protein identification has an impact on the visualization of protein

bands. Several of the lower molecular weight proteins observed in the silver-stained Tricine

PAGE gel in Figure 2-1 were not observed on a Sodium Dodecyl Sulfate Polyacrylamide (SDS-

PAGE) gel (results not shown here). Additionally, gel bands were only seen when several protein

extracts were pooled together and lyophilized before electrophoresis. Even up to 15 byssal

threads in a single extraction were not sufficient to view bands (result not shown here) as

compared to the proteins seen when 65 byssal threads from 13 extractions of 5 byssal threads

were pooled together (Byssal T/P lane in Figure 2-1A).

Page 46: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

34

Figure 2-1. Electrophoretic identification of zebra mussel byssal proteins. Proteins were

extracted from induced, freshly secreted byssal threads, dialysed and lyophilized as described in

the methods. Lyophilized proteins were resuspended in double distilled water and loaded on 16%

Tricine PAGE gels that were then silver-stained for protein visualization. The masses in brackets

on the gel represent the mass of lyophilized protein loaded for each sample. (A) Byssal proteins

identified in an extract from 65 complete byssal threads (Byssal T/P). The leftmost lane contains

a Colorburst molecular weight ladder. Underlined proteins represent novel byssal foot proteins

that we have chosen to call Dpfp0 (>210 kDa), Dpfp4 (>90 kDa) and Dpfp5 (~30 kDa). The

other protein bands correspond to the molecular weights of previously known DOPA containing

foot proteins, Dpfp1 (76, 65 kDa) and Dpfp2 (26 kDa). (B) Byssal proteins identified in the

extracts from 100 separated threads and 100 separated plaques. Arrows indicate bands observed

on the gel, most of them corresponding to the bands seen in the byssal T/P in Figure 2-1A.

Page 47: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

35

2.4.3 Comparisons of LC-MS/MS derived sequences of Dpfp1, Dpfp2 and

Dpfp5

The protein gel bands of presumed Dpfp1 (76 kDa), Dpfp2 (26 kDa) and Dpfp5 (~30 kDa) in the

byssal T/P lane in Figure 2-1A were subjected to in-gel trypsin digestion and LC-MS/MS mass

spectrometry analysis. The protein mass spectra were then matched against known zebra mussel

proteins and against virtually translated EST sequences from a cDNA library of genes unique to

the zebra mussel foot. Dpfp1 did not match any of the EST sequences in the cDNA library but

did match the known sequence of Dpfp1 (AF265353), thus confirming its identity [35]. The mass

spectra of protein bands Dpfp2 and Dpfp5 matched to a number of EST sequences virtually

translated in any of six reading frames (±1, ±2, ±3). Table 2-3 shows a sequence match

comparison of the three sequenced proteins, Dpfp1, Dpfp5 and Dpfp2. The accession numbers

described for Dpfp2 and Dpfp5 represent the sequence match that has the highest molecular

weight corresponding to the protein’s molecular weight as identified by gel electrophoresis. The

theoretical protein masses are derived from the matched sequence and are in each case smaller

than the molecular weight seen by gel electrophoresis (Table 2-3). This discrepancy could be

because the EST sequences do not account for post-translational modifications such as tyrosine

hydroxylation to DOPA and protein glycosylations such as those seen by Rzepecki and Waite,

1993 in Dpfp1 and Dpfp2 [33]. Incomplete N-termini of the sequences, owing to limitations in

the creation of the cDNA library (see Methods), may also contribute to the inconsistency.

Significantly, while Dpfp1 is known to run at 76 and 65 kDa on the gel [33], Matrix Associated

Laser Desorption Ionization mass spectrometry indicates that the protein forms actually have

molecular weights of 54.5 and 48.6 kDa, respectively [35]. Thus the discrepancy can be greatly

attributed to an overestimation of molecular weights when the byssal proteins run on the gel.

Based on the theoretical pI, Dpfp1 and Dpfp5 are acidic proteins unlike Dpfp2 which is basic.

Also, Dpfp2 peptide matches reveal post-translational modifications in the form of glutamine

deamidation (Q) and tyrosine hydroxylation to DOPA (Y). No such modifications are detected in

the spectrum matches obtained for Dpfp1 and Dpfp5 (Table 2-3).

Page 48: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

36

The Scaffold program describes a ‘protein identification probability’ value that indicates the

probability that a deduced sequence matches the protein. Mascot peptide scores indicate the

certainty that the mass spectrum matches the respective peptide sequence. These are described in

Table 2-3. Dpfp1 and Dpfp2 have a protein identification probability of 100%. The derived

sequence of Dpfp2 (AM229739) has a high probability match and three mass spectrum matching

peptides (Table 2-3) and this sequence compares very well to the sequence fragments obtained

by automated Edman degradation of Dpfp2 by Rzepecki and Waite, 1993 [33]. Hence, we can be

very confident about the Dpfp2 – EST match. The Dpfp5 derived sequence (AM230139) has a

protein identification probability of 95% and the single spectrum match has a Mascot score of

46, thus indicating a high certainty of the sequence to spectrum match. While there is only a

single spectrum match shown here, an additional mass spectrum match

(NDVDGNENIVGGQSNAVGGK) was also observed when the EST match was identified in the

insoluble byssal matrix that is found in the pellet after centrifugation, as described elsewhere

[64], thus further supporting the protein identification.

Table 2-3. Comparisons of three zebra Mussel byssal proteins sequenced by LC-MS/MS.

Foot

Protein

MW by

Electrop

horesis

(kDa)

EST

Accession

Number

(Protein

Identification

Probability)

Mass Spectra

Matching peptide

sequences

Mascot

Peptide

Score

Theor

-etical

mass

(kDa)

Theore

tical

pI

Dpfp1 76/65 AF265353

(100%)

SPLYGWR 19 49.3 5.2

TGPYVPIKPDDK 29

TRVYPYLPLYPGYQPE

YHR

22

Dpfp5

~30 AM230139a

(95%)

YVGEGNNVGEQR 46 20.1c 6.4

c

Dpfp2 26 AM229730a

(100%) QAYPVYPEK

b 25 15.3

c 9.0

c

QSYPVYPEK b

38

YPEKPYPGYQDYWGK 26

a The representative EST sequence is the one that has the highest MW corresponding to that seen by electrophoresis.

b Q and Y signify glutamine deamidation and tyrosine hydroxylation to DOPA, respectively

c The mass and pI shown represent the protein sequence excluding the N-terminal adaptor sequence

Page 49: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

37

2.4.4 Sequence properties of the EST-derived sequence of the novel Dpfp5

protein

The sequence of the novel Dpfp5 protein has been determined for the very first time through

EST matching of peptide mass spectra. Figure 2-2 depicts the multiple sequence alignment of

four matches. The aligned sequences are quite similar with some regions that are present in all

matches (red sequence at C-terminus) and other regions that are missing or different in some

matches (blue or purple sequences at N-terminus). In all the matches, the orange sequence

represents an adaptor sequence that was inserted during the creation of the cDNA library [56]. In

all of the EST matches, no N-terminal signal peptide or methionine residue (representing start

codon) was observed, likely because the N-terminus of the sequences is incomplete due to

removal of base pairs from the 5’ end of the cDNA for adaptor ligation for cDNA cloning

purposes [56]. Premature termination of reverse transcription, due to strong mRNA secondary

structure, could also result in an incomplete 5’ cDNA end and therefore in an incomplete N-

terminal sequence [62]. An incomplete N-terminus also means that we do not know the correct

reading frame of the cDNA sequence and hence we have to theoretically translate the cDNA in

all three positive reading frames. cDNA second strand synthesis during amplification means that

we must theoretically translate from the C-terminus in the three negative reading frames as well.

The sequence matches obtained are independent of the reading frame of the virtually translated

sequence (shown in brackets beside the EST accession number). On checking for post

translational modifications (PTM), no tyrosine hydroxylation (DOPA) or other PTM was

detected.

Page 50: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

38

AM230139(+1) GRGNSISSGRPGRYNSWPPKPNQPQQPQQPQQPPQPPRYPQP-SYPAYPP 49

AM230094(+1) GRGNSISSGRPGRYNSWPPKPNQPQQPQQPQQPPQPPRYPQP-SYPAYPP 49

AM230093(-2) ------LAWSRPR-------------------------YPQQQSYPAYPP 19

AM230120(+1) GRGNSISSGRPGR------------------------------------- 13

AM230139 QQSYPAYPPKQSYPTYPPKQSYPAYPPKQSYPTNPPYNPCDAVYCRPIYC 99

AM230094 QQSYPAYPPKQ---------SYPAYPPKQSYPTNPRYNPCAAVYCHPIYC 90

AM230093 KQSYPTYPPKQ---------SYPAYPPKQSYPTNPPYNPCDAVYCRPIYC 60

AM230120 --------------------------------------------------

AM230139 NYGQYTPQGECCPQCNPGTYLPEKWSWKGNNVVGDQEKYVGEGNNVGEQR 149

AM230094 NYGQYTPQGECCPQCNPGTYLPEKWSWQGNNVVGDQEKYVGEGNNVGEQR 140

AM230093 NYGQYTPQGECCPQCNPGTYLPEKWSWQGNNVVGDQEKYVGEGNNVGEQR 110

AM230120 ----------------SGTYLPEKWSWQGNNVVGDQEKYVGEGNNVGEQR 47

********** **********************

AM230139 NDVDGNENIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 196

AM230094 NDVGGNANIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 187

AM230093 NDVSGNSNIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 157

AM230120 NDVSGNSNIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 94

*** ** ****************************************

Figure 2-2. Alignment of the multiple EST sequence matches derived for the Dpfp5 gel band in Figure 2-

1A. The gel band was digested by in-gel tryptic digestion and the fragmented peptides were subject to

LC-MS/MS mass spectrometry analysis. The mass spectra obtained were then matched against the

virtually translated EST cDNA library of zebra mussel foot proteins. The bracketed numbers besides the

accession numbers represent the reading frame of the virtually translated EST sequence. The numbers at

the end of the sequence rows represent the position of the last amino acid in the row. The peptide matches

are aligned and color coded to show regions of sequence similarity between matches. The orange

sequence at the N-terminus represents adaptor sequences added during cDNA amplification. The colors

red (100%), blue (75%) and purple (50%) represent the percent sequence similarity between different

EST matches. The yellow highlight represents peptide sequences that matched directly from the mass

spectra. Q and Y signify glutamine deamidation and tyrosine hydroxylation to DOPA, respectively. *

indicates residues that are conserved between all EST matches. The first accession number represents the

sequence that is further analyzed through the paper.

For further analysis of the Dpfp5 sequence properties, we chose to exclude the N-terminal

adaptor sequence (orange) and analyze the longest and thus the most complete sequence

(AM230139) as described in Table 2-3. The AM230094 sequence (Figure 2-2) is mostly similar

to AM230139 except that it is missing a single SYPTYP repeat in the blue region. The Dpfp5

sequence (AM230139) is 183 residues long and has a theoretical mass of 20.1 kDa (Figure 2-3).

The sequence is rich in proline (P, 18%), glycine (G, 12%), glutamine (Q, 12%), asparagine (N,

10%) and tyrosine (Y, 10%) which may be hydroxylated to DOPA. Additionally, different

Page 51: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

39

regions of the Dpfp5 protein display noticeable distinctions in amino acid properties and repeat

patterns. The N-terminus (residues 1 – 68) has a theoretical pI of 9.56 whereas the rest of the

protein has a pI of 4.45. Additionally, the N-terminus is quite hydrophilic in contrast to the rest

of the protein which is mostly hydrophobic. The Dpfp5 sequence consists of similar mol% of

aliphatic (12%) and aromatic (12%) residues and same mol% of acidic (7%) and basic (7%)

residues. While positive residues (K+R) are uniformly distributed through the sequence, negative

residues (D+E) are absent at the N-terminus. The N-terminus is rich in triads of proline and

glutamine (generally PQQ and PKQ) alternately underlined and highlighted in green (Figure 2-

3). The latter of these repeats are interspersed with consensus repeats of SYP(A/T)YP

highlighted in blue. The middle region of the Dpfp5 sequence (residues 69 – 114) has a pI of

5.87 and has no discernible repeats. It does however contain six cysteine residues that are

otherwise absent from the rest of the protein. The C-terminal of the sequence (residues 115 –

183) has a theoretical pI of 4.21 and consists of ten VGG repeats (highlighted in yellow) where

the third glycine is occasionally substituted with aspartic acid (D), glutamic acid (E) or

tryptophan (W). Alternate VGG repeats are followed by a glutamine (Q) residue and all except

one are preceded by an asparagine (N) residue at the second last position. Five GN(N/D/T)

repeats are also observed at the C-terminus, preceding the VGG repeats.

YNSWPPKPNQPQQPQQPQQPPQPPRYPQP (29)

SYPAYPPQQSYPAYPPKQSYPTYPPKQSYPAYPPKQSYP (68)

TNPPYNPCDAVYCRPIYCNYGQYTPQGECCPQCNPGTYLPEKW (111)

SWKGNNVVGDQEKYVGEGNNVGEQRNDVDGNENI VGGQSNA (152)

VGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG (183)

Figure 2-3. Illustration of the pattern of repeats identified in the EST derived sequence of Dpfp5

(AM230139). The adaptor sequence inserted during cDNA cloning has been excluded and the N-

terminus of the sequence is incomplete. The bracketed numbers represent the sequence position

of the last amino acid in the row. Alternating underlined and non-underlined green highlighted

sequences represent proline and glutamine rich triads. Blue, grey and yellow highlights represent

other repeat sequences. Cysteine residues are indicated in red.

Page 52: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

40

2.4.5 Sequence properties of the EST-derived sequence of Dpfp2

While incomplete and unordered fragments of the protein sequence of Dpfp2 had previously

been determined by automated Edman degradation [33], we have determined here for the first

time a more complete sequence of Dpfp2 through EST matching of its peptide mass spectra.

Figure 2-4 depicts the multiple sequence alignment of five of these matches. The five aligned

sequences are quite similar with some regions that are conserved in all matches (red sequence at

C-terminus) and other regions that are missing or different in some matches (blue, purple or

green sequences at N-terminus). All the sequences except the sequence in black in AM229733

correspond quite well to the Dpfp2 sequence fragments observed by Rzepecki and Waite, 1993

[33]. The AM229733 sequence is therefore not studied in further analysis of the protein. In all

the matches, the orange sequence represents an adaptor that was inserted during the creation of

the cDNA library [56]. Like with Dpfp5, this replaced several N-terminal residues and thus, the

N-terminus of the Dpfp2 sequence (likely including the signal peptide) is incomplete. The

sequence matches obtained are independent of the reading frame of the virtually translated

sequence. On checking for post translational modifications, one DOPA (Y) residue was located

in the spectrum match in the 40% conserved region near the N-terminus. For further analysis of

the Dpfp2 sequence properties, we chose to exclude the N-terminal adaptor sequence (orange)

and analyze the sequence with accession number AM229730 as described in Table 2-3. This

sequence is 125 residues long and has a theoretical mass of 15.3 kDa. It is rich in charged

residues including 10% glutamic acid (E) and 19% lysine (K). It is also rich in proline (P) (16%)

and threonine (T) (12%) and is richest in tyrosine (Y) (23%).

Page 53: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

41

AM229730(+2) APGRHGGRGNSISSGRPGR-YQEKTYPGYPPKQAYPVYPEKTYPEKTYPAY 50

AM229733(+1) --------GRGNLAWSRPRCTRRKLIRVILQDKHILYIVRNRYPEKTYAAY 43

AM229731(-2) -------NSLVISSGRPGR-YQEKTYPGYPPKQAYPVYPEKTYPEKTYPAY 43

AM229735(+1) -------GRGNSISVVAAE-VK----------------------------- 14

AM230118(+1) -------GRGNSISVVAAE-V------------------------------ 13

AM229730 PTKKSYPEYPEKTYTKKTYEAYPTKDSYTVYPDKKYTEKTYEAYPTKDSY 100

AM229733 PTKKSYPEYPEKTYTKKTYEAYPTKDSYTVYPDKKYTEKTYEAYPTKDSY 93

AM229731 PTKKSYPEYPEKTYTKKTYEAYPTKDS---YPDKKYTEKTYEAYPTKDSY 90

AM229735 ----------------KTYEAYPTKDSYTVYPDKKYTEKTYEAYPTKDSY 48

AM230118 -----------------------------VYPDKKYTEKTYEAYPTKDSY 34

********************

AM229730 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWGK 144

AM229733 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWGK 137

AM229731 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWG- 133

AM229735 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWGK 92

AM230118 TVYPDKKYTEKKYEAYPTKQSYPVYPEKKYPEKPYPGYQDYWGK 78

*******************************************

Figure 2-4. Alignment of the multiple EST sequence matches derived for the Dpfp2 (26 kDa)

gel band in Figure 2-1A. The gel band was digested by in-gel tryptic digestion and the

fragmented peptides were subject to LC-MS/MS mass spectrometry analysis. The mass spectra

obtained for each band was then matched against the virtually translated EST cDNA library of

zebra mussel foot proteins. The bracketed numbers besides the accession numbers represent the

reading frame in which the EST sequence was virtually translated. The numbers at the end of the

sequence rows represent the position of the last amino acid in the row. The peptide matches are

aligned and color coded to show regions of similarity between matches. The orange sequence at

the N-terminus represents adaptor sequences added during cDNA amplification. The colors red

(100%), blue (80%), purple (60%) and green (40%) represent the percent sequence similarity

among the EST matches. The yellow highlight represents peptide sequences that matched

directly from the mass spectra. Q and Y signify glutamine deamidation and tyrosine

hydroxylation to DOPA, respectively. * indicates residues that are conserved between all EST

matches. The first accession number represents the sequence that is further analyzed through the

paper.

Page 54: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

42

Analysis of the EST derived sequence of Dpfp2 reveals five full tandem repeats of a 22 residue

consensus sequence that make up the bulk central region of the protein. Figure 2-5A shows this

pattern of repeats with each repeat on a different line. The consensus sequence can be

represented by KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where non-italicized residues

represent highly conserved residues. There are five tyrosines (Y) in every consensus and the

position of the tyrosine residue is always conserved (indicated in bold in Figure 2-5A). Rzepecki

and Waite, 1993 had identified two fragments of this Dpfp2 consensus sequence; K-(K/T)-Y-

(X/P)-E and *-Y-(P/X)-*-(Y/K)-*-D, where * is any residue, Y is DOPA and X was speculated

to be glycosylated threonine [33]. Within the EST match, the one DOPA (Y) residue was

identified in the first full repeat though there are others (as detected by Rzepecki and Waite,

1993) that were not detected by the LC-MS/MS machine. The deduced Dpfp2 sequence contains

only 7 mol% aliphatic residues, 24% aromatic residues and 32% charged residues. There is an

equal distribution of non-polar and polar residues. Comparing hydrophobicity, the Kyte-Doolittle

Hydropathy Plot in Figure 2-5B illustrates a repeating pattern of rising and falling

hydrophobicity. Higher hydropathy scores indicate higher hydrophobicity. The KT/KK residues

at the 18th

position in the consensus represent the hydrophilic start and end of each of the central

four hydrophobic peaks.

Page 55: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

43

A.

YQE (3)

KTYPGYPPKQAYPVYPEKTYPE (25)

KTYPAYPTKKSYPEYPEKTYTK (47)

KTYEAYPTKDSYTVYPDKKYTE (69)

KTYEAYPTKDSYTVYPDKKYTE (91)

KKYEAYPTKQSYPVYPEKKYPE (113)

KPYPGYQDYWGK (125)

B.

Figure 2-5. Illustration of the tandem repeat pattern identified in the EST derived sequence of

Dpfp2 (AM229730). (A) Sequence depicting five full repeats of a 22 residue consensus sequence

KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where non-italicized residues represent highly

conserved residues. Each full repeat is on a new line and tyrosine residues with conserved

positions within the consensus are indicated in bold. The bracketed numbers represent the

sequence position of the last amino acid in the row. The underlined residues indicate post-

translational modifications; Q and Y signify glutamine deamidation and tyrosine hydroxylation

to DOPA, respectively. (B) Kyte-Doolittle hydropathy plot of the sequence. Higher hydropathy

scores indicate higher hydrophobicity.

Page 56: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

44

2.5 Discussion

We report here on the byssal distribution and sequence properties of novel and previously know

byssal proteins. We identified three novel zebra mussel byssal proteins by performing gel

electrophoresis of proteins extracted from freshly secreted, minimally cross-linked byssal

threads. These novel proteins, Dpfp0 (>210 kDa), Dpfp4 (>90 kDa) and Dpfp5 (~30 kDa) did

not previously stain for DOPA in the foot extract [33] and were thus never before known to be

present in the byssus. Further, we used peptide fragment fingerprinting (LC-MS/MS analysis and

cDNA database matching) to determine a likely protein sequence for the novel Dpfp5 protein.

We also identified two previously known DOPA proteins, Dpfp1 (76 and 65 kDa) and Dpfp2 (26

kDa) on the gel and determined a more complete sequence of Dpfp2 to complement the

previously known fragments of the sequence [33].

The EST derived sequence of Dpfp5 displays interesting repeat patterns and sequence properties

within the protein (Figure 2-3). Sequence repeats are a characteristic of several adhesive proteins

including those in the much studied marine mussels and sandcastle worm [65], [42] and are

therefore important to study. The N-terminus of Dpfp5 (residues 1 – 68) is basic with a pI of

9.56, is quite hydrophilic as compared to the rest of the protein and lacks negative residues

identified in the rest of the protein. This N-terminus is rich in repeats of glutamine (Q, 21%) and

proline (P, 37%) residues that each make up only 7% of the rest of the protein. Extended

polyglutamine (polyQ) sequences are a characteristic of neurodegenerative diseases where

expansion of polyQ stretches is believed to cause aggregation of the protein. Popiel et al., 2004

found that inserting proline into the expanded polyQ stretch suppressed protein aggregation and

cytotoxicity [66]. As such, in Dpfp5, the glutamine chain possibly contributes to the

structure/function of the protein and the interspersed prolines may be present to ensure that the

glutamine residues do not aggregate. The N-terminus also has a number of SYPAYP repeats that

are interspersed between an additional set of P(K/Q)Q repeats. However, the number of these

repeats slightly varies between EST matches. While most of the EST matches have only three

full repeats of SYPAYPP(K/Q)Q, the match described in Figure 2-3 (AM230139) has an

additional fourth SYPTYPPKQ repeat (Figure 2-2). These slightly different sequences might

represent different Dpfp5 variants. These variants could arise as multiple copies of the same gene

Page 57: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

45

in the form of different mature RNAs. The mRNA variants can be created by RNA editing or

alternate splicing of one primary RNA transcript [67]. Additionally, since the protein sample is

prepared by collecting byssal threads from several different mussels, allelic variation between

mussels could contribute to the different protein forms we see [29].

A BLAST search of the proline and glutamine chain in the first 29 Dpfp5 residues revealed PQQ

sequence homologies with a number of extracellular structural proteins including glycoproteins

from the zona pellucida (oocyte egg coat) of the Winter Flounder Flatfish [Pseudopleuronectes

americanus] (score 56.2) and with the Choriogenin H minor glycoprotein in the Zona Radiata of

Fundulus heteroclitus (score 53.7). These glycoproteins are believed to be involved in hardening

reactions involving alterations to the structure of the protein [68], [69]. This might thus be

comparable to the maturation process of the zebra mussel byssal thread. Lyons et al., 1993 found

PQQ repeats in the Winter Flounder glycoprotein gene to be part of a longer (PQQ)4PKY repeat

and suggested that the lysine, tyrosine and glutamine residues might be involved in cross linking

owing to their positioning [69]. As such, the conserved positioning of these residues in the Dpfp5

terminus could also indicate similar interactions. Importantly, such repeats, containing a proline

at every third residue position interspersed with hydrophilic residues, are a common feature of

many extracellular structural proteins [69] thus indicating a structural role for the N-terminus in

Dpfp5. PQQ homologies were also seen in a structural integral membrane protein [Streptomyces

roseosporus] (score 55.8) and to a CCAAT-binding transcription factor subunit HAPB

[Arthroderma benhamiae] (score 55.8) and MEF2D Transcriptional activator protein (53.7)

amongst other DNA binding proteins.

The 46 central residues in the Dpfp5 sequence (69 – 114) have a pI of 5.84 and are most

abundant in proline (17%), tyrosine (13%) and cysteine (13%). This region displays no

discernible repeat pattern but is interesting because it contains cysteine residues that are

otherwise absent from the rest of the Dpfp5 sequence. Such specific cysteine localization could

possibly indicate a role for disulfide bridge interactions by the middle region of the protein.

Cysteine residues are also known for their potential roles as antioxidants owing to the hydrogen

atom in the thiol group available for donation [70]. Yu et al., 2011 demonstrated in the marine

mussel Mytilus californianus that cysteine containing byssal proteins can provide an acidic,

reducing environment that can reduce dopaquinone back to DOPA and thus restore DOPA

Page 58: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

46

adhesion [16]. They also found that this thiol rich protein can later transform into a cross-linker

with the DOPA containing protein by forming S-cysteinyldopa adducts [16]. Thus, Yu et al.,

2011 identified this cysteine rich protein as a plaque antioxidant as well as a cross-linker to

improve plaque cohesion [16]. The distribution of cysteine residues uniquely within the middle

region of Dpfp5 thus indicates that this region might play a similar role in maintaining DOPA

adhesion and in mediating cohesion within the byssus.

The Dpfp5 C-terminus (residues 115 – 183) with a pI of 4.21 also has its distinct repeats and

amino acid compositions. In contrast to 29% proline present in the rest of the sequence, the C-

terminus has absolutely no proline. It also has only 1 tyrosine compared to 17 in the rest of the

sequence. The near absence of tyrosine from the C-terminus indicates that if any DOPA

(hydroxylated tyrosine) is present in the protein it is likely not in the C-terminus. Thus, this

section of the protein is most likely not involved in any DOPA dependent adhesion/cohesion.

The C-terminus is also composed of a very high percent of glycine (28%) and valine (16%)

compared to only a few of each in the middle region and absolutely none in the N-terminus.

These residues are represented in a series of VGG repeats richly constituting the C-terminus. The

VGG repeats are also a characteristic of the sandcastle worm (Phragmatopoma californica)

adhesive protein pc-1 [36]. The sandcastle worm uses its adhesive proteins to stick sand grains

together underwater. It has evolved independently of the zebra mussel (belonging to different

phyla) to incorporate a similar repeat sequence in its protein. This indicates that the VGG repeat

might be an important contributor to adhesion/cohesion and might be a key repeat for

bioadhesive glues to possess. In Dpfp5, the third glycine in VGG is occasionally substituted with

positive residues glutamic acid (D) and aspartic acid (E). These additional charges might

contribute to stronger charge – charge interactions of Dpfp5 within the byssus. GNN repeats

through the C-terminus and conserved positioning of arginine (R) preceding the VGG repeat

may also have specific functions in the protein.

The EST derived sequence of Dpfp5 (Figure 2-3) reveals striking similarities as well as some

unique properties as compared to the byssal protein Dpfp1. Dpfp1 is an acidic protein with a

diblock polymer structure consisting of an N-terminus with 22 repeats of a heptapeptide

consensus motif and a C-terminus with 16 repeats of a tridecapeptide [35]. In Dpfp5, the N-

Page 59: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

47

terminus has a highly basic pI of 9.56 whereas the rest of the protein has an acidic pI of 4.45.

This is similar to Dpfp1 which has an N-terminus with pI 9.02 and C-terminus with pI 4.62.

Such striking charge differences between the protein ends must have an implication in the

mechanism of assembly and/or modes of interactions of these proteins. In both Dpfp1 and

Dpfp5, the two oppositely charged termini could possibly interact with each other in a certain

way allowing the proteins to assemble into a specific structure. Or perhaps the two ends interact

with distinct surfaces or different byssal materials thus functioning as connectors or increasing

the range of interactions the protein can undergo. These speculations and sequence interactions

must be further investigated by studying peptide mimics of these proteins. Interestingly as well,

Dpfp5 (theoretical pI 6.41) is only the second identified acidic byssal protein after Dpfp1 (pI

5.24) since marine mussel byssal precursor proteins are generally basic [9]. Thus, Dpfp5 is

similar to Dpfp1 in that it is also acidic and its termini are distinct with different kinds of repeats

and charges; however Dpfp5 is different in that it also has a middle sequence with no discernible

pattern that connects the two blocks. Further, the Dpfp5 N-terminus is quite hydrophilic in

comparison with the rest of the protein. This is unlike Dpfp1 which shows no such distinction.

The protein termini in Dpfp5 may thus have different solubility properties and different

affinities. For example, the hydrophilic N-terminal end might have a greater affinity for

hydrophilic surfaces or other hydrophilic proteins within the byssus whereas the hydrophobic C-

terminus would prefer to stay buried among hydrophobic proteins or maintain interactions with

hydrophobic surfaces at the plaque-substrate interface.

Knowledge about the distribution and sequence properties of byssal proteins can also give us

useful insights into their function and mode of adhesion/cohesion within the byssus. The newly

identified Dpfp5 byssal protein was identified by gel electrophoresis in both the thread and the

plaque extracts (Figure 2-1B). It must therefore have a role relevant to both. Since a plaque-

substrate adhesive role would not be required in the thread, we suggest that this protein might

have a cohesive or an alternate adhesive role in the byssus. This role could possibly include

DOPA dependent interactions such as cohesive DOPA quinone cross-linking or metal mediated

interactions in the cuticle [9]. It could possibly also include DOPA-independent chemical

interactions involving other protein residues such as covalent bonding, ionic binding, hydrogen

bonding, dipole interactions and/or van der waal interactions. Since the sequencing method did

Page 60: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

48

not reveal information on the presence or absence of DOPA in Dpfp5, we can only speculate on

the proteins dependence on DOPA. Dpfp5 has only 10% tyrosine in its sequence compared to

23% in the derived Dpfp2 sequence and 15% in Dpfp1. Thus Dpfp5 has less tyrosine available

for hydroxylation to DOPA and hence possibly has a less DOPA dependent role than the other

proteins. That Dpfp5 might contain little or no DOPA is consistent with the finding that Dpfp5

did not stain for DOPA when extracted from the mussel’s foot by Rzepecki and Waite, 1993b

[33]. Lower DOPA compositions could also help explain better extraction and/or electrophoresis

of Dpfp5 compared to the other byssal proteins, as is witnessed with more prominent bands on

the gel (Figure 2-1B).

In addition to Dpfp5, we also investigated the sequence properties of the previously identified

DOPA containing protein, Dpfp2. Dpfp2 runs at 26 kDa on the gel and, like Dpfp5, is present in

both the thread and the plaque extracts (Figure 2-1). Thus, it must have a DOPA dependent role

that is relevant to both the thread and plaque. However, the Dpfp2 gel bands are not as prominent

as the Dpfp5 bands. This may possibly be because the presence of DOPA makes the protein

more resistant to extraction and electrophoresis. Owing to its highly basic pI of 9.32, it is also

possible that the protein is not sufficiently extracted in basic extraction buffer. With Dpfp2 as

well, the N-terminus of the EST derived sequence is incomplete, but here the most striking

properties seem to be at the sequence core. The bulk of the protein, from residues 4 to 117,

consists of five full repeats a 22 residue consensus sequence

KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where non-italicized residues represent highly

conserved residues (Figure 2-5A). Interestingly, the position of the tyrosine residues is

conserved in all of the repeats. Since tyrosine is hydroxylated to DOPA, this indicates that the

positioning of the DOPA residue plays an important role in the structure/function of the protein.

The deamidation of glutamine (Q) near the N-terminus may also hold some significance in the

function of the protein. Sagert and Waite, 2009 suggest that deamidation in the byssal proteins

might occur to provide charge heterogeneity or primary structure heterogeneity to the protein

[25]. It is however also possible that the deamidation seen here occurs due to experimental

conditions such as temperature, buffer and the basic pH [71]. It is also interesting that the

AYPTK(D/Q) and the SYPVYPE regions within the consensus are somewhat similar to the

SYPAYP repeat in the N-terminus of the Dpfp5 protein. Dpfp2 is very different than Dpfp1 and

Page 61: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

49

Dpfp5 in that it does not have a copolymer block structure and is highly basic. It has no

distinctions between different regions of the protein. It has the same consensus sequence

throughout and alternating hydrophobic and hydrophilic regions as seen in Figure 2-5B. Thus,

its assembly and modes of interactions are potentially different than that of these other byssal

proteins.

Comparisons of zebra mussel adhesive proteins with adhesive proteins from other species can be

useful in determining common properties of adhesive mixtures that have evolved independently.

The zebra mussel byssus contains a mixture of basic (Dpfp2) and acidic (Dpfp1 and Dpfp5)

proteins. Like the zebra mussel, the sandcastle worm also has adhesive proteins with distinct

charges. One of these is strongly acidic with a pI 2.5 and two are basic with a pI greater than 9

[36]. These protein pI similarities between the species indicate a possible requirement for

distinctly charged proteins in an adhesive mixture. In marine mussels however, the byssal

precursor proteins are generally basic [9]. Like Dpfp1 and Dpfp5, one of the adhesive proteins in

the sandcastle worm, pc-3A, is also a highly acidic diblock protein with an acidic N-terminus

and basic C-terminus [41]. Thus, its sequence distribution is similar to that of Dpfp1 and Dpfp5,

once again revealing similar adhesive mechanisms between species.

It is interesting that while several EST sequence matches were obtained for Dpfp5 and Dpfp2

bands, no EST sequences matched to Dpfp1. It is unlikely that this is due to limitations in the

tryptic digestion and LC-MS/MS of the Dpfp1 gel band because three sufficient peptide matches

were obtained against the full sequence of Dpfp1 (Table 2-3). It is thus possible that the EST

library lacks the cDNA for this protein possibly because its mRNA was not present in the

mussel’s foot at the time that the library was created. In addition to Dpfp5, two other novel

byssal proteins, Dpfp0 and Dpfp4 were also identified by gel electrophoresis (Figure 2-1). Like

Dpfp5, these proteins did not previously stain for DOPA in the foot extract [33]. Dpfp0, with a

molecular weight greater than 210 kDa, runs at a similar molecular weight to the Dbfp0 protein

in the closely related freshwater mussel, the quagga mussel (Dreissena bugensis) [33]. The zebra

and quagga mussels possess a number of potentially homologous proteins pairs including Dpfp1

and Dbfp1, Dpfp2 and Dbfp2 and Dpfp3 and Dbfp3. Thus, it is possible that Dpfp0 and Dbfp0

are also homologues of each other. However, Dbfp0 is known to be a DOPA containing protein,

Page 62: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

50

indicating that if these are in fact homologous, then Dpfp0 might be a DOPA containing protein

as well. It would not be right to disregard Dpfp0 as a DOPA protein just because it did not stain

for DOPA in the foot extract in the experiments done in 1993 [33]. Its low band density as

compared to the other gel bands in Figure 2-1 could indicate that it is not easily extracted or not

present in sufficient quantities in the byssus and this could maybe explain why it did not

previously stain for DOPA [33]. With regards to localization, Dpfp0 was seen in the plaque

extract but not in the thread extract. However, almost twice as much plaque extract was loaded

and hence we cannot conclude that this protein is unique to the plaque. The novel protein Dpfp4,

on the other hand, appears to be present uniquely in the thread but the bands are too faint to make

any conclusions on the distribution of the protein.

Our investigation has thus provided some useful insights into proteins constituting the zebra

mussel byssus. Analysis of induced, freshly secreted byssal threads allowed us to extract proteins

at a stage of minimal DOPA cross-linking. Electrophoresis of these extracts allowed us to

identify novel byssal proteins and compare their distribution between the thread and plaque. The

novel protein Dpfp5 was found localized in both the thread and plaque and its putative sequence

revealed distinct N and C termini with distinct repeats and amino acid properties. The incomplete

N-terminus, owing to the incomplete 5’ end of the cDNA sequence, however limits our complete

understanding of the Dpfp5 sequence. The similarities of Dpfp5 to the diblock copolymer

structure of Dpfp1and distinction from the uniblock, basic properties of the Dpfp2 protein

provide an interesting insight into byssal protein properties. Further, comparisons of these zebra

mussel proteins with adhesive proteins in the much studied marine mussels and sandcastle worm

reveals common adhesive mechanisms that have evolved independently and that must therefore

be important in adhesion. Future work must look into zebra mussel protein distribution at the

plaque substrate interface. Adhesive/cohesive interactions of the byssal proteins must also be

investigated by studying peptide mimics of the proteins. An understanding of zebra mussel

adhesion will ultimately be useful in the development of biocompatible and water resistant

adhesives for medical and dental applications. Additionally, this knowledge will reveal byssal

properties that must be targeted in the development of antifouling agents against this biofouling

species.

Page 63: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

51

2.6 Acknowledgments

The authors gratefully acknowledge Dr. Craig Simmons and Dr. Ben Ganss for access to

electrophoretic equipment and Zahra Shahrokh for advice on protocols. We would like to thank

Trevor Gilbert and Kyle Serkies for collecting the mussels. We also thank Li Zhang and

Reynaldo Interior of the Advanced Protein Technology Centre, Sick Kids, Toronto for LC-

MS/MS and amino acid analysis, respectively. This work was supported by the National

Sciences and Engineering Research Council (NSERC) of Canada, the Canadian Foundation for

Innovation (CFI), and an Ontario Graduate Scholarship (OGS).

Page 64: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

52

Chapter 3:

Novel Proteins Identified in the Insoluble Byssal Matrix

of the Freshwater Zebra Mussel Dreissena polymorpha

Arpita Gantayeta and Eli D. Sone

a,b,c *

a Institute of Biomaterials & Biomedical Engineering; University of Toronto, Toronto, ON,

Canada

b Department of Materials Science & Engineering, University of Toronto, Toronto, ON, Canada

c Faculty of Dentistry; University of Toronto, Toronto, ON, Canada

*Corresponding author: Email: [email protected]

This chapter is in preparation as a manuscript to be submitted to the journal ‘Marine

Biotechnology’.

Page 65: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

53

3.1 Abstract

The freshwater zebra mussel Dreissena polymorpha is an invasive, biofouling species that

adheres to a variety of substrates underwater using a proteinaceous holdfast called the byssus and

is therefore an inspiration for the development of water-resistant bioadhesives for medical and

dental applications. The byssus, consisting of a number of threads with adhesive plaques at the

tips, utilizes a rare amino acid called 3, 4-dihydroxyphenylalanine (DOPA) to mediate adhesion

and cohesion within the byssus. This is similar to the much-studied marine mussel byssus but the

DOPA compositions are lower in zebra mussels, thus indicating the importance of other non-

DOPA interactions as well. Extensive DOPA cross-linking however renders the zebra mussel

byssus highly resistant to analysis and therefore limits byssal protein identification. We report

here on the sequencing and identification of seven novel byssal proteins in the insoluble byssal

matrix following protein extraction from induced, freshly secreted byssal threads with minimal

cross-linking. Comparisons of the protein sequences, as determined by LC-MS/MS analysis and

spectrum matching against a zebra mussel cDNA library, identified repeat patterns and block

structures as common features of zebra mussel byssal proteins and identified varying theoretical

molecular weights (4.1 to 14.6 kDa) and isolectric points (4.2 – 9.6) of byssal proteins. All

proteins contain one or more defined sequence motifs including glycine rich, proline and tyrosine

rich and proline and cysteine rich motifs and all of the proteins were identified in both the thread

and plaque matrices.

Keywords: bioadhesion, DOPA, LC-MS/MS, mussel adhesion proteins, plaque, threads

Page 66: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

54

3.2 Introduction

Zebra mussels (Dreissena polymorpha) are an invasive freshwater mussel species that are native

to the Black, Caspian and Azov Seas and were accidentally introduced into North American

water bodies such as the Great lakes in the late 1980’s [1]. These bivalves are able to attach

strongly to a variety of hard substrates underwater using a proteinaceous structure called the

byssus that consists of a number of threads with adhesives plaques at the tips and that is

surrounded by a protective layer called the cuticle [33]. The mussels’ ability to attach to boat

hulls has allowed them to spread rapidly over the years and in addition to causing major

ecological repercussions, this biofouling species is also able to clog water intake pipes and affect

water based industries such that its economic impact has been far in excess of $100 million [2].

The zebra mussel byssus is superficially similar to the byssi of the marine mussels which have

evolved independently as a different subclass [8], [3] and have been studied much more

extensively [18], however the zebra and marine mussel byssi differ in their overall amino acid

contents and hence, their overall protein compositions [33]. The byssi of both species contain

proteins containing the rare amino acid 3,4 dihydroxyphenylalanine (DOPA), however the

marine mussels have much higher compositions of this residue which is a post translational

hydroxylation of tyrosine [10, 18]. Additionally, while the zebra mussel byssus contains similar

compositions of DOPA and other amino acids within the threads and adhesive plaques, marine

mussels have threads composed of collagenous proteins versus DOPA containing proteins

localized in the plaques [9]. Underwater adhesive proteins are a characteristic of several aquatic

species including mussels [9], sandcastle worms [36], barnacles [37], starfish [38], sea

cucumbers [39] and caddisfly larvae [40]. The sandcastle worm, Phragmatopoma californica has

also evolved independently of the mussels to incorporate DOPA residues in some of its cement

proteins for the purpose of sticking sand grains together underwater [41]. Thus, DOPA is an

important component of adhesive proteins in other species as well. In the mussel byssus, DOPA

can undergo a variety of adhesive and cohesive interactions. It can form multiple metal mediated

ligations to give cohesive strength to the cuticle [46], it can undergo catechol oxidase mediated

oxidation to DOPA quinone followed by covalent cross-linking with DOPA, lysine and cysteine

in order to provide cohesive strength to the thread and plaque [15] and in its native form, DOPA

can bind to metal oxide surfaces and mediate surface adhesion [11]. However, the low DOPA

Page 67: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

55

content in zebra mussels byssal proteins [10] and the mussels ability to adhere to a variety of

substrates (both hydrophobic and hydrophilic) [72] indicates that other amino acid interactions

other than DOPA must also contribute to the adhesion and cohesion functions of byssal proteins.

Information on the protein compositions of the zebra mussel byssus and an insight into their

sequence properties will thus be useful in understanding the molecular mechanism of zebra

mussel adhesion. This knowledge will ultimately be useful in the design of biocompatible and

water-resistant adhesives for dental and medical applications and will contribute to the design of

targeted anti-fouling agents against this biofouling species.

Mature zebra mussel byssal threads are highly resistant to extraction and immunolocalization due

to extensive cross-linking of DOPA residues [33] thus making protein composition analysis quite

difficult. So far, six Dreissena polymorpha foot proteins (Dpfp0 – 5) have been identified by gel

electrophoresis and the primary sequences of three of these are known. Dpfp1, Dpfp2 and Dpfp3

were first identified as byssal proteins based on their ability to stain for DOPA in an

electrophoresed extract from the ‘foot’, the organ that secretes precursor byssal proteins to form

byssal threads [33]. Dpfp0, Dpfp4 and Dpfp5 were identified as silver-stained gel bands in a

soluble extract from induced, freshly secreted byssal threads with minimal cross-linking [73].

However, no information is available on the DOPA content of these proteins. Table 3-1

describes the molecular weights, DOPA contents and primary sequence information available for

each of the six identified byssal proteins. While Dpfp1 is the only protein for which the full

primary sequence is known [35], the primary sequences of Dpfp2 [33], [73] and Dpfp5 [73] have

also been determined by tandem mass spectrometry analysis and database matching against a

zebra mussel foot cDNA library created by Xu and Faisal, 2008 [56], however, the cDNA

sequences are potentially incomplete at the N-terminus. Dpfp1 has a block-copolymer structure

consisting of 22 repeats of a heptapeptide P(V/E)YP(T/S(K/Q)X at the N-terminus and 16

repeats of a tridecapeptide KPGPY*DYDGPYDK at the C-terminus, where Y* represents

DOPA [35]. Dpfp2 contains five full repeats of the near consensus sequence

KTY(P/E)AYPTK(Q/D)SYPVYPEKKYTE where the position of the tyrosine residue is always

conserved [73]. Dpfp5 also has a block structure where the N-terminus is rich in proline (P) and

glutamine (Q) triads, the C-terminus is rich valine (V), glycine (G) and asparine (N) based triads

and the middle region contains cysteine that is absent in the other sequences [73]. Interestingly,

all three byssal proteins have been identified in both the thread and the plaque [33], [73]. Matrix-

Page 68: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

56

assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) of

mature byssal threads has additionally revealed the presence of several small molecular weight

byssal proteins in the range of 3.7 – 7 kDa that have not been identified yet [34]. These proteins

have distinct distributions between different regions of the byssus including the thread, plaque

and 10-20 nm [6] adhesive interface [34]. Identification of these proteins and information on

their sequences can therefore provide useful insights into the adhesive/cohesive roles of byssal

proteins.

Since mature byssal threads are highly resistant to extraction, and protein identification in the

mussels foot reveals no information on byssal distribution, we induce the secretion of fresh

threads such that byssal protein composition can be studied after secretion from the foot but

before extensive cross-linking. This method has been used extensively in marine mussels [58],

[25] and once before in the freshwater zebra mussels [73]. In marine mussels, it has been shown

that the induced byssal thread compositions are almost the same as in naturally secreted threads

[16], [25]. Assuming a similar scenario in zebra mussels, we use fresh, induced threads as a

model system to better understand mechanisms in natural threads. Following protein extraction

from induced byssal threads, the byssal sample can be centrifuged to separate out soluble extract

and insoluble matrix proteins. However, while the soluble proteins can be separated and

analyzed by electrophoresis [73], such analysis is not possible for the insoluble byssal proteins.

Therefore, we characterize the protein composition of the insoluble matrix by directly

performing tandem mass spectrometry analysis following trypsin digestion of the pellet proteins.

Here, we report on the identification of novel as well as previously known byssal proteins in

base-insoluble thread and plaque matrices and analyze the sequences of the seven novel byssal

proteins identified by zebra mussel cDNA database matching.

Page 69: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

57

Table 3-1. Summary of the molecular weight and sequence information available for the six

identified zebra mussel byssal proteins (Dpfp), in decreasing order of their molecular weights as

determined by gel electrophoresis.

Dreissena

polymorpha

Foot Protein

(Dpfp)

MW by

Electropho

-resis (kDa)

MW by

MALDI-

TOF

(kDa) [35]

Max

DOPA

Content

[33]

Primary

sequence

information

known

Theoretical MW

and pI based on

primary sequence

Dpfp0 [73] > 210 - - - -

Dpfp4 [73] > 90 - - - -

Dpfp1 [33],

[73]

76 & 65, 54.5 & 48.6 6.6% Full sequence

[35]

49 kDa, pI 5.3-6.5

Dpfp5 [73] ~ 30 - - Incomplete N-

terminus [73]

20.1 kDa a, pI 6.4 a

Dpfp2 [33],

[73]

26 - 7.0% Incomplete N-

terminus [73]

15.3 kDa a, pI 9.4 a

Dpfp3 [33] 12-13 - Unknown - -

a MW and pI are theoretically calculated based on primary sequence lacking signal peptide

3.3 Methods

3.3.1 Protein extraction from induced byssal threads/plaques

Zebra Mussels collected from Round lake, Ontario, Canada were kept for up to 60 days at room

temperature in an aquarium in artificial freshwater prepared with a recipe by M. Sprung, 1987

[59]. Protein extractions were performed as described previously [73]. Firstly, as described by

Tamarin et al., 1976, the mussels were dissected and the foot was injected with approximately 30

µL of 0.56M Potassium Chloride (KCl) using an 18G syringe, thus leading to the secretion of a

fresh byssal thread [58]. After 3-5 minutes, the induced thread/plaque was located in the foot’s

ventral groove, was pulled out with tweezers, washed in a drop of deionized water and extracted

in extraction buffer. The extraction method was adapted from Zhao and Waite, 2006 [30]; byssal

threads (around 6 to 14 per extraction) were extracted in 250 µL of basic extraction buffer (EB)

Page 70: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

58

(0.2M sodium borate, 4M urea, 1mM KCN, 1mM EDTA, and 10 mM ascorbic acid) that was

prepared using a recipe adapted from Rzepecki and Waite, 1993 [33]. Samples were

homogenized on ice in a 1mL Ground Glass Hand-Held Tissue Grinder, sonicated with a probe

sonicator (15 times, 2sec each) and centrifuged (17000 g, 8 min, 4°C) [30]. The supernatant

(soluble extract) and the pellet (insoluble matrix) were stored separately at -20°C. Where

relevant, the byssal thread was separated into thread and plaque prior to extraction.

3.3.2 Protein digestion

The base-insoluble matrix proteins were trypsin digested at the Advanced Protein Technology

Centre, Sick Kids Hospital, Toronto by suspending in 50 mM ammonium bicarbonate, reducing

with 10 mM DTT (30 min, 56°C), alkylating with 55 mM iodoacetamide (15 min, dark, room

temperature) and then digesting with 1µg trypsin (Porcine, Sequencing Grade, Promega)

overnight at 37°C. Extracted peptides were lyophilized by SpeedVac centrifugation and

reconstituted in 20 µL 0.1% formic acid in water for LC-MS/MS analysis.

3.3.3 Liquid chromatography – tandem mass spectrometry (LC-MS/MS)

LC-MS/MS analysis of extracted peptides was performed by the Advanced Protein Technology

Centre at MaRS Discovery District, Toronto. The digested peptides were loaded onto a 100 μm

ID pre-column (Dionex) at 4 μl/min and separated over a 50 μm ID analytical column (C18 2um,

Dionex). The peptides were eluted over 60 min. at 250 nl/min. using a 0 to 35% acetonitrile

gradient in 0.1% formic acid using an EASY n-LC 1000 nano-chromatography pump (Thermo

Fisher, Odense Denmark). The peptides were eluted into a Q-Exactive hybrid

quadrupole/orbitrap mass spectrometer (Thermo-Fisher, Bremen, Germany) operated in a data

dependant mode. Data was acquired at 70,000 FWHM resolution in the MS mode and 17,500

FWHM in the MS/MS mode. 10 MS/MS scans were obtained per MS cycle.

Page 71: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

59

3.3.4 Database matching and protein identification

MS data was analyzed using two mass spectrometry software, SEQUEST (Thermo Fisher

Scientific, San Jose, CA, USA; version 1.3.0.339) and PEAKS (Bioinformatics Solutions Inc.,

Waterloo, Ontario, Canada). While SEQUEST directly matches the mass spectrum data from the

protein extract against theoretical spectra from a library [74], the PEAKS program first

determines de novo peptide sequences based on the mass spectrum data and then matches these

sequences against a library [75]. In both programs, MS/MS data was searched against zebra

mussel and metazoa protein databases and against a zebra mussel Expressed Sequence Tag

(EST) library virtually translated in six different reading frames using Virtual Ribosome 1.1

(http://www.cbs.dtu.dk/services/VirtualRibosome/). The cDNA library comprising 750 genes

with Accession numbers AM229723 to AM230448 (downloaded November 2011 from the

Genbank Server) was prepared by Xu and Faisal, 2008 using a BD Clontech PCR-Select cDNA

Subtraction Kit [56] and represents genes expressed uniquely in the mussel foot. During creation

of this library, base pairs were removed from the 5’ end of the DNA to add adaptor sequences for

cDNA amplification. If the base pairs are removed from the 5’ translated region, the virtual

protein sequences are incomplete at the N-terminus [62].

When using SEQUEST, tandem mass spectra were first extracted, charge state deconvoluted and

deisotoped by BioWorks version 3.3. SEQUEST was searched with a fragment ion mass

tolerance of 20 PPM and a parent ion tolerance of 15 PPM and hits were manually confirmed by

inspecting the spectra. Iodoacetamide derivative of cysteine was specified as a fixed

modification and deamidation of asparagines and glutamine, hydroxylation of tyrosine (DOPA)

and oxidation of methionine were specified as variable modifications. MS/MS based peptide and

protein identifications were visualized and validated using a program called Scaffold 3.4.9

(Proteome Software Inc., Portland, OR). SEQUEST peptide identifications required at least

deltaCn scores of greater than 0.10 and XCorr scores of greater than 1.2, 1.5, 2.0 and 2.2 for

singly, doubly, triply and quadruply charged peptides respectively. The XCorr score for peptide

matches measures how closely the actual spectra fit the theoretical spectra [74]. Peptide

identifications were accepted at greater than 95.0% probability and protein identifications were

accepted at greater than 95.0% probability with at least 2 identified peptides.

Page 72: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

60

Using PEAKS, de novo sequences derived from MS/MS data were matched against the databases

with tyrosine hydroxylation to DOPA set as a variable modification. Parent ion and fragment ion

mass tolerances were set to 5 PPM and 0.01 Da respectively and hits were manually confirmed

by inspecting the spectra. In PEAKS, the identification probabilities of the protein and peptide

matches are indicated by the formula -10LogP. Protein identifications were accepted at a -

10LogP score greater than 50 with at least two identified peptides. In some relevant EST

matches, in both SEQUEST and PEAKS, where the protein identification criteria were not met

and where less than two peptides were identified, the protein identification was justified based on

the presence of repeat patterns and/or similarities to other known byssal proteins. The exceptions

are described in the results section.

3.3.5 Sequencing data analysis

The theoretical mass, pI and amino acid composition of virtual EST protein matches were

determined using EMBOSS Pepstats and amino acid distribution metrics were determined using

the EMBOSS Pepinfo tools on the European Bioinformatics Institute website

(http://www.ebi.ac.uk/Tools/emboss/pepinfo/). Signal peptides were searched using the SignalP

4.0 (http://www.cbs.dtu.dk/services/SignalP/) and PrediSi (http://www.predisi.de/) online tools

and multiple sequence alignments were performed with the Clustal-W2 online tool

(http://www.ebi.ac.uk/Tools/msa/clustalw2/). Protein homology searches were done using NCBI

Protein BLAST (Basic Local Alignment Search Tool)

(http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins). Conserved domains were predicted

using SMART (Simple Modular Architecture Research Tool) (http://smart.embl-heidelberg.de/).

Page 73: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

61

3.4 Results and Discussion

3.4.1 Identification of novel and known proteins in base-insoluble thread and

plaque matrices

Owing to the cross-linking mediated resistance to characterization of mature zebra mussel byssal

threads, we performed our analysis on fresh threads that have undergone minimal cross-linking.

Induced byssal threads were separated into thread and plaque and separate peptide fragment

fingerprinting analysis was performed on their insoluble matrices. This involved LC-MS/MS

analysis of the digested matrices followed by spectrum matching against a cDNA library of zebra

mussel foot proteins. Analysis of the MS/MS data with SEQUEST and PEAKS led to the

identification of a number of EST matches that sometimes overlapped and that were sometimes

unique to either program. EST matches were therefore primarily determined using SEQUEST

and PEAKS was then used to supplement the identifications. The identified EST matches were

found to represent both previously known zebra mussel byssal proteins as well as novel protein

sequences. Table 3-2 describes the accession numbers of the novel EST matches, the program

they were identified by as well as the probability of the protein and peptide-spectrum matches. In

addition to byssal proteins, contaminations from other cellular proteins in the foot tissue

including cytoplasmic actin, translation elongation factor 1α, alpha tubulin, histone 3 and

cyclophilin A were also identified with PEAKS -10LogP probabilities of 193, 120, 77, 70 and

49, respectively.

All of the EST matches (for both known and novel byssal proteins) obtained from the insoluble

matrices were identified in both the thread and the plaque extracts, though sometimes with

differing probabilities. The non-specific distribution of Dpfp1 [33], [73], Dpfp2 [73] and Dpfp5

[73] had also previously been determined by electrophoresis of soluble thread and plaque

extracts, although distribution analysis of Dpfp2 in mature threads and plaques by Rzepecki and

Waite, 1993 had, with some uncertainty, revealed unique localization of Dpfp2 in the thread

[33]. In the current distribution analysis of separated thread and plaque matrices, it appears that

all of the sequenced zebra mussel byssal proteins identified thus far have a non-specific

distribution between thread and plaque, however there is an additional possibility of some thread

Page 74: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

62

contamination in plaque samples since separation of thread from plaque is not always exact.

Additionally, any thread or plaque based linker proteins present near the thread-plaque anchor

zone might appear to be present in both the thread and plaque during analysis, thereby making it

difficult to determine their distributions.

Among the novel EST matches, similar EST sequences with the same spectrum match were

clustered together and described as the single sequence of a putative novel protein. A total of

seven novel proteins were thus identified in the insoluble extracts. These were named Dpfp6 –

Dpfp12 in decreasing order of their molecular weights. Three of these, Dpfp7, Dpfp9 and

Dpfp11 are represented by two or more similar sequences with unique peptide matches. These

protein sequences are described as variants α, β and γ of the novel protein. Table 3-2 describes

the deduced sequences of the seven novel byssal proteins Dpfp6 – Dpfp12 and their variants.

Post translational modifications (PTM) observed include asparagine deamidation (N) and

cysteine carbamidomethylation (C). While deamidation can be an artifact of basic pH conditions

and protein aging [76], [77], cysteine carbamidomethylation is a fixed modification that occurs

due to alkylation of cysteine during trypsin digestion [76]. No tyrosine hydroxylations to DOPA

were observed in any of the peptide matches. Signal peptides (underlined sequences in Table 3-

2) were observed for all proteins except Dpfp6 and Dpfp8. The missing or incomplete signal

sequences could be due to removal of base pairs from the 5’ end in order to add adaptor

sequences for cDNA amplification during creation of the cDNA library [56]. For our purposes,

the adaptor sequences were removed from the N-terminus of all EST matches prior to

sequencing analysis. In sequences with intact signal peptides, the base pairs may have been

removed from the 5’ untranslated region. Additionally, while stop codons were observed at the

C-terminus of almost all EST matches, stop codons were surprisingly absent in the Dpfp12

matches. The YLGRDHANRIPAA sequence at the Dpfp12 C-terminal end is however observed

throughout the cDNA library as the end sequence in several C-terminal non-coding sequences.

The EST sequence AM230111 is an example where the sequence is preceded by a stop codon.

Therefore, the YLGRDHANRIPAA sequence was assumed to be after a missing stop codon and

was removed from the end during sequence analysis.

The only three zebra mussel byssal proteins with known sequences, Dpfp1 [35], Dpfp2 [33], [73]

and Dpfp5 [73], were also identified in the insoluble byssal extracts. In SEQUEST, Dpfp1

Page 75: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

63

(AF265353), Dpfp2 (AM229739) and Dpfp5 (AM230139) were identified with 100%

identification probability with the identification of 8, 5 and 2 unique peptides, respectively.

Previously, the gel band derived EST match for Dpfp5 had revealed only one peptide-spectrum

match (YVGEGNNVGEQR) [73]. Here an additional match (NDVDGNENIVGGQSNAVGGK)

with an XCorr score of 2.89 was found. This confirms the previous identification of Dpfp5. In

1993, Rzepecki and Waite had additionally identified a precursor DOPA containing protein

called Dpfp3 by electrophoresis of the zebra mussel foot extract [33]. This protein ran as a 12-13

kDa mixture on the gel but its sequence has not yet been determined [33]. It is therefore possible

that some of the novel proteins identified in this analysis might actually correspond to the known

protein Dpfp3. However, electrophoresis often overestimates the molecular weight of the DOPA

containing proteins and we are unaware of all PTMs on the novel sequences. Hence, it is not

possible to determine which of these novel proteins, if any, might correspond to Dpfp3.

Page 76: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

64

Table 3-2. Sequences of novel byssal proteins identified in insoluble plaque and thread extracts by LC-MS/MS analysis and database

matching against a zebra mussel foot protein cDNA library. The proteins have been named Dpfp6 – Dpfp12 in decreasing order of

their molecular weights (MW). Sequences are deduced from clustered sequencing where more than 1 EST match was found. Signal

peptides are underlined and peptides that match with mass spectra are indicated in bold. The probability of their matches is also

indicated. Post translational modifications observed include asparagine deamidation (N) and cysteine carbamidomethylation (C).

Protein Name

(GenBank Accession

Number)

Protein Sequence (# of amino acids) a

Theoreti

cal

MW a,

pI a

Protein

Identification

Probability

(PEAKS b,

SEQUEST c)

Matching Peptide

Sequences

Peptide Scores

PEAKS

(-10logP)

SEQUEST

(X-Corr)

Dpfp6 b

(AM229723, 229736,

229737)

YDPVEDKKPGPYDYDGPYDKNPGPYDYDGPYDKKP

DPYGTDWQYDKKTGPYVPDKSEDKKPGPYDYDGP

YDKNPGPYDYNGPYDKKPGPYDYDGPYDKKPGPYD

YDGPYDIKPGPYDYDVPRPRPR (126)

14.6 kDa

pI 4.2

109 b YDYDGPYDK

NPGPYDYDGPYDK

KPDPYGTDWQYDKK

KPGPYDYDGPYDK

19

50

61

53

Dpfp7α bc

(AM230153)

MFSTVTLVLLVSCCGVALSSWIPYGKSYLPQQPAGK

GGYWNSYLPQYENYGPQQYQGSYWPGPWGGWRGN

NVGSQGNSVSGYGNAVGSQGNNVDGYGNDVGWQW

NSVDGKGNYVGSQWNSVN (103)

11.2 kDa

pI 6.5

80 b

89 % c

SYLPQQPAGK 69

2.2

Dpfp7β c

(AM230070)

MFSTVTLVLLVSCCGAAFSSWSPYWNSYLPGQGSGK

GGYWNSNVPKYGSYWPQQYPSYSGSYWPGWGNNV

GSQGNSVRGYGNAVGSQGNDVSGYGNDVGWQWNS

VDGKGNYVGSQWNSVN (101)

11.0 kDa

pI 8.7

100% c GGYWNSNVPK 2.2

Dpfp7γ c

(AM230146, 230189,

230411)

MFSTVTIVLLVSCCGAALSSWIPYGNSYSPEQGKGGY

WNSYLPKYESYRPQQYPSYPGSYWPGPWGGWQGDN

VGSQKNSVDGTGNYVGWQKNYVN (76)

8.7 kDa

pI 8.7

100% c GGYWNSYLPK

NSVDGTGNYVGWQK

2.5

4.5

Dpfp8 c

(AM230242, 230362)

VQDHMSVRLDNVLKVLGGVATGNKYSSDEIATLV

GSTGGGSVNTGGYSKGTYPVPYGTGGVSGYKSGG

R (69)

6.9 kDa

pI 9.5

100% c LDNVLKVLGGVATGNK

YSSDEIATLVGSTGGGSV

NTGGYSK

GTYPVPYGTGGVSGYK

2.1

6.3

3.5

Page 77: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

65

Dpfp9α c

(AM229975)

MNIKQLMCLLVAAVALLAIAPVANAQYYDYGYGGN

NYGYPGNYGYGGNYGGYPGKYGDYDNYGGGWLY

KILGGGGKGKGKWGGYGGYGK (64)

6.8 kDa

pI 9.3

89% c YGDYDNYGGGWLYK 4.3

Dpfp9β c

(AM229830)

MNTKQLMCVLYAAVVLLAVANAQYCDYGYGGNNY

GYPGNYGYGGNYGGYPRNYGDYDNYGGGWLYKIL

GGGGIGKGKWGGYGGYGK (64)

6.8 kDa

pI 8.8

82% c NYGDYDNYGGGWLYK 4.2

Dpfp10 c

(AM230045, 230046,

230047, 230048,

230050, 230051,

230052, 230053,

230054, 230055,

230056)

MLSAVSFLLLVTLYVTVSSQTYKGYPPPKPYPKDPCY

KVYCPPIYCPKGQYTPPGECCPRCKKGYGYQDPDP

YFPGGK (59)

6.7 kDa

pI 8.8

100% c VYCPPIYCPK

GQYTPPGECCPR

2.7

3.0

Dpfp11α c

(AM230400)

MLSAVTLLLLVSCCGMALSQWGGDSCRPIYPPLDCRL

VFCQPAINCRYGNYTPKGHCCSVCIEDCWGWPWPW

GK (55)

6.4 kDa

pI 7.6

89% c LVFCQPAINCR 3.2

Dpfp11β b

(AM230182)

MLSAVTLLLLVSCCGMALGQWGGDRCSPRYPPLDCT

VVLCAFPINCRYGSFTPKGRCCPVCIEDCWGWPWPG

K (54)

6.1 kDa

pI 8.0

58b

YGSFTPK 49

Dpfp12 c

(AM230355, 230369,

230302)

MFSAATLLLLVSFYGTASGQYWNSYRPYPVYPPKQT

YPSYPDKKYPSYPEKT (33)

4.1 kDa

pI 9.6

100% c QTYPSYPDK

YPSYPEK

2.3

2.3

a Sequence properties were calculated after removing the predicted signal peptide sequence

b EST sequence identified with PEAKS program

c EST sequence identified with Scaffold program using the SEQUEST search engine

Page 78: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

66

3.4.2 Sequence properties of novel byssal proteins identified in the insoluble

extracts

The seven novel byssal proteins identified in the insoluble extracts display varied sequence

characteristics. After removing signal peptides, the sequence lengths range from 33 to 114

residues, molecular weights range from 4 kDa to 15 kDa and the isoelectric points range from

acidic (pI 4.2) to basic (pI 9.6) (Table 3-2). Table 3-3 shows the theoretical compositions (in

mol %) of the most prominent amino acids present in the sequence of the novel proteins as well

as previously known zebra mussel byssal proteins. These byssal proteins are collectively rich in

Pro (P), Gly (G), Tyr (Y) and Asx (D/N) and also have significant compositions of Ser (S), Lys

(K) and Val (V). While most protein sequences contain absolutely no cysteine, Dpfp9β, Dpfp10,

Dpfp11 (α and β) and Dpfp5 contain 1, 6, 8 and 6 cysteine residues, respectively. Based on these

theoretical amino acid compositions of the protein sequences, the byssal proteins can be

categorized as ‘Glycine rich’, ‘Proline and Tyrosine rich’ (P,Y rich) and ‘Proline and Cysteine

rich’ (P, C rich) as shown in Table 3-3. The P, C rich category not only includes proteins with

high P and C content but also includes proteins where cysteine is not one of the most prominent

amino acids but is still present in significant compositions relative to other byssal proteins.

Page 79: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

67

Table 3-3. Theoretical mole % compositions of prominent amino acids found in the sequences of

previously known zebra mussel byssal proteins and novel byssal proteins indentified in the

insoluble byssal extracts.

Proteins Variants Prominent amino acids Protein

Category

NOVEL 1st 2

nd 3

rd Notable

Dpfp6 - P (22%) D (21%) Y (20%) P, Y rich

Dpfp7 Dpfp7α G (21%) N (13%) Y, S (11%) Glycine rich

Dpfp7β G (21%) S (16%) N (13%)

Dpfp7γ G (17%) Y (15%) S (12%)

Dpfp8 - G (17%) V (13%) S (12%) Glycine rich

Dpfp9

Dpfp9α G (41%) Y (25%) K (9%) Glycine rich

Dpfp9β G (39%) Y (23%) N (9%)

Dpfp10 - P (25%) Y (17%) K (14%) C (10%, 5th) P, C rich

Dpfp11 Dpfp11α C (15%) P (13%) G (11%) P, C rich

Dpfp11β C (15%) P (15%) G (11%)

Dpfp12 - P (24%) Y (24%) K (12%) P, Y rich

KNOWN

Dpfp1[35] - P (24%) Y (24%) D, K (11%) P, Y rich

Dpfp2 [33],

[73]

- Y (23%) K (18%) P (16%) P, Y rich

Dpfp5 [73] - P (18%) G (12%) Q (12%) Y (10%, 4th)

C (3%, 7th)

P, Y & P, C

rich

3.4.3 Proline and tyrosine (P, Y) rich proteins

Dpfp6, a protein resembling the C-terminus of Dpfp1

The EST matches obtained for Dpfp6 give high probability protein matches in PEAKS and

contain four peptide spectrum matches thus indicating a high probability identification of the

protein (Table 3-2). The sequence however displays no signal peptide and might therefore be

incomplete at the N-terminus owing to limitations in the creation of the cDNA library [62] (see

Methods). The clustered sequence of Dpfp6 is 126 residues long and has a theoretical MW of

14.6 kDa and an acidic theoretical pI of 4.2 (Table 3-2). The sequence is richest in proline,

aspartic acid and tyrosine (Table 3-3) and also contains considerable lysine (13%) and glycine

Page 80: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

68

(12%). Interestingly, the sequence contains several repeats of a consensus sequence

KPGPYDYDGPYDK as shown in Figure 3-1A. The consensus sequence consists of two PY

diads (indicated in bold in Figure 3-1A) and the position of the first diad is conserved through

all the consensus repeats.

A BLAST search of the Dpfp6 sequence did not reveal any significant protein matches.

Interestingly though, when aligned against the sequence of Dpfp1, Dpfp6 shows high levels of

similarity with the C-terminus of the Dpfp1 protein (residues 230 – 430) (Figure 3-1B). Both

proteins contain repeats of the KPGPYDYDGPYDK consensus sequence shown as underlined

sequences in Figure 3-1B. Dpfp6 (pI 4.2) also resembles the C-terminus of Dpfp1 (pI 4.3) in

terms of its highly similar acidic isoelectric point. These similarities in pI and repeat patterns

indicate that Dpfp6 must have a function very similar to that of the Dpfp1 C-terminus. In Dpfp1,

the C-terminus is specifically the DOPA containing region of the protein where the Y in

KPGPYDYDGPYDK indicates a DOPA modification [35]. Thus, if the similarities between

Dpfp1 and Dpfp6 are extended to DOPA compositions as well, this could indicate a specifically

DOPA-dependent role for Dpfp6. The two sequences however differ in the number of consensus

sequence repeats (Figure 3-1B). Dpfp1 contains 15 full and 1 incomplete repeat of the consensus

sequence as described by Anderson and Waite, 2000 [35]. Dpfp6 contains only 8 complete

repeats and two incomplete repeats of the consensus sequence. There is a possibility that Dpfp6

is just a truncated version of a Dpfp1 variant whose mRNA was not completely reverse

transcribed during creation of the cDNA library, however this is unlikely since only one of the

four peptide-spectrum matches in Dpfp6 (KPGPYDYDGPYDK) overlaps with the sequence of

Dpfp1 (Table 3-2). It is interesting that Dpfp6 mimics only the acidic part of the Dpfp1 protein,

possibly suggesting a specialized role for the protein’s acidity within the byssus.

Page 81: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

69

A.

YDPVEDK

KPGPYDYDGPYDK

NPGPYDYDGPYDK

KPDPYGTDWQYDK

KTGPYVPDKSEDK

KPGPYDYDGPYDK

NPGPYDYNGPYDK

KPGPYDYDGPYDK

KPGPYDYDGPYDI

KPGPYDYDVPRPRPR

B.

Dpfp6 --------------------YDPVEDK KPGPYDYDGPYDK NPGPYDYDGPYDK 22

Dpfp1(C-term) YPGYQPEYHRRPPVYPPVYPYDPVEDK KPGPYDYDGPYDK NPGPYDYDGPYNK 255

******* ************* *********** *

Dpfp6 ----------------------------KPDPYGTDWQYDK KTGPYVPDKSEDK 48

Dpfp1(C-term) KPNPYGTDWQYDK KTGPYVPIKPDDK KPNPYGTDWQYDK KTGPYVPDKSEDK 307

** ********** *************

Dpfp6 KPGPYDYDGPYDK---------------NPGPYDYNGPYDK KPGPYDYDGPYDK 87

Dpfp1(C-term) KPGPYDYDGPYDK NPGPYDSDGPYNK KPGPYDYDGPYDK NPGPYDYNGPYDK 359

************* ****** ***** ****** *****

Dpfp6 KPGPYDYDGPYDI KPGPYDYDVP----RPRPR---------------------- 115

Dpfp1(C-term) KPGPYDYDGPYDI KPGPYDYDVPYDK KPDPYDTDGPYDK KTGPYVPDKPDDK 411

************* **********

Dpfp6 --------------------

Dpfp1(C-term) KTDPYVPDVPLEP PGPLGK 430

Figure 3-1. Sequence analysis of the EST derived sequence of Dpfp6 (AM229723) (A) Repeat

pattern of the consensus sequence KPGPYDYDGPYDK. PY diads are indicated in bold letters.

(B) Sequence alignment of the Dpfp6 sequence with the C-terminus (residues 230 – 430) of

previously described byssal protein Dpfp1 (AF265353). Underlined sequences represent

alternate repeats of the KPGPYDYDGPYDK consensus sequence. Italicized sequences indicate

incomplete repeats. * indicates residues that are conserved between the sequences.

Page 82: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

70

Dpfp12, a protein resembling fragments of Dpfp1, Dpfp2 and Dpfp5

The EST matches obtained for Dpfp12 revealed a 100% protein identification probability in

SEQUEST and displayed two unique peptide-spectrum matches (Table 3-2) but while the

Dpfp12 sequences display N-terminal signal peptides, no stop codon is observed at the C-

terminus (Figure 3-2). As described earlier when introducing the EST match results, the

YLGRDHANRIPAA sequence at the C-terminal end is preceded by a stop codon in several

sequences throughout the cDNA library and was therefore assumed to be after a missing stop

codon and was removed during sequence analysis. The clustered EST derived sequence of

Dpfp12 is then 33 amino acids long and has a theoretical MW of 4.1 kDa and a very basic

theoretical pI of 9.6 (Table 3-2). Unlike all of the other identified zebra mussel byssal proteins,

the Dpfp12 sequence has no glycine. This sequence is richest in tyrosine, proline and lysine

(Table 3-3) and also contains much serine (9%) and threonine (6%). It contains three repeats of

the consensus YPSYPXK where non-italicized residues are highly conserved and X = P, D, E

(Figure 3-2). The consensus thus contains two diads of YP that are highly conserved between

the repeats and that are indicated in bold in Figure 3-2. Dpfp12 also contains a fourth repeat

(residues 1 – 7) that partially matches the consensus sequence and consists of two tyrosine

residues separated by three residues (Figure 3-2). When aligned against known zebra mussel

byssal proteins, Dpfp1, Dpfp2 and Dpfp5, the Dpfp12 sequence shows some similarity with all

of the proteins. Comparing to the N-terminus of Dpfp1, the PYPVYP sequence in Dpfp12 is

similar to KYPVYP, TYPSYPD is similar to QYPEYPS and KYPSYP is similar to QYPVYP in

Dpfp1 [35]. Similarities with Dpfp2 include YPDKK and YPEKTY which resemble YPDKKTY

in Dpfp2 [73], [33]. Dpfp2 as well is poor in glycine. It has only 2% glycine and is YP rich.

Comparing to Dpfp5, the PYPVYPPKQTYPSYPDK sequence in Dpfp12 corresponds well to

the SYPTYPPKQSYPAYPPK sequence in the N-terminus of Dpfp5 [73]. The N-terminus of

Dpfp5 is also similar in that it contains no glycine and is YP rich.

A BLAST search of the complete Dpfp12 sequence revealed no significant matches. However, a

BLAST search only of the last 26 residues, which comprises the three consensus repeats

(PYPVYPPKQTYPSYPDKKYPSYPEKT), reveals matches with Dpfp1 (score 36.3 bits), two

variants of the Mytilus californianus foot protein mfp-1 (max score 35.4) and with Mytilus edulis

mfp-1 (score 33.3). The mfp-1 protein in M. edulis (108 kDa) and M. californianus (90 kDa)

Page 83: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

71

[18] is a key byssal protein present strictly in the cuticle that is rich in proline (25%), lysine

(21%), tyrosine (19%), threonine (13%) and serine (9%) [78]. It is highly basic with a pI of 10.0,

contains 13% DOPA and unlike the other marine mussel proteins, is composed of only 0.4%

glycine [78]. Thus, while Dpfp12 (4.1 kDa) differs from mfp-1 in terms of its much lower

molecular weight, it greatly resembles mfp-1 in terms of homology matches, overall prominent

amino acid content, low glycine content and basicity. The similarities could thus possibly

suggest an mfp-1 like cuticle based role for Dpfp12. Using the partial BLAST search, matches

are also seen with other protective structural proteins including a putative cuticle protein in the

Brine shrimp Artemia franciscana (score 38.8 bits) and choriogenin H minor protein in the

acellular vitelline envelope surrounding the oocyte in the Mangrove Killifish (Kryptolebias

marmoratus) (score 34.6). These matches again indicate a structural, protective function for

Dpfp12 in an aquatic environment such as in the cuticle of the zebra mussel byssus.

MFSAATLLLLVSFYGTASG

QYWNSYR (7)

PYPVYPPKQ (16)

TYPSYPDK (24)

KYPSYPEKT (33)

YLGRDHANRIPAA

Figure 3-2. Illustration of the pattern of sequence repeats in the clustered EST derived sequence

of Dpfp12. YP diads are indicated in bold. Numbers in brackets represent the position of the last

amino acid in the row. The italicized sequence represents the sequence believed to be preceded

by a stop codon and thus ignored during sequence analysis.

3.4.4 Glycine rich proteins

Dpfp7, a Dpfp-5 like protein

Three variants of the Dpfp7 sequence were identified by EST matching. These variants, named

Dpfp7α, β and γ are similar in sequence layout and repeat patterns but display unique peptide –

spectrum matches which indicates that the different variants must exist at the protein level

Page 84: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

72

(Table 3-2). Dpfp7γ has a 100% identification probability and two peptide spectrum matches

with high SEQUEST XCorr scores (Table 3-2). Dpfp7α and Dpfp7β each contain only a single

good spectrum match and therefore do not meet the identification criteria. However, owing to the

high degree of similarity between the three variants (Figure 3-3A), we consider Dpfp7α and

Dpfp7β to be positive byssal protein matches as well. The Dpfp7α and Dpfp7β sequences

represent a single EST match whereas the Dpfp7γ sequence represents a clustered sequence

(Table 3-2). After removal of the N-terminal signal peptides, Dpfp7α and Dpfp7β have similar

theoretical molecular weights of 11.2 and 11.0 kDa, respectively. However, while Dpfp7α (pI

6.5) is acidic, Dpfp7β (pI 8.7) is basic. Dpfp7γ is smaller with a MW of 8.7 kDa but is also basic

(pI 8.7) like Dpfp7β (Table 3-2). The three variants are rich in glycine (G), asparagine (N),

serine (S), tyrosine (Y) and also contain considerable glutamine (Q) and tryptophan (W) (Table

3-3). Dpfp7α and β have a greater composition of glycine, asparagine and valine than Dpfpλ and

Dpfpβ has more serine than the other two. All three variants have 11 tyrosine residues in the

sequence and the position of Y is generally conserved (Figure 3-3A).

The Dpfp7 variants display specific repeat patterns that are distinct between the N-terminus and

C-terminus. Figure 3-3A illustrates this pattern of repeats. The N-terminus in all variants

contains four repeats of the near consensus sequence SY(L/W)PQQ shown in yellow highlights

where non-italicized residues are highly conserved. The C-terminus is rich in repeats of

GNNVG(G/S) shown in blue highlights, where non-italicized residues are highly conserved.

While Dpfp7α and β have six full and one incomplete C-terminal repeat, Dpfp7γ has only two

full and one such incomplete C-terminal repeat (Figure 3-3A). In addition to differing in the

kinds of consensus repeats, the N- and C-termini of Dpfp7 also vary in their isoelectric points.

The N-terminus of Dpfp7 is basic (Dpfp7α, β and γ have N-terminal pI 9.3, 9.3 and 8.8

respectively) and C-terminus is acidic (Dpfp7α, β and γ have C-terminal pI 3.8, 4.3 and 6.4

respectively). They are thus comparable to previously known byssal proteins Dpfp1 [35] and

Dpfp5 [73] which are also known to possess termini with distinct repeats and pI’s.

A BLAST search of the Dpfp7 variants did not reveal any significant protein matches. However,

when aligned against the EST derived sequence of Dpfp5 described previously [73], Dpfp7

shows several regions of similarity as shown in grey highlights in Figure 3-3B for Dpfp7α.

Major similarities include a couple of aligned, near consensus (S/A)Y(L/P)PQQ repeats at the N-

Page 85: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

73

terminus and several aligned, near consensus GNNVGG repeats at the C-terminus of both

proteins. The Dpfp7 N-terminus does not have the continuous chains of PQQ like repeats seen in

Dpfp5 and instead has SY(L/W)PQQ and other intermediate sequences. Regions of Dpfp7α

similarity with the middle region of Dpfp5 are limited (Figure 3-3B) and unlike the middle

region of Dpfp5, Dpfp7 has no cysteine residues. In both proteins however a series of tryptophan

(W) residues contribute to the transition to the C-terminus. Dpfp7 is therefore like a compact

version of Dpfp5, containing a bit of its N-terminus, much of its C-terminus and missing the

middle cysteine containing region. Dpfp7 must thus have similar byssal roles as Dpfp5 but

lacking the cysteine related functions.

The presence of protein variants is commonly seen in marine mussel byssal proteins. Some

marine mussel proteins such as the plaque protein mfp-3 have even up to 35 different sequence

variants [18]. The sequence variants, including those of Dpfp7, could also represent different

mature RNA (mRNA) variants created by RNA editing or alternate splicing of one primary

RNA transcript [67]. Additionally, since multiple mussels are used for analysis, there could be

multiple alleles of the same gene in the cDNA library and the Dpfp7 sequence variants could

arise from these allelic variations between mussels [29]. Zhao et al., 2006 identified several

variants of the M. californianus interfacial protein mcfp-3 having varied isoelectric points and Y

and R modifications [79]. It was speculated that these variants increase the variety of interactions

the protein can undergo and provide flexibility to match with varied surface features [78].

Similarly, the differences between the Dpfp7 variants, such as varied repeat patterns and

differing size and isoelectric points, could also have evolved to promote such variety and

flexibility in zebra mussel adhesion.

Page 86: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

74

A.

Dpfp7α SWIPYGKSYLPQQPAGKGGYWNSYLPQYENYGPQQ---YQGSYWPGPWGGWR 49

Dpfp7β SWSPYWNSYLPGQGSGKGGYWNSNVPKYGSYWPQQYPSYSGSYWPGW----- 47

Dpfp7γ SWIPYGNSYSPEQ--GKGGYWNSYLPKYESYRPQQYPSYPGSYWPGPWGGWQ 50

** ** ** * * ******** * * * *** * ******

Dpfp7α GNNVGSQGNSVSGYGNAVGSQGNNVDGYGNDVGWQWNSVDGKGNYVGSQWNSVN 103

Dpfp7β GNNVGSQGNSVRGYGNAVGSQGNDVSGYGNDVGSQWNSVDGKGNYVGSQWNSVN 101

Dpfp7γ ----------------------------GDNVGSQKNSVDGTGNYVGWQKNYVN 76

* ** * ***** ***** * * **

B.

Dpfp7α --SWIP-------------------YGK----SYLPQQ--PAG--KGGY- 20

Dpfp5 YNSWPPKPNQPQQPQQPQQPPQPPRYPQPSYPAYPPQQSYPAYPPKQSYP 50

** * * * *** ** * *

Dpfp7α -------WNSYLPQY--------------------ENYG---------PQ 34

Dpfp5 TYPPKQSYPAYPPKQSYPTNPPYNPCDAVYCRPIYCNYGQYTPQGECCPQ 100

* * *** **

Dpfp7α QYQGSYWPGPWGGWR--------------GNNVGSQGNSVSGYGNAVGSQ 70

Dpfp5 CNPGTYLPEKWS-WKGNNVVGDQEKYVGEGNNVGEQRNDVDGNENIVGGQ 149

* * * * * ***** * * * * * ** *

Dpfp7α GNNVDGYGNDVGWQWNSVDGKGNYVGSQWNSVN- 103

Dpfp5 SNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 183

* * * ***** * * * * ** ** * * *

Figure 3-3. Sequence alignment of the EST derived sequences of Dpfp7 (A) Alignment of the

three variants of Dpfp7 (Dpfp7α, β and γ) amongst each other. The yellow and blue highlights

represent repeats of two different consensus sequences. * represents residues conserved in all

three variants. (B) Alignment of the Dpfp7α sequence with the EST derived sequence of Dpfp5

(AM230139) described previously [73]. Grey highlights represent regions of similarity between

the proteins. In all Dpfp7 sequences, signal peptides have been removed. Numbers at the end of

the sequence rows indicate the position of the last amino acid in the row. * represents residues

conserved between Dpfp5 and Dpfp7α.

Page 87: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

75

Dpfp8

The EST matches for Dpfp8 have a 100% identification probability with three high scoring

peptide-spectrum matches (Table 3-2). No distinguishable signal peptide was identified in spite

of the presence of a methionine residue at the N-terminus. This methionine might therefore not

represent a start codon and the absence of a signal peptide could be due to an incomplete N-

terminus owing to limitations in the creation of the cDNA library [62] (see Methods). The

clustered sequence of Dpfp8 is 69 amino acids long with a theoretical MW of 6.9 kDa and pI of

9.5. The sequence is richest in glycine and is also rich in serine and valine (Table 3-3). Unlike

the other byssal proteins which are generally rich in proline, this putative sequence contains only

two proline residues. The Dpfp8 sequence in Table 3-2 indicates that its four negative residues

are restricted within the first 30 residues at the N-terminus and the seven positive residues are

spread throughout. No discernible repeat pattern is seen within the sequence though a number of

GGX repeats are seen where X = V, G, Y, S, R (Table 3-2). A BLAST search of the sequence

did not reveal any significant protein matches.

Dpfp9, a YGY/YGGY rich byssal protein

Two variants of the Dpfp9 sequence were identified by EST matching. Dpfp9α (AM229975) and

Dpfp9β (AM229830) have similar sequences that differ by four residues and have unique

spectrum matches (Table 3-2). The two sequences however have less than 95% identification

probabilities and each display only a single spectrum match, albeit with high SEQUEST XCorr

scores greater than 4.00 (Table 3-2). Even though the sequences do not meet ideal identification

criteria, they are still justified as positive matches because they possess a specific repeat pattern

of consensus sequences and show amino acid contents that are characteristic of byssal proteins.

After removal of the N-terminal signal peptide, the Dpfp9α and Dpfp9β variants are 64 amino

acids long each with a theoretical MW of 6.8 kDa and with basic pI’s of 9.3 and 8.8,

respectively. Dpfp9 overwhelmingly has the highest mol% (around 40%) of glycine among all

zebra mussel byssal proteins. The sequence is also very rich in tyrosine and contains asparagine

(N) and lysine (K) as the next most prominent residues (Table 3-3). Like Dpfp8 and unlike other

identified byssal proteins, Dpfp9 also has only 3% proline in its sequence.

Page 88: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

76

Dpfp9 contains several repeats of the near consensus sequence NYG(G/-)Y(G/P)G where non-

italicized amino acids indicate highly conserved residues. The repeats are shown in green

highlights in Figure 3-4. The N and C-terminus of Dpfp9 are however distinct in their pattern of

repeats. The N-terminus consists of five of these YGY/YGGY containing consensus repeats

making up the first 35 residues of the sequence. The C-terminus (residues 36 – 64) contains only

one of these consensus repeats preceded by long chains of glycine often interspersed with lysine

residues (Figure 3-4). Both termini have 13 glycines each. The C-terminal region with long

glycine chains (residues 36 – 56) represents a sequence with high hydrophobicity compared to

the rest of the protein. Of the eight charged residues in the sequence, all three negative residues

(D) are at the N-terminus and all five positive residues (K) are at the C-terminus (Figure 3-4).

The two termini also have very distinct pI’s. In Dpfp9α, the N-terminus is acidic with pI 3.8 and

the C-terminus is highly basic with pI 10.4. Thus, Dpfp9 resembles Dpfp1, Dpfp5 and Dpfp7 in

terms of having a block structure but differs from these proteins in that they have the reverse

structure, basic N-termini and acidic C-termini [73].

A BLAST search of the full Dpfp9 sequence does not reveal any significant matches however

the individual consensus sequence and partial N and C termini sequences do show sequence

homologies in BLAST. The NYGYPG consensus itself matches to a putative cuticle protein

from the Tobacco Hornworm, Manduca sexta (score 23.5 bits). A search of the partial N-

terminal sequence of Dpfp9α containing residues 11 to 38 (Figure 3-4) also reveals structural

matches such as to a secretory eggshell protein precursor from the liver fluke Clonorchis sinensis

(score 46.0 bits) and to other putative outer membrane proteins. Interestingly, the partial N-

terminal sequence revealed strong matches to Shematrin proteins (max score 48.1) in the mantle

shell of the pearl oyster, Pinctada fucata, a marine bivalve mollusc [80]. Shematrins are a family

of glycine-rich structural proteins that comprise repeats with two or more glycines followed by a

hydrophobic amino acid [80]. Such repeats including the GY and GGY repeats seen in Dpfp9 are

also found in spider silk and flagelliform silk proteins that impart physical strength to spider

webs [45] and are also a characteristic of structural cell wall glycine-rich proteins (GRP) in some

plants species [81]. The BLAST search of the C-terminal glycine chain of Dpfp9α

(GGGGKGKGKWGGYGGYGK) also revealed similar homologies with a 29.9 bits match with

the flagelliform silk protein from the tick, Hyalomma marginatum rufipes and a 28.6 bits match

with a secretory eggshell protein precursor from Clonorchis sinensis. Thus, it appears that both

Page 89: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

77

the N- and C-terminal sequences of Dpfp9 bear much similarity with structural proteins, thereby

indicating a structural role for Dpfp9 within the byssus, both in the thread and plaque.

The glycine and tyrosine repeats of Dpfp9 are also a characteristic of some adhesive/cohesive

proteins secreted by the sandcastle worm Phragmatopoma californica to stick sand grains

together underwater [42]. Pc-1 (pI 9.7) and pc-2 (pI 9.9) are basic, DOPA containing precursor

cement proteins containing glycine rich consensus repeats of VGGYGYGGKK and

HPAVHKALGGYG respectively. Interestingly, very similar to the amino acid compositions

seen in Dpfp9 (Table 3-3), Pc-1 contains 45% glycine, 19% tyrosine and 14% lysine [42]. GYG

and YGY triads are also a prominent feature of the 57 kDa thread matrix protein tmp-1 (pI 9.5)

identified in the marine mussel M. galloprovincialis [25]. The TMP’s are also a glycine, tyrosine

and asparagine rich protein family that separate the collagenous microfibrils within the threads of

marine mussels and thus also play a structural role within the byssus [25]. The sequence

similarities with the marine mussel TMPs thus further assert that Dpfp9 must play a structural

role in the byssus.

MNTKQLMCLLVAAVVLLAIAPVANA

QY(Y/C) (3)

DYGYGGN (10)

NYGYPG (16)

NYGYGG (22)

NYGGYP(G/R) (29)

(N/K)YGDYD (35)

NYGGGWLYKIL (46)

GGGG(K/I)GKGKWG (57)

GYGGYGK (64)

Figure 3-4. Illustration of repeat patterns in the EST derived sequence of Dpfp9 obtained by

clustering the Dpfp9α and Dpfp9β sequences. Residues of the form (X/Y) represent the four

positions that differ between Dpfp9α and Dpfp9β, respectively. Near consensus sequences of

YG(G/-)Y(G/P)G are highlighted in green. Numbers represent the position of the last amino acid

in the row. The signal peptide is underlined.

N – terminal region

C – terminal region

Page 90: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

78

3.4.5 Proline and Cysteine (P, C) rich proteins

Dpfp10 and Dpfp11, proteins resembling the cysteine containing region of Dpfp5

The clustered sequence of Dpfp10 is derived from several EST matches, all of which have 100%

identification probability and two peptide-spectrum matches with high SEQUEST XCorr scores

(Table 3-2). Following the removal of the signal peptide, the clustered Dpfp10 sequence is 59

amino acids long with a theoretical MW of 6.7 kDa and basic pI of 8.8. The sequence is richest

in proline (25%) and tyrosine (17%) and also contains much lysine (14%), glycine (12%) and

cysteine (10%).No significant repeat pattern is discernible in the sequence but the six cysteines

are generally associated with proline, lysine and tyrosine residues as seen with the CYK, CPP,

CPK, CCP and CKK triads (Table 3-2). A BLAST search of the protein shows several matches

to extracellular matrix proteins (max score 34.7 bits), kielin/chordin-like proteins (max score

34.2) and other Bone Morphogenetic Protein (BMP) binding proteins (max score 33.5) where the

major region of similarity is in the CPP and YTPPGECCPRC regions of the Dpfp10 sequence.

The significance of these matches is not readily apparent, however.

The Dpfp11 EST matches were identified as two variants with similar sequences that differ in 13

residues and have unique spectrum matches (Table 3-2). Dpfp11α (AM230400) and Dpfp11β

(AM230182) were identified with different programs, SEQUEST and PEAKS, respectively.

Both sequences have a low protein identification probability and each display only a single

spectrum match, albeit with high peptide probability scores (Table 3-2). However, even though

the sequences do not meet ideal identification criteria, they are still considered positive matches

here because they show a high degree of similarity to other byssal adhesive proteins, Dpfp5 and

Dpfp10 (Figure 3-5). After removal of the N-terminal signal peptide, the Dpfp11α and Dpfp11β

variants are 55 and 54 amino acids long, respectively. They have theoretical MW of 6.4 kDa and

6.1 kDa and theoretical pI of 7.6 and 8.0 respectively. The Dpfp11 sequences are richest in

cysteine (15%), proline (13 – 15%) and glycine (11%) (Table 3-3) and also contain considerable

tryptophan (7 – 9%) specifically at the C-terminal end. No significant repeat pattern is

discernible in the sequence. The eight cysteines are sometimes associated with arginine (R)

(CRP, CRL and CRY triads) but mostly with variable residues (Table 3-2). A BLAST search of

Dpfp11α reveals no significant protein matches however a search of Dpfp11β shows matches to

Page 91: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

79

extracellular matrix proteins from several species (max score 34.7 bits) especially in the

TPKGRCCPVC region of the sequence. Again, these matches do not give any clear indication of

the structural/functional properties of the Dpfp11 variants.

Dpfp10 and Dpfp11 are the only two novel byssal proteins identified in the insoluble extract that

contain significant number of cysteine residues in their EST derived sequences. Dpfp9β contains

only 1 cysteine in its sequence. The only other zebra mussel byssal protein known to contain

cysteine is Dpfp5 [73]. When the sequences of Dpfp10 and Dpfp11α are aligned against that of

Dpfp5, similarities are seen with sequences in and around the cysteine containing region of

Dpfp5. The similar sequences are shown in blue highlights in Figure 3-5. Dpfp10 shows two

major sequence matches to Dpfp5, one in its N-terminal YP rich region (residues 1 – 68) and the

other in its middle cysteine containing region (residues 69 – 114) [73]. Dpfp11 displays just one

of these matches, in the cysteine containing region (Figure 3-5). Thus, while Dpfp7 mimics the

two termini of Dpfp5, Dpfp10 and Dpfp11 mimic the Dpfp5 middle region.

While Dpfp10 has 10 tyrosines and 15 prolines in its sequence, Dpfp11α/β has only 2/3 tyrosines

and 7/8 prolines. In Dpfp11, the N and C termini have similar hydrophobicities but in Dpfp10,

the N-terminus is more hydrophobic than the C-terminus. Additionally, Dpfp10 has more

similarities to Dpfp5 than does Dpfp11. The sequence differences between Dpfp10 and Dpfp11

could thus be reflective of different cysteine related roles. In marine mussels, two foot proteins

mfp-2 (15 mole% Cys) and mfp-6 (11% Cys) have the most prominent cysteine compositions

with cysteine being almost absent in the others byssal proteins [78]. Mfp-2 is an abundant plaque

matrix protein that consists of epidermal growth factor (EGF) domains stabilized by disulfide

bonds, a common characteristic of ECM proteins [18]. Mfp-6 is present in the plaque footprint

and plays a role both as a plaque antioxidant that restores DOPA adhesion by reducing

dopaquinone (oxidized DOPA) and as a cross-linker than can improve plaque cohesion by

forming S-cysteinyldopa adducts [16]. Thus, Dpfp10 and the two Dpfp11 variants could

similarly play roles as cohesive and/or adhesion promoting proteins in the byssus. Further

information on the oxidation states and cross-linking properties of their cysteine residues will

however be required to better understand their specific functions [16].

Page 92: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

80

Dpfp5 YNSWPPKPNQPQQPQQPQQPPQPPRYPQPSYPAYPPQQSYPAYPPKQSYPTYPPKQSYPAYPPKQSYP 68

Dpfp10 -------------------------------------------------------QTYKGYPPPKPYP 13

Dpfp11α --------------------------------------------------------QWGGDSCRPIYP 12

**

Dpfp5 TNPPYNPCDAVYCRP-IYCNYGQYTPQGECCPQCNPGTYLPEKWSWKGNNVVGDQEKYVGEGNNVGE 134

Dpfp10 K----DPCYKVYCPP-IYCPKGQYTPPGECCPRCKKG------YGYQ------DPDPYFP------- 56

Dpfp11α P----LDCRLVFCQPAINCRYGNYTPKGHCCSVCIED-----CWGWP------WP------------ 52

* * * * * * * *** * ** *

Dpfp5 QRNDVDGNENIVGGQSNAVGGKGNDVGEQKNAVGGSGNTVGWQGNNVGG 183

Dpfp10 -------------------GGK--------------------------- 59

Dpfp11α -------------------WGK--------------------------- 55

**

Figure 3-5. Sequence alignment of cysteine containing byssal proteins; Dpfp10, Dpfp11α and

previously described Dpfp5 (AM230139) [73]. * indicates residues that are conserved between

the sequences. The highlights depict regions of similarity between the proteins. Numbers at the

end of the sequence rows indicate the position of the last amino acid in the row.

3.4.6 Analysis of the set of zebra mussel byssal proteins identified in the

insoluble matrix

The sequence analysis of the insoluble byssal matrix revealed seven novel and three previously

known byssal proteins that were each identified in both the thread and plaque matrices. However,

this does not at all mean that we have identified all of the proteins present in the zebra mussel

byssus. There may no doubt have been other relevant EST matches that did not meet

identification criteria and were therefore not identified as byssal proteins. Additionally,

limitations in the cDNA library or in byssal protein extraction may also restrict our identification

of novel proteins. It is interesting that in addition to being identified in the insoluble matrices, the

known proteins, Dpfp1, Dpfp2 and Dpfp5 were previously also identified in soluble extracts by

gel electrophoresis [73]. The presence of these proteins in the insoluble extract could thus

possibly be due to partial cross-linking that prevents all of the protein from being solubilized.

Sequence comparisons of the ten zebra mussel byssal proteins sequenced thus far reveals a

number of protein characteristics that are common features of the byssal proteins. Firstly, as seen

in adhesive proteins from many species, low sequence complexity due to the presence of

Page 93: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

81

consensus sequence repeats is a common characteristic of most of the identified byssal proteins.

In some proteins such as Dpfp1, Dpfp2 and Dpfp6, the repeat pattern is in the form of tandem

repeats and in others such as Dpfp7, Dpfp9 and Dpfp12, the repeat patterns is in the form of

shorter repeats irregularly interspersed with other residues. Others such as Dpfp8, Dpfp10 and

Dpfp11, however, display no notable repeats. As well, the repeat sequences themselves display

varying degrees of consensus. Within the sequence of Dpfp1 itself, the N-terminal repeats have

poor consensus in contrast to the C-terminal repeats which are strongly conserved in their

sequences [35]. Another prominent characteristic of the zebra mussel byssal proteins is the

appearance of a block structure within the protein sequence. The blocks within a sequence may

vary in their sequence repeats, their isoelecetric points and even in their post-translational

modifications (e.g. Dpfp1 [35]). In terms of pI, Dpfp1, Dpfp5 and Dpfp7 have a di-block

structure with a basic N-terminus and an acidic C-terminus and Dpfp9 is a di-block with acidic

N-terminus and basic C-terminus. In terms of repeat patterns, Dpfp1, Dpfp7 and Dpfp9 have di-

block structures but Dpfp5 has three blocks containing different kinds of repeats. The cement

protein pc-3A in the sandcastle worm also has a pI based block structure similar to Dpfp1, Dpfp5

and Dpfp7 [41] thus illustrating that block structures must play an important role in

adhesion/cohesion mechanism of adhesive proteins.

The zebra mussel byssus also displays some interesting characteristics with respect to its protein

mixture. The mussels protein collection encompasses a wide range of isoelectric points ranging

from acidic to basic. Most of the proteins are basic, however there are some such as Dpfp1

(theoretical pI 5.1), Dpfp5 (theoretical pI 6.4) [73], Dpfp6 (theoretical pI 4.2) and Dpfp7α

(theoretical pI 6.5) that are acidic. This is in contrast to marine mussel proteins that are generally

basic (pI >9) [18] but is similar to sandcastle worms which have a heterogeneously charged

adhesive mixture containing one strongly acidic (pI 2.5) and two strongly basic (pI > 9) proteins

among others [36]. With respect to molecular weights as well, there is a range in protein sizes

ranging from 4.1 kDa (theoretical MW of Dpfp12) (Table 3-2) to 49 kDa (theoretical MW of

Dpfp1) [35] to greater than 210 kDa (electrophoretic MW of Dpfp0) [73]. Such molecular

weight ranges are also seen in marine mussels however their highest molecular weight proteins

are represented by collagenous proteins where more than one subunit is bundled [18].

Page 94: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

82

Analysis of the ten zebra mussel byssal protein sequences defines a number of distinct sequence

motifs that are common to two or more byssal proteins. These motifs are based not only on

amino acid compositions (as described in Table 3-3) but also on the pattern of repeats of the

amino acids. One prominent set of sequence motifs are those that are rich in YP diads. These

include Dpfp2, Dpfp12 and the N-termini of Dpfp1, Dpfp5 and Dpfp7. In fact, Dpfp12 resembles

YP containing fragments of Dpfp1, Dpfp2 and Dpfp5 (Figure 3-2) and YP based similarities are

also observed between the N-termini of Dpfp5 and 7. Another motif, though not as common, is

the motif rich in PY diads that can be clearly distinguished from the YP rich motifs. These are

observed in Dpfp6 and the C-terminus of Dpfp1. Two additional sequence motifs are based on

glycine rich motifs and include those that contain long glycine runs of the form GGX (Dpfp8 and

the C-terminus of Dpfp9), where X is a variable residue, and others that possess glycine rich

repeats (C-termini of Dpfp5 and Dpfp7 and the N-terminus of Dpfp9). A fifth motif is a cysteine

rich motif seen only in Dpfp10, Dpfp11 and the middle region of Dpfp5. In Dpfp5 and Dpfp10,

cysteine is often closely associated with proline whereas in Dpfp11, the residues are quite

independently distributed. The presence of these five common sequence motifs between ten

sequences could potentially indicate that only a few specific motifs are actually required for

byssal protein functions but the different motif combinations in different proteins then provide

the variety and flexibility needed by the mussel to adapt to different conditions and surface

features. While we were able to identify a number of sequence motifs by drawing comparisons

amongst the byssal proteins, domain searches of the byssal proteins using the Simple Modular

Architecture Research Tool (SMART) did not reveal any significant domain identifications

within the sequences, thus emphasizing the low sequence complexity of these proteins.

In spite of significant compositional differences, direct comparisons of zebra mussel byssal

proteins with proteins identified in the much studied marine mussel byssi can be a useful

indication of their roles within the byssus. As discussed previously, the sequence similarities and

homology matches between Dpfp12 and the marine mussel cuticle protein mfp-1 indicate that

Dpfp12 might also play a role as a protective structural protein in the cuticle of the zebra mussel

byssus [46]. In marine mussels, mfp-6 is a cysteine containing footprint protein that potentially

acts to restore DOPA adhesion as well as promotes cohesion in the plaque matrix [16]. Mfp-2 is

also a cysteine-containing plaque matrix protein where the cysteine residues form disulfide

bonds and impart a structural role to the protein [82]. The almost uniquely cysteine containing

Page 95: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

83

zebra mussel byssal proteins Dpfp5, Dpfp10 and Dpfp11 could mimic the functions of mfp-2

and/or mfp-6 and play either an adhesion promoting and/or cohesive role in the byssus. In marine

mussels, the interfacial adhesive proteins, mfp-3 (25% Gly) and mfp-5 (20% Gly), have more

glycine than plaque matrix cohesive proteins mfp-6 (15%), mfp-2 (14%) and mfp-4 (5%) and

much more glycine than the cuticle protein mfp-1 (0.4%) [78]. High glycine content could

therefore possibly be indicative of more interfacial adhesive roles in the zebra mussel proteins as

well. While Dpfp7 (17 – 21%), Dpfp8 (17%) and Dpfp9 (39 – 40%) all have comparatively high

levels of glycine, its distribution varies between them all and could dictate whether the protein

role is adhesive or cohesive. Other structural proteins in the marine mussel byssus include

histidine rich mfp-4, a plaque matrix protein that acts as a linker between thread and plaque [30]

and glycine, tyrosine and asparagine rich thread matrix proteins (TMPs) that separate and

perhaps lubricate collagenous fibrils under tension in the thread [25]. Dpfp9, which is also rich in

G, Y and N and likely plays a structural role in the byssus as described previously, may thus

somewhat mimic the structural functions of the TMPs in the zebra mussel thread and plaque.

In the marine mussel byssus, the cohesive proteins in the thread, plaque and cuticle consistently

display repetitive patterns of consensus sequences. However, the proteins near the adhesive

interface (Mfp-3, 5 and 6) consist only of a single non-repeating sequence [18]. If this is taken as

a rule, Dpfp8 would likely be an adhesive interface protein and Dpfp7 and Dpfp9 would be

cohesive matrix proteins. Additionally, the marine mussel adhesive interface proteins, Mfp-3 (6

kDa) and Mfp-5 (9 kDa) [18] and the interfacial antioxidant and cross-linker protein Mfp-6 (11

kDa) [16] have low molecular weights in comparison with the rest of the proteins. Thus it

appears that in marine mussels, the footprint proteins have evolved to be smaller than the other

byssal proteins [18]. If that is true in the zebra mussel byssus, then some of these low molecular

glycine rich proteins such as Dpfp8 (6.9 kDa) could also be expected to play an interfacial

adhesive role. However, since all of the zebra mussel byssal proteins identified thus far have

been found in both the thread and plaque, it is not possible to directly determine which, if any, of

the identified proteins might play an adhesive role at the plaque-substrate interface.

Previously, MALDI-TOF mass spectrometry analysis of mature byssal threads by Gilbert and

Sone, 2010 revealed the presence of several low molecular weight proteins (3.7 to 7 kDa) that

had different distributions between the thread, plaque and plaque footprint [34]. A number of the

Page 96: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

84

zebra mussel byssal proteins (Dpfp8, 9, 10, 11 and 12) identified in this analysis fall within this

range (Table 3-2). Dpfp10 (6727 Da) has significant correspondence to major MALDI peaks of

6737 Da in the thread and 6742 Da in the plaque footprint. These peaks even display an adjacent

protein peak with hydroxylation (16 kDa difference) indicating a DOPA modification in the

protein. Dpfp11α (6364 Da) as well shows correspondence to a peak with a DOPA modification,

at 6399 Da, appearing uniquely in the footprint. This makes it similar to the cysteine and DOPA

containing marine mussel protein mfp-6 that is also in the footprint where it plays a cross-linking

or anti-oxidant role [83]. Additionally, Dpfp12 (4132 Da), which we have speculated is a cuticle

protein, corresponds to a peak at 4159 Da detected only in the thread and clearly not detected

elsewhere [34]. However, since the cuticle makes up a much greater percent of the thread cross-

section as compared to the plaque cross-section, there might just not have been enough cuticle

protein to be detected in the plaque.

Marine mussels have distinct protein compositions between thread and plaque with collagenous

proteins in the thread and DOPA containing proteins in the plaque [9]. In contrast, the zebra

mussel byssus contains similar amino acid compositions between thread and plaque and

therefore likely has similar protein compositions between the two. Thus, while there might be

some specialized adhesive proteins at the plaque-substrate interface, the distributions of cohesive

matrix proteins may be similar throughout the byssus, possibly explaining why most proteins

indentified are present in both thread and plaque. Through MALDI mass spectrometry analysis

of different byssal regions, Gilbert and Sone, 2010 identified the presence of similar proteins

peaks between the thread and footprint thus indicating that the same proteins might be present in

the thread and at the adhesive interface and that even if the adhesive proteins were identified in

our analysis, they cannot be pinpointed as adhesive because they are also identified in the thread.

In scenarios where the same protein is present in both the adhesive footprint and the thread, the

enzyme catechol oxidase that oxidizes DOPA to DOPA quinone could possibly contribute to

functional differences [17]. Catechol oxidase is present in greater amounts in the thread and

plaque bulk than at the plaque-substrate interface and is therefore responsible for greater

cohesive cross-linking in the thread versus the plaque footprint. It may therefore be able to

impart different functions to the same protein depending on their localization within the byssus

[17]. Additionally, the MALDI analysis revealed a number of small proteins uniquely distributed

in plaque adhesive layer and other peptides between 3.7 kDa to 4.2 kDa that are uniquely

Page 97: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

85

distributed in the thread matrix [34], however, all the byssal protein sequences identified thus far

have been found both in the thread and plaque. Thus, these low MW byssal proteins may not

been identified yet and might possibly be present in the soluble extract, requiring further

characterization.

3.5 Conclusion

LC-MS/MS analysis of base insoluble matrix proteins obtained from induced, freshly secreted

byssal threads allowed analysis at a stage of minimal DOPA cross-linking and zebra mussel

cDNA database matching of the byssal protein mass spectra then led to the identification of

previously known as well as seven novel proteins (Dpfp6 – Dpfp12) in both the thread and

plaque matrix. Sequence analysis of the zebra mussel byssal proteins reveals protein

characteristics and sequence motifs that are common features of the proteins and also reveals

several prominent sequence homologies within the byssal proteins. The current analysis has thus

greatly added to our knowledge of the protein composition of the zebra mussel byssus. Future

work must look into protein distribution specifically at the plaque substrate interface to identify

byssal proteins with specialized adhesive functions. An understanding of the molecular basis of

adhesion in zebra mussels will ultimately contribute to the development of water resistant

adhesives for medical and dental applications and will also allow the development of targeted

anti-fouling strategies against this rapidly spreading species.

3.6 Acknowledgements

The authors gratefully acknowledge Trevor Gilbert and Kyle Serkies for collecting the mussels.

We also thank Li Zhang and Paul Taylor of the Advanced Protein Technology Centre, Sick Kids,

Toronto for their technical advice on LC-MS/MS analysis. This work was supported by the

National Sciences and Engineering Research Council (NSERC) of Canada, the Canadian

Foundation for Innovation (CFI), and an Ontario Graduate Scholarship (OGS).

Page 98: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

86

Chapter 4:

Conclusions, Preliminary work and Future Directions

4.1 Summary and Conclusions

The primary objective of this thesis was to improve our knowledge of the molecular basis of

adhesion in zebra mussels such that this knowledge can be implemented in the design of

biological adhesives for medical and dental applications and in the development of anti-fouling

strategies against this invasive, biofouling species. In the past, the zebra mussel byssus has

stubbornly evaded biochemical characterization due to extensive cross-linking of its mature

structure and has thus left major gaps in our understanding of its protein composition and

distribution. In this work, we have strongly built on the knowledge of zebra mussel byssal

proteins by performing our analysis on induced, freshly secreted byssal threads that are

minimally cross-linked and more amenable to characterization.

Over the course of this thesis, we have identified the presence of ten novel proteins (Dpfp0 and

Dpfp4 – Dpfp12) in the zebra mussel byssus by investigating the composition of both the soluble

byssal extract and insoluble byssal matrix. Previously, byssal proteins were identified only as

DOPA-staining precursors (Dpfp1 – 3) in zebra mussel foot extracts, thereby overlooking

DOPA-poor or DOPA-lacking byssal proteins. The current identifications on the other hand,

represent both DOPA-rich and DOPA-deficient byssal proteins even though these cannot

currently be distinguished as one or the other. Further, we have determined the primary sequence

structure of eight of these novel proteins (Dpfp5 – Dpfp12) and identified a more complete

sequence for Dpfp2, for which only fragments of the sequence were previously known.

Additionally, by performing our analysis on separated threads and plaques, we have determined

the byssal distribution of known (Dpfp1 and 2) and novel (Dpfp5 – Dpfp12) sequenced proteins

(Dpfp5 – Dpfp12) between the two regions of the byssus and have in fact found that all ten

proteins are present in both the thread and plaque.

Page 99: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

87

Two very important biochemical and proteomic techniques have contributed to our findings on

the zebra mussel byssal composition; gel electrophoresis and peptide fragment fingerprinting

(PFF), which includes LC-MS/MS analysis and database sequence matching. While gel

electrophoresis was useful in the identification of proteins in the soluble byssal extract, PFF was

useful in two contexts, sequencing gel band proteins from the soluble extract, and sequencing

and thereby identifying proteins in the insoluble byssal matrix. The overlap seen in the

identification of soluble extract proteins in the insoluble byssal matrix, however, indicates that

soluble proteins might be retained in the insoluble matrix owing to partial cross-linking.

Additionally, in spite of the great information provided by these techniques, there are still certain

limitations in our identification and sequencing of byssal proteins. Since the protein gel bands

are not sufficient, proteins cannot be isolated in adequate amounts for functional characterization

of their post-translational modifications, secondary structure, biophysical properties and precise

byssal distribution. As well, stringent protein and peptide identification criteria, inaccuracies in

the cDNA library and irregular trypsin digestion could have limited identification of additional

proteins in the insoluble matrix. Since we work with induced threads in our analysis, an

additional limitation may be that these do not exactly represent the protein compositions and

distributions of the mature byssus. In marine mussels however, the induced byssal threads have

been shown to be indistinguishable from the natural threads [16], [25] and hence we work on the

assumption that they are indistinguishable in zebra mussels as well.

Putting together all the byssal protein information available from the current and previous work,

a total of 13 zebra mussel byssal proteins have been identified (Dpfp0 – 12) (ten from this work)

and the protein sequence and distribution between thread and plaque is known for ten of these

(Dpfp1, 2 and 5 – 12) (eight from this work). Comparing all of these protein sequences amongst

each other and comparing to adhesive proteins from other species reveals features that are

common characteristics of byssal proteins. Through this thesis, especially in Chapter 2 and

Chapter 3, several such comparisons have been drawn and have led to the identification of

protein characteristics and sequence motifs that are distinctive of two or more zebra mussel

byssal proteins. Similar to adhesive proteins in other species, repeat patterns of consensus

sequences are a significant characteristic of most zebra mussel byssal proteins. Additionally,

several of these byssal proteins have co-block structures where the block sequences can differ in

post-translational modifications (known only for Dpfp1), isoelectric point and/or repeat patterns.

Page 100: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

88

Interestingly, unlike marine mussel byssal proteins that are generally basic, zebra mussels have a

mix of acidic and basic proteins, also a characteristic of the sandcastle worm cement proteins.

Significantly as well, a number of distinct sequence motifs have been identified in the zebra

mussel byssal proteins. These include motifs that are rich in YP diads or PY diads, others that are

comparatively rich in cysteine residues and even other glycine rich motifs where G is present as

long glycine runs of the form GGX (where X is variable) or as glycine rich repeats. Some

proteins even have multiple sequence motifs within their protein sequence. Another very

interesting observation in the zebra mussel byssal proteins is that several proteins have very

prominent sequence homologies to each other, often encompassing a whole block of the proteins.

For example, Dpfp6 shows strong homology to the C-terminal block of Dpfp1 (Figure 3-1B)

and the C-terminal block of the di-block Dpfp7 shows strong similarity to the C-terminal block

of the tri-block Dpfp5 (Figure 3-3B).

4.2 Preliminary Additional Studies

4.2.1 Comparing zebra and quagga mussel byssal proteins

In addition to comparing with the much-studied marine mussels, it is also useful to compare

zebra mussel adhesion to adhesion in other freshwater byssate mussel species such as the closely

related quagga mussel (Dreissena bugensis). As discussed previously in section 1.1.6, the two

freshwater mussel species are similar in having DOPA-containing proteins in both the thread and

plaque (unlike marine mussels) and the two species have some potentially homologous proteins

that run at similar molecular weights on a gel [33]. Therefore, we additionally investigated the

protein composition and distribution of the quagga mussel byssus to potentially identify other

homologous proteins and to determine other byssal characteristics that are common or unique to

either freshwater species. The results from this preliminary study are presented in Appendix A

and summarized below.

As with the zebra mussels, we once again performed extractions on induced, freshly secreted

byssal threads from quagga mussels and were able to identify a series of novel quagga mussel

Page 101: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

89

byssal proteins in addition to some previously known ones (Figure A-1 and A-2). The quagga

mussel byssus was much more easily extractible than the zebra mussel byssus (possibly due to

lower DOPA content as discussed in section 1.1.6) and therefore a much larger number of novel

proteins were identified. These include a protein at ~50 kDa for which no homologue was seen

in the zebra mussel, one at ~35 kDa that could be homologous to Dpfp5 (~30 kDa by

electrophoresis) and one at ~ 7 kDa that could be homologous to any of the lower molecular

weight Dpfp proteins indentified in the insoluble matrix (Figure A-2). Like with the zebra

mussel byssal proteins, all of these were identified in both the thread and the plaque.

In the quagga mussel, unlike the zebra mussel, a number of significant bands were also identified

that were present almost uniquely in the plaque and could therefore have specialized adhesive

functions (Figure A-2). These include a band at > 210 kDa (Dbfp0), a band at ~22 kDa

corresponding to Dbfp2 and other bands at ~16 kDa and at 12 – 13 kDa corresponding to Dbfp3

(Figure A-2). These known DOPA-containing proteins Dbfp0, Dbfp2 and Dbfp3 are believed to

be homologous to zebra mussel DOPA proteins Dpfp0 (>210 kDa), Dpfp2 (26 kDa) and Dpfp3

(12 – 13 kDa) respectively [33], however the predominant localization of Dbfp2 and Dbfp3 in

the plaque, in contrast to Dpfp2 and possibly even Dpfp3 which we identified in both thread and

plaque, indicates that all of these presumed homologues might actually have different roles in the

their respective byssi. This is consistent with the finding that in spite of other similarities, Dbfp1

(80 and 69 kDa) and Dpfp1 (76 and 65 kDa) are actually quite different in terms of their repeats,

DOPA content and other amino acid content [5], as discussed in section 1.1.6. Thus, in general,

it appears that in spite of several superficial similarities between the byssal compositions of the

zebra and quagga mussels, their proteins might actually vary in their roles and sequence

properties. Sequencing of quagga mussel protein bands will be useful in either asserting or

refuting our assumptions; however, the lack of a quagga mussel cDNA library makes such

analysis difficult. De novo sequencing of the peptide fragments is therefore currently underway

(Appendix A).

Page 102: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

90

4.2.2 Peptide mimics: an insight into byssal protein interactions

Throughout this thesis, we have attempted to interpret the adhesive/cohesive roles of zebra

mussel byssal proteins by correlating their primary structures with their byssal distribution and

comparing to other byssal proteins with known functions. An additional way to study the

mechanism of adhesion/cohesion of the byssal proteins is to study peptide mimics of the proteins

since peptide mimics provide an easier system to work with, can be prepared in much greater

amounts, are more amenable to solution techniques and can be modified as needed. In marine

mussel byssal proteins, peptide mimics of protein sequences with tandem repeats of consensus

sequences have been studied in order to investigate the structure and interactions of these repeats

and to design mimics for biological adhesive applications [26]. One such mimic created for

commercial applications is a fusion peptide called fp-151 that consists of an fp-5 adhesive

protein sequence from Mytilus galloprovincialis flanked with copies of the mgfp-1 cuticle

protein’s decapeptide repeat on either side [84]. This mimic displays good macro-scale adhesion

and biocompatibility for various cell types [84]. At the current stage, the study of peptide mimics

of zebra mussel byssal proteins can provide useful insights into the structure and chemical

reactivity of the proteins. Therefore, in this work, we investigated peptide mimics of the only

fully sequenced zebra mussel protein at the time, Dpfp1, to learn more about its’ mode of

interactions [14].

Gilbert, 2010 had found that a Dpfp1 inspired fusion peptide containing one N-terminal and one

C-terminal repeat of Dpfp1 self-assembles into spherical aggregates (~500 nm diameter) upon

interaction with iron (III) [14]. It was hypothesized that in addition to complexation of iron by

the single DOPA residue in the fusion peptide, other peptide-peptide interactions must also be

responsible for aggregate formation [14]. Therefore, in our analysis, we study the mode of self-

assembly and iron complexation by the Dpfp1 inspired mimetic peptide in order to elucidate its

mechanism of aggregate formation. The results from this preliminary study are described in

Appendix B. Using Circular Dichroism (CD) Spectroscopy we determined that upon interaction

with iron (III) the fusion peptide does not adopt a specific secondary structure and instead

maintains a random coil conformation (Figure B-1), thus possibly indicating that Dpfp1 might

not adopt a specific structure if it interacts with iron (III) in the byssus. Additionally, Dynamic

Light Scattering (DLS) (Figure B-2) and Transmission Electron Microscopy (TEM) (Figure B-

Page 103: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

91

3) experiments revealed that the ratio of iron (III) to fusion peptide in the mixture affects the size

of the aggregates and the rate of increase in the size of the aggregates. The ability of the Dpfp1

fusion peptide to interact with iron (III) in such a specific way could indicate that Dpfp1 has an

iron (III) dependent role in the byssus such as in the cuticle. Additionally, as discussed in

Appendix B, interpretation of DLS results indicate that in addition to DOPA-iron complexation,

other peptide-peptide interactions amongst the Dpfp1 N- and C-terminal repeats must also be

directing aggregate formation. Further experiments on solution conditions and substitutions of

charged residues in the Dpfp1 fusion peptide will give additional insights into interactions within

the Dpfp1 protein.

4.3 Future work

The current analysis has greatly enhanced our knowledge of the protein composition of the zebra

mussel byssus, however, there are still several large gaps in our understanding and we still have a

long way to go in characterizing the mechanism of zebra mussel adhesion. For one, while several

zebra mussel byssal proteins have now been identified, there are still many more that are

unknown, as evidenced by MALDI-TOF analysis of the byssus (section 1.1.5) [34] and our

inability to identify any protein that are localized specifically in the plaque (as expected of

adhesive footprint proteins). Secondly, even though our current analysis characterizes byssal

distribution between the thread and plaque, it does not reveal any information on the distribution

of proteins within the plaque itself such as at the thread-plaque anchor zone, in the bulk plaque

matrix or at the plaque-substrate interface. As witnessed in marine mussels, the localization of

proteins within the plaque is correlated with their byssal functions (section 1.1.3) and hence,

studying zebra mussel protein distribution within the plaque is critical to our understanding of

their byssal roles. Thirdly, the structural information available on the zebra mussel byssal

proteins is still quite incomplete. The EST (Expressed Sequence Tag) derived primary sequences

in our analysis do not reveal any significant information on post-translational modifications such

as glycosylations and tyrosine hydroxylations to DOPA, thereby limiting our understanding of

their functions and interactions. Additionally, some of the EST-derived primary sequences are

incomplete at the N-terminus. Further investigations of zebra mussel byssal composition are

Page 104: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

92

therefore required to identify other novel byssal proteins, determine protein distributions within

the plaque and better characterize protein structure and chemical reactivity. These are described

below.

4.3.1 Identification of other novel zebra mussel byssal proteins

Through the current work, we have identified a series of novel proteins by electrophoresis of the

soluble byssal extract and by peptide fragment fingerprinting (PFF) of the insoluble byssal

matrix. While these proteins have theoretical molecular weights around 4.1 kDa and above

(Table 3-2), MALDI-TOF analysis specifically of the mature thread revealed the presence of

several proteins around 3.7 – 4.2 kDa [34] that have not been identified yet. Additionally, there

are proteins in the range of 5.8 to 7 kDa that are unique to the plaque interface and have not been

identified yet (section 1.1.5) [34]. All of these might represent small proteins present in the

soluble extract that got washed off the gel during staining. Therefore, in order to identify these

proteins, PFF analysis can be performed directly on an undigested soluble extract filtered with

~10 kDa cutoff. Since these proteins are very small, they do not need to be digested and the LC-

MS/MS analysis should be able to isolate intact peptides instead of peptide fragments. Further, to

minimize pre-analysis processing before from homogenizing and separating byssal threads into

soluble extract and insoluble matrix, it will be worth attempting PFF analysis on intact, freshly

secreted byssal threads subjected to trypsin digestion. This analysis may also be useful to

validate the proteins and distributions identified thus far.

4.3.2 Determining protein distribution within the byssal plaque

While we have determined generalized distributions of several zebra mussel byssal proteins

between thread and plaque, we do not have any information on their distribution within the

plaque and hence, on their role in adhesion/cohesion. Even within the byssal plaque, proteins can

have varied localizations such as at the thread-plaque anchor zone, bulk plaque matrix and

plaque-substrate interface. Previous MALDI-TOF analysis of the different byssal regions was

done on mature byssal threads where cross-linking could have hindered the identification of

Page 105: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

93

other significant protein peaks [34]. Thus, it will be useful to repeat this analysis with induced

byssal threads that have minimal cross-linking such that even some higher molecular weight

proteins not detected before may now be identifiable. Though this analysis does not reveal

protein sequences, it usefully identifies novel proteins and their hydroxylated (DOPA-modified)

variants, as well as their distributions between thread, plaque and plaque footprint [34]. The

small size and delicateness of fresh byssal threads may however make this analysis difficult but

the adhesive proteins at the interface layer can especially be studied by performing the mass

spectrometry analysis on an upturned plaque embedded in gelatin [34].

Immunohistochemistry is a useful technique used to localize an antigen such as a protein in a

tissue by developing a high affinity and specific antibody against it [20]. This method has been

used in marine mussels, however the analysis is difficult due to the poor antigen quality of some

proteins and/or significant shielding of epitopes in the mature byssus [20]. In zebra mussels,

immunolocalization of Dpfp1 did not detect the protein within the mature byssal threads (due to

epitope masking) but detected it in the foot tissue and homogenized, acid-extracted threads

(section 1.1.5) [4]. A next step is therefore to perform this analysis in plaques from freshly

secreted byssal threads where antigen eptiopes will not be as masked. Additionally, now that the

sequences of several byssal proteins have been identified, antibodies can be developed against

these to locate them within the different regions of the byssus and in secretory granules

surrounding the ventral groove in the mussels foot.

4.3.3 Characterizing structure and chemical reactivity of byssal proteins

The EST derived primary sequences in our analysis do not reveal any information on post-

translational modifications within the protein sequence such as the presence of glycosylations

and DOPA modifications. Such information is available only for the sequences of Dpfp1 and

Dpfp2 as determined previously by staining gels for DOPA (Arnow or NBT stain) and

glycoproteins (PAS stain) and by quantifying DOPA and carbohydrate content following

purification of the proteins [33]. Therefore, for proteins that can be identified on a gel, an

important next step is to stain for DOPA and glycoproteins to determine the nature of these

proteins. In order to be able to quantify these protein characteristics and obtain N-terminal

Page 106: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

94

sequences (where incomplete), the proteins will however have to first be purified in sufficient

amounts from the gel thus requiring extractions on an even larger scale, which is not necessarily

feasible. Instead, the identified target proteins can now be extracted in greater amounts from the

mussels foot and then be separated by 2D gel electrophoresis. Purified proteins can then be

functionally characterized for their post-translational modifications, secondary structure,

biophysical properties and precise byssal distributions.

Peptide mimics of the sequenced byssal proteins will additionally provide useful insights into

their chemical reactivities and modes of interactions. For proteins with tandem repeats (such as

Dpfp2) the chemical reactivity of individual repeats can be studied, for proteins with co-block

structures (such as Dpfp1 (section 4.3), Dpfp5 and Dpfp9), sample repeats from each block can

be fused together and for proteins that are really small (such as Dpfp12), the whole sequence

could possibly be synthesized for characterization. Unfortunately, the position of DOPA

modifications is unknown in most of these sequences and therefore assumptions on DOPA

positions would be required when synthesizing the mimics. As with experiments involving the

fusion peptide mimic of Dpfp1 described in Appendix B, the mimetic peptides can also be

interacted with metals such as iron (III) to observe their complexation abilities.

Page 107: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

95

4.4 Significance and Conclusions

An understanding of zebra mussel adhesion is critical to the development of specific anti-fouling

strategies against the species and will contribute to the design of biological adhesives as an

alternative to those currently based on marine mussel adhesion. However, limited information on

the zebra mussel byssal composition had thus far held back interpretation of sequence properties

and had prevented studies on their adhesive/cohesive capacities. In the current work, the

identification of several novel byssal proteins along with information on their primary structures

and byssal distributions has greatly added to our knowledge of the protein composition of the

zebra mussel byssus. This work has allowed us to identify protein characteristics and sequence

motifs that are common to zebra mussel byssal proteins and in the future, will allow further

characterization of byssal protein adhesion/cohesion by means of mimetic peptide studies and

biochemical characterization techniques such as immunolocalization. The novel proteins

identified here do not however represent the complete list of zebra mussel byssal proteins and

therefore, future efforts must be directed at updating this list, in addition to functionally

characterizing the proteins.

Page 108: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

96

Appendix A:

Quagga Mussel Adhesion: Novel Proteins and their

Byssal Distribution

In addition to our analysis on zebra mussels, we also studied the byssal protein composition of

the closely related freshwater mussel, the quagga mussel (Dreissena bugensis) which is also a

biofouling species. As described in the Introduction in section 1.1.6, thus far only four quagga

mussel byssal proteins (Dbfp0, 1, 2 and 3) have been identified as DOPA-staining precursors

upon extraction from the mussel’s foot [33] and the partial sequence of only one of these (Dbfp1)

is known [5]. Thus, as with zebra mussels, limited information is available on the composition of

the quagga mussel byssus. Hence, with an objective to identify novel byssal proteins and

determine their distributions within the byssus, we extract quagga mussel proteins from induced,

freshly secreted byssal threads, using the protocol described for zebra mussels in sections 2.3.1

and 3.3.1 and then analyze the extracts by Tricine-PAGE gel electrophoresis as described in

section 2.3.4.

We found that the quagga mussel byssal proteins are much more easily extractible than the zebra

mussel byssal proteins. While for zebra mussels, we extract 4µg protein/mussel thread and

7µg/plaque, in quagga mussels we are able to extract 12 µg protein/mussel thread and 10

µg/plaque (as determined by A280 measurements of extracts). Additionally, several zebra mussel

extracts (~65 full byssal threads) have to be pooled together for protein bands to be visualized on

the gel (Figure 2-1) but quagga mussel bands are clearly visualized even when proteins from a

single extraction of 15 threads/plaques are loaded (Figure A-1) The reason for the easier

extraction of the quagga mussel byssus could be the presence of less DOPA in quagga mussel

byssal proteins as shown by previous comparisons of Dpfp1 (6.6 mol% DOPA) and Dbfp1 (0.55

mol%) and of the overall amino acid contents of their byssi (0.6 mol% DOPA in zebra versus

0.1mol% in quagga) (described in section 1.1.6) [33], [5].

Page 109: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

97

Figure A-1. Electrophoretic identification of quagga mussel byssal proteins. Proteins were

extracted in basic extraction buffer (as described in the Methods sections in Chapter 2 and 3)

from 15 induced, freshly secreted byssal threads and the soluble extract was loaded on a16%

Tricine PAGE gel that that was silver-stained for protein visualization. The leftmost lane

contains a Colorburst molecular weight ladder. Dbfp# labeled proteins represent previously

identified byssal proteins and underlined proteins represent novel byssal foot proteins identified

in the extract. Numbers in brackets represent the approximate molecular weights (in kDa) of the

visible proteins. The gel has been modified to show the most relevant lanes.

Page 110: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

98

As with Zebra mussels, electrophoretic analysis of extracts from the intact quagga mussel byssal

thread/plaque led to the identification of previously known as well as novel byssal proteins. As

shown in Figure A-1, protein bands were identified corresponding to the molecular weight of

previously known Dbfp0 (>210 kDa), the two forms of Dbfp1 (80 and 69 kDa) and Dbfp2 (22

kDa) and novel bands include those seen at approximately 30, 20 and 16 kDa (Figure A-1).

When pooled extracts of separated threads (~45) and plaques (~45) were analyzed by

electrophoresis, even a larger number of novel proteins were identified in both the thread and

plaque. These are indicated with arrows in Figure A-2. The % values beside the bands indicate

the % density of the bands relative to the plaque 7 kDa band taken as 100% density (densities

were calculated in total pixels using the Gel Analysis Software ‘UN-SCAN-IT’, Silk Scientific

Inc., Utah, USA). Most importantly, a number of significant proteins bands were identified that

are present uniquely in the plaque (indicated with red asterixes in Figure A-2). Since a greater

mass of thread protein (241 µg) versus plaque protein (187 µg) was loaded on the gel, it is

unlikely that the unique plaque bands are an artifact of protein extraction and loading. Thus,

these unique plaque proteins could represent proteins with specialized adhesive functions in the

plaque. The ~16 kDa band is the most prominent protein unique to the plaque, followed by

proteins at approximately 22 kDa (possibly Dbfp2), 20 kDa, 13 kDa (possibly Dbfp3) and 12

kDa (possibly Dbfp3) some of which might be present as light bands in the thread. The Dbfp0

protein with a molecular weight greater than 210 kDa is seen in both the thread and plaque,

however the density of the plaque band is about ten times that of the thread band.

In addition to the unique plaque proteins, a number of prominent bands were identified on the gel

in Figure A-2 that are common to both thread and plaque and could possess similar roles as the

uniformly distributed zebra mussel proteins identified in our analysis. While most of these have

comparatively high molecular weights such as the bands at approximately 80 kDa (possibly

Dbfp1), 50 kDa and 35 kDa, there is a single common band at ~ 7 kDa that is relatively smaller

(Figure A-2). Additional faint but distinct protein bands are also seen throughout the thread and

plaque lanes. These could represent new proteins not described above or maybe even variants of

the other proteins (Figure A-2). MALDI analysis of the thread and plaque from a freshly

secreted byssal thread also reveals major peaks around 7451 and ~7506 kDa (Figure A-3) that

correspond well to the ~ 7 kDa band seen by electrophoresis. In the plaque, additional low

Page 111: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

99

intensity peaks are also seen around the major peaks. As suggested with gel bands, these peaks

could represent other new proteins or potential variants of the 7 kDa protein.

Figure A-2. Electrophoretic determination of the distribution of quagga mussel byssal proteins

between thread and plaque. Proteins from separated threads and plaques were extracted in basic

extraction buffer, pooled together, dialyzed and lyophilized as described in the Methods in

Chapter 2. Lyophilized proteins were resuspended in water and loaded on a 16% Tricine PAGE

gel that was then silver-stained for protein visualization. Masses in brackets represent the mass

of lyophilized protein loaded for each sample. Arrows indicate visible bands in the thread and

plaque and MWs (in kDa) indicate molecular weights of some significant bands. Red asterixes

indicate proteins that are almost uniquely present in the plaque extract. Per cent values beside

bands indicate the % density of the bands relative to the plaque’s 7 kDa band taken as 100%

density. The gel has been modified to show the most relevant lanes.

Page 112: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

100

Figure A-3. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry

(MALDI-TOF MS) analysis of the quagga mussel thread and plaque from an induced, freshly

secreted byssal thread. The byssal thread was coated with sinapinic acid (3,5-dimethoxy-4-

hydroxy cinnamic acid) matrix (10 mg/mL Sinapinic acid in 50:50:0.1 water: Acetonitrile:

Trifluoroacetic acid) and analyzed using an ‘Applied Biosystems’ MALDI-TOF analyser at the

Department of Forestry, University of Toronto, Toronto.

Page 113: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

101

In zebra mussels, following identification of novel proteins, the protein primary sequences were

determined by tandem mass spectrometry of gel bands and database matching of mass spectra

against a cDNA library representing genes expressed in the mussels foot. Unfortunately, in

quagga mussels, such peptide fragment fingerprinting (PFF) analysis cannot be done because no

such cDNA library has been created for the species. Thus, instead, we attempted to perform de

novo sequencing analysis on protein gel bands in order to infer sequence information directly

from the experimental MS/MS spectrum [85]. To improve the quality of the MS/MS spectrum,

more concentrated protein samples were analyzed by pooling together multiple gel bands of the

same protein. Analysis was thus perfomed on five bands of Dbfp0 (two from the gel in Figure

A-2 and three from a gel not shown here), the thread and plaque bands of the 35 kDa protein, the

one plaque band of the 16 kDa protein and the thread and plaque bands of the 7 kDa protein, all

taken from the gel in Figure A-2. The pooled gel bands were digested with trypsin and LC-

MS/MS spectra were then obtained as described in the Methods in section 2.3.6. The de novo

sequencing software, PEAKS (Bioinformatics Solutions Inc., Waterloo, Ontario, Canada), was

then used to derive sequences of the trypsin digested protein fragments. Analysis of these de

novo sequences is still underway.

Page 114: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

102

Appendix B:

Peptide Mimics of the Zebra Mussel Byssal Protein

Dpfp1

Peptide mimics of zebra mussel byssal proteins can provide useful insights into the mechanism

of adhesion/cohesion of the proteins. Therefore, Gilbert, 2010 synthesized peptide mimics (by

Fmoc solid-phase synthesis) of the only fully sequenced zebra mussel byssal protein, Dpfp1, to

learn more about its mode of interaction [14]. One such mimic is a fusion peptide made by fusing

one N-terminal [P(V/E)YP(T/S)(K/Q)X] and one C-terminal consensus repeat

[KPGPY*DYDGPYDK] of Dpfp-1 (Figure 1-5). The resulting peptide has a sequence

PVYPTKYKPGPY*DYDGPYDK where Y* stands for DOPA [14]. Gilbert, 2010 found that

upon complexation with iron (III), this fusion peptide self-assembles into a film over several

days but no film is formed in the absence of DOPA or iron. When the film was characterized by

Scanning Electron Microscopy (SEM), a layer of spherical aggregates about 500 nm in diameter

was seen [14]. Such aggregate formation was not seen with just the DOPA containing C-terminal

repeat or even a double C-terminal repeat (containing 2 DOPAs) even in the presence of iron.

Gilbert, 2010 also found that the repeats must be part of the same peptide and not just mixed

together as a co-solution for this aggregate formation to occur [14]. These observations thus

indicated that the two Dpfp1 repeat sequences and Fe3+

must interact in a very specific way to

induce self assembly into aggregates [14].

Characterization of the specific interactions, between the two Dpfp1 repeat sequences and Fe3+

,

that lead to aggregate formation will provide useful information on the function and mechanism

of adhesion/cohesion of the Dpfp1 protein [14]. Therefore, here, we investigate the mode of iron

complexation and self-assembly of the Dpfp1-inspired mimetic peptide in order to elucidate its

mechanism of aggregate formation. In this direction, we study the peptides secondary structure

upon self-assembly in the presence of iron (III) using Circular Dichroism (CD) Spectroscopy and

investigate the size of aggregates formed by iron complexation under varied iron (III) to fusion

Page 115: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

103

peptide ratios (Fe3+

: FP) using Dynamic Light Scattering (DLS) and Transmission Electron

Microscopy (TEM).

Dynamic Light Scattering (DLS) measurements of aggregate size of solutions containing iron

(III) and the DOPA-containing fusion peptide revealed the formation of aggregates of varying

sizes depending on the solution conditions such as the Fe3+

: FP ratios (as will be discussed). In

the absence of iron or replacement of peptidyl DOPA with tyrosine, no aggregate formation was

detected. However, a double C-terminal repeat (26 residues) containing 2 DOPA’s showed even

bigger aggregate formation (results not shown here). These observations reinforced the

importance of both iron and DOPA in peptide aggregate formation. All iron – peptide mixtures

were prepared by mixing 2 mg/mL filtered peptide solutions in water (pH 6.5) with an equal

volume of the desired concentration of filtered FeCl3.6H2O in 20 mM BisTris buffer (2,2-

Bis(hydroxymethyl)-2,2’,2”-nitrilotriethanol). BisTris buffer complexes the iron and ensures that

it does not precipitate out as iron hydroxide. This method was adapted from Gilbert, 2010 [14].

In order to characterize any change in the secondary structure of the peptide upon interaction

with iron (III), we performed CD spectroscopy of the fusion peptide solution both in the presence

and absence of iron (III). We found that in both conditions the peptide maintains a random coil

conformation thus indicating that the Dpfp1-inspired fusion peptide does not form any specific

secondary structure upon complexation with iron. Figure B-1 illustrates the CD spectrum of a

2:1 Fe (III): Fusion Peptide solution in BisTris where the the negative trough between 190-225

nm indicates the random coil and the 230 nm peak indicates tyrosine.

Page 116: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

104

Figure B-1. Circular Dichroism spectrum of a 2:1 Fe3+

: fusion peptide solution in BisTris

buffer.

Next, in order to characterize the effect of the iron to peptide ratio on aggregate formation, we

used DLS analysis to determine the size of aggregates formed with different ratios. Figure B-2

shows the rate of increase in aggregate size for four Fe3+

: FP ratios over 10 minutes, five

minutes after mixing. It was seen that smaller Fe3+

: FP ratios (1:5, 1:3, 1:2) form larger

aggregates than larger Fe3+

: FP ratios (2:1) and also show a greater rate of increase in aggregate

size. The same ratios at pH 7 instead of pH 6 (shown in Figure B-2) give the same pattern of rate

of increase in aggregate size but also show much bigger aggregates being formed (results not

shown here). The pH effect could however be due to changes in interactions of iron with the

BisTris buffer which has a pKa of 6.5 at 25°C. In addition to DLS, TEM images of the 1:2 and

2:1 Fe3+

: FP solutions (five minutes after mixing) were also taken in order to visualize the shape

of the aggregates and verify the DLS findings. The TEM images as well, as shown in Figure B-

3, confirmed that smaller iron to peptide ratios form larger size aggregates. In fact, the 1:2 ratio

forms twice as large aggregates (~30 nm radius) as the 2:1 ratio (~15 nm radius) (Figure B-3)

which is similar to the pattern of aggregate sizes obtained by DLS for the 1:2 and 2:1 ratios, 48

nm and 19 nm, respectively. Interestingly as well, the TEM images indicate that the 1:2 Fe3+

: FP

solution has aggregates that tend to form clusters whereas the aggregates in the 2:1 Fe3+

: FP

solution are more dispersed (Figure B-3).

-7

-6

-5

-4

-3

-2

-1

0

1

2

190 210 230 250

Ab

sorb

an

ce

Wavelength (nm)

Page 117: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

105

Figure B-2. Dynamic Light Scattering measurements of the effect of iron (III) to fusion peptide

ratio on size of aggregates formed. Aggregeate sizes of mixtures of four Fe3+

: FP ratios (2:1, 1:2,

1:3 and 1:5) at pH 6 were measured over ten minutes, starting from 5 minutes after mixing. DLS

measurements were taken at 20°C with 6 line measurements of each sample taken at 2 minute

intervals, using a Malvern Zetasizer NanoZS instrument. Aggregate sizes measured were not

always consistent between experiments but the general pattern of results was the same.

0

100

200

300

400

500

600

5 7 9 11 13 15

Ag

gre

gat

e R

adiu

s (n

m)

Time (min)

1:5 pH 6

1:3 pH 6

1:2 pH 6

2:1 pH 6

Fe3+

: FP ratio

Page 118: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

106

Figure B-3. Transmission Electron Microscopy (TEM) images depicting the effect of two Fe3+

:

fusion peptide (FP) ratios (2:1 and 1:2) on the size of aggregates formed. TEM was done by

negative staining with Phosphotungstic acid (PTA) on a carbon coated Nickel grid, 5 minutes

after mixing. Images were taken with a Tecnai 20 Microscope in Mt. Sinai Hospital, Toronto.

In trying to understand why lower iron to peptide ratios lead to bigger peptide aggregates, we

consulted a hypothesis by Zeng et al., 2010 whether they analyze the effects of iron

concentration on the modes of iron-DOPA complexation within a DOPA-containing marine

mussel protein [86]. In accordance with their hypothesis, at lower iron to DOPA ratios of 1:2 and

1:3 (Figure B-2), bis- and tris-complexation takes place respectively, causing peptides to come

together and form aggregates. At higher iron to DOPA ratios (greater than 1:1), mono-

complexation takes place and the peptides are dispersed [86]. While this hypothesis explains the

results seen for most of the ratios in Figure B-2, it does not explain why the 1:5 Fe3+

: FP ratio

produces even bigger aggregates (since one Fe3+

cannot be complexed by more than three DOPA

Page 119: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

107

residues). Therefore, in addition to iron-DOPA complexation, other interactions like peptide-

peptide interactions must also occur within the aggregate that contribute to the results we see.

Gilbert, 2010 had hypothesized that in addition to the DOPA-iron interactions within the fusion

peptide (PVYPTKYKPGPY*DYDGPYDK) where Y* is DOPA, other interactions between the

positively charged lysine (K) and negatively charged aspartic acid (D) residues could also be

responsible for aggregate formation [14]. We tested this hypothesis by introducing a charge

shielding agent, NaCl, in the solution to shield any K and D interactions, but instead of

interfering with aggregate formation, higher concentrations of NaCl led to the formation of

bigger aggregates with a greater rate of increase in aggregate size (result not shown here). It

could be that NaCl helps shield some repulsive forces between peptides and hence promotes

bigger aggregates. In the future, it will be useful to directly introduce modifications to the fusion

peptide such as replacing charged residues with uncharged glycine to better study their role in

aggregate formation. Additionally, UV-Vis spectroscopy of the varied iron to peptide mixtures

will provide useful information of the mode of iron complexation (mono-, bis-, or tris-) by the

peptidyl DOPA. Overall, our experiments have revealed some interesting observations on the

nature of the aggregates formed by the interaction of the Dpfp1-inspired fusion peptide with iron

(III), however, further experiments will be needed to better understand the mechanism of self-

assembly and iron-complexation involved in aggregate formation.

Page 120: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

108

References

1. Claudi, R. and G.L. Mackie, Practical manual for zebra mussel monitoring and control. 1994:

CRC.

2. Strayer, D.L., Twenty years of zebra mussels: lessons from the mollusk that made headlines.

Front Ecol Environ, 2008. 7(3): p. 135-141.

3. Morton, B., The anatomy of Dreissena polymorpha and the evolution and success of the

heteromyarian form in the Dreissenoidea. Zebra mussels: biology, impacts and control, 1993.

185: p. 216.

4. Anderson, K.E. and J.H. Waite, Immunolocalization of Dpfp1, a byssal protein of the zebra

mussel Dreissena polymorpha. J Exp Biol, 2000. 203(Pt 20): p. 3065-3076.

5. Anderson, K.E. and J.H. Waite, Biochemical characterization of a byssal protein from Dreissena

bugensis (Andrusov). Biofouling, 2002. 18(1): p. 37-45.

6. Farsad, N. and E.D. Sone, Zebra mussel adhesion: Structure of the byssal adhesive apparatus in

the freshwater mussel, Dreissena polymorpha. J Struct Biol, 2012. 177(3): p. 613-620.

7. Eckroat, L.R. and L.M. Steele, Comparative morphology of the byssi of Dreissena polymorpha

and Mytilus edulis. Am Malacol Bull, 1993. 10: p. 103-108.

8. Allen, J.A., The recent Bivalvia: Their form and evolution. 1985.

9. Waite, J.H., et al., Mussel adhesion: finding the tricks worth mimicking. The journal of adhesion,

2005. 81(3-4): p. 297-317.

10. Rzepecki, L.M. and J.H. Waite, The byssus of the zebra mussel, Dreissena polymorpha. I:

Morphology and in situ protein processing during maturation. Mol Mar Biol Biotechnol, 1993.

2(5): p. 255-66.

11. Lee, H., N.F. Scherer, and P.B. Messersmith, Single-molecule mechanics of mussel adhesion.

Proc Natl Acad Sci U S A, 2006. 103(35): p. 12999-3003.

12. Taylor, S.W., et al., Ferric ion complexes of a DOPA-containing adhesive protein from Mytilus

edulis. Inorganic Chemistry, 1996. 35(26): p. 7572-7577.

13. Holten-Andersen, N., et al., Metals and the integrity of a biological coating: the cuticle of mussel

byssus. Langmuir, 2009. 25(6): p. 3323-3326.

14. Gilbert, T.W., Investigation of the protein components of the zebra mussel (Dreissena

polymorpha) byssal adhesion apparatus, in Institute of Biomaterials and Biomedical

Engineering. 2010, University of Toronto, MASc Thesis.

15. Burzio, L.A. and J.H. Waite, Cross-linking in adhesive quinoproteins: studies with model

decapeptides. Biochemistry, 2000. 39(36): p. 11147-11153.

16. Yu, J., et al., Mussel protein adhesion depends on interprotein thiol-mediated redox modulation.

Nat Chem Biol, 2011. 7(9): p. 588-90.

17. Farsad, N., T.W. Gilbert, and E.D. Sone, Adhesive structure of the freshwater zebra mussel,

Dreissena polymorpha, in Materials Research Society. 2009, Materials Research Society

Symposium Proceedings.

18. Lee, B.P., et al., Mussel-Inspired adhesives and coatings. Annu Rev Mater Res, 2011. 41: p. 99-

132.

19. Silverman, H.G. and F.F. Roberto, Understanding marine mussel adhesion. Mar Biotechnol

(NY), 2007. 9(6): p. 661-681.

20. Waite, J.H., Adhesion a la moule. Integrative and comparative biology, 2002. 42(6): p. 1172-

1180.

21. Waite, J.H., X.X. Qin, and K.J. Coyne, The peculiar collagens of mussel byssus. Matrix Biol,

1998. 17(2): p. 93-106.

22. Qin, X.X., K.J. Coyne, and J.H. Waite, Tough tendons. Mussel byssus has collagen with silk-like

domains. J Biol Chem, 1997. 272(51): p. 32623-32627.

Page 121: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

109

23. Coyne, K.J., X.X. Qin, and J.H. Waite, Extensible collagen in mussel byssus: A natural block

copolymer. Science, 1997. 277(5333): p. 1830-1832.

24. Qin, X.X. and J.H. Waite, A potential mediator of collagenous block copolymer gradients in

mussel byssal threads. Proc Natl Acad Sci U S A, 1998. 95(18): p. 10517-10522.

25. Sagert, J. and J.H. Waite, Hyperunstable matrix proteins in the byssus of Mytilus

galloprovincialis. J Exp Biol, 2009. 212(Pt 14): p. 2224-2236.

26. Cha, H.J., D.S. Hwang, and S. Lim, Development of bioadhesives from marine mussels.

Biotechnol J, 2008. 3(5): p. 631-638.

27. Taylor, S.W., et al., trans-2, 3-cis-3, 4-Dihydroxyproline in the tandemly repeated consensus

decapeptides of an adhesive protein from Mytilus edulis. J Am Chem Soc, 1994. 116(10): p. 803-

810.

28. Rzepecki, L.M., K.M. Hansen, and J.H. Waite, Characterization of a cystine-rich polyphenolic

protein family from the blue mussel Mytilus edulis L. The Biological Bulletin, 1992. 183(1): p.

123-137.

29. Warner, S.C. and J.H. Waite, Expression of multiple forms of an adhesive plaque protein in an

individual mussel, Mytilus edulis. Marine Biology, 1999. 134(4): p. 729-734.

30. Zhao, H. and J.H. Waite, Proteins in load-bearing junctions: the histidine-rich metal-binding

protein of mussel byssus. Biochemistry, 2006. 45(47): p. 14223-14231.

31. Waite, J.H. and X. Qin, Polyphosphoprotein from the adhesive pads of Mytilus edulis.

Biochemistry, 2001. 40(9): p. 2887-2893.

32. Zhao, H. and J.H. Waite, Linking adhesive and structural proteins in the attachment plaque of

Mytilus californianus. J Biol Chem, 2006. 281(36): p. 26150-26158.

33. Rzepecki, L.M. and J.H. Waite, The byssus of the zebra mussel, Dreissena polymorpha. II:

Structure and polymorphism of byssal polyphenolic protein families. Mol Mar Biol Biotechnol,

1993. 2(5): p. 267-279.

34. Gilbert, T.W. and E.D. Sone, The byssus of the zebra mussel (Dreissena polymorpha): spatial

variations in protein composition. Biofouling, 2010. 26(7): p. 829-836.

35. Anderson, K.E. and J.H. Waite, A major protein precursor of zebra mussel (Dreissena

polymorpha) byssus: deduced sequence and significance. Biol Bull, 1998. 194(2): p. 150-160.

36. Endrizzi, B.J. and R.J. Stewart, Glueomics: an expression survey of the adhesive gland of the

sandcastle worm. J Adhes, 2009. 85(8): p. 546-559.

37. Wiegemann, M. and B. Watermann, The impact of desiccation on the adhesion of barnacles

attached to non-stick coatings. Biofouling, 2004. 20(3): p. 147-153.

38. Flammang, P., et al., A study of the temporary adhesion of the podia in the sea star asterias

rubens (Echinodermata, asteroidea) through their footprints. J Exp Biol, 1998. 201(Pt 16): p.

2383-2395.

39. Flammang, P., J. Ribesse, and M. Jangoux, Biomechanics of adhesion in sea cucumber Cuvierian

tubules (Echinodermata, Holothuroidea). ICB, 2002. 42(6): p. 1107-1115.

40. Stewart, R.J. and C.S. Wang, Adaptation of caddisfly larval silks to aquatic habitats by

phosphorylation of H-fibroin serines. Biomacromolecules, 2010. 11(4): p. 969-974.

41. Wang, C.S. and R.J. Stewart, Localization of the bioadhesive precursors of the sandcastle worm,

Phragmatopoma californica (Fewkes). J Exp Biol, 2012. 215(2): p. 351-361.

42. Zhao, H., et al., Cement proteins of the tube-building polychaete Phragmatopoma californica. J

Biol Chem, 2005. 280(52): p. 42938-42944.

43. Ohkawa, K., et al., Purification and characterization of a dopa-containing protein from the foot

of the Asian freshwater mussel, Limnoperna fortunei. Biofouling, 1999. 14(3): p. 181-188.

44. Yamamoto, H. and K. Ohkawa, Synthesis of adhesive protein from the vitellaria of the liver

flukeFasciola hepatica. Amino Acids, 1993. 5(1): p. 71-75.

45. Gatesy, J., et al., Extreme diversity, conservation, and convergence of spider silk fibroin

sequences. Science, 2001. 291(5513): p. 2603-2605.

Page 122: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

110

46. Holten-Andersen, N., H. Zhao, and J.H. Waite, Stiff coatings on compliant biofibers: the cuticle

of Mytilus californianus byssal threads. Biochemistry, 2009. 48(12): p. 2752-2759.

47. Zhao, H. and J.H. Waite, Coating proteins: structure and cross-linking in fp-1 from the green

shell mussel Perna canaliculus. Biochemistry, 2005. 44(48): p. 15915-15923.

48. Burzio, L.A., et al., The adhesive protein of Choromytilus chorus (Molina, 1782) and Aulacomya

ater (Molina, 1782): a proline-rich and a glycine-rich polyphenolic protein. Biochim Biophys

Acta, 2000. 1479(1-2): p. 315-320.

49. Rzepecki, L.M., et al., Molecular diversity of marine glues: polyphenolic proteins from five

mussel species. Mol Mar Biol Biotechnol, 1991. 1(1): p. 78-88.

50. Ninan, L., et al., Adhesive strength of marine mussel extracts on porcine skin. Biomaterials, 2003.

24(22): p. 4091-4099.

51. Lee, B., J. Dalsin, and P. Messersmith, Biomimetic adhesive polymers based on mussel adhesive

proteins. Biological Adhesives, 2006: p. 257-278.

52. Penoff, J., Skin closures using cyanoacrylate tissue adhesives. Plastic and reconstructive surgery,

1999. 103(2): p. 730.

53. Brubaker, C.E. and P.B. Messersmith, The present and future of biologically inspired adhesive

interfaces and materials. Langmuir, 2012. 28(4): p. 2200-2205.

54. Farsad, N., Ultrastructural and Histochemical Characterization of the Zebra Mussel Adhesive

Apparatus, in Institute for Biomaterials and Biomedical Engineering. 2010, University of

Toronto, MASc Thesis.

55. Eckroat, L.R., et al., The byssus of the zebra mussel (Dreissena polymorpha): Morphology, byssal

thread formation, and detachment. Mol Mar Biol Biotechnol, 1993: p. 239-263.

56. Xu, W. and M. Faisal, Putative identification of expressed genes associated with attachment of

the zebra mussel (Dreissena polymorpha). Biofouling, 2008. 24(3): p. 157-161.

57. Xu, W. and M. Faisal, Development of a cDNA microarray of zebra mussel (Dreissena

polymorpha) foot and its use in understanding the early stage of underwater adhesion. Gene,

2009. 436(1-2): p. 71-80.

58. Tamarin, A., P. Lewis, and J. Askey, The structure and formation of the byssus attachment

plaque in Mytilus. J Morphol, 1976. 149(2): p. 199-221.

59. Sprung, M., Field and laboratory observations of Dreissena polymorpha larvae: abundance,

growth, mortality and food demands. Archives in Hydrobiology, 1989. 115(4): p. 537-561.

60. Waite, J.H., Process for purifying and stabilizing catechol-containing proteins and materials

obtained thereby. 1984, University of Connecticut, Farmington, Conn.: United States. p. 5.

61. Mortz, E., et al., Improved silver staining protocols for high sensitivity protein identification

using matrix-assisted laser desorption/ionization-time of flight analysis. Proteomics, 2001. 1(11):

p. 1359-1363.

62. Zhu, Y.Y., et al., Reverse transcriptase template switching: a SMART approach for full-length

cDNA library construction. Biotechniques, 2001. 30(4): p. 892-897.

63. Burzio, L.A., et al., In vitro polymerization of mussel polyphenolic proteins catalyzed by

mushroom tyrosinase. Comp Biochem Physiol B Biochem Mol Biol, 2000. 126(3): p. 383-389.

64. Gantayet, A. and E.D. Sone, Novel adhesive proteins identified in the insoluble byssal matrix of

the freshwater zebra mussel Dreissena polymorpha. 2012, University of Toronto, Unpublished

Manuscript: Toronto.

65. Waite, J.H., T.J. Housley, and M.L. Tanzer, Peptide repeats in a mussel glue protein: theme and

variations. Biochemistry, 1985. 24(19): p. 5010-5014.

66. Popiel, H.A., et al., Disruption of the toxic conformation of the expanded polyglutamine stretch

leads to suppression of aggregate formation and cytotoxicity. Biochem Biophys Res Commun,

2004. 317(4): p. 1200-1206.

67. Weaver, R.F., Molecular Biology. Third ed. 2005, New York: McGraw-Hill.

68. Lee, C., et al., Sequence analysis of choriogenin H gene of medaka (Oryzias latipes) and mRNA

expression. Environ Toxicol Chem, 2002. 21(8): p. 1709-1714.

Page 123: Identification and Sequence Analysis of Novel Proteins in ... · Arpita Gantayet Masters of Applied Science Institute of Biomaterials and Biomedical Engineering University of Toronto

111

69. Lyons, C.E., et al., Expression and structural analysis of a teleost homolog of a mammalian zona

pellucida gene. J Biol Chem, 1993. 268(28): p. 21351-21358.

70. Elias, R.J., D.J. McClements, and E.A. Decker, Antioxidant activity of cysteine, tryptophan, and

methionine residues in continuous phase beta-lactoglobulin in oil-in-water emulsions. J Agric

Food Chem, 2005. 53(26): p. 10248-10253.

71. Brennan, T.V. and S. Clarke, Deamidation and isoaspartate formation in model synthetic

peptides: The effects of sequence and solution environment. ChemInform, 1995. 26(32).

72. Marsden, J.E. and D.M. Lansky, Substrate selection by settling zebra mussels, Dreissena

polymorpha, relative to material, texture, orientation, and sunlight. Can J Zool, 2000. 78(5): p.

787-793.

73. Gantayet, A., L. Ohana, and E.D. Sone, Identification and sequence analysis of novel proteins in

the zebra mussel adhesive apparatus. 2012, University of Toronto, Unpublished Manuscript:

Toronto.

74. Gentzel, M., et al., Preprocessing of tandem mass spectrometric data to support automatic

protein identification. Proteomics, 2003. 3(8): p. 1597-610.

75. Ma, B., et al., PEAKS: powerful software for peptide de novo sequencing by tandem mass

spectrometry. Rapid communications in mass spectrometry, 2003. 17(20): p. 2337-2342.

76. Karty, J.A., et al., Artifacts and unassigned masses encountered in peptide mass mapping. J

Chromatogr B Analyt Technol Biomed Life Sci, 2002. 782(1-2): p. 363-383.

77. Wright, H.T., Nonenzymatic deamidation of asparaginyl and glutaminyl residues in proteins. Crit

Rev Biochem Mol Biol, 1991. 26(1): p. 1-52.

78. Stewart, R.J., T.C. Ransom, and V. Hlady, Natural Underwater Adhesives. J Polym Sci B Polym

Phys, 2011. 49(11): p. 757-771.

79. Zhao, H., et al., Probing the adhesive footprints of Mytilus californianus byssus. J Biol Chem,

2006. 281(16): p. 11090-11096.

80. Yano, M., et al., Shematrin: a family of glycine-rich structural proteins in the shell of the pearl

oyster Pinctada fucata. Comp Biochem Physiol B Biochem Mol Biol, 2006. 144(2): p. 254-262.

81. Lei, M. and R. Wu, A novel glycine-rich cell wall protein gene in rice. Plant Mol Biol, 1991.

16(2): p. 187-198.

82. Inoue, K., et al., Mussel adhesive plaque protein gene is a novel member of epidermal growth

factor-like gene family. J Biol Chem, 1995. 270(12): p. 6698-6701.

83. Yu, J., et al., Mussel protein adhesion depends on interprotein thiol-mediated redox modulation.

Nat Chem Biol, 2011. 7(9): p. 588-590.

84. Hwang, D.S., et al., Practical recombinant hybrid mussel bioadhesive fp-151. Biomaterials, 2007.

28(24): p. 3560-3568.

85. Hernandez, P., M. Muller, and R.D. Appel, Automated protein identification by tandem mass

spectrometry: issues and strategies. Mass Spectrom Rev, 2006. 25(2): p. 235-254.

86. Zeng, H., et al., Strong reversible Fe3+-mediated bridging between dopa-containing protein

films in water. Proc Natl Acad Sci U S A, 2010. 107(29): p. 12850-12853.