development of sensitive high performance analytical ...707/fulltext.pdf · comprehensive...

224
1 Development of Sensitive High Performance Analytical Methods for the Comprehensive Characterization of Proteins and Glycoproteins from Samples of Clinical and Biopharmaceutical Importance A dissertation presented by Dipak A. Thakur to The department of Chemistry and Chemical Biology In partial fulfillment of the requirements for the degree of Doctor of Philosophy in the field of Chemistry Northeastern University Boston, Massachusetts June 2011

Upload: others

Post on 19-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Development of Sensitive High Performance Analytical Methods for the

    Comprehensive Characterization of Proteins and Glycoproteins from Samples of

    Clinical and Biopharmaceutical Importance

    A dissertation presented

    by

    Dipak A. Thakur

    to

    The department of Chemistry and Chemical Biology

    In partial fulfillment of the requirements

    for the degree of

    Doctor of Philosophy

    in the field of

    Chemistry

    Northeastern University

    Boston, Massachusetts

    June 2011

  • 2

    Development of Sensitive High Performance Analytical Methods for the

    Comprehensive Characterization of Proteins and Glycoproteins from Samples of

    Clinical and Biopharmaceutical Importance

    by

    Dipak A. Thakur

    ABSTRACT OF DISSERTATION

    Submitted in partial fulfillment of the requirements for the degree

    of Doctor of Philosophy in Chemistry in the Graduate School of

    Arts and Sciences of Northeastern University, June 2011

  • 3

    ABSTRACT

    This thesis focuses on the development of ultra sensitive high resolution

    analytical methods for the characterization of proteins and glycoproteins from samples of

    clinical and biopharmaceutical origin. In the first instance the combination of laser

    capture micro dissection (LCM) for the selective enrichment of homogenous but low

    number cell populations in combination with down-stream porous layer open tubular

    column (PLOT) liquid chromatography-mass spectrometry (LC-MS) using both one- and

    two-dimensional separations is described. The second portion of the thesis describes the

    ultra high performance analysis of intact recombinant a-human chorionic gonadotrophin

    glycoforms using capillary electrophoresis with accurate mass high resolution Fourier

    transform ion cyclotron resonance mass spectrometry (CE-FTMS).

    In Chapter 1 an overview of current analytical methods and technologies applied

    in the field of proteomics is discussed. A critique of these technologies is also performed

    laying down the foundations for the developments and improvements in current state-of-

    the-art as presented in the subsequent Chapters.

    In Chapter 2 the development of a micro-proteomic workflow for the

    comprehensive analysis of just 10,000 cells, collected by LCM, from invasive and

    metastatic epithelial cell types from a breast cancer patient is described. To minimize

    sample loss the development of an efficient sampling handling approach was necessary.

    To achieve this protein level separation and subsequent enzymatic digestion of the cell

    lysate was performed using short distance SDS-PAGE separation on tricine-PAGE gels.

    By combining this sample clean-up and fractionation approach with ultrasensitive 1D

    PLOT LC-MS in excess of 1,000 proteins were identified following injection of just

  • 4

    1/10th

    of the digested lysate or approximately 1,000 cells. The micro-proteomic workflow

    is highly suited for the comparative analysis of such small but highly informative LCM

    collected cell populations, more than 100 proteins were found to be differentially

    expressed thereby facilitating a deeper understanding of the associated biological changes

    associated with the invasive to metastatic transition.

    In Chapter 3 the application of an online 2D-RP/SCX/SPE/PLOT LC-FT-MS micro-

    proteomics platform is presented for the comparative proteomic analysis of LCM

    collected normal and triple negative breast cancer cell population. Using the effective

    sample handling approach described in Chapter 2 followed by fractionation and ultra

    sensitive analysis of the lysate, the tryptic digest corresponding to 4,000 cells using the

    2D-RP/SCX/SPE PLOT LC-FT-MS platform in excess of 15,000 unique peptides

    corresponding to 4,259 proteins were identified. This deep proteome coverage further

    emphasizes the utility of the developed micro-proteomic platform for the analysis of trace

    quantities of proteins generated from small but highly biologically important LCM

    enriched cell populations.

    In chapter 4 the development and application of a high resolution CE-FTMS method for

    intact glycoform profiling of recombinant α-human chorionic gonadotrophin is described.

    The CE separation parameters used allowed for the rapid analysis, 60 different glycoforms bearing up to nine sialic acids in addition to other

    glycoforms differing by the number and extent of uncharged monosaccharides. A low

    volume pressurized liquid junction, which preserves the high resolution of the CE

    separation, was used to interface the CE system with high resolution FTMS thereby

    allowing accurate determination of charge state and accurate mass of each intact

  • 5

    glycoform following deconvolution. In addition to the intact glycoform, profiling analysis

    of glycopeptides and glycans was also performed to determine and assign the population

    of oligosaccharides present at each individual glycosite, thereby facilitating complete and

    comprehensive characterization of r-ahCG. The methodology developed in Chapter 4 was

    further applied to the analysis of r-αhCG from different expression systems, CHO and

    murine cell based. The CE-FTMS method is readily applicable for characterization of

    drug substance/product as well as in process monitoring of these complex glycoforms.

  • 6

    ACKNOWLEDGEMENT

    I want to express my sincere and heartfelt gratitude to many people, teachers, colleagues

    and friends, who have helped me in reaching this milestone.

    First, I would like to acknowledge my thesis advisor, Professor Barry L. Karger, for

    accepting me as his student and giving me an opportunity to work in his research group.

    His guidance was constructive and aimed at bringing best out of me as a scientist and a

    person. Importantly, I was inspired and motivated by his wisdom, enthusiasm and

    commitment to highest standards.

    I would like to thank Dr. Tomas Rejtar for devoting his time and energy while guiding

    me on various projects. I would like to appreciate Dr. Marina Hincapie, Dr. Andras

    Guttman, Dr. Billy Wu, Dr. Shujia Dai, Dr. Sanwon Cha and Dr. Jonathan Bones for

    sharing their knowledge and expertise.

    I would like to thank my dissertation committee members, Prof. Paul Vouros, Prof.

    Graham Jones and Prof. Roger Giese for their time, suggestions and guidance.

    Many thanks to Dr. Buffie Clodfelder-Miller (Cellular and Molecular Neuropathology

    Core, University of Alabama), Elizabeth Richardson, Shemeica Binns, Sonika Dahiya

    and Dennis Sgroi (Massachusetts General Hospital) for providing precious LCM

    samples. I would like to thank our collaborators N.Washburn, C.J. Bosques, N.S.Gunay,

    Z.Shriver, and G.Venkataraman (Momenta Pharmaceuticals) for supporting glycoform

    profiling project and for their full contribution towards the glycan analysis.

    I would like to acknowledge the support and friendship of current and former researchers

    of Barnett Institute, Dr. E.Moskovets, Dr. Vickor Andreev, Dr. Quanzhou Luo, Dr.

  • 7

    Guihua Yue, Mr. Laxmi Manohar Akella, Dr. Claudia Donnet, Dr. Enrique Avarelo, Dr.

    Zoltan Sabo, Dr. Jim Glick, Somak Ray; previous and current graduate students lingyun

    Li, Ye Gu, Dongdong Wang, Majlinda Kulloli, Agnes Rafalko, Jonna Linholm-Ventola,

    Jack Liu, Chen Li, Peter Li, Chris Morgan, Vaneet Sharma, Rose Gathungu, Joshua

    Klaene and Fateme Tousi.

    I would like to express my gratitude to Jeffrey Kesilman, Felicia Hopkins, Richard

    Pumphrey, Andrew Bean, Jana Volf and Bill O,Neil for their support.

    I would like to acknowledge my wife, Vaishali, daughter Radhika, and son Hrishikesh for

    their love, support, sacrifice and compromise during 5 long years. Many many thanks to

    my parents, Sudha and Arjun Thakur, for their support, encouragement and care. I would

    like to thank my brother, Ganesh and his family, for supporting, guiding and encouraging

    me during my graduate studies. I would like to express my gratitude to my sister Jyoti

    and her family for their support and encouragement.

  • 8

    TABLE OF CONTENTS

    ABSTRACT………………………………………………………………………. 3

    ACKNOWLEDGEMENT………………………………………………………… 6

    TABLE OF CONTENTS………………………………………………………….. 8

    LIST OF FIGURES.…………………………………………………………….…..14

    LIST OF TABLES……………………………………………………………..……16

    LIST OF ABBREVIATIONS AND CONVENTIONS…….……………………….16

    Chapter 1: Overview of Technologies and Methodologies for Proteomics

    Analysis…………………………………………………………………………..…19

    1.1 Introduction………………………………………….…….…………………….20

    1.1.1 Proteomics: An Overview………………………………………………….….20

    1.2 Shotgun Proteomics Methodologies…………………………………………..…23

    1.2.1 Samples………………………………………………………………………...25

    1.2.1.1 In Vitro Sample Source: Cell lines…………………………………….….....25

    1.2.1.2 In Vivo Sample Sources…………………………………….……………..…26

    1.2.2 Tissue Microdissection………………………………….……………………...28

    1.2.2.1 Laser Capture Microdissection………………………….……………….…...30

    1.2.2.2 Laser Microbeam Microdissection (LMM) ….......................................... 32

    1.2.2.3 Comparison of LCM and LMM……………………………..……………….33

    1.2.3 Sample Preparation……………………………………………..………..….…34

    1.2.3.1 SDS-Polyacrylamide Gel Electrophoresis (SDS-PAGE) ……….……...…..36

    1.2.4 Separation Techniques…………………………………………………….…...38

    1.2.4.1 High Pressure Liquid Chromatography…………………………………..... 38

    1.2.5 Mass Spectrometry…………………………………………………..………..40

    1.2.5.1 Ionization Methods……………………………………………………….. 40

    1.2.5.2 Mass Analyzers………………………………………………………….. 42

    1.2.5.3 Database Searching Tools for Proteomics……………………………….. 47

    1.3 Microproteomics………………………………………………………….. 54

    1.3.1 Alternative strategies for protein digestion………………………………….. 56

    1.3.1.1 Solvents based approach………………………………………………….. 56

    1.3.1.2 Cleavable surfactant……………………………………………………….. 57

  • 9

    1.3.1.3 Filter-Aided Sample Preparation (FASP) ……………………………….. 59

    1.3.2 High Performance Liquid Chromatography for Microproteomics………….. 61

    1.3.2.1 Peak Capacity………………………………………………………….. 61

    1.3.2.2 Narrow-bore column and ESI-MS……………………………………….. 64

    1.3.2.3 Porous Layer Open Tubular (PLOT) Columns…………………………….. 66

    1.4 Protein Glycosylation Analysis……………………………………………….. 71

    1.4.1 Intact Glycoprotein Analysis……………………………………………….. 73

    1.4.1.2 Capillary Electrophoresis………………………………………………….. 73

    1.4.1.3 Capillary Electrophoresis Coupled to Mass Spectrometry……………….. 77

    1.4.1.4 Application of CE-MS for Analysis of Intact Glycoforms……………….. 80

    1.4.2 Glycan analysis………………………………………………………….. 81

    1.4.2.1 Glycan release methods………………………………………………….. 82

    1.4.2.2 Enzymatic Sequencing of Oligosaccharides…………………………….. 82

    1.4.2.3 HPLC analysis of glycans……………………………………………….. 85

    1.5 References……………………………………………………………….. 89

    Chapter 2: Proteomic Analysis of 10,000 Laser Captured Microdissected Breast

    Tumor Cells Using Short Migration on SDS-PAGE and Porous Layer Open

    Tubular (PLOT) LC-MS…........………………………………………….. 101

    ABSTRACT……………..………………………………………………….. 102

    2.1 Introduction……….…………………………………………………….. 104

    2.2 Experimental Section………………………………………………………….. 106

    2.2.1 Chemicals………………….………………………………………….. 106

    2.2.2 Clinical Specimens………………………………………………………….. 106

    2.2.3 Laser Capture Microdissection…………………………………………….. 107

    2.2.4 Cell Lysis, SDS-PAGE and In-Gel Digestion……………………………….. 107

    2.2.5 Nano LC-ESI-MS with 10 µm i.d. PLOT Column………………………….. 108

    2.2.6 Protein Identification……………………………………………………….. 109

    2.2.7 Identification of Differentially Abundant Proteins by Spectral Counts...…….110

    2.2.8 Reproducibility of Replicate Analyses of Metastatic and Invasive Breast

    Cancer Samples. ………………………………………………………………….. 111

    2.2.9 Gene Ontology Annotation with DAVID (Database for Annotation,

    Visualization and Integrated Discovery)………………………………………….. 111

    2.3 Results and discussion……………………………………………………….. 112

  • 10

    2.3.1 Overview of Proteomic Workflow………………………………………….. 112

    2.3.2 Cell Lysis and Protein Extraction from the LCM Cap…………………….. 113

    2.3.3 Short SDS-PAGE Run for In-Gel Digestion……………………………….. 114

    2.3.4 Online PLOT/LC-ESI-MS……………………………………………….. 114

    2.3.5 Proteomic Analysis of Three Replicates of 10,000 Breast Cancer Cells…….. 118

    2.3.6 Identification of Differentially Expressed Proteins………………………….. 119

    2.3.7 Gene Ontology Analysis………………………………………………….. 121

    2.4 Conclusions………………..…………………………………………….. 125

    Addendum to Chapter 2………………………………………………………….. 127

    Evaluation of Short SDS-PAGE Separation Distance for Sample

    Preparation of Small Protein Amounts Prior to LC/MS Proteomic Analysis…….. 127

    2.1A Methods and Materials……………………………………………………….. 127

    2.1.1 Chemicals…………….……………………………………………….. 127

    2.1.2 SDS-PAGE Separation and In-Gel Digestion……………………………….. 127

    2.1.3 LC-MS/MS Analysis……………………………………………………….. 130

    2.1.4 Protein Identification……………………………………………………….. 130

    2.2A Results…………………………..…………………………………….. 131

    2.3 Reference……………………………………………………………….. 132

    Chapter 3: Comparative Proteomic Analysis of 10,000 Triple Negative

    Breast Cancer and Normal Mammary Epithelial Laser Microdissected

    Cells Using On-line 2D RP-SCX/Porous Layer Open Tubular Column

    (PLOT) LC-MS…………………………………………………………….. 134

    Abstract………………………….………………………………………….. 135

    Introduction…………….…………………………………………………….. 136

    2. Materials and Methods………………………………………………………….. 140

    2.1. Chemicals and Materials……………………..……………………………….. 140

    2.2. Laser Capture Microdissection……………….……………………………….. 140

    2.3. Protein Extraction and Digestion…………………………………………….. 141

    2.4. Column Preparation and Two-Dimensional Separation………………………. 142

    2.5. MS Analysis and Data Analysis…………………………………………….. 145

    2.6. Spectral Index (SpI) for Identification of Differentially Abundant Proteins….. 146

  • 11

    2.7. Gene Ontology by DAVID (Database for Annotation, Visualization

    and Integrated Discovery) a Functional Annotation Clustering Tool……….. 147

    2.8 Gene Set Enrichment Analyses (GSEA) for Functional Significance

    of Differentially Abundant Proteins………………………………………….. 147

    3. Results and Discussion………………………………………………………….. 148

    3.1 Experimental and Bioinformatics Workflow for Proteomic Analysis of

    10,000 LCM Collected Normal and Cancer Breast Epithelial Cells. ……….. 148

    3.2. Peptide and Proteins Identification…………………………………………... 150

    3.3. Spectral Index Analysis for Determination of Differentially Abundant

    Proteins. ……………………………………..…………………………………….. 152

    3.4 DAVID Functional Annotation Analysis of Differentially Abundant Proteins…154

    3.5 Gene Set Enrichment Analyses (GSEA) for Canonical Pathway Analysis….. 156

    Conclusions………….……………………………………………………….. 160

    References…………….…………………………………………………….. 162

    Chapter 4: Characterization of the Intact α- Subunit of Recombinant Human

    Chorionic Gonadotropin Glycoforms by High Resolution CE-FT-MS*…….. 165

    Abstract………………….………………………………………………….. 166

    4.1 Introduction……………………….…………………………………….. 167

    4.2 Experimental…………………………….……………………………….. 171

    4.2.1 Recombinant r-αhCG ……………………………………………………….. 171

    4.2.2 Chemicals………………………………….………………………….. 171

    4.2.3 CE-MS System………………………………………………………….. 172

    4.2.4 Deglycosylation and Analysis of Released Glycans……………………….. 176

    4.2.5 Trypsin Digestion of r-αhCG Expressed in a Murine Cell Line…………….. 177

    4.2.6 LC-MS Analysis of r-αhCG Tryptic Digest……………………………….. 177

    4.2.7 Data Analysis………………………………………………………….. 178

    4.3 Results and Discussion……………………………………………………….. 180

    4.3.1 Intact Protein Analysis……………………………………………………….. 180

    4.3.2 Repeatability of the Intact Protein Separation……………………………….. 185

    4.3.3 Analysis of the Released Glycans………………………………………….. 188

    4.3.4 Glycopeptide Analysis……………………………………………………….. 199

    4.3.5 Analysis of Combined Data………………………………………………….. 202

    4.3.6 Analysis of r r-αhCG Expressed in CHO Cell Culture…………………….. 214

  • 12

    4.4 Conclusions…….……………………………………………………….. 217

    4.5 References ………………………………………………………………….…...219

    Chapter 5: Summary and Future Directions…………………………………. 221

  • 13

    LIST OF FIGURES

    Chapter 1

    Figure 1.1 Conceptual organization of proteomic experiments………………... 22

    Figure 1.2 Human islet protein reference map……………………………………... 23

    Figure 1.3.The principles of laser capture microdissection (LCM) …………….... 31

    Figure 1.4 Common matrices used in MALDI mass spectrometry…………….... 41

    Figure 1.5 Operational principle of the FTICR…………………………………... 45

    Figure 1.6 Cutaway view of the Orbitrap mass analyzer……………………………47

    Figure 1.7 Low energy collision induced dissociation of peptide………………... 48

    Figure 1.8 Mobile Proton Theory………………………………………………... 49

    Figure 1.9. Illustration of effect of concentration of analytes and flow

    rate on ESI processes………..................................................................…... 63

    Figure 1.10 Comparison of normal flow rate electrospray vs. a lower

    flow rate electrospray. ……………………………………………………………... 65

    Figure 1.11 Schematic diagram of the low dead volume connections

    used to design 1D and 2D SPE-PLOT system……………………………………... 67

    Figure 1.12 Diagram of the advanced on-line 2-D SCX/PLOT/MS system using

    a 3.2 m* 10 µm i.d. PLOT column and an online triphasic trapping column…….. 68

    Figure 1.13 Chemical diversity of glycans………………………………………... 72

    Figure 1.14 Electric double layer at the capillary wall and creation of EOF.......... 75

    Figure 1.15 Different types of CE/MS interfaces…………………………………. 78

    Figure 1.16 CZE-ESI-MS analysis of a recombinant human EPO. …..………….. 81

    Figure 1.17 Exoglycosidases commonly used to determine the structure

    of the N-glycans……………………………………………………………………. 84

  • 14

    Chapter 2

    Figure 1. Shotgun proteomic workflow for the analysis of 10,000 LCM collected

    breast cancer cells collected from breast tumor and lymph node tumor…………...113

    Figure 2. Optimization of LC-MS parameters……………………………………….115

    Figure 3. Assessment of the variability in proteomic profiles associated

    with three replicate runs each of invasive and metastatic breast cancer

    samples (three samples of 10,000 cells each)…………………..……………….. 120

    Figure S1. Selection of gel type and SDS-PAGE separation distance for

    proteomic analysis of small sample amounts……………………………………. 129

    Chapter 3

    Figure 1. Shotgun proteomics workflow to analyze breast epithelial

    cells collected from normal and triple negative breast tumor epithelium……….... 148

    Figure 2. Peptide and protein identifications from 6 salt steps……………... 150

    Figure 3. Peptide and protein identifications in the six samples. …………………..151

    Figure 4. Participants of cell cycle (G1-S Phases) were significantly

    enriched in triple negative breast cancer (TNBC) cells……………………... 157

    Figure 5. Structural molecular organization was significantly deficient

    in triple negative breast cancer (TNBE)…………………………………….. 159

    Chapter 4

    Figure 1A Diagram of CE-MS system for analysis of intact glycoproteins………. 172

    Figure 1B. Photograph of CE system coupled to LTQ-FTMS for

    analysis of intact glycoproteins…….………………………………………. 175

    Figure 2 Illustration of the separation resolution of CE-MS analysis

    of intact α-hCG derived from a murine cell line………..……………………. 181

    Figure 3A. CE-MS separation of r-αhCG produced in a murine cell line……….. 182

    Figure 3B CE-MS separation of r-αhCG produced in a murine cell line……….. 183

    Figure 4: Chromatograms and fragmentation spectra of glycan analysis……….. 189

    Figure 5: LC/MS/MS analysis of sulfated and α-galactose containing N-glycans.. 190

    Figure 6: Exoglycosidase characterization of

    galactose-α-galactose-containing species……………………………………….... 191

    Figure 7. CE-MS separation of r-αhCG produced in a CHO cell line….…... 214

  • 15

    LIST OF TABLES

    Chapter 2

    Table 1. Number of proteins identified per gel section per sample from

    three technical replicates of 10,000 mouse liver cells……………………… 117

    Table 2. Number of proteins identified per gel section per sample

    from three replicates of 10,000 invasive breast cancer cells……...…………..……119

    Table 3. Enriched Gene-Ontology (GO) terms for with FDR less than

    5% and P value less than 0.05 are shown in bold………………………….. 123

    Table S1. Peptides and proteins identified using three SDS-PAGE

    separation conditions……………………………………………………….. 131

    Chapter 3

    Table 1. Details about normal breast specimens and triple negative breast

    cancer specimens……………………………………………………………………141

    Table 2. List of differentially abundant proteins between TNBE and BNE……….. 153

    Table 3. Representative enriched, functional clusters with corresponding

    GO terms for differentially expressed proteins identified by DAVID……………. 155

    Table 4. List of the canonical pathways found to be overrepresented in

    TNBE samples. ..…………………………………………………………………….156

    Table 5. List of the canonical pathways found to be overrepresented

    in NBE samples………………………………………………………………………158

    Chapter 4

    Table 1. Repeatability of peak area measurements for 20 glycoforms on r-αhCG….186

    Table 2. Summary table N-linked glycans in r-αhCG……………………………….194

    Table 3. Abundance of individual glycopeptides……………………………………200

    Table 4. List of theoretical and observed glycoforms ………………………………204

    Table 5. Abundance of r- hCG glycoforms produced in CHO cells ………………216

  • 16

    LIST OF ABBREVIATIONS AND CONVENTIONS

    2D GE Two-dimensional gel electrophoresis

    2-AB 2-amino benzamide

    CE Capillary electrophoresis

    CID Collision Induced Dissociation

    CPAS Computational proteomics analysis system

    CTC Circulating tumor cells

    CZE Capillary zone electrophoresis

    DAVID Database for annotation, visualization and integrated discovery

    DTA Sequest data files

    DTT dithiothreitol

    EIE Extracted ion electropherograms

    EOF Electroosmotic flow

    ESI Electrospray ionization

    FASP Filter-aided sample preparation

    FDR False discovery rate

    FFPE Formalin-fixed paraffin-embedded

    FTICR Fourier Transform Ion Cyclotron Resonance

    GO Gene ontology

    GSEA Gene set enrichment analyses

    HILIC Hydrophilic interaction liquid chomatography

  • 17

    IAA Iodoacetamide

    ICAT Isotope-Coded Affinity Tag

    INV Invasive

    IPG Immobilized pH gradient

    IPI International Protein Index

    IR Infra red

    IT Ion Trap

    iTRAQ Isobaric tags for relative and absolute quantitation

    LCM Laser capture microdissection

    LMM Laser Microbeam microdissection

    LTQ Linear Ion Trap

    MALDI Matrix-assisted laser desorption/ionization

    MBE Invasive malignant breast epithelial

    MCM Minichromosomal maintenance

    MET Metastatic

    MS Mass spectrometry

    NBE Normal breast epithelial

    NBE Non-cancerous breast epithelial

    NCBI National Center for Biotechnology Information

    PALM Pressure assisted Laser microdissection

    PGC Porous graphitic carbon

    PLOT Porous-layer open-tabular

  • 18

    ppb Parts per billion

    ppm Parts per million

    PRLC Reverse-phase liquid chromatography

    PS-DVB Poly Styrene- Divinyl benzene

    r-αhCG Recombinant human chorionic gonadotrophin

    SCX Strong Cation Exchange

    SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis

    SILAC Stable isotope labelling by amino acids in cell culture

    SPE Solid phase extraction

    SpI Spectral index

    TNBC Triple negative breast cancer

    TNBE Triple negative malignant breast epithelial

    TOF Time-of-flight

    UV Ultraviolet

    Xcorr Cross-correlation score

  • 19

    Chapter 1: Overview of Technologies and Methodologies for Proteomics Analysis

  • 20

    1.1 Introduction

    1.1.1 Proteomics: An Overview

    Proteomics[1] offers a complementary approach to genomic technologies by

    investigating biological phenomena on the global protein level. The emergence of

    mass spectrometric-based proteomic technologies has advanced our understanding of

    the complexity and dynamic nature of proteomes, at the same time revealing that no

    „one-size-fits-all‟ proteomic strategy can be used to solve all biological problems. Two

    technologies have been responsible for the recent, rapid advance of proteomics : first,

    the development of new strategies for peptide sequencing using mass spectrometry,

    including soft ionization techniques, such as electrospray ionization (ESI) and

    matrix-assisted laser desorption/ionization (MALDI); and second, the miniaturization

    and automation of liquid chromatography. However, the high expectations on the

    potential of proteomics have been slowed with the discovery of huge molecular

    complexity and dynamic nature of the proteome, introducing difficulties greater than

    those encountered for either genome or transcriptome studies. In particular,

    complexities related to splice variants, post-translational modifications (PTM) ,

    dynamic ranges covering ten orders of magnitude or more of protein abundance in

    plasma, protein stability and dependence on cell type or physiological state have

    challenged our ability to characterize proteomes comprehensively in a reasonable time

    [2,3,4].

    Despite the above challenges, proteomic technologies have already

    significantly contributed to the life sciences and are today an integral part of biological

  • 21

    research efforts. Currently, the field of proteomics covers diverse research topics such

    as, protein expression profiling, analysis of signaling pathways, and protein biomarker

    discovery, among others [4]. It is important to be aware that within each area, unique

    proteomic approaches need to be applied; these approaches differ widely in their

    requirement of skills, difficulty and expense. Based on the objectives, the proteomic

    experiments are categorized into either discovery or assay. Proteomic assay

    experiments investigate a quantitative change in a small, predefined set of proteins or

    peptides, whereas discovery experiments focus on the analysis of large, unbiased sets

    of proteins. The measurement of cardiac troponins in human plasma samples is one

    such example of an assay experiment [3,4]. An example of the discovery proteomic

    experiment is the Human Proteome Organization Plasma Project, which aims to

    catalog all proteins and peptides in the human plasma.

    The discovery proteomics experiments are divided into comprehensive, broad scale or

    focused approaches because these distinctions determine how a biological question is

    approached technically. The comprehensive approaches aim at enumerating as many

    components of a biological system as possible [5]. Next, broad-scale experiments

    target a selected fraction of the expressed proteome, for example, the

    phosphoproteome, glycoproteome, etc. The comprehensive and broad-scale

    experiments are used to profile qualitative and quantitative changes in the system

    taking place as a result of perturbation to a biological system or differences in genetic

    background [6,7]. Whereas focused approaches, such as identification of components

    of a protein complex, involve co-purification of relatively few interacting proteins and

    their analysis, here, the aim is to identify the components of multiprotein complexes

  • 22

    and their interaction mechanisms in order to understand physiological and pathogenic

    processes. Once components of multiprotein complexes are determined, they are

    further monitored using the assay methods to develop therapies [8].

    Characterization of a single protein that is isolated from natural or recombinant

    sources involves determination of its mass, identity, post-translational modifications

    and purity. The comprehensive characterization task draws on decades of experience

    in protein chemistry [4]. Figure 1.1 presents a diagram of the various components of

    proteomics discovery and assay.

    Figure 1.1 Conceptual organization of proteomic experiments. Reprinted from

    reference [4].

  • 23

    1.2 Shotgun Proteomics Methodologies

    Figure 1.2 Human islet protein reference map. The proteins were loaded onto an IPG

    strip (pH 3-10) and subsequently separated by mass on a gradient (8-12%) SDS-PAGE

    gel. Reprinted from reference [9].

    The combination of two dimensional gel electrophoresis and mass

    spectrometry (2DE-MS) has traditionally been used to determine changes in protein

    identity and protein abundance in a complex protein mixture [10]. Using this

    combination, a protein mixture is first separated based on isoelectric point and then by

    molecular weight to almost single protein spots, therefore this strategy is sometimes

    called the “single protein” method [11]. To identify individual proteins separated by

    2DE, the excised gel pieces are subjected to in-gel digestion and subsequent analysis

    using tandem mass spectrometry. As this method provides very high resolution, the

    visible image of a stained 2D gel is used to observe changes in protein abundance,

    http://pubs.acs.org/action/showImage?doi=10.1021/pr050024a&iName=master.img-001.jpg&type=master

  • 24

    protein isoform and protein modification [9]. Figure 1.2 shows an example of a

    complex 2D gel pattern from a proteome. While powerful, the method is difficult to

    automate, is slow to operate and does not work well with highly hydrophobic proteins

    [12].

    In the past few years shotgun proteomics, introduced by Yates et al. (10) has

    replaced conventional 2DE-MS (2-dimensional gel electrophoresis- mass

    spectrometry) due to its inherent high throughput capability and its ability to detect

    and quantitate more proteins than 2D gel electrophoresis. Shotgun proteomics is a

    method, in which the total proteome is digested to peptides, and the resulting highly

    complex peptide mixture is separated by one-dimensional or 2- dimensional liquid

    chromatography coupled to mass spectrometry (MS). The method consists of four

    steps: sample preparation, liquid chromatography, MS and data processing. The results

    are interpreted using bioinformatics tools that are rapidly developing [13]. The sample

    preparation for proteomic analysis involves multiple steps such as protein extraction,

    enrichment, digestion and peptide clean-up. The sample preparation step extracts the

    proteins from the biological specimen such as blood, cell lines or tissues. The

    extracted protein mixture may be further fractionated to reduce the protein complexity

    using chromatographic, electrophoretic or affinity purification procedures. To facilitate

    their identification, the proteins are digested with highly specific proteolytic enzymes,

    such as trypsin, to generate fragments of suitable mass for MS detection. The digested

    peptides are subsequently separated using high performance liquid chromatography

    coupled to ESI or MALDI mass spectrometry. Both precursor mass and MS/MS

    fragmentation spectra can be used to determine and quantitate the peptides. Generally,

  • 25

    the tandem mass spectra, which provide peptide sequence data based on MS/MS

    fragmentation patterns, are searched against a specific protein database (e.g. NCBI and

    Swiss-Prot[14]) using various algorithms (e.g., Mascot[15] or SEQUEST [16]) to

    determine protein identity. The advantage of shotgun proteomics over the 2DE

    approach is that the former can analyze hydrophobic membrane proteins as well as

    proteins with a broad range of pI or size. In addition, the protein dynamic range which

    shotgun method covers can be higher than that covered by the 2DE method [17].

    1.2.1 Samples

    Cancer is one of the leading causes of death worldwide. In order to develop

    treatment for cancer, protein biomarkers, which can be an (1) indicator of presence of

    disease, (2) disease reduction or progression, and (3) response to the treatment, are

    highly desired. During biomarker discovery proteomic experiments, a variety of

    sample sources can be used, such as cell lines, tissues and body fluids.

    1.2.1.1 In Vitro Sample Source: Cell lines

    Cell lines are routinely used in proteomic studies as they may be easily

    manipulated with different chemical additives or physical conditions. Because the

    population of cells can be large (as many as 100,000,000 cells), there are no

    limitations with respect to the amount of sample available. Cancer cell lines are

    extensively studied using quantitative proteomics for:

    1) identification of differentially abundant proteins between diseased and normal cells of

    the same type,

  • 26

    2) identification of pathways associated with specific phenotype. e.g., cancer progression,

    3) drug resistance studies, and proteins secreted by cancer cell lines for potential

    biomarker discovery [18].

    One must, however, always keep in mind that a cell line is a model system that may

    or may not represent the in vivo condition [19].

    1.2.1.2 In Vivo Sample Sources

    Biofluids

    In contrast to cell lines, body fluids such as serum[20], plasma[21], saliva[22],

    urine, nipple aspirate, cervical –vaginal fluid[23] and exhaled breath condensate[24]

    closely represent the in-vivo biological events. Compared to biopsied samples, the

    biofluids are easy to collect at low cost using less invasive methods [25,26]. Among

    the body fluids, blood, the most common human sample used in diagnosis, is often the

    focus for the discovery of protein biomarkers for disease [26,27]. However, the

    challenges with analysis of serum or plasma are high complexity of proteome with a

    wide dynamic range (at least 10 orders of magnitude[28]) and anticipated low relative

    abundance of many disease-specific biomarkers.

    Compared to blood, proximal fluids, a body fluid which is close to or in direct

    contact with the site of disease, can be an attractive alternative sample type for

    biomarker discovery. The proteins or peptides secreted, shed or leaked from diseased

    tissue, are likely to be enriched in proximal fluids with respect to both blood and

    disease-free control fluid of the same type[29]. The examples of proximal fluids are

    urine for bladder and kidney disease, nipple aspirate or ductal lavage for breast cancer,

  • 27

    and cerebrospinal fluid for intracranial processes[30]. Evidence of marker enrichment

    in proximal fluids was demonstrated with a study of ovarian cancer, where both

    ovarian cyst fluid and ascites fluid constituted proximal fluid[31].

    Tissue Samples

    Compared to blood and proximal fluids, analysis of tissue offers several

    important advantages. 1) During the biomarker discovery on tissue samples, the

    proteins are studied in their surroundings. 2) The possibility of identifying potential

    biomarkers is highest in damaged/diseased tissues as they are likely to be concentrated

    in those tissues. Therefore, it makes sense to look for markers in tissue samples due to

    their higher concentration and relatively narrower dynamic range of proteins. To

    perform the discovery studies, tissue samples can be used either from animal models

    or from human biopsied samples. Mouse [32-34] and rat [35,36] are two of the most

    widely used animal models for proteomic research, though human biopsies are the

    most appropriate samples to study human diseases. However, human biopsied samples

    are not as easily available as tissue samples from animal models, and controlled

    experiments are clearly much easier to perform on animal models. The biopsied

    samples require extra care during their processing and storage. That is, the tissue

    specimens are frozen immediately after their excision and stored at -80ºC.

    Conventionally, in order to preserve all the biopsied samples and to maintain their

    morphology, the samples are fixed in formalin and embedded in paraffin[37]. The

    formalin fixation causes cross-linking of the proteins, and the paraffin limits water

    contact.

  • 28

    Huge collections of formalin fixed and paraffin embedded biopsied samples, are

    preserved and last many years [38]. Such samples have a well documented clinical

    history of individual patients and are available for prospective analysis. To perform

    proteomic analysis of FFPE samples, decross-linking and efficient extraction of

    proteins are necessary. To remove paraffin from FFPE tissue blocks, the bocks are

    treated with xylene. Further, formalin fixed tissue blocks are boiled in a solution

    containing metal ions. This procedure is termed heat induced antigen retrieval [39,40].

    High temperature, above 90ºC, is found to be essential to decross-link methylene

    bridges between the proteins. The studies performed on FFPE samples, in order to

    obtain the comprehensive proteome, use two different approaches. The first is

    extraction of intact proteins with SDS and high temperature [41-43]. The

    commercialized product, called Qproteome FFPE tissue kit (QIAGEN, Germantown,

    MD), uses proprietary chemistry to extract full length proteins for subsequent analysis.

    The second approach, a novel approach, is to perform in-solution enzymatic digestion

    on FFPE samples, after heat induced decross-linking, to directly obtain peptides for

    shotgun proteomic analysis[25]. The commercialized product, based on the later

    extraction principle, is called Liquid Tissue-MS protein prep kit (Expression

    Pathology, Inc. Rockville, MD).

    1.2.2 Tissue Microdissection

    The microenvironment of a tumor tissue sample is highly heterogeneous[44].

    The pathologist identifies malignant cells based on their differential staining and

    morphology. The malignant cells are surrounded by normal-related and other types of

  • 29

    cells in the tissue matrix. In order to perform a detailed study of biopsied samples to

    gain information on proteomic changes between malignant and normal cells, the

    malignant cells need to be separated into a homogeneous population. Tissue

    microdissection is an indispensible tool to enrich distinct cell types from

    heterogeneous tissue matrix in an efficient and accurate manner. Before the advent of

    microdissection, fluorescence-activated cell sorting (flow cytometry)[45] and

    magnetic-bead based cell sorting[46] were the methods of choice for cell separation.

    However, these methods employed enzymes for breakage of tissue structure, which

    may alter or modify the cellular constituents in a number of ways. Microdissection

    techniques have the advantage over cleavage that they allow selection of individual

    cells under the microscopic inspection of the intact tissue.

    Microdissection techniques can be classified into two major classes, manual

    microdissection and laser assisted microdissection. Early efforts to dissect specific

    cell types from tissue sections used sharp tools such as scalpel blades and needles [47].

    The other manual dissection technique called “negative ablation”, as the name

    suggests, destroys the unwanted cells surrounding cells of interest and collects the non

    ablated cells using the needle [48].

    Though, manual microdissection techniques were useful in obtaining a

    homogeneous cell population, these methods were slow, tedious and required

    considerable expertise to perform. In addition, the manual microdissection techniques

    suffered due to issues such as sample handling and contamination. To address these

    issues and to perform fast, clean and accurate microdissection, laser based-

  • 30

    microdissection technology which includes laser capture microdissection and laser

    microbeam microdissection, was developed [49]. Over the years, this technique has

    proved to be effective, as more than one thousand research articles have been

    presented on the samples procured using this technique [50] .

    1.2.2.1 Laser Capture Microdissection

    Laser Capture Microdissection (LCM) is a laser based cell procurement

    method that was developed in mid 1990s by Emmert-Buck, Liotta and colleagues at

    the National Institute of Health (NIH) and designed to perform fast and accurate

    microdissection of tissue samples [49,51]. The earliest design of the LCM system was

    commercialized by Arcturus Biosciences Inc. (now part of Applied Biosciences), and

    later on Leica and PALM introduced a non-contact based LCM system based on a

    technique called laser microbeam microdissection.

  • 31

    Figure 1.3. The principles of laser capture microdissection (LCM). (a) The scheme of

    LCM. (b) Comparison of properly melted polymer spots and poor spots. Only cell

    lying within the dark ring of melted polymer will be targeted for LCM. (c) Physical

    forces involved in LCM. (d) A single cell bound to the thermolabile polymer.

    Reprinted from reference [52].

    The principles of contact based LCM technology are shown in Figure 1.3. In

    brief, the tissue specimens are first processed by sectioning and staining, and then

    examined to identify cells of interest based on their staining and morphology. To

    selectively capture cells of interest from the tissue sections, an LCM cap with a

    thermolabile polymer membrane is placed on the tissue section. An infrared (IR) laser

    is focused through the transparent cap material, heating and melting the membrane,

    and thus causing the targeted cells to adhere to the membrane. The cells of interest are

    then dissected by lifting the LCM cap away from tissue section. The thickness of the

  • 32

    tissue sections (5-15 µm) used for microdissection is critical from an operational point

    of view. The tissue thickness

  • 33

    Pathology introduced “DIRECTOR” Microdissection slides, which are based on Laser

    Induced Forward Transfer (LIFT) Technology utilizing a thin layer energy transfer

    coating. Laser energy is transferred to the coating and thus results in evaporation of

    the coating. The evaporation of the transfer coating causes the selected feature of the

    tissue section to fall into collection tube.

    1.2.2.3 Comparison of LCM and LMM

    Using the older version of Arcturus LCM instrument, any material adhering to

    the LCM cap was collected. This type of nonspecific collection of loose material from

    tissue specimen is a potential source of contamination. To overcome this issue,

    Arcturus introduced a newer design of LCM caps in which the cap remains slightly

    away from the tissue specimen, allowing collection of only cells which are in contact

    with the melted thermolabile membrane. In addition, to avoid contamination, such as

    keratin and loose tissue material, sticky “prep strips” can be used [12]. In contrast to

    LCM, the primary source of contamination in LMM is fine tissue material resulting

    from laser ablation of the edges of targeted cells/tissue areas [58].

    In case of LCM, the tissue preparation procedure which includes slide

    selection, tissue staining and dehydration, and microscopic evaluation of tissue

    specimen has to be strictly followed in order to obtain effective microdissection.

    Whereas, in case of LMM, the tissue preparation is less complicated than LCM, and

    parameter such as tissue thickness is more flexible.

    The LCM, contact-based microdissection technique, is advantageous compared

    to LMM, since the cells collected on the thermolabile membrane can be easily viewed

  • 34

    under the microscope for their homogeneity. In LMM, a thin polyethylene naphthalate

    (PEN) membrane is required between the glass slide and the tissue section; otherwise,

    the catapulted cells might pulverize to debris in the collection tube. Thus the collected

    cells remain relatively intact and can be visualized.

    One example of an application of LCM is to investigate the molecular basis of breast

    tumor formation. This disease is not clearly understood due to difficulties encountered

    while studying the early stages of disease progression. The breast cancer progression is

    a multistep process, involving the premalignant stage of atypical ductal hyperplasia

    (ADH), the preinvasive stage of ductal carcinoma in situ (DCIS), and the potentially

    lethal stage of invasive ductal carcinoma (IDC)[59]. The obstacles in studying breast

    cancer disease lesions are complexity and heterogeneity of tissue and microscopic size

    (

  • 35

    a single protein, this approach is suitable as it requires minimal sample preparation.

    Chapter 4 of this thesis describes glycoform profiling of recombinant α-human

    chorionic gonadotropin (α-hCG) using high resolution capillary electrophoresis

    coupled with high mass resolution FT-MS [61].

    In the bottom-up approach, there are two ways to convert proteins extracted

    from biological specimens to peptides which are suitable for mass-spectrometry (MS)

    based proteome analysis. The first solubilizes the proteins with detergents and

    separates the proteins by sodium dodecyl sulfate (SDS) polyacrylamide gel

    electrophoresis. The proteins trapped by the gel are subjected to enzymatic digestion,

    i.e., “in-gel” digestion. The second sample preparation method is detergent-free, as it

    uses strong chaotropic reagents such urea and thiourea for cell lysis, protein extraction

    and solubilization. The enzymatic digestion of the proteins in the presence of

    denaturing reagents is termed “in-solution” digestion.

    The in-gel digestion method is advantageous over in-solution digestion due to

    the absence of most impurities which could interfere with digestion; however, the gel

    may limit peptide recovery. On the other hand, in-solution digestion can be more

    readily automatable and can minimize losses associated with sample handling.

    However, the use of chaotropes may result in incomplete solubilization of the

    proteome, and digestion may be impeded by interfering substances.

  • 36

    1.2.3.1 SDS-Polyacrylamide Gel Electrophoresis (SDS-PAGE)

    The mobility of proteins during gel-electrophoresis depends upon the following

    factors:

    1) electric field strength,

    2) total charge on the molecule,

    3) size and shape of the molecule and

    4) ionic strength of the buffer and properties of the gel matrix through which the

    molecules are migrating.

    The polyacrylamide gel matrix is in extensive use for protein prefractionation

    [62]. Gel matrices act like a molecular sieve, and their sieving function depends on the

    mesh size of the gel. The polyacrylamide gels are synthesized by the polymerization of

    acrylamide monomers into long chains and the reaction of these chains with

    bifunctional compounds such as N, N-methylene-bisacrylamide (bis) to form a sieve

    like structure. The mesh size of the gel is determined by the concentration of

    acrylamide and bisacrylamide (%T and %C).

    %T=concentration of total monomer

    %C=concentration of cross linker (as a percentage of the total monomer)

    The higher the concentration of monomer (%T), the smaller the mesh size of the gel

    [63].

    Gel electrophoresis is performed under either continuous or discontinuous buffer

    conditions. The running buffer and gel buffer are same in the continuous buffer

    system; whereas the discontinuous buffer system has different gel and running buffers.

    The gel system contains two gel layers, the stacking and separating layer.

  • 37

    Electrophoresis with a discontinuous buffer system provides sample concentration and

    higher resolution. SDS-PAGE is performed under denaturing conditions, where the

    detergent denatures and opens the protein by wrapping around the peptide backbone of

    the protein. SDS binds to the protein approximately at a ratio of 1:1.4. The highly

    negative SDS-protein complexes are separated on the gel based on their molecular

    weight rather than their charge, as protein acquires net negative charge which is

    proportional to the length of the protein. The electrophoretic mobility of the proteins

    through the gel is inversely proportional to the logarithm of the protein molecular

    weight[64].

    Prefractionation of samples is required in proteomics, and gel electrophoresis is

    a versatile and reliable method to achieve such prefractionation. The discontinuous

    buffer system is frequently used as it provides higher protein resolution compared to

    continuous buffer system. The discontinuous buffer system offers the ability to

    manipulate buffer systems to achieve “steady-state-stacking” or “isotacophoresis”

    which is responsible for focusing of the proteins before their separation by PAGE.

    Though the separation of proteins in SDS-PAGE is primary based on the molecular

    weight, the molecular weight range that can be preferentially resolved depends upon

    the gel composition, buffer system used and the pH of the buffer system. The presence

    of post-translational modification, such as glycosylation on the protein, results in

    anomalous migration of the glycoprotein on SDS-PAGE. This anomalous

    electrophoretic migration of glycoproteins, resulting in inaccurate molecular weight

    determination, is due to little or no SDS binding of the sugar moieties.

  • 38

    1.2.4 Separation Techniques

    Peptide mass spectrometry (shotgun proteomics) identifies proteins by

    measuring mass-to-charge ratios of peptides and their fragments in the MS spectra. In

    order to perform unambiguous identification of proteins and to achieve deep proteome

    coverage, mass-spectrometry is highly dependent on separation to reduce the very

    complex samples prior to their analysis. This facilitates the identification of low-

    abundant species that would otherwise be overshadowed by the high abundant species,

    i.e., increase the dynamic range.

    1.2.4.1 High Pressure Liquid Chromatography

    High-pressure liquid chromatography (HPLC) is often directly coupled to mass

    spectrometric instruments with electrospray ionization (ESI) source. The continuous

    separation of analytes using HPLC is physically compatible with an electrospray

    ionization source. Due to efficient coupling of HPLC and ESI source, the combination

    has become a standard sample introduction setup for peptide analysis. The most

    commonly used chromatographic materials for separation of analytes are: ion

    exchange (IEX), reverse phase, hydrophilic interaction chromatography (HILIC),

    affinity, and hybrid materials.

    Reverse phase liquid chromatography (RPLC or RP) separates analytes based

    on their hydrophobicity, and a significant advantage of RPLC, when coupled with

    mass spectrometer, is that the buffers used are generally compatible with ESI. The use

    of acidic pH and organic solvents (acetonitrile and methanol) are conducive for

    analysis of peptides by ESI-MS. Due to its high resolution, efficiency, reproducibility,

  • 39

    and mobile phase compatibility with ESI-MS, RPLC has emerged as a preferred

    separation phase for the analysis of proteins and peptides. Over the years, significant

    efforts have been made to increase peak capacity, sensitivity, reproducibility, and

    analysis speed of reverse phase chromatography. It has been observed that packing

    long, narrow capillary RP columns results into significant improvement in loading

    capacity, sensitivity, and dynamic range of the RPLC. Shen et al. have reported use of

    50 µm i.d. 40-200 cm long, small-particle-size (1.4 μm) RPLC columns with high

    peak capacity (1000-1500, compared with an average of 400) operated in an ultrahigh

    pressure regime (20 kpsi) for proteomic and metabolomics analysis[65]. The use of a

    small diameter particle stationary phase (1.7 μm diameter) contributes significantly

    towards the efficiency of the separation. The efficiency is inversely proportional to

    the size of the particles used for packing the column. However, columns packed with

    small diameter particles exhibit high back pressure, high pressure pumps (up to 15,000

    psi) are required for their operation [66].

    Multidimensional separation is a common way to increase the peak capacity of

    chromatographic analysis. This approach combines several separation techniques, such

    as ion exchange, high pH reverse phase separation, low pH reverse phase separation

    and so forth, to improve the resolving power. For effective performance of

    multidimensional separation, the individual separation methods should be as

    orthogonal as possible to other methods in which each dimension utilizes different

    molecular properties as a basis of separation. One of the first and most practiced two

    dimensional setups is combination of strong cation exchange (SCX) chromatography

    with reverse phase chromatography known as multidimensional protein identification

  • 40

    technology (MudPIT [67]). In this multidimensional separation, a highly complex

    peptide mixture is loaded onto an SCX column and eluted in a series of steps with

    increasing salt concentration. Each fraction is transferred onto an RP column either

    off-line or directly, and peptides are further separated and eluted into the MS.

    1.2.5 Mass Spectrometry

    Mass spectrometry usually involves three parts: ion source and optics, mass analyzer

    and data processing software.

    1.2.5.1 Ionization Methods

    A rapid growth in mass spectrometry based proteomic analysis can be

    attributed to major contributions of experimental methods, instrumentation and data

    analysis. Among the most important developments in mass spectrometry related

    instrumentation is the invention of soft ionization methods i.e. matrix assisted laser

    desorption ionization (MALDI) and electrospray ionization (ESI), allowing peptides

    and proteins to be directly analyzed by MS.

    MALDI

    MALDI functions just as its name suggests: the matrix assists in desorption

    and ionization of ions. In this type of ionization technique, the incident laser energy is

    absorbed by the matrix and transferred to the acidified analyte. The rapid laser heating

    results in desorption of matrix and positively charged analyte into the gas phase.

    Singly charged ions are predominantly generated by MALDI, which makes it

    applicable for top-down analysis of high-molecular weight proteins [68]. However,

  • 41

    low shot-to shot reproducibility and strong dependence on sample preparation are the

    drawbacks of this technique. MALDI-TOFMS is suitable for high throughput analysis.

    However, the high ionization energy can be detrimental in the analysis of

    compounds with labile modifications [69].

    Figure 1.4 Common matrices used in MALDI mass spectrometry. Reprinted from

    reference [69].

    ESI

    ESI, unlike MALDI , generates ions from solution. Electrospray ionization is

    created by application of high voltage between the emitter end of the separation

    column and the inlet of the mass spectrometer [68]. Physicochemical processes of ESI

    involve formation of a Taylor cone, i.e. an electrically charged spray of liquid eluting

    from the separation column, followed by generation and desolvation of eluent droplets.

    The unique feature of ESI compared to other ionization methods is its ability to

    produce multiply charged ions from high molecular weight biological molecules like

  • 42

    proteins, which enables the analysis of these molecules with instrument having a small

    mass to charge range (400-2000 m/z). A most important development in ESI

    technique, which led to the sensitive proteomic analysis, is known as nano-ESI. In

    Chapters 2 and 3 of this dissertation, nano-ESI, operated at 20 nL/min, is a primary

    technique used for the analysis of 10,000 laser captured microdissected breast cancer

    cells. The diagram of the ESI process is discussed in the PLOT related section.

    1.2.5.2 Mass Analyzers

    Ion Trap

    As the name suggests an ion-trap mass spectrometer works by trapping the ions

    in a vacuum. The ion trap functions by repeating the steps of ion collection, ion

    storage and ejection of ions from the ion trap as flow from the LC column occurs. The

    unique feature of ion-trap lies in its ability to isolate and fragment peptide ions from

    complex mixtures, this operation is called tandem MS. Due to their fast scan rates,

    MSn scans, high sensitivity, high-duty cycle, high ion storage capacity (compared to

    2D and 3D traps), reasonable resolution and mass accuracy, linear ion traps (e.g. LTQ,

    Thermo Fisher) are considered as the high-throughput workhorses in proteomic

    research. Therefore, for our initial development work, as mentioned in Chapters 2 and

    3, we employed LTQ-MS for bottom-up 10 µm i.d. Porous Layer Open Tubular

    (PLOT) LC-MS analysis of 10,000 LCM cells. Furthermore, the LTQ is coupled with

    Orbitrap and FTICR as the front end of hybrid MS instruments to perform ion

    trapping, ion selection and high resolution ion analysis.

    Mass spectrometry has been extensively used for determination of molecular masses

    of the intact proteins. Among the mass spectrometric techniques, the ESI- high mass

  • 43

    accuracy MS is preferred as ESI generated multiply charged ions fall in the m/z range

    of most mass spectrometers. A variety of mass spectrometers can be used for this

    purpose; including ion trap (IT), orthogonal time-of-flight, time-of-flight and Fourier

    transform ion cyclotron (FTICR) and Orbitrap instruments. However, mass

    spectrometers such as ion traps are not suitable for this purpose due to their low

    resolving power at full scan speed. However, the mass spectrometers such as TOF,

    FTICR and Orbitrap, due to their high mass resolution and high mass accuracy, have

    become the preferred instruments for accurate mass determination of intact proteins.

    Quadrupole -Time of Flight Mass Spectrometer

    Time-of-flight mass spectrometry (TOFMS) determines the mass-to-charge

    ratio of the ions using a time measurement. Ions are accelerated in the flight tube by an

    electric field. This acceleration provides the same kinetic energy to all the ions bearing

    the same charge. The velocity gained by the ion due to acceleration depends on the

    mass-to-charge ratio. Then, the time that an ion takes to travel to the detector is

    measured. The heavier ions will take longer time to reach the detector compared to

    lighter ones. Based on the flight time of the ion and the known experimental

    parameters, the mass-to-charge ratio of the ion can be determined.

    Fourier Transform Ion Cyclotron Resonance (FTICR)

    FTICR mass analyzer determines the mass to charge ratio of the ions based on

    their cyclotron frequency under the influence of constant magnetic field. In the ICR

    mass analyzer, the ions are stored in a Penning trap under the influence of constant

    magnetic and electric fields. The ions are excited to a larger cyclotron radius by an

  • 44

    oscillating electric field perpendicular to the magnetic field. The energy applied to the

    ions in ICR cell can be tuned to excite, dissociate and eject ions. The detector plates on

    the opposite sides of the trap measures the cyclotron frequency of all the ions

    simultaneously and with the help of Fourier transform convert these frequencies into

    m/z values (Figure 1.5). FTICR is a very high mass resolution technique contributing

    to accurate mass measurement[70]. The high mass resolution and high mass accuracy

    of the FTICR is due to following reasons. 1) The mass of the ion is calculated from the

    measurement of cyclotron frequency, a parameter that is more precisely measurable

    than any other parameter. 2) The ion cyclotron frequency is defined by the magnetic

    field. The better the time stability of the magnetic field (1 ppb/hour) compared to time

    stability of rf voltage (100 ppb/hour) results in a superior mass precision. 3) In the

    spatially uniform magnetic field, the cyclotron frequency of an ion is independent of

    the ion speed. 4) In order to attain high mass precision, ICR, unlike ion-beam-based

    mass measurement, does not require the use of narrow slits [71].

  • 45

    Figure1.5 Operational principle of the FTICR. Reprinted from reference [69].

    Among the many applications of the FTICR, the high resolving power of

    FTICR is useful for the study of large macromolecules such as proteins with several

    multiple charges generated by electrospray ionization. The FTICR instrument provides

    mass resolution in the range of 50,000-750,000 and mass accuracy of less than 2 ppm.

    However, FTICR suffers due to relatively slow acquisition speed and low sensitivity

    of analysis. In order to obtain high sensitivity and improved acquisition time, we

    acquired MS scans over a limited mass window, corresponding to m/z values of the 9+

    charge state of intact alpha-human chorionic gonadotropin (Chapter 4).

  • 46

    Orbitrap

    In 1999, Markov invented a new type of mass analyzer called the Orbitrap [72]

    which was applied for proteomic research in 2005[73]. Among the high mass

    resolution FTMS instruments, the Orbitrap superceded the FT-ICR due to low cost of

    operation, while providing equivalent high mass accuracy. The Orbitrap consist of two

    electrodes, an outer barrel-like electrode and a coaxial inner spindle-like electrode with

    an electrostatic field formed between them (Fig.1.6). The ions are tangentially injected

    in the gap between the two electrodes and made to rotate around the inner electrode

    due to the electrostatic attraction by the inner electrode and the balancing centrifugal

    forces. While cycling around the central axis, the ions move back and forth along the

    central axis. The frequency of these harmonic oscillations is Fourier transformed to

    determine the mass-to charge ratio of the ions. The Orbitrap offers a high resolving

    power of roughly 50,000 and a mass accuracy of less than 2 ppm, with proper

    standards. With an average acquisition speed of at least 6 MS/MS spectra per second

    in parallel with a single high-resolution spectrum (60,000 resolution) significantly

    improved protein coverage can be achieved.

  • 47

    Figure1.6 Cutaway view of the Orbitrap mass analyzer. Ions are injected into the

    Orbitrap at a point (arrow) offset from its equator and perpendicular to the z-axis,

    where they begin coherent axial oscillations without the need for any further

    excitation. Reprinted from reference [69].

    1.2.5.3 Database Searching Tools for Proteomics

    Database searching plays an important role in large-scale proteomics. Database

    searching tools enable the use of mass spectrometric data of peptides to identify

    proteins in sequence databases. Two mass spectrometric- based database search

    principles are mainly used for identification of proteins. The first method uses the

    molecular weight fingerprint of the protein digest (peptides) obtained by a site-specific

    protease [74,75], and the second method uses the tandem mass spectra obtained on the

    individual peptides of a digested protein[16,76]. Since each tandem mass spectrum

    stands as a unique and verifiable piece of data, the second method has the ability to

    identify a wide range of proteins and thus provide a comprehensive approach to

    handle complex protein mixtures[77].

    Tandem Mass-Spectrometry and Data Processing

  • 48

    Figure 1.7 Low energy collision induced dissociation of peptide. Reprinted from

    reference [78]

    In tandem mass-spectrometry (MS/MS), the gas phase peptide ions undergo

    fragmentation due to process such as collision-induced dissociation (CID. The gas

    phase CID is the most widely used technique in tandem mass-spectrometry. The

    dissociation pathways are exclusively dependent on the collision energy. The low

    energy collisions (

  • 49

    Figure 1.8 Mobile Proton Theory. Reprinted from reference [84].

    To explain the intensity patterns observed in the tandem mass spectra, a mobile proton

    model has been proposed[83]. The mobile proton model states that to initiate backbone

    cleavages for production of b and y ions, the protons are transferred intramolecularly

    from basic side-chains to the heteroatoms along the backbone. Figure 1.8A shows that

    the proton exists in equilibrium between all possible basic sites. The energy required

    to mobilize the proton from a basic side-chain or from the amino terminus to the

    peptide backbone depends on the amino acid composition of the peptide. Therefore,

    the dissociation or the fragmentation energy for the peptides containing amino acids

    having greater gas-phase basicity is higher compared to the peptides with amino acids

  • 50

    having lower gas-phase basicity. An example of a lysine- terminated peptide is shown

    Figure 1.8B.

    SEQUEST- Database Search Algorithm

    Given the mass of the precursor ion (m/z of the peptide ion) and its fragment

    ions, the goal of the database search algorithm is to determine peptide sequence and

    protein identity. SEQUEST [16] is a database search program which uses a descriptive

    model for peptide fragmentation and correlative matching to a tandem mass

    spectrum[16]. To access the quality of the match between the experimental spectrum

    and amino acid sequence from the database, SEQUEST applies a two-tiered scoring

    scheme. It first calculates the empirically derived preliminary score (Sp) that restricts

    the number of sequences to be analyzed in the correlation analysis. Sp is calculated by

    summing the peak intensities of fragment ions as well as accounting for continuity of

    the fragment ion series and the length of the amino acid sequence. The second and

    decisive score is a cross-correlation score, referred to as XCorr, which correlates the

    experimental and theoretical spectra. The theoretical spectrum is generated from the

    predicted fragmentations, i.e. b- and y-ions for each of the sequence in the database.

    The similarity between the theoretical and experimental spectra is evaluated based on

    the cross-correlation of the two spectra. Apart from preliminary and cross-correlation

    scores, SEQUEST calculates another important difference, ΔCn, the normalized

    difference of XCorr values between the best matched sequence and each of the other

    sequences. ΔCn is a useful indicator of the uniqueness of the match. If the value of

    ΔCn is greater than 0.1, then the match is considered as reasonably unique to a

    sequence. XCorr, which is not dependent of the database size, suggests the quality of

  • 51

    the match between the spectrum and sequence, whereas ΔCn, which is dependent on

    the size of the database, indicates the quality of the match relative to near misses.

    Label Free Quantitative Microproteomics

    Currently, a number of stable isotope labeling approaches are in use for

    „shotgun” quantitative proteomic analysis. The stable isotope labeling approaches

    include Isotope-Coded Affinity Tag (ICAT), Stable Isotope Labeling by Amino Acids

    in cell culture (SILAC), 15

    N/14

    N metabolic labeling, 18

    O/16

    O enzymatic labeling,

    Isotope Coded Protein Labeling (ICPL), Tandem Mass Tags (TMT), Isobaric Tags for

    Relative and Absolute Quantification (iTRAQ) and other chemical labeling[85,86].

    These stable isotope labeling methods have offered valuable flexibility while using

    quantitative proteomic methods to study protein abundance changes in complex

    samples. However, most labeling based quantification methods are limited in their

    application due to increased time and complexity of sample preparation, the

    requirement of higher sample concentration, high cost of the reagents and incomplete

    labeling. Therefore, for relative quantitation of small sample amounts, there is

    increased interest in label-free approaches in order to achieve more sensitive and

    simpler quantification results.

    Label-free protein quantitation is generally based on two approaches. The first

    involves the measurement of ion intensity changes such as peptide peak areas or peak

    heights in chromatography (i.e. total or single ion analysis). The second approach is

    based on spectral counting of the identified peptides after MS/MS analysis. Peptide

    peak intensity and spectral counting are measured for individual LC-MS/MS runs, and

  • 52

    changes in protein abundance are determined by direct comparison between different

    analyses.

    Relative Quantitation by Peak Intensity

    In this approach, relative quantitation of the peptides was achieved by direct

    comparison of peak area of each peptide ion in multiple LC-MS datasets. However,

    application of this method for determination of protein abundance changes in complex

    biological samples had some practical limitations. The differences in the sample

    preparation and sample injection, in addition to experimental changes in retention time

    and m/z value, significantly influence the direct and accurate comparison of multiple

    LC-MS datasets. Therefore, highly reproducible LC-MS performance and careful

    chromatographic peak alignment are critical for the quantitation approach[87].

    Relative Quantitation by Spectral Count

    In the spectral counting approach, comparison of the number of identified

    MS/MS spectra from the same proteins (spectral count) are compared between

    multiple LC-MS/MS datasets. The increase in protein sequence coverage, the number

    of identified unique peptides and the number of identified total MS/MS spectra

    (spectral count) correspond with the increase in protein abundance. However from

    these three factors of identification, only spectral count showed strong linear

    correlation with relative protein abundance with a dynamic range over 2 orders of

    magnitude. Therefore spectral counting is considered as a simple and reliable index

    for relative protein quantification[88]. In comparison to peak intensity, which uses

    computer algorithms for automatic LC-MS peak selection, alignment and comparison,

    the spectral counting approach is much easier to implement.

  • 53

    However, for accurate and reliable detection of protein changes in complex

    mixtures, normalization and statistical analysis of spectral counting databases is

    necessary. One of the simple normalization methods, which accounts for the run to

    run variability, uses total spectral counts[89]. Another approach to normalization

    involving calculation of a normalized spectral abundance factor (NPAF) was

    suggested to account for the effect of protein length on spectral count[90].

    Zhang et al. compared five different statistical tests on spectral count data

    collected by analysis of yeast digests to evaluate the significance of comparative

    quantification by spectral counts[91]. These statistical tests were 1) Fisher‟s exact test,

    2) goodness-of-fit test (G-test) 3) AC test, 4) Student‟s t-test and 5) Local-Pooled-error

    (LPE) test. For datasets with three or more replicates, the Student‟s t-test was found to

    be the best, whereas, in case of datasets with one or two replicates, the Fisher‟s exact

    test, G-test and AC test can be used.

    Relative quantitation by spectral count has been successfully applied for

    different clinical applications[92], including analysis of normal and acute

    inflammation, biomarker discovery in human saliva proteome in type-2 diabetes[93],

    comparison of protein expression in mammalian and yeast cells under different culture

    conditions, distinguishing normal and diseased lung cancer samples[94,95], discovery

    of phosphotyrosine-binding proteins in mammalian cells and identification of

    differential plasma membrane proteins in terminally differentiated mouse cell

    lines[95].

    Another label -free method, the spectral index, is used to analyze relative protein

    abundances in large-scale data sets obtained from biological samples by shotgun

  • 54

    proteomics is called spectral index. The spectral index method is made up of two

    biochemically plausible features i.e. 1) Spectral counts (indicative of relative protein

    abundance and 2) the number of samples within a group with detectable peptides [96].

    We used this method to assess differentially abundant proteins between 9 non-

    cancerous, normal breast epithelial (NBE) samples and 9 estrogen receptor (ER)-

    positive (luminal subtype), invasive malignant breast epithelial (MBE) samples [97].

    However, for a low number of replicates of breast cancer samples (n=3), we used

    spectral counting (PatternLab software [98]) for determination of differentially

    abundant proteins between invasive breast cancer cells and metastatic breast cancer

    cells (Chapter 2).

    1.3 Microproteomics

    Mass spectrometry-based proteomic methods are extensively used to study

    global changes in protein expression caused due to pathological stimuli in an

    organism. Current methods use sample total protein amounts in the range of

    micrograms or milligrams [99] and extensive protein/peptide level separations in order

    to achieve comprehensive proteomic analysis. However, in many cases, obtaining

    these sample amounts can be practically impossible or challenging. There are several

    reasons for low availability of sample amounts e.g. rarity of the sample itself,

    collection of many thousand cells takes several hours or days using a technique such

    as laser capture microdissection, multiple experiments on a homogeneous sample and

    so forth.

  • 55

    One of the examples of such a rare/limited sample type is brain tissue specimen

    related to neurodegenerative diseases such as Parkinson, Alzheimer, and Huntington

    disease. These neurodegenerative diseases are characterized by selective degeneration

    of particular types of neurons; while the tissue of rest of the brain is under normal

    pathological state[100]. Researchers, trying to understand the causes behind these

    diseases, are using laser capture microdissection (LCM) to selectively collect

    degenerative neurons. However, obtaining even 10,000 to 50,000 neurons is

    impractical because the degenerative neurons are limited in numbers[101]. Another

    similar example of limited sample amount is malignant cells collected from a solid

    tumor. Solid tumors are heterogeneous in composition, i.e. they are made up of a

    subpopulation of cancer cells, along with stromal elements that collectively form a

    microenvironment[41]. The subtypes of malignant cells differ among themselves in

    many properties, such as production and expression of cell surface markers, sensitivity

    to therapeutics, growth rate, etc. The studies aimed at determining the proteomic

    changes in these individual cell types are limited due to the time and cost required to

    collect large cell numbers using the LCM procedure. The proteomic analysis of

    circulating tumor cells (CTCs), which can be an indicator of potential metastasis, is

    thought to provide a noninvasive way of determining tumor metastasis or the impact of

    treatment on the number of CTCs[102]. As the number of CTCs circulating in the

    blood is very low, advances in proteomics are required to analyze them.

    To accomplish microproteomics of clinically relevant and limited amounts of

    sample, one must use a minimum number of steps in the proteomic platform, and each

    of these steps must limit sample losses[99]. Considerable sample losses during sample

  • 56

    preparation and limited dynamic range of LC-MS/MS system are two main obstacles

    in analyzing small protein amounts. In order to improve sample preparation, low

    protein binding tubes and the use of MS–friendly acid labile detergents are

    suggested[99]. The use of MS-friendly detergents results in shorter extraction and

    digestion procedures.

    One of the recent examples of low sample proteomics was the analysis of 500-

    5,000 CTCs, generating proteomic profiles of ~150-650 proteins[103]. The cells were

    lysed using NP-40 detergent, and the detergent was separated by precipitating the

    proteins from cell lysate. The in-solution digest of these samples were subjected to

    nanoflow LC/Q-TOF analysis. In an another approach to a small sample amount,

    quantitative comparison of a proteome of LCM collected single pancreatic islets,

    containing 2,000-4,000 cells, treated with high and low levels of glucose, was carried

    out. The cells were lysed with acid labile detergent followed by in-solution digestion.

    Sensitive LC-MS/MS analysis was performed using a low column flow rate and long

    chromatographic separation time. In Chapter 2 and 3, we have presented a short run on

    SDS-PAGE based sample handling step, followed by sensitive LC-MS analysis using

    PLOT column.

    1.3.1 Alternative strategies for protein digestion

    1.3.1.1 Solvents based approach

    In 2007, Veenstra et al. introduced a membrane protein digestion method with

    60% methanol in place of chaotropes, as a membrane protein solubilizing solvent

    during trypsin digestion[44]. In this approach, the plasma membrane protein

  • 57

    population was isolated from the human epidermis and dispersed in 50 mM

    ammonium bicarbonate, pH 7.9. The proteins were reduced and alkylated using TCEP

    and iodoacetamide (IAA), respectively. The membrane proteins were separated using

    ultracentrifugation at 100,000 g. The protein pellet was further solubilized in 60% v/v

    methanol in 50 mM ammonium bicarbonate. The proteins were digested by trypsin

    (trypsin/protein ratio: 1/20) at 37˚C for 5 hours in the same solubilizing buffer. The

    acidified digest was analyzed using two dimensional (SCX/RP) LC-MS.

    This strategy was found to have advantages compared to detergent- and

    chaotrope-based solubilization as 1) the same methanol based buffer conditions were

    used for solubilization, denaturation and proteolysis, 2) sample dilution and dialysis

    steps were completely eliminated, and these steps typically decrease solubilizing

    capacity and subsequent proteolytic efficiency, 3) methanol and ammonium

    bicarbonate , volatile water soluble compounds, are removable by lyophilization after

    digestion, making the methanol-based buffer approach MS-friendly. Other solvents

    such as acetonitrile and trifluroethanol are also used for solubilization and digestion of

    membrane proteins.

    1.3.1.2 Cleavable surfactant

    The surfactants are stable and strong solubilizing agents. The environmental

    concerns such as a low biodegradability rate of the surfactant, has become one of the

    main driving forces for the development of cleavable surfactants. Although cleavable

    surfactants were first synthesized many years ago, Norris et.al applied nonacid

    cleavable detergents for MALDI mass spectrometry profiling of whole cells [104].

  • 58

    They showed that cleavable surfactant results in an increase in the number of proteins

    analyzed by increasing protein solubility. Cl