introduction to the proteomics bioinformatics course 2016

33
Proteomics: History and introduction to the course Dr. Juan Antonio Vizcaíno Proteomics Team Leader EMBL-EBI Hinxton, Cambridge, UK

Upload: juan-antonio-vizcaino

Post on 16-Apr-2017

228 views

Category:

Science


1 download

TRANSCRIPT

EMBL-EBI Now and in the Future

Proteomics: History and introduction to the courseDr. Juan Antonio Vizcano

Proteomics Team LeaderEMBL-EBIHinxton, Cambridge, UK

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

Data resources at EMBL-EBIGenes, genomes & variationArrayExpressExpression Atlas

PRIDEInterProPfamUniProtChEMBLChEBIMolecular structuresProtein Data Bank in EuropeElectron Microscopy Data BankEuropean Nucleotide ArchiveEuropean Variation ArchiveEuropean Genome-phenome ArchiveGene & protein expressionProtein sequences, families & motifsChemical biologyReactions, interactions & pathwaysIntActReactomeMetaboLightsSystemsBioModelsEnzyme PortalBioSamplesEnsembl Ensembl GenomesGWAS CatalogMetagenomics portalEurope PubMed CentralGene OntologyExperimental Factor OntologyLiterature & ontologies

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016The slide shows the core resources at the EBI to show the range of data you can access through the EBI.

2

Useful definitions and concepts to start

A little bit of history and curiosities

Importance of bioinformatics

Overview

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Proteomics is the large-scale study of proteins, particularly their structures and functionsThe proteome is the entire complement of proteins including the modifications made to a particular set of proteins, produced by an organism or system. This will vary with time and distinct requirements, or stresses, that a cell or organism undergoes

proteome = protein + genome (M. Wilkins, 1994)

Definitions

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

GenomicsTranscriptomicsProteomicsFrom the genome to the proteome

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Genome vs. proteomeGenome

Essentially static over time Non location specific Human genome mapped (initially on 2000) ~20,000 genes

PCR is available to amplify DNA

Proteome

Dynamic over time

Location specific Human proteome non-mapped: How many???

No equivalent of PCR for proteins

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Large increase in protein diversity due to:

Alternative splicing of pre-mRNA (introns and exons)Post-translational modifications of proteinsCell age and health/disease state

Genome -> Proteome

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

20 naturally occurringamino acids

ChiralityL-aa

Amino acids

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016This is important for the 3D view of the proteins8

From: Molecular Biology of the Cell (4th Ed) http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=mboc4&part=A388&rendertype=figure&id=A391Individual amino acidspolypeptide

Peptide bondProtein backbone

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

Useful definitions and concepts to start

A little bit of history and curiosities

Importance of bioinformatics

Overview

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

Sanger's principal conclusion was that the two polypeptide chains of the protein insulin had precise amino acid sequences and, by extension, that every protein had a unique sequence.

Nobel Prize in Chemistry in 1958

F. SangerProtein sequencing: the pioneers

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Sanger's first triumph was to determine the complete amino acid sequence of the two polypeptide chains of bovine insulin in 1951.[5][6] Prior to this it was widely assumed that proteins were somewhat amorphous. In determining these sequences, Sanger proved that proteins have a defined chemical composition. For this purpose he used the "Sanger Reagent", fluorodinitrobenzene (FDNB), to react with the exposed amino groups in the protein and in particular with the N-terminal amino group at one end of the polypeptide chain. He then partially hydrolysed the insulin into short peptides (either with hydrochloric acid or using an enzyme such as trypsin). The mixture of peptides was fractionated in two dimensions on a sheet of filter paper: first by electrophoresis in one dimension and then, perpendicular to that, by chromatography in the other. The different peptide fragments of insulin, detected with ninhydrin, moved to different positions on the paper, creating a distinct pattern which Sanger called "fingerprints". The peptide from the N-terminus could be recognised by the yellow colour imparted by the FDNB label and the identity of the labelled amino acid at the end of the peptide determined by complete acid hydrolysis and discovering which dinitrophenyl-amino acid was there. By repeating this type of procedure Sanger was able to determine the sequences of the many peptides generated using different methods for the initial partial hydrolysis. These could then be assembled into the longer sequences to deduce the complete structure of insulin. Sanger's principal conclusion was that the two polypeptide chains of the protein insulin had precise amino acid sequences and, by extension, that every protein had a unique sequence.

In 1958 he was awarded a Nobel prize in chemistry "for his work on the structure of proteins, especially that of insulin". 11

F. Sanger

By 1975, he had developed the dideoxy method for sequencing DNA molecules, also known as the Sanger method. He sequenced the first organism: Phague F-x-174

Nobel Prize in Chemistry in 1980

Not only protein sequencing

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016In 1980, Walter Gilbert and Sanger shared half of the chemistry prize "for their contributions concerning the determination of base sequences in nucleic acids.

Multiple Nobel Awardees: Four people have received two Nobel Prizes. Maria Skodowska-Curie received the Physics Prize in 1903 for the discovery of radioactivity and the Chemistry Prize in 1911 for the isolation of pure radium.[164] Linus Pauling won the 1954 Chemistry Prize for his research into the chemical bond and its application to the structure of complex substances. Pauling also won the Peace Prize in 1962 for his anti-nuclear activism, making him the only winner of two unshared prizes. John Bardeen received the Physics Prize twice: in 1956 for the invention of the transistor and in 1972 for the theory of superconductivity.[165] Frederick Sanger received the prize twice in Chemistry: in 1958 for determining the structure of the insulin molecule and in 1980 for inventing a method of determining base sequences in DNA.12

MS is an analytical technique that measures the mass-to-charge (m/z) ratio of charged particles. It is used for determining masses of particles, for the determination of the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and other chemical compounds.Many applicationsone of them is proteomicsMass spectrometry (MS)

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

13

P. V. Edman

By 1950, he first developed the Edman degradation method.A major drawback of this technique is that the peptides being sequenced cannot be longer than around 30 residuesProtein sequencing: the pioneers

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Phenylisothiocyanate is reacted with an uncharged terminal amino group, under mildly alkaline conditions, to form a cyclical phenylthiocarbamoyl derivative. Then, under acidic conditions, this derivative of the terminal amino acid is cleaved as a thiazolinone derivative. The thiazolinone amino acid is then selectively extracted into an organic solvent and treated with acid to form the more stable phenylthiohydantoin (PTH)- amino acid derivative that can be identified by using chromatography or electrophoresis. This procedure can then be repeated again to identify the next amino acid. A major drawback to this technique is that the peptides being sequenced in this manner cannot have more than 50 to 60 residues (and in practice, under 30). The peptide length is limited due to the cyclical derivitization not always going to completion. The derivitization problem can be resolved by cleaving large peptides into smaller peptides before proceeding with the reaction. It is able to accurately sequence up to 30 amino acids with modern machines capable of over 99% efficiency per amino acid. An advantage of the Edman degradation is that it only uses 10 - 100 picomoles of peptide for the sequencing process. Edman degradation reaction is automated to speed up the process14

Wolfgang Paul / Hans G. Dehmelt developed the ion trap technique (1950s and 1960s).

Nobel Prize in Physics (1989)

A commercial quadrupole ion trap (Finnigan MAT) was introduced in 1983. The ion trap quickly became the primary instrument for conducting proteomics because of its ability to conduct tandem MS (MS/ MS) analysis of complex mixtures of peptides, generated by enzymatic digestion of proteome samples such as cell lysates.History of Mass spectrometry

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Ion traps are almost ubiquitous in analytical laboratories worldwide and serve as both GC and LC downstream MS detectors.15

John B. Fenn (Yale University) and co-workers use electrospray (ESI) to ionize biomolecules (high-molecular weight proteins).

Koichi Tanaka (Shimadzu Corp) used the ultra fine metal plus liquid matrix method to ionize intact proteins (Soft Laser Desorption): With the proper combination of laser wavelength and matrix, a protein can be ionized.

Fenn and Tanaka: Nobel Prize in Chemistry (2002) Ionization methods were too energetic to be used with biological molecules

F. Hillenkamp & M. Karas developed the MALDI technique: use of organic matrices to obtain MS of large proteins

Mass spectrometry: Soft ionization methods

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Fenn had a big fight with Yale University because he did not want to retire. In fact he started the studies that led to the Nobel Prize when he was 70.He joined the Yale University faculty in 1962. In 1987, he reached the mandatory retirement age. Fighting age discrimination and a University-mandated move to smaller laboratory space, Fenn remained at Yale and was 70 years old before he began work on what would in time become his Nobel Prize-winning discovery.

K. Tanaka is so far the only person with non a post-graduate to win a Nobel Prize in a scientific discipline. However, there was some criticism about his winning the prize, saying that contribution by two German scientists, Franz Hillenkamp and Michael Karas was also big enough not to be dismissed, and therefore they should also be included as prize winners16

Patrick H. OFarrellJ. Klose

1D SDS gel

MW

MWpI2D SDS gel

2D gel image from: http://www.fixingproteomics.org/Gel electroforesis

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

17

The rapid development of genomics allowed the development of proteomics

Shot-gun proteomics: Method of identifying proteins in complex mixture

HPLCMS

There are only 20 aminoacids. The physico-chemical properties of the peptides are more homogeneous and manageable than the ones from the proteinsFrom protein centric to peptide centric

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Mass Spectrometry (MS)-based proteomicsMany different workflows.

Discovery mode:Bottom-up proteomicsData dependent acquisition (DDA)Data independent acquisition (DIA)

Top down proteomics (intact proteins)

Targeted mode:SRM/ MRM (Selected Reaction Monitoring/ Multiple Reaction Monitoring)

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

Not only identify, but also quantify the amount of each protein in the sample

The current methods rely mainly on MS:

Vaudel et al., Proteomics 2010 Feb;10(4):650-670Proteomics becomes quantitative

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

The Yeast-two-hybrid method was developed by S. Fields in 1989.Many more methods developed since then:

Affinity electrophoresis Co-inmunoprecipitationTandem affinity purification (TAP)

Protein-protein interactions: yeast-two hybrid

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016The premise behind the test is the activation of downstream reporter gene(s) by the binding of a transcription factor onto an upstream activating sequence (UAS). For two-hybrid screening, the transcription factor is split into two separate fragments, called the binding domain (BD) and activating domain (AD). The BD is the domain responsible for binding to the UAS and the AD is the domain responsible for the activation of transcription.

Overview of two-hybrid assay, checking for interactions between two proteins, called here Bait and Prey.A. Gal4 transcription factor gene produces two domain protein (BD and AD), which is essential for transcription of the reporter gene (LacZ).B,C. Two fusion proteins are prepared: Gal4BD+Bait and Gal4AD+Prey. None of them is usually sufficient to initiate the transcription (of the reporter gene) alone.D. When both fusion proteins are produced and Bait part of the first interact with Prey part of the second, transcription of the reporter gene occurs.21

Proteomics in a clinical environment Biomarker discovery is a very active field of research.

MS technology is slowly incorporating into the clinical world.

Used to identify microorganisms by MALDI MS profiling.

Approved in Europe. On August 2013 it become the first MS diagnostic tool approved in the US.J Rohn (2013) Nat Biotechnol, 31, 862

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

http://thehpp.org/The Human Proteome Project (HPP)

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Draft Human proteome papers published in 2014

Wilhelm et al., Nature, 2014Kim et al., Nature, 2014

Two independent groups claimed to have produced the first complete draft of the human proteome by MS.

Some of their findings are controversial and need further validation but generated a lot of discussion and put proteomics in the spotlight.

They used many different tissues.Nature cover 29 May 2014

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Proteomics for structural biology

Increased focus in recent years (a lot more to come).

MS/MS cross-linking approaches

HD-exchange mass spectrometry

Lssl et al., EMBO J, 2016

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

Useful definitions and concepts to start

A little bit of history and curiosities

Importance of bioinformatics

Overview

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Atlaswhat happens whereNeed for bioinformaticsBiology is changing:High-throughputMore data producedNew types of dataEmphasis on systems biologyBioinformatics enables new applications:molecular medicineagriculturefoodenvironmental sciences

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

On 21 July 1986, SWISS-PROT was created byA. Bairoch (it contained around 3,900 protein sequences)In 1979, the first software was developed for 2DE image analysis (ELSIE)

Bioinformatics is very much needed in proteomics

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

On 21 July 1986, SWISS-PROT was created byA. Bairoch (it contained around 3,900 protein sequences)

In 1979, the first software was developed for 2DE image analysis (ELSIE)Bioinformatics is very much needed in proteomics

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

Mallick & Kuster, Nat. Biotechnol. 2010 Jul;28(7):695-709Proteomics is a complex discipline

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016MS based proteomics

Hein et al., Handbook of Systems Biology, 2012

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016

GenomicsTranscriptomicsProteomicsMore multi-omics studies

Metabolomics

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 2016Questions?

Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 4 December 201633