international collaboration in proteomics and informatics
DESCRIPTION
INTERNATIONAL COLLABORATION IN PROTEOMICS AND INFORMATICS. Bibliotheca Alexandrina, 9 October, 2007 Gilbert S. Omenn, M.D., Ph.D. Center for Computational Medicine & Biology Chair, HUPO Plasma Proteome Project University of Michigan, Ann Arbor, MI, USA. - PowerPoint PPT PresentationTRANSCRIPT
1
INTERNATIONAL COLLABORATION IN PROTEOMICS AND INFORMATICS
Bibliotheca Alexandrina, 9 October, 2007
Gilbert S. Omenn, M.D., Ph.D.
Center for Computational Medicine & Biology
Chair, HUPO Plasma Proteome Project
University of Michigan, Ann Arbor, MI, USA
2
It Is Such A Great Pleasure to Visit The Bibliotheca Alexandrina
One of the Wonders of the Modern World!
“The First Digital Library, from its Birth”
Facilitating International Collaboration in
Science and Technology
3
Nearly-Complete Human Genome Sequence, 15-16 Feb 2001
4
We Live in a New World of Life Sciences
New Biology---New Technology: a “parts list”
Genome Expression Microarrays
Comparative Genomics + CNV + miRNA
Proteomics and Metabolomics
Bioinformatics & Computational Biology• Mechanism- & Evidence-Based Medicine: “What were you doing up to now?!”• Predictive, personalized, preventive, participatory healthcare and community health services
5
Key Components of the Vision of Biology As An Information Science
• An avalanche of genomic information: validated SNPs, haplotype blocks, candidate genes/alleles, proteins, & metabolites--associated with disease risk• Powerful computational methods• Effective linkages with better environmental and behavioral datasets for eco-genetic analyses• Credible privacy and confidentiality protections• Breakthrough tests, vaccines, drugs, behaviors, and regulatory actions to reduce health risks and cost-effectively treat patients globally.
6
A Golden Age for the Public Health Sciences
Sequencing and analyzing the human genome is generating genetic information that must be linked with information about:• Nutrition and metabolism• Lifestyle behaviors• Diseases and medications • Microbial, chemical, physical exposuresEvery discipline of public health sciences
needed.
7
Definitions
Genetics is the scientific study of genes and their roles in health and disease, physiology, and evolution.
Genomics is a modern subset of the broader field of genetics, made feasible by remarkable advances in molecular biology, biotechnology, and computational sciences, to examine the entire complement of genes and their actions.
Global analyses permit us and require us to go beyond the known “lamp-posts” of individual gene associations and effects.
8
Proteins are the action molecules of the cell and the leading candidates for biomarkers—in tissues and in the blood. Proteins are coded for by genes. Understanding one protein can be a lifetime’s work!
Proteomics is the global analysis of proteins in cells or body fluids. Techniques for global analysis of proteins are advancing rapidly, especially for discovery of biomarkers for diagnosis, treatment, and prevention.
Metabolomics is the global analysis of metabolites.
Proteomics + metabolomics + epigenomics = “functional genomics”
9Protein DNA
10
Rationale for Proteomics
Proteins are much closer to the pathophysiologic changes and molecular targets for drugs than are mRNAs.
Changes in mRNAs are clues, but changes in corresponding proteins often are not highly correlated.
Advances in fractionation of complex tissue and plasma protein mixtures, in mass spectrometry, and in curated databases of proteins help address complexity, dynamic range, and uncertainty of protein identifications.
11
A Vision For Proteomics
Multiple protein biomarkers discovered
Biomarkers combined on diagnostic chips
Detect organ location of cancers, for surgery or radiation
Detect mechanism of disease for chemotherapy, even if location unknown
Mechanistic, rather than “geographic” classification
Better efficacy/less toxicity for all types of patients
12
Status of Proteomics Assays
• Many technology platforms of increasing sensitivity and resolution
• Patterns or specific proteins still just biomarker candidates —most lack independent confirmation and coefficient of variation, let alone “validation” with standard clinical chemistry parameters of sensitivity, specificity, and especially positive predictive value
• Approaches of clinical chemistry needed to guide further development of the field
13
Barriers for Proteomic Cancer Biomarker Discovery in Plasma
Human cancers are very heterogeneousTumor proteins are in low abundance for
early detection of cancersTumor proteins are greatly diluted upon
release to ECF and bloodPlasma is an extraordinarily complex
specimen dominated by high abundance proteins (50% by weight is albumin)
Knowledge of the plasma proteome is still limited
14
Outline of Lecture
1. Review of the vision, strategy, and output of the HUPO Human Plasma Proteome Project Pilot Phase
2. Objectives for the New Phase of the Plasma Proteome Project
3. Example of the power of computational tools and collaborations (if time)
15
HUPO
The international Human Proteome Organization (HUPO) was founded in 2001. Its aims are:
1. To advance the science of proteomics
2. To enhance training in proteomics
3. To build international initiatives by organ (liver, brain, kidney), biofluid (plasma, urine, CSF, saliva), and disease (cardiovascular, cancers), plus antibodies and data standards.
16
Proteomics Interaction MapRuth McNally, sociologist
17
Samir Hanash, founding President of HUPO Gil Omenn, leader of HUPO PPP
18
THE PLASMA PROTEOME
Advantages: The most available human specimen; the most comprehensive sample of tissue-derived proteins; the basis for a Disease Biomarkers Initiative tied to organ proteomes.
Specific Disadvantages: Extreme complexity/enormous dynamic range High risk of ex vivo modifications Lack of highly standardized protocols General Challenges: Inadequate appreciation of
incomplete sampling by MS/MS; evolving annotations and unstable databases
19
Long-Term Scientific Goals of the HUPO Human Plasma Proteome Project
1. Comprehensive analysis of plasma and serum protein constituents in people2. Identification of biological sources of variation within individuals over time, with validation of biomarkers Physiological: age, sex/menstrual cycle, exercise Pathological: selected diseases/special cohorts Pharmacological: common medications3. Determination of the extent of variation across populations and within populations
20
HUPO HUMAN PLASMA PROTEOME
PROJECT (PPP)HUPO PPP Participating Labs
Technology Vendors
Development & Validation of Biomarkers
Liver and Brain Proteome, Antibody, Protein Stds Projects
Reference Specimens
Technology Platforms--Separation and Identification
Serum vs Plasma
Omenn GS. The Human Proteome Organization Plasma Proteome Project Pilot Phase: Reference Specimens, Technology Platform Comparisons, and Standardized Data Submissions and Analyses. Proteomics 2004;4:1235-1240.
Scheme Showing Aims and Linkages of the HUPO Plasma Proteome Project, Pilot Phase
21
OUTPUT FROM PPP Pilot PhaseSpecial Issue Aug 2005, Proteomics, “Exploring
the Human Plasma Proteome”: 28 papers—collaborative analyses and annotations, plus lab-specific analyses, and Wiley book (2006)
Publicly-accessible datasets: www.ebi.ac.uk/pride [EBI]
www.peptideatlas.org/repository [ISB] www.bioinformatics.med.umich.edu/hupo/ppp Additional papers are encouraged: Nature Biotechnology 2006; 24:333-338 (States et al) Genome Biology 2006;7:R35 (Fermin et al) Proteomics 2006; 6: 5662-5673 (Omenn)Numerous citations/comparisons of datasets
22
23
1. BD: specially prepared male/female pooled samples, divided into EDTA-, Heparin-, and Citrate-anti-coagulated Plasma and Serum (250 ul x4 of each).
BD clot activator. No protease inhibitors. Three separate ethnic pools prepared. Shipped frozen.
2. Chinese Academy of Medical Sciences: Sets of three plasmas + serum, similar to BD protocol.3. National Institute for Biological Standards & Control, UK: citrate-anti-coagulated, freeze-dried plasma, from 25 donors, prepared for Intl Soc Thrombosis & Hemostasis, 1 ml aliquots/ampoules.
SERUM AND PLASMA REFERENCE SPECIMENS
24
Specifications for Data SubmissionEach of 55 labs agreed (July, 2003 Workshop) to
provide, and 31 labs did provide: a) a detailed experimental protocol, to “push the
limits” to detect low-abundance proteins b) peptide sequences, rated as “high” or “lower”
confidence, based on MS/MS criteria c) protein IDs from IPI 2.21 (July 2003) and
search engine parameters used to align peptide sequences with proteins in human database
Later, we obtained m/z peak lists and raw spectra (by DVD) for independent analyses.
25
PeptidesSample Proteins
digestion
200200 400400 600600 8008001000100012001200m/zm/z
From Peptides to Genome Annotation
Spectrum Peptide Probability Spectrum 1 LGEYGH 1.0 … … …Spectrum N EIQKKF 0.3
BLASTprotein
database
statisticalfiltering
LC-MS/MSdatabase
searchextraction
Mass Spectrum
Peptides
visualization
PeptideAtlas DatabaseGenome Browser
Map togenome
Peptide … Chrom Start_Coord End_Coord … PAp00007336 … X 132217318 132217368 … … … … … … …
SBEAMS
26
Numbers of Proteins Identified (LC-MS/MS or FTICR-MS, 18 labs)
From 15,519 reported distinct protein IDs in IPI 2.21, we chose one representative/cluster:
(a) 9504 = 1 or more peptide matches
(b) 3020 = 2+ peptide matches (Core Dataset)
(c) 1274 = 3 or more peptide matches
(d) 889 = follow-up high-stringency analysis with adjustments for protein length and multiple (43,000) comparisons in IPI v2.21
(Nature Biotech 2006; 24:333-338)
27
GREATEST RESOLUTION AND SENSITIVITY
The most extensive high-confidence yield was from combined methods of immunoaffinity (“top-6”) depletion, 2 or 3-D high-resolution fractionation, and then ESI-MS/MS with ion-trap LTQ instrument.
LTQ gave several fold more IDs (1168) than did LCQ (271) in same hands (B1-serum vs B1-heparin) and obtained multiple peptides for many proteins which had just one hit with LCQ.
28
SPECIFIC OBSERVATIONS: DEPLETION
• Many investigators depleted albumin and/or immunoglobulins
• Several were provided Agilent immunoaffinity column to remove “top-6” proteins
• Much higher numbers of identifications after depletion if sufficient fractionation
• Inadvertent removal of other proteins; “sponge” effect of albumin
• Assay both flow-through & bound fractions
29
SPECIMEN VARIABLES
What evidence have we developed for choice of specimens for analysis?
Plasma preferred over serum—more consistent, less degradation
EDTA-plasma preferred over heparin interferences and citrate dilution
Clot activator? necessary only for serum
Minimize freeze/thaw cycles (archives)
Minimal evidence of platelet activation [4C]
Protease inhibitors desirable, but alter proteins
30
INFLUENCE OF ABUNDANCE
Using quantitative immunoassays and microarrays (generally unknown epitopes), we have found very high rates of detection of the more abundant proteins, less in the mid-range, and occasional detection of very low abundance proteins, as expected.
High correlation (r=0.9) between # peptides and measured concentrations
31
Least Abundant Proteins Identified with two distinct peptides
(pg/ml: range 200 pg/ml to 20 ng/ml)
Alpha fetoprotein 2.9E+-02 TNF-R-8 3.3E+02 TNF-ligand-6 1.5E+03 PDGF-R alpha 4.6E+03 Leukemia inhibitory factor receptor 5.0E+03 MMP-2/gelatinase 8.8E+03 EGFR 1.1E+04 TIMP-1 1.4E+04 IGFBP-2 1.5E+04 Activated leukocyte adhesion mol 1.6E+04 Selectin L [five labs;10 peptides] 1.7E+04
32
BIOLOGICAL INSIGHTS
The proteins identified can be annotated by many methods. We have searched multiple databases, including Gene Ontology, Novartis Atlas, Online Mendelian Inheritance in Man (OMIM), incomplete or unidentified sequences in the human genome, microbial genomes, InterPro protein domains, transmembrane domains, secretion signals.
See Proteomics 2005; 5:3226-3519; Wiley, 2006
33
GENE ONTOLOGY SPECIFIC TERMS
Over-represented in PPP 3020 (vs whole genome): “extracellular”, “immune response”, “blood coagulation”, “lipid transport”, “complement activation”, “regulation of blood pressure”, as expected; also: cytoskeletal proteins, receptors and transporters.
Proteins from most cellular locations and molecular processes are recognized.
Under-represented: “perception of smell” (1 vs 25 exp); cation transporters, ribosomal proteins, G-protein coupled receptors, and nucleic acid binding proteins.
34
InterPro Protein Domain Analysis
Compared with the whole human genome, the 3020 PPP proteins are:
Over-represented for EGF, intermediate filament protein, sushi, thrombospondin, complement C1q, and cysteine protease inhibitor.
Under-represented: Zinc finger (C2H2, B-box, RING), tyrosine protein phosphatase, tyrosine and serine/threonine protein kinases, helix-turn-helix motif, and IQ calmodulin binding region domains.
35
TRANSMEMBRANE AND SECRETED PROTEIN FEATURES
1297 of 3020: SwissProt Annotated ProFun Both
Transmembrane 230 151 104
Secretion signal 373 420 358
1723 of 3020: ProFun Predicted TM domain(s) 137
Secretion signal 255
36
Cardiovascular-Related Proteins Biomarker Candidates in the PPP Database
Proteins characterized in eight groups:
Inflammation
Vascular
Signaling
Growth and differentiation
Cytoskeletal
Transcription factors
Channels
Receptors
37
Comparison of Five Search Algorithms
Using PPP data, Kapp et al (Proteomics 2005) found Sequest and Spectrum Mill more sensitive and MASCOT, Sonar, and X!Tandem more specific for peptide identifications at specified false-positive rates.
Some investigators have reported using combinations of two or more search engines. Decision rules are necessary.
38
Can We Overcome the Idiosyncrasies of Individual Instruments and Laboratories?
Several informatics investigators approached the human PPP with an offer to re-analyze the complete MS/MS datasets using their own software and criteria from the raw spectra (or peaklists).
These analyses eliminated the heterogeneity of search algorithms, search parameters, and idiosyncrasies of individual labs.
The results are hard to compare, given different extent of analysis. However, each can be compared with the Core Dataset.
39
Independent Analyses from Raw Spectra (#IDs with 2+ peptides)
Core Dataset (18 datasets, 3020)
• PepMiner (Beer, 8 large datasets, 2895) [1051 in 3020 dataset, + 700 in the 9504]
• X!Tandem (Beavis/States, 18 datasets, 2678) [577 in the 3020; 218 in the 889]
• PeptideProphet/ProteinProphet (Deutsch, 7+ datasets, 960)[479 in 3020]
• Mascot/Digger (Kapp, Australia, 14 datasets, 513 with 1.4% error rate; ongoing analysis
40
What is Required and Feasible to Enhance the Statistical Robustness of Findings?
Many complex proteomics analyses are done once, without replicates required to estimate coefficient of variation or other standard parameters for clinical chemistry use.
“Five to ten independent repetitions of the experiments are a must” [Hamacher et al, Proteomics in Drug Discovery, 2006].
How should we determine how similar or different are samples A and B, or the results of methods X and Y? What decision rules apply?
We have a long way to go from discovery research to clinical applications.
41
Comparison of 5 Published Reports on Plasma Proteins with HUPO PPP Datasets
Report #IDs #IPI in 3020 in 9504
Anderson 1175 990 316 471
Shen [1682] 1842 213 526
Chan 1444 1019 257 402
Zhou 210 148 51 88
Rose 405 287 142 159
42
Comparison of New Biofluid Proteome Findings with HUPO PPP-3020 Proteins
Proteome # Proteins IPI 2.21 PPP-3020 Urine 1543 910 293 tears 491 313 117 semen 923 560 180
Refs from Matthias Mann Lab, Genome Biology, 2007, different IPI versions.
Comparison, Omenn, Proteomics-Clinical Applications (2007).
43
NEXT PHASE OF PPP (PPP-2)
1. Standard operating procedures (SOPs), including EDTA-plasma as standard specimen; replication and confirmation of results
2. Quantitation and subproteomes, using new methods and advanced instruments
3. Databases and robust bioinformatics4. Clinical chem/disease-related studies
44
PPP-2 Research & Technology Thrusts
Learned a lot from Pilot Phase—plasma is a very complex specimen; no single platform sufficient; analyses currently far from comprehensive, let alone reproducible; now have improved data quality and informatics resources.
PPP-2: use multiple methods; focus on biomarker discovery; build upon already-funded laboratories and repositories.
45
Specific Technology Recommendations
N-Glycosite (proteotypic) peptide resource is a special subproteome likely to have high biomarker relevance.
Capture glycoproteins, digest with trypsin and PNGase F to yield N-linked glycopeptides. Choose one unique to each protein; a finite number; not all proteins. Use complementary lectin approach to characterize glycans.
Prepare isotope-labeled N-glyco-peptides for multiple uses as standards and to spike specimens.
46
N-Glycosites
Glycoproteins are enriched on cell surface, in secreted proteome and in plasma
Glycoproteins tend to be stable
Only few glycosites per protein: reduction in sample complexity (excludes albumin)
Inherent validation of N-glycosite by fragment ion spectrum
N-glycosite subproteome is probably the one easiest to completely map
47
Glycopeptide Isolation
PNGaseF Digestion
N-linkedglycopeptides
Capture
Non-glycoproteinsTrypsin digestion
Non-glyco-peptidesWash
Wash
Asn Asp
Zhang H., Li X.-J., Martin D.B. & Aebersold R. (2003) Nat Biotech 21: 660-666
48
Flow chart of process
Capture / Digestion
Normal & DiseaseTissue Samples
'Glycopeptide' Fract.
Data Analysis
Targeted LC/MS/MS
LC-MS
LC/MS Maps
Plasma Samples
'Glycopeptide' Fract.
Data Analysis
MRM LC/MS/MS
Target peptides
49
Reducing Complexity: Glycoprotein-Enriched Subproteomes
Methods Lab 2 Lab 11
Enrichment hydrazide chem lectin chrom’y
Peptide Fxn SCX + RP RP
Mass Spec qtof deca-xp
Search engine Seq/ProteinProphet Sequest
Protein IDs 222 83
in B1-serum [51 in common]
Of total 254, 164 found among data from 11 other labs without glycoprotein enrichment.
50
Technology Recommendations (cont’d)
Orbitrap and other advanced instruments with high mass accuracy and increased throughput
Multiple Reaction Monitoring (Q-Trap, triple quad---LOD <50 amol, 5 logs range, probably ng/ml range for GP.
Extensive fractionation and newer labeling methods.
Recruit several major labs; be open to volunteers.
Determine interest in reference specimen.
Make peptide standards available through PPP-2: post lists and make labeled compounds.
51
Multiple Reaction Monitoring (MRM)
High selectivity ~ two levels of mass selection (increased S/N)
High sensitivity because of high duty cycle (Q1 and Q3 are static)
Only known peptides (candidates) are detected
time Fixed Fixed
MS-2MS-1 CIDSource
Set precursor m/z Set fragment m/z
Peptide (M) Fragment (m)
52
Technology Recommendations (cont’d)
Compare pooled samples from disease and control; high throughput not essential for discovery phase
Continue to build the catalogDo longitudinal repeat measures on individuals to
establish CoV—must reliably tell whether two samples are the same or different, including PTMs
Pay attention to precursor ions Known interested labs: Aebersold, Paik, Smith,
Speicher, Hancock, Mann; probably Chinese, Michigan, FHCRC, Japanese/glycomics.
53
Issues for PPP Bioinformatics What are imperatives for project design?
How can many more spectra be interpreted?
How can more confident protein IDs be generated?
How do we add value and benefit from EBI/PRIDE and ISB/PeptideAtlas repositories?
What is required to make the datasets more useful for other investigators?
Can quantitation, including of PTMs, be achieved with statistical robustness?
54
A Robust Bioinformatics Architecture
Individuallabs
Level I repository
PRIDE
PeptideAtlas
Dissemination
Genomeannotation
55
Repositories and Resources for Proteomics Informatics
PRIDE at EBI, repository for protein identifications (Martens)
PeptideAtlas, repository for raw data processed through TransProteomics Pipeline at ISB (Deutsch), plus SpectraST barcodes from NIST
Tranche Distributed File System/DFS (Andrews, UM) at ProteomeCommons.org, National Resource for Proteomics and Pathways
CPAS, developed as part of Mouse Models of Human Cancers Consortium, at Fred Hutchinson (McIntosh)
GPMdb, developed by Beavis (Canada)
56
Tranche Distributed (P2P) File System
Open, simple, cross-platform protocols– e-Commerce-grade encryption makes it appropriate
for scientific research (peer-review and traceability)
– Can easily grow to accommodate very large amounts of data and users
• Commodity hardware @ $0.37 per GB storage
~16 TB over 12 servers (30 additional TB ordered) and funding for additional 20TB
Documentation, tools, code, credits: http://www.proteomecommons.org/dev/dfs
Data sets: GPM, PNNL, Aurum, QqTOF vs QSTAR, sPRG ABRF 2006, HUPO PPP– Links with PeptideAtlas, OPD, HPRD, TheGPM
57
Can We Identify More High Confidence Peptides from the MS/MS spectra?
The spectra, not protein lists, are the raw data. <20% of spectra are confidently assigned to peptide sequences; the rest are typically discarded.
More high quality spectra can be mined (Nesvizhskii et al, MCP 2006).
Higher mass accuracy greatly enhances results (with some complications---Eric Deutsch).
Error estimates and thresholds should be routine for peptide IDs and protein matches. TransProteomicPipeline (TPP) from ISB has been designed for this purpose.
58
Mining Un-assigned High Quality Spectra
(Nesvizhskii)
Typical search: SEQUEST, IPI databasesemi-constrained (tryptic on one
end)Met + 16+/- 3 Da, average mass
Average numbers (LCQ/LTQ data): 10-15% of all
spectra assigned peptide with high
confidence 20-25 % of all high quality spectra are not
assigned
59
Why Are Spectra Not Assigned?
Possible causes of failure to assign peptide:
• Imperfect scoring scheme
• Constrained search (PTM, not tryptic etc.)
• Incorrect mass/ charge state
• Low spectrum quality / contaminant ion
• Correct sequence may not be in the database searched (e.g., SNP)
• Novel sequence (splice variants, fusion peptides?)
Use MS/MS data for genome annotation
60
Finding and Mining High Quality Unassigned Spectra (Nesvizhskii)
61
Further Analyses at the Peptide Level
The PPP, GPM, and PeptideAtlas databases are rich with peptide-level findings, which can be analyzed for many questions---e.g., which peptides are most likely to be detected from among the predicted tryptic peptides of various proteins, and why? Can peptides be used directly to identify sequences of splice isoforms and SNPs? Can PTMs be identified more readily? Answers: Yes to all three questions.
Proteotypic peptides will be a major feature of Next Phase PPP.
62
What Kinds of Biological Insights Emerge from Annotation?
The aim of proteomics analyses is not just to create lists of peptides and proteins, but to advance our understanding of complex biological processes in health and disease. Going forward, quantitation of proteins and their PTMs will be increasingly important---and feasible.
63
High Throughput Proteomics and Systems Biology
condition 1
condition 2
condition 3
Integration of genomic, transcriptomic, proteomic, metabolomic data
Understanding andmodeling cell
behavior
Systems Biology
64
SUMMARY
Enthusiasm for continuing and expanding Plasma Proteome Project, confirmed at Seoul, Korea, World Congress of Proteomics Oct 2007
Commitment to combine PPP with concept of Disease Biomarker Initiative
Interest in linking with and absorbing datasets from other Biofluid Proteomes (saliva, urine, CSF, organ-related proximal fluids)
65
Biology as an Information Science: NIH Roadmap National Centers for Biomedical Computing
Informatics for IntegratingBiology and the Bedside (i2b2)Isaac Kohane, PI
Center for Computational Biology(CCB)Arthur Toga, PI
Multiscale Analysis of Genomicand Cellular Networks (MAGNet)Andrea Califano, PI
National Alliance for MedicalImaging Computing (NA-MIC)Ron Kikinis, PI
The National Center ForBiomedical Ontology (NCBO)Mark Musen, PI
Physics-Based Simulation ofBiological Structures (SIMBIOS)Russ Altman, PI
National Center for Integrative Biomedical Informatics (NCIBI) Brian Athey, PI
66
A Bioinformatics Approach to Discover Candidate Oncogenes
Few causal cancer genes have been discovered using gene expression microarrays
Oncogenic events are often heterogeneous– ERBB2/HER2 amplification in 20% of breast CA– Activating Ras mutations in 25% of melanomas– E2A-PBX1 translocation in 5-10% of leukemias
Chromosomal aberrations that result in marked over-expression of an oncogene should be detectable in transcriptome data
Protein products then may be identified in tumor, biofluids, and plasma
67
68
COPA of microarray data revealed ETV1 and ERG as outlier genes across multiple prostate cancer gene expression data sets [Tomlins et al., Science 2005, 310: 644 -648]
69
COPA Unveils Androgen-Responsive TF Fusion Genes
70
The Molecular Concept Map Project [Chinnaiyan, Rhodes]
71
72
Our Genetic Future
“Mapping the human genetic terrain may rank with the great expeditions of Lewis and Clark, Sir Edmund Hillary, and the Apollo Program.” --Francis Collins, Director
National Human Genome Research Institute, 1999
Next: Understand gene and protein expression Elucidate genetic, environmental, and
behavioral interactions in health and disease Engage scientists globally
73
Acknowledgements
HUPO PPP: Ruedi Aebersold and Young-Ki Paik, co-chairs; Eric Deutsch, Lennart Martens, Alexey Nesvizhskii, David States, bioinformatics; lab leaders and sponsors (see Proteomics 2005)
UM Proteomics Alliance for Cancer Research: Phil Andrews, David States, Alexey Nesvizhskii, George Michailidis, Mike Pisano, Arul Chinnaiyan, Dan Rhodes, Scott Tomlins, Arun Sreekumar, Adai Vellaichamy, Brian Haab
UM National Center for Integrative Biomedical Informatics: Brian Athey, David States, HV Jagadish, Jignesh Patel, Peter Woolf, Biaoyang Lin