ipam #3: childhood sarcoma classification by gene expression profiles timothy j. triche chla/usc
TRANSCRIPT
IPAM #3: Childhood Sarcoma
Classification by Gene Expression Profiles
Timothy J. TricheCHLA/USC
Clinical Classification ofChildhood Cancer
• Historical: Morphologic diagnosis + clinical data => risk group, protocol eligibility, treatment (eg, group-based treatment)
• Current: Combined (morphology, immunophenotype, genomic defect) => patient-specific group-based treatment
• Future: Patient-specific therapy, based on multi-genic phenotype?
Osteosarcoma
• Five histologic types, no prognostic value
• Weak prognostic features: site, size, age
• No specific, predictive genetic abnormality (RB, p53)
• Clinical stage only significant prognostic indicator at presentation
Osteosarcoma Prognosis
• Pre-resection chemotherapy => major increase in survival
• Improved survival limited to patients with ≥95% tumor kill
• Patients w/ metastases can be salvaged
But, many exceptions occur:– Responders who metastasize & die– Non-responders who survive– Metastatic patients who survive after resection of mets
Thus, predicting outcome & tailoring therapy remains a major problem
Osteosarcoma: Response to Chemo
Before After
Osteosarcoma Survival
• Surgery only: <10%
• Metastases, no surgery: 0%
• Metastases, surgery: ~20%
• Single-agent chemotherapy: <20%
• Conventional chemotherapy: ~44%
• Up-front chemotherapy: ~65%
• Responders: ~80%
• Non-responders: <40%
Multi-gene Analysis by Microarrays
• Single gene abnormalities, even when present, are inadequate alone to:
– Establish a diagnosis
– Identify individual patients risk profile
– Predict clinical course
– Predict response to therapy
– Predict outcome
• Increasing evidence suggests gene expression profiles may favorably address these issues
Gene Expression Analyses
• Scatter Analyses– 1 X 1– Groups
• Outlier Gene Analyses– Up & down regulated from
mean– Identity
• Cluster Analyses– All genes– Various methods
Specimen Handling
A) Cut pilot section of OCT embedded frozen tissue
tumor
non-tumor dissection of tumor
tissue when
possible
puretumor
B) Cut ~12 frozen sections
C) Extract RNA (<5ug total RNA)
D) Synthesis of double-stranded cDNA
E) In-vitro transcription w/ biotinylated nucleotides
F) Size confirmation of cRNA transcripts
G) Fragmentation of cRNA
500 bp
Osteosarcoma: Gross Appearance
Histopathology of Osteosarcoma
Gene Expression: Osteosarcoma
6= primary tumor, 1993
11= first metastasis, 1996
9= second metastasis, 1998
(died 1999)
12
Met 1 vs. met 2: little similarity
Pilot data
Primary vs. 1st MetastasisPrimaryPrimary 11stst Pulmonary Met Pulmonary Met
Differential Gene Expression:Primary vs. Metastatic Osteosarcoma
-50000 0 50000 100000 150000 200000 250000
Ribosomal Protein L30
Osteonectin
Ribosomal Protein L37A
TF SL1
Thymosin
IMP E16
Pinch Protein
CPT1
Cyclin A
Tat-SF1
CAMP PK RII subunit
NGF beta
Tyrosine Phosphatase
PIGA, A
Uncoupling Protein 3
PSA
Ribosomal Protein L32
PSG11
Primary
Metastasis
OsteonectiOsteonectin lost in n lost in
metastasismetastasis
Primary vs. “Metastasis”
Primary, 1993, pre-RxPrimary, 1993, pre-Rx Tibia lesion, 1998, pre-Rx Tibia lesion, 1998, pre-Rx
Gene Expression Data ClusteringGene Expression Data Clustering
Multiple methods workMultiple methods work
PatternPattern TestedTested
New Postulated New Postulated PatternsPatterns
Optimized Set of Optimized Set of PatternsPatterns
Millions of possible patternsGenerate possible patterns:
Scenario analysis Non-numeric
simulations Computational
linguistics Neural networks Linear/non-linear
optimization methodology
Neural net uses data to optimize pattern
Discovery of Discovery of patterns buried patterns buried
in massive in massive datasetdataset
No process knowledgeNo process knowledge
Pattern Pattern recognitionrecognition
New rules developed
Limited set of probable patterns
Iterative Process
Pattern Recognition
Postulated Postulated PatternsPatterns
DataData
Agglomerative vs. Optimizing Hierarchical Clustering
• Both build a tree of clusters, with data points as leaves, & “nearby” data points as siblings.
• Agglomerative method repeatedly finds closest pair and irreversibly groups them. Bottom-up. Binary tree.
• Optimization methods reconsider assignments based on other assignments and their effects on cluster means & variances.
• Minimize sum of squared distances.– Distance measure matters.– Relate to statistical noise models,
co-regulation models & likelihood of fit.
Agglomerative vs. Optimizing Hierarchical Clustering, cont.
• Optimize means, variances, and cluster memberships.
• Currently we optimize top-down, by levels
• Expectation Maximization: soft memberships. K-means: hard.
• Optimize tree topology (fanout) by CV
• SOM also optimizes at one level, and requires low-dimensional grid embedding of cluster means.
• Alternative to data-cluster distances: cliques of low data-data distances. Also has EM-like stat mech algorithms.
Mimir User Interface
Courtesy of Eric Mjolsness, JPL
Data Flow forSarcoma Analysis
data
labelsscoring
classifierssample
clusteringgene
clustering
Pilot Study of Sarcomas17 cases of osteosarcoma and rhabdomyosarcoma
6800 GeneChip analysis
6800 genes yield 14 gene clusters
Reduced mean space yields 4 sample clusters
OSOS
OS, OSERMS
OS, ARMS 1ªERMS X 4
OS x 3ARMS met X
3
Expandable Tree of Variables Characterizing a Tissue Sample
Clinical response
All variables
Subject Conditions Genes
Outcomes Clinical Demographics
Metastasis Survival PathologyTreatment Age, Sex, etc.…
EM (Expectation Maximization) Gene Clustering
A B C F G D J K
= POOR = INTERMEDIATE = FAVORABLE
Sarcoma Dataset: 45 cases of RMS (Alv + Emb) & Osteosarcoma (R + NR)
Working hypothesis:
Gene expression profiling can detect prognostic distinctions among sarcomas independently of conventional clinical or diagnostic criteria
Future Directions
• Analyze larger data set (institutional, COG) to test hypothesis
• Expand to all sarcomas (RMS, non-RMS, OS, ESFTs)
• Identify biologically important genes
• Creation of custom “sarcoma” arrays using oligomers representing these genes
• Long term studies of COG sarcoma patients using arrays in context with current clinical & biology studies
All osteosarcomas
Osteosarcoma vs RMS GenesLog(0steo)
Mean(Osteo)
Log(Rhabdo)
Mean(Rhabdo)
Mean logRhabdo GENE DESCRIPTION
5.68194.82
7.966357.93
2.83 TNNT 1 Troponin T1, skeletal, slow
6.501952.82
8.8716832.71
2.83 M EST M esoderm specific transcript (m ouse) homolog
7.312240.65
9.7731139.89
2.83 IGF2 Insulin-like growth factor 2 (somatomedin A)
6.01162.35
8.288099.00
2.79 FGFR4 Fibroblast growth factor receptor 4
6.662058.88
8.9313677.36
2.71 IGF2 Insulin-like growth factor 2 (somatomedin A)
5.90 -629.94
7.939770.93
2.56 Adrenal-Specific Protein Pg2
5.751225.65
7.713923.54
2.49 Steroid receptor coactivator (SR C-1) mRN A
6.431166.59
8.409229.32
2.44 GB DEF = DNA for cellular retinol binding protein (CRB P) exons 3 and 4
5.64 -890.82
7.483556.00
2.39 RB P1 Cellular retinol-binding protein
8.6312771.76
10.7577271.57
2.36 Insulin-Like Growth Factor 2
8.2512913.76
5.632011.11
2.33 PTN Pleiotrophin (heparin binding growth factor 8, neurite growth-
promoting factor 1) 6.18 -
1124.24 8.01
7048.57 2.33 M uscle acetylcholine receptor alpha-subunit
8.7719340.53
6.291756.00
2.25 M M P2 Matrix metalloproteinase 2 (gelatinase A; collagenase type IV)
7.091659.82
8.5810916.54
1.90 CCND 2 Cyclin D2
6.36 -907.00
7.724836.68
1.84 TNN I1 Troponin I, skeletal, slow
6.62732.18
8.006595.57
1.83 M YL1 Myosin light chain (alkali)
Proposed COG Study of All Sarcomas
Acknowledgements
• CHLA:– Deb Schofield– Jingsong Zhang
• USC:– Jonathan Buckley– Kim Siegmund
• NCCF:– Mark Krailo
• Caltech:– Barbara Wold– Chris Hart
• JPL:– Eric Mjolsness– Tobias Mann– Joe Roden– Ben Bornstein
• UBC:– Poul Sorensen