head and neck cancer: microrna analysis amy li monti lab rotation boston university 11/25/13
TRANSCRIPT
Head and Neck Cancer: microRNA analysis
Amy Li
Monti Lab Rotation
Boston University
11/25/13
Dataset
Head and Neck Cancer Dataset from The Cancer Genome Atlas (TCGA)o Contains large and well-documented data in many cancer subtypeso DNA-methylation, SNP Array, RNA-seq, miRNA-seq, low pass DNA-seq,
Reverse Phase Protein Array Normalized miRNAseq data
o 463 samples 39 patients: tumor tissue and adjacent normal tissue 385 patients: tumor tissue only
o 1046 miRNA Clinical information
o 360 samples, 71 clinical attributes (ie. anatomic subdivision, gender, grade, race, stage)
Goals
Identify miRNA markers of:o Cancer status:
Normal vs. tumoro Cancer progression:
Differentially expressed in each stage or grade
Integrate miRNA expression with gene expression datao Gene set enrichment in miRNA targetso mRNA data (Vinay)
Tumor Progression Classification
Tumor Grade: Assigned based on how abnormal the tumor cells looks
under a microscope Ranges from G1 (well-differentiated) to G4
(undifferentiated) Well differentiated tumor cells from a lower grade
resemble normal cells, tend to spread slowly, and is generally indicative of better prognosis
Tumor Stage: Based on size or extent (reach) of the primary tumor
http://www.cancer.gov/cancertopics/factsheet/detection/tumor-grade
Analysis Overview
Exploratory data analysiso Mean vs. standard deviationo Boxplots of miRNA expression o Data filteringo Clinical demographics
Unsupervised clusteringo Heatmapso Fisher test for association between clusters and sample attribute assignment (tumor
status, grade, stage, etc) Tests for confounders
o Association between grade and other attributes, ie: ethnicity, gender, smoking history, age, alcohol consumption
Differential Analysiso Look for differentially expressed genes with respect to grade or stage
miRNA targets and Gene Set Enrichment Analysiso Identify sets of miRNA targets and see whether such gene sets are enriched with
respect to disease phenotype
Exploratory Data Analysis:Mean vs. standard deviation
Exploratory Data Analysis: Boxplot of Expression
Exploratory Data Analysis:Data filtering
Sample filtering: Samples without clinical labels
Gene filtering: Lowly expressed genes
o Row maximum Genes with constant expression
o Standard deviation
Full Matrix• 1046 × 463
Filtered Matrix• 692 × 393
Exploratory Data Analysis:Clinical Demographics
Exploratory Data Analysis:Clinical Demographics
Exploratory Data Analysis:Clinical Demographics
Unsupervised Clustering:Paired samples: Tumor vs. Adjacent Normal
Fisher Test: Test for association
between Cluster Assignment and Actual Class Label
P-val ~ 0
Cluster 1 Cluster 2
Normal 38 1
Tumor 2 37
Unsupervised Clustering:Grades: G1, G2, G3, G4
Fisher Test: Tested for association
between grades and cluster assignments for total number of clusters ranging from 2 to 5
P-vals not significant in all cases
Tests for confounders
Tested for association between grade and the putative confounding variable using Fisher test (discrete variables) or ANOVA (continuous variables)
Ethnicity (p=0.57), race (p=0.84), gender (p=0.09), age (p=0.55), alcohol consumption (p=0.63)
Correct gene expression for gender using a linear regression model prior to performing differential analysis
Data Processing for Differential Analysis
Sample
Filtering
• Removed samples without clinical labels• Removed samples sequenced on IlluminaGA (kept IlluminaHiseq samples)• Removed samples with minority races (kept “white”)
Gene filtering
• Removed miRNAs with low expression (90% quantile < 100)• Removed miRNAs with constant expression (sd < 0.1)
Attribute Filtering
• Grade:• Removed GX and “Not Available”• Kept G1, G2, G3, G4
• Stage:• Removed “Not Available”• Kept S1, S2, S3, S4A, S4B
Differential Analysis:Grade
diffAnal.Ro Performs permutation tests to identify significant genes
differentially regulated in one of two classes Normalized expression matrix corrected for gender Class label: grade attribute binarized to “low” vs. “high” Run diffAnal for each high vs. low cutoff:
o G0 (adjacent normal) vs. G1-G4 (tumor)o G1 vs. G2-G4o G1-G2 vs. G3-G4o G1-G3 vs. G4
Differential Analysis:Grade : G0 vs. G1-G4
Differential Analysis:Grade : G1 vs. G2-G4
Differential Analysis:Grade : G1-G2 vs. G3-G4
Differential Analysis:Grade : G1-G3 vs. G4
Differential Analysis: Trends
Found more significant markers for tumors vs. normal than for distinguishing between low and high grades
Performed same analysis for stage, significant markers for stage are weaker than that of grade
For both grade and stage, most significant markers found by diffAnal show upregulation in the later disease state.
Differential Analysis:Tumor Classification Marker
148 total genes (90% quantile > 100) used for diffAnal 65 significant genes upregulated in tumors 37 significant genes downregulated in tumors Cutoff: FDR < 0.01
Differential Analysis:Cancer Progression Marker for Grades
G1-_vs_G2+_fdr G2-_vs_G3+_fdr G3-_vs_G4+_fdr
hsa-mir-106b 0.03 0.01 0.04
hsa-mir-15b 0.03 0.05 0.81
hsa-mir-582 0.03 0.01 0.04
hsa-mir-151 0.03 0.61 -0.25
hsa-mir-196b 0.03 0.01 0.18
hsa-mir-10a 0.03 0.32 -0.96
hsa-mir-374a 0.03 0.05 -0.6
hsa-mir-128-2 0.04 0.61 0.26
hsa-mir-25 0.03 0.01 0.02
hsa-mir-128-1 0.03 0.47 0.17
hsa-mir-28 0.04 0.01 0.44
Cancer progression marker will satisfy ALL of:1. Tumor classification marker2. Significant FDR in 2/3 runs of diffAnal3. Monotonous increase or decrease across grades
4 miRNA markers identified (all are upregulated with increasing grade)
Differential Analysis:Cancer Progression Marker for Grades
Finding miRNA Targets
miRWalk Targetscan mirBase
Finding miRNA Targets
miRWalk “Validated targets” module Targets for differentially expressed miRNAs: 162 targets Intersect targets found by miRWalk with AhR targets 6 matches: NQO1, NFE2, IL1B, TNF, TGFB1, MYC
miRNA
Targets
(162)
AhR Targets
(54) (6)
miRNA markers
(4)
Work in Progress
Gene set enrichment analysis: Consider targets of strong miRNA markers as a gene set Is there an enrichment of this defined gene set in certain
disease phenotypes, ie. high grade?Pathway analysis: Which pathways are these miRNA markers involved in?Modeling tumor progression: Explore other definitions of tumor progression markers