what’s new in gdac firehose? raw mafs for many cancer types, mutation samples continued to be...

1
What’s new in GDAC Firehose? Raw MAFs For many cancer types, mutation samples continued to be sequenced after paper publication. Previously, we only packaged and ran analyses on the mutations submitted with the paper in such cases. We are excited to announce that our latest analyses run also include post-publication samples, so far adding over 1500 mutation samples to our data stream. New Analyses 7 new analyses in our latest run, over 300 new reports: Correlate_Clinical_vs_Mutation_APOBEC_Categorical Correlate_Clinical_vs_Mutation_APOBEC_Continuous Correlate_mRNAseq_vs_Mutation_APOBEC miRseq_FindDirectTargets Mutation_CoOccurrence Pathway_GSEA_mRNAseq Pathway_Overlaps_MSigDB_MutSig2CV With iCoMut researchers can now play with coMut plots interactively, sorting and reordering the samples and results as they see fit; this provides a powerful synoptic tool for interpretation and exploration. The Broad Institute GDAC Firehose Born of the desire to systematize analyses from The Cancer Genome Atlas pilot and scale their execution to the dozens of remaining diseases to be studied, Firehose now sits atop ~54 terabytes of TCGA data and reliably executes thousands of pipelines per month. Version-stamped, standardized datasets Version-stamped packages of standard scientific analysis results Version-stamped custom runs for TCGA analysis working groups Making the results of such available through biologist- friendly and literature-citable online reports, and en masse through firehose_get Michael S. Noble 1 , Katherine Huang 1 , Kane Hadley 1 , Noam Shoresh 1 , Eila Arich-Landkof 1,2 , Benjamin Alexander 1 , David I. Heiman 1 , and Gad Getz 1,2 1 Broad Institute of MIT and Harvard, Cambridge, MA ◼ 2 Massachusetts General Hospital, Boston, MA New Features in FireBrowse Continuing to grow out the API and interactive documentation Quartiles for mRNASeq expression Patients for participant barcodes, optionally filtered by cohort Open source client-side wrappers for the RESTful API: fbget - https://confluence.broadinstitute.org/display/GDAC/ fbget High- and low-level Python bindings The fbget CLI, for easy access from UNIX command line Automatically generated to keep synchronized with the RESTful API FireBrowseR - https://github.com/mariodeng/ FirebrowseR R bindings developed by a Ph.D candidate in Germany Visualization tools: iCoMut - interactive visualization of mutation co- occurrence viewGene - a mRNASeq gene expression viewer RESTful API Powering the FireBrowse GUI is a RESTful application programming interface (API) with 27 functions and growing, fully open to the public. Gives bulk or fine-grained access to: Sample-level data Firehose analyses Standard data archives Metadata Simplified Portal Access In the summer of 2014 we introduced firebrowse.org, which makes it easy to find any of the thousands of TCGA datasets or Firehose analysis result reports in just 2 clicks. viewGene Utilizing the RESTful API, the viewGene tool generates a boxplot of mRNASeq expression levels for a selected gene across all cohorts. Spring 2015 Analysis Workflow : Mining the Firehose of TCGA 0 5 10 15 Mutation Rate non synonym ous synonym ous MutationRate age ClinicalAge vitalstatus C linicalVitalStatus gender ClinicalGender histology ClinicalHistology race ClinicalEthnicity 0 100 200 # Mutations NoM utation S yn In-fram e INDEL O therN on Syn M issense Splice Site Fram eshift N onsense NA ShowGrids Hide Grids IDH 1 IDH 2 TP53 ATRX C IC NOTCH1 FUBP1 P IK3C A NF1 E G FR SMARCA4 P IK3R 1 PTEN BCOR A RID 1A ZBTB20 TCF12 PTPN11 PDGFRA* ZCCHC12 -log10(q) GeneMutation 0 10 20 30 #SCNAs N o C hange D eletion Loss Gain A m plification NA ShowGrids Hide Grids Am p_10p15.3 Am p_7p11.2 A m p_13q34 Am p_12q14.1 A m p_4q12 Am p_7q32.3 Am p_8q24.13 Am p_12p13.32 Am p_7q31.2 Am p_Xp11.22 Am p_1q32.1 Am p_19p13.3 Am p_11q24.1 Am p_3q26.33 Am p_17q25.1 -log10(q) FocalLevelC N G ain 0 50 100 #SCNAs N o C hange D eletion Loss Gain A m plification NA ShowGrids Hide Grids D el_19q13.42 D el_19q13.41 D el_1p36.32 D el_1p32.3 D el_9p21.3 D el_9p23 D el_10q26.3 D el_14q24.3 D el_11p15.5 D el_13q14.2 D el_6q24.3 D el_4q34.3 D el_6q22.31 D el_11p15.1 D el_22q13.31 D el_18q23 D el_2q37.3 D el_6p25.3 D el_12q12 D el_5p15.33 D el_3p21.31 D el_3q29 D el_5q34 D el_X q21.1 -log10(q) FocalLevelC N Loss mRNA cNM F C LU S_m R N A_cN M F mRNAcHierarchical C LU S_m R N A_cH ierarchical CN cNMF CN cNMF Methlyation cNM F M ethylationcNM F RPPA cNMF R PPA cNM F Clusters RP PA cH ierarchical R PPA cHierarchical m RNAseq cNM F m RNAseq cNM F m RN Aseq cH ierarchical m RNAseq cHierarchical m iRseq cNM F m iRseq cNM F m iRseq cH ierarchical m iRseqcHierarchical miRseq Mature cNM F m iRseqM ature cNMF miRseq Mature cHierarchical m iRseqM ature cHierarchical Astrocytoma Oligoastrocyt oma Oligodendroglioma 0 5 10 15 Mutation Rate non synonym ous synonym ous MutationRate age ClinicalAge vitalstatus C linicalVitalStatus gender ClinicalGender histology ClinicalHistology race ClinicalEthnicity 0 100 200 # Mutations NoM utation S yn In-fram e INDEL O therN on Syn M issense Splice Site Fram eshift N onsense NA ShowGrids Hide Grids IDH 1 IDH 2 TP53 ATRX C IC NOTCH1 FUBP1 P IK3C A NF1 E G FR SMARCA4 P IK3R 1 PTEN BCOR A RID 1A ZBTB20 TCF12 PTPN11 PDGFRA* ZCCHC12 -log10(q) GeneMutation 0 10 20 30 #SCNAs N o C hange D eletion Loss Gain A m plification NA ShowGrids Hide Grids Am p_10p15.3 Am p_7p11.2 A m p_13q34 Am p_12q14.1 A m p_4q12 Am p_7q32.3 Am p_8q24.13 Am p_12p13.32 Am p_7q31.2 Am p_Xp11.22 Am p_1q32.1 Am p_19p13.3 Am p_11q24.1 Am p_3q26.33 Am p_17q25.1 -log10(q) FocalLevelC N G ain 0 50 100 #SCNAs N o C hange D eletion Loss Gain A m plification NA ShowGrids Hide Grids D el_19q13.42 D el_19q13.41 D el_1p36.32 D el_1p32.3 D el_9p21.3 D el_9p23 D el_10q26.3 D el_14q24.3 D el_11p15.5 D el_13q14.2 D el_6q24.3 D el_4q34.3 D el_6q22.31 D el_11p15.1 D el_22q13.31 D el_18q23 D el_2q37.3 D el_6p25.3 D el_12q12 D el_5p15.33 D el_3p21.31 D el_3q29 D el_5q34 D el_X q21.1 -log10(q) FocalLevelC N Loss mRNA cNM F C LU S_m R N A_cN M F mRNAcHierarchical C LU S_m R N A_cH ierarchical CN cNMF CN cNMF M ethylationcNM F R PPA cNM F Clusters R PPA cHierarchical m RNAseq cNM F m RNAseq cNM F m RN Aseq cH ierarchical m RNAseq cHierarchical m iRseq cNM F m iRseq cNM F m iRseq cH ierarchical m iRseqcHierarchical miRseq Mature cNM F m iRseqM ature cNMF miRseq Mature cHierarchical m iRseqM ature cHierarchical In the right panel we re-sort by copy- number clustering, making it further apparent that not only is the copy-number landscape different with the lack of aforementioned mutations, there also seems to be involvement with EGFR and PTEN. The left panel is very close to what iCoMut displays out-of-the-box: sorted first by the histology clinical parameter, then by gene (initially ordered by descending mutation count). It is quickly apparent that copy-number changes differ when IDH1/2, TP53, and ATRX mutations drop off. Brain Lower Grade Glioma (TCGA Network 2015, in press) iCoMut CoMut plots are a staple of many TCGA papers. They provide a comprehensive analysis profile in a single graphic, enabling the reader to infer relationships between co- occurring results.

Upload: oswin-west

Post on 19-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: What’s new in GDAC Firehose? Raw MAFs For many cancer types, mutation samples continued to be sequenced after paper publication. Previously, we only packaged

What’s new in GDAC Firehose?

Raw MAFsFor many cancer types, mutation samples continued to be sequenced after paper publication. Previously, we only packaged and ran analyses on the mutations submitted with the paper in such cases. We are excited to announce that our latest analyses run also include post-publication samples, so far adding over 1500 mutation samples to our data stream.

New Analyses7 new analyses in our latest run, over 300 new reports:• Correlate_Clinical_vs_Mutation_APOBEC_Categorical• Correlate_Clinical_vs_Mutation_APOBEC_Continuous• Correlate_mRNAseq_vs_Mutation_APOBEC• miRseq_FindDirectTargets• Mutation_CoOccurrence• Pathway_GSEA_mRNAseq• Pathway_Overlaps_MSigDB_MutSig2CV

With iCoMut researchers can now play with coMut plots interactively, sorting and reordering the samples and results as they see fit; this provides a powerful synoptic tool for interpretation and exploration.

The Broad Institute GDAC FirehoseBorn of the desire to systematize analyses from The Cancer Genome Atlas pilot and scale their execution to the dozens of remaining diseases to be studied, Firehose now sits atop ~54 terabytes of TCGA data and reliably executes thousands of pipelines per month.

• Version-stamped, standardized datasets• Version-stamped packages of standard scientific analysis results• Version-stamped custom runs for TCGA analysis working groups

Making the results of such available through biologist-friendly and literature-citable online reports, and en masse through firehose_get

Michael S. Noble1, Katherine Huang1, Kane Hadley1, Noam Shoresh1, Eila Arich-Landkof1,2, Benjamin Alexander1, David I. Heiman1, and Gad Getz1,2

1Broad Institute of MIT and Harvard, Cambridge, MA ◼︎ 2Massachusetts General Hospital, Boston, MA

New Features in FireBrowse• Continuing to grow out the API and interactive documentation

Quartiles for mRNASeq expressionPatients for participant barcodes, optionally filtered by cohort

• Open source client-side wrappers for the RESTful API:fbget - https://confluence.broadinstitute.org/display/GDAC/fbget

• High- and low-level Python bindings• The fbget CLI, for easy access from UNIX command line• Automatically generated to keep synchronized with the

RESTful APIFireBrowseR - https://github.com/mariodeng/FirebrowseR

• R bindings developed by a Ph.D candidate in Germany• Visualization tools:

iCoMut - interactive visualization of mutation co-occurrenceviewGene - a mRNASeq gene expression viewer

RESTful API

Powering the FireBrowse GUI is a RESTful application programming interface (API) with 27 functions and growing, fully open to the public.

Gives bulk or fine-grained access to:• Sample-level data• Firehose analyses• Standard data archives• Metadata

Simplified Portal AccessIn the summer of 2014 we introduced firebrowse.org, which makes it easy to find any of the thousands of TCGA datasets or Firehose analysis result reports in just 2 clicks.

viewGene

Utilizing the RESTful API, the viewGene toolgenerates a boxplot of mRNASeq expression levels for a selected

gene across all cohorts.

Spring 2015Analysis

Workflow

: Mining the Firehose of TCGA

iCoMutBeta for FireBrowse

0

5

10

15Mutation Rate

non synonymoussynonymous

Mutation Rate

ageClinical Agevital statusClinical Vital Status

genderClinical GenderhistologyClinical Histology

raceClinical Ethnicity

0100200

#Mutations

No MutationSynIn-frame INDELOther Non SynMissenseSplice SiteFrameshiftNonsenseNA

Show Grids Hide GridsIDH1

IDH2

TP53

ATRX

CIC

NOTCH1

FUBP1

PIK3CA

NF1

EGFR

SMARCA4

PIK3R1

PTEN

BCOR

ARID1A

ZBTB20

TCF12

PTPN11

PDGFRA*

ZCCHC12

-log10(q)

Gene Mutation

0102030

#SCNAs

No ChangeDeletionLossGainAmplificationNA

Show Grids Hide GridsAmp_10p15.3Amp_7p11.2Amp_13q34

Amp_12q14.1Amp_4q12

Amp_7q32.3Amp_8q24.13

Amp_12p13.32Amp_7q31.2

Amp_Xp11.22Amp_1q32.1

Amp_19p13.3Amp_11q24.1Amp_3q26.33Amp_17q25.1

-log10(q)

Focal Level CN Gain

050100

#SCNAs

No ChangeDeletionLossGainAmplificationNA

Show Grids Hide GridsDel_19q13.42Del_19q13.41Del_1p36.32Del_1p32.3Del_9p21.3

Del_9p23Del_10q26.3Del_14q24.3Del_11p15.5Del_13q14.2Del_6q24.3Del_4q34.3

Del_6q22.31Del_11p15.1

Del_22q13.31Del_18q23

Del_2q37.3Del_6p25.3Del_12q12

Del_5p15.33Del_3p21.31

Del_3q29Del_5q34

Del_Xq21.1

-log10(q)

Focal Level CN Loss

mRNA cNMFCLUS_mRNA_cNMFmRNA cHierarchicalCLUS_mRNA_cHierarchical

CN cNMFCN cNMFMethlyation cNMFMethylation cNMF

RPPA cNMFRPPA cNMF ClustersRPPA cHierarchicalRPPA cHierarchical

mRNAseq cNMFmRNAseq cNMFmRNAseq cHierarchicalmRNAseq cHierarchical

miRseq cNMFmiRseq cNMFmiRseq cHierarchicalmiRseq cHierarchicalmiRseq Mature cNMFmiRseq Mature cNMF

miRseq Mature cHierarchicalmiRseq Mature cHierarchical

Astrocytoma Oligoastrocytoma Oligodendroglioma

iCoMutBeta for FireBrowse

0

5

10

15Mutation Rate

non synonymoussynonymous

Mutation Rate

ageClinical Agevital statusClinical Vital Status

genderClinical GenderhistologyClinical Histology

raceClinical Ethnicity

0100200

#Mutations

No MutationSynIn-frame INDELOther Non SynMissenseSplice SiteFrameshiftNonsenseNA

Show Grids Hide GridsIDH1

IDH2TP53

ATRXCIC

NOTCH1

FUBP1PIK3CA

NF1

EGFRSMARCA4

PIK3R1PTEN

BCOR

ARID1AZBTB20

TCF12

PTPN11

PDGFRA*ZCCHC12

-log10(q)

Gene Mutation

0102030

#SCNAs

No ChangeDeletionLossGainAmplificationNA

Show Grids Hide GridsAmp_10p15.3Amp_7p11.2Amp_13q34

Amp_12q14.1Amp_4q12

Amp_7q32.3Amp_8q24.13

Amp_12p13.32Amp_7q31.2

Amp_Xp11.22Amp_1q32.1

Amp_19p13.3Amp_11q24.1Amp_3q26.33Amp_17q25.1

-log10(q)

Focal Level CN Gain

050100

#SCNAs

No ChangeDeletionLossGainAmplificationNA

Show Grids Hide GridsDel_19q13.42Del_19q13.41Del_1p36.32Del_1p32.3Del_9p21.3

Del_9p23Del_10q26.3Del_14q24.3Del_11p15.5Del_13q14.2Del_6q24.3Del_4q34.3

Del_6q22.31Del_11p15.1

Del_22q13.31Del_18q23

Del_2q37.3Del_6p25.3Del_12q12

Del_5p15.33Del_3p21.31

Del_3q29Del_5q34

Del_Xq21.1

-log10(q)

Focal Level CN Loss

mRNA cNMFCLUS_mRNA_cNMFmRNA cHierarchicalCLUS_mRNA_cHierarchical

CN cNMFCN cNMF

Methylation cNMF

RPPA cNMF Clusters

RPPA cHierarchical

mRNAseq cNMFmRNAseq cNMFmRNAseq cHierarchicalmRNAseq cHierarchical

miRseq cNMFmiRseq cNMFmiRseq cHierarchicalmiRseq cHierarchicalmiRseq Mature cNMFmiRseq Mature cNMF

miRseq Mature cHierarchicalmiRseq Mature cHierarchical

In the right panel we re-sort by copy-number clustering, making it further apparent that not

only is the copy-number landscape different with the lack of aforementioned mutations, there also seems to be involvement with EGFR and PTEN.

The left panel is very close to what iCoMut displays out-of-the-box: sorted first by the histology clinical parameter, then by gene (initially ordered by descending mutation

count). It is quickly apparent that copy-number changes differ when IDH1/2, TP53, and ATRX

mutations drop off.

Brain Lower Grade Glioma (TCGA Network 2015, in press)

iCoMut

CoMut plots are a staple of many TCGA papers. They provide a comprehensive analysis profile in a single

graphic, enabling the reader to infer relationships between

co-occurring results.