whole transcriptome profiling of cancer tumors in mouse pdx models

12
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models http://www.impactjournals.com/oncotarget/index.php?journal=oncotarget&page=article&op=view&path%5B %5D=8014 Based on Breast Cancer Samples taken from the publication “Whole transcriptome profiling of patient-derived xenograft models as a tool to identify both tumor and stromal specific biomarkers” (James R. Bradford et. al.; DOI: 10.18632/oncotarget.8014)

Upload: tom-koch

Post on 09-Jan-2017

733 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

http://www.impactjournals.com/oncotarget/index.php?journal=oncotarget&page=article&op=view&path%5B%5D=8014

Based on Breast Cancer Samples taken from the publication “Whole transcriptome profiling of patient-derived xenograft models as a tool to identify both tumor and stromal specific biomarkers” (James R. Bradford et. al.; DOI:

10.18632/oncotarget.8014)

Page 2: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

IntroductionFrom a complex data set that included a number of cancer types in several different mouse

species (Whole Transcriptome Profiling of PDX Models), a focused dataset can be extracted to look at transcriptional differences between cancer subtypes and expression-based interaction between tumor and stroma. Specifically, this dataset contains 21 samples from 3 subtypes of

breast cancer in 4 different mouse models. Human tumor cells from patients with varying acuteness of cancer were placed in 4 different

mouse models. At a later point, RNA from human tumor cells and mouse stroma cells was extracted and analyzed using unsupervised and supervised analysis methods on the T-Bio

platform. The goal was to identify differences in expression as well as select representative genes that could be considered as biomarker candidates. Of special interest was

transcriptional stromal response to tumor type due to a major role stroma cells play in determining tumor malignancy.

Page 3: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

PDX Mouse Species

XID: This mouse species is characterized by the absence of the thymus , mutant B lymphocytes and

no T cell function.

NOD SCID: Combined immunodeficiency, with no mature T cells and B cells.

Athymic Nude: This mouse species lacks the thymus and is

unable to produce T-cells

CB17 SCID: a severe combined immunodeficiency affecting both B

and T lymphocytes. They have normal NK cells, macrophages, and

granulocytes.  Breast TN: Triple Negative Breast Cancer, this cancer is negative (based on gene expression) for common biomarker genes including ER, PR, and HER2 (genes that express hormones) and does not respond to typical hormonal therapy. Survival rates are lower for this cancer than ER+ cancer types. Breast ER+: Estrogen Receptor Positive, this is the most common breast cancer diagnosed. Treatment often includes Hormone Therapy and has a more positive outlook in the short term. Breast HER2+: Human Epidermal growth factor Receptor Positive, tends to be a more aggressive cancer type than ER+.  

After samples were extracted, RNA libraries were prepared with the Illumina TruSeq RNA Sample Preparation kit (un-stranded) according to the manufacturer’s protocol. These libraries were then submitted for 100 bp paired-end sequencing on the Illumina HiSeq 2000 platform using one lane per three to six PDX models.

Sample Summary

Page 4: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

RNA-seq pipeline prepares all annotated and non-annotated genomic element estimation of

expression levels

Removing genomic elements that did not have any expression (all zeros) in the RSEM table.

Quantile NormalizationPrincipal Component

Analysis

RSEM output tables of genes, isoforms and exons are prepared for Machine

Learning Analysis1. Mapping TopHat2. Finding Isoforms using Cufflinks3. GTF file of isoforms using Cuffmerge4. Mapping Bowtie-2t on new transcriptome Factor Regression Analysis

Analysis Pipeline Overview

Page 5: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

Principal Component Analysis (PCA)

Principal Component Analysis is a data reduction technique that represents the dataset structure on a 2 dimensional plane projected on principal component coordinates. Those components that explain the most percentage of variability are chosen as principal.After genetic elements such as genes and isoforms were mapped and exported as a table, they were prepared for machine learning analysis. Zero level expression across all samples were removed. All values were normalized using quintile normalization. PCA was performed using log values.

-20 -15 -10 -5 0 5 10 15

-10

-5

0

5

10

15

PCA of Human and Mouse Genes (RSEM-FDR 0.05)

Mouse (Stroma)

-20 -15 -10 -5 0 5 10 15 20

-15

-10

-5

0

5

10

15

PCA Mouse and Human RSEM FDR: 0.05 After Batch Correction

Mouse (Stroma)

Before Correction After Correction

Batch Effect Correction

Batch Effect: unwanted technical interference that occurs when when data arise from complex experiments, involving, for instance, cell sorting, low-input RNA or different batches (e.g., multiple sequencing centers or different read lengths); we refer to such typically unknown nuisance technical effects as unwanted variation. Removal of Batch effect is a crucial normalization step in the analysis of RNA-seq dat to remove confounding variability.http://www.nature.com/nbt/journal/v32/n9/fig_tab/nbt.2931_F2.htmlFILE: RSEM0.05_beforeBATCH.xls; 0.05_batch_sorted_PCA_dots.xlsx

Data Pre-processing

Page 6: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

GENES FILE:expression_genes_breast_conct_normalized_PCA.xlsx; ISOFORMS FILE: expression_isoform_nozero_breast_normalized_PCA_.xlsx

GENES: PC1:17.86%, PC2:17.65

ISOFORMS: PC1:19.65%, PC2:9.77%

Initial PCA of gene and isoform expression profiles using concatenated mouse and human genome did

not produce any meaningful results

SAMPLES

COMPONENTS

Further investigation resulted in identifying

Batch Effect across samples

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Estrogen Receptor

Estrogen Receptor

Estrogen Receptor Estrogen Receptor

Estrogen Receptor Estrogen Receptor

Estrogen Receptor

Triple Negative Triple Negative

Triple Negative

Triple Negative

Triple Negative Triple Negative Triple Negative

Triple Negative

Triple Negative

Triple Negative

Triple Negative

Triple Negative

Triple Negative

Triple Negative Estrogen Receptor

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Estrogen Receptor Estrogen Receptor

Estrogen Receptor

Estrogen Receptor

Estrogen Receptor

Estrogen Receptor

Estrogen Receptor

Triple Negative

Triple Negative

Triple Negative Triple Negative Triple Negative

Triple Negative

Triple Negative

Triple Negative

Triple Negative

Triple Negative Triple Negative

Triple Negative Triple Negative

Triple Negative Estrogen Receptor

A total of 40,266 ENSG- Human Genes were identified from RSEM concatenated results after removing all zeros. This table was used to create the following PCA, which shows only the human gene expression from the RSEM table. We can see good separation between the Triple negative subtype and ER+ subtype with sample ERR1084766 as an outlier in both genes and isoforms.

Page 7: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

PCA of Human Genes and Isoforms (Tumor)

ERR1084802

Triple_NEG ER+ HER2+

Genes

A total of 40,266 ENSG- Human Genes were identified from RSEM concatenated results after removing all zeros. This table was used to create the following PCA, which shows only the human gene expression from the RSEM table. We can see good separation between the Triple negative subtype and ER+ subtype with sample ERR1084766 as an outlier in both genes and isoforms.

ERR1084810

ERR1084766

ERR1084802 Triple_NEG ER+ HER2+

Isoforms

PC1:22.16%, PC2:9.22%

PC1:12.38%, PC2:10.09%

ER+

ER+

Triple_NEG

Triple_NEG

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

ERR1084763

ERR1084766

ERR1084799

AthymicNude__Triple_NEG NOD_SCID_Triple_NEG Athymicnude__ER+ CB17_SCID_Triple_NEG NOD_SCID__ER+

Triple NEG

ER+

A total of 22,656 ENSMUSG Mouse Genes were identified from RSEM concatenated results after removing all zeros. This table was used to create the following PCA. This represents the stroma expression in the mouse in the PDX model, this PCA is labeled to show the different mouse species used in the study. Athymic nude mice lacks the thymus and is unable to produce T-cells. The SCID mice: both have a combined immunodeficiency. This mouse species is characterized by the absence of the thymus , mutant B lymphocytes and no T cell function.

PCA of Mouse Genes (Stroma)

Page 8: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

Factor Regression Analysis

In order to select genes and isoforms that are affected by tumor type (Factor A) and/or mouse types (Factor B), we can apply Factor Regression Analysis to gene and isoform expression tables. One can select mouse genes under the influence of tumor type and human genes/isoforms that are under the influence of mouse type. Thus, we can select a number of genomic elements that are involved in the interplay of tumor and stroma.

A0B0 Triple Neg/ Athymic Nude

A0B1 Triple Neg-/SCID

A1B0 ER+/ Athymic Nude

A1B1 ER+/ SCID

Factor Regression Analysis output table

Gene ID Expression Levels Factor Influence FTEST

Factor Analysis Output: out_expression_genes_breast_cont_afterbatch_prefactor_changed.xlsx

Factor A: Triple Negative vs. ER+

Factor B: Athymic Mouse vs. SCID Mouse

Factor Table (2 factors, 2 levels each)

Page 9: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

Factor A (Triple Negative vs. ER+)

0

2

4

6

8

10

12

ER+Triple-Negative

SFRP1 has been associated with the TNBC subtype, expression shows high upregulation when compared with ER+ hormone positive samples. In our results, we can see that the SFRP1 gene is upregulated in the triple negative samples when compared to the ER+. (Influence of secreted frizzled receptor protein 1 (SFRP1) on neoadjuvant chemotherapy in triple negative breast cancer does not rely on WNT signaling. – Bernemann C et. al.)

Factor A: SFRP1_ENSG00000104332

0

2

4

6

8

10

12

ENSG00000160182_ENSG00000160182

TFF1 and TFF3 play a role in tumors under the effect of estrogen, thus the ER+ samples have an increased expression to breast cancer subtype TNBC, that are suppose to be negative for estrogen receptor.

http://www.neoplasia.com/article/S1476-5586(10)80022-3/pdfhttp://www.ncbi.nlm.nih.gov/pubmed/11919164

Factor A:TFF1&3: Estrogen-Regulated Proteins

0123456789

ENSG00000143556 ENSG00000120075 ENSG00000205076 ENSG00000170608

ENSMUSG00000060183 ENSG00000259610 ENSG00000251533 ENSMUSG00000022157

Factor B: ER- Athymic Nude vs. NOD SCID MICE ENSG00000143556 S100 calcium binding protein A7ENSG00000120075 homeobox B5ENSG00000205076 galectin 7ENSG00000170608 forkhead box A3

ENSG00000251533long intergenic non-protein coding RNA 605

ENSMUSG00000022157 mast cell protease 8ENSMUSG00000060183

chemokine (C-X-C motif) ligand 11

Chemokine (C-X-C motif): encodes a secretory protein that is a member of the CXC subfamily of chemokines., which recruit and activate leukocytes, classified by function (inflammatory or homeostatic) or by structure.

Page 10: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

Cancer-Type Specific Examples:

ENST is an LncRNA whose genes are located near the protein coding gene AFF3, the protein produced by this gene has been associated with the ENT/WNT pathway that is vital for cell migration and invasion.

LncRNA Relationship Coding Gene Transcript

Coding Gene Symbol

ENST00000434301

Natural antisense

ENST00000317233 AFF3

1_TN102_TN9

2_TN82_TN7

2_TN62_TN5

1_TN42_TN1

1_TN2

2_TN11

2_TN134_ER1

3_ER23_ER7

4_ER34_ER4

4_ER54_ER6

0

1

2

3

4

5

6

Lnc-RNA-ENST00000434301Triple-Negative

ER+

1_TN102_TN9

2_TN82_TN7

2_TN62_TN5

2_TN11_TN2

2_TN3

2_TN11

2_TN12

2_TN134_ER1

3_ER23_ER7

4_ER34_ER4

4_ER54_ER6

0123456789

10

AFF3-ENSG00000144218 ER+Triple-Nega-tive

ERR1084802 Triple_NEG

ERR1084801_Triple_NEG

ERR1084800_Triple_NEG

RR1084799_Triple_NEG

ERR1084798_Triple_NEG

ERR1084810_Triple_NEG

ERR1084809_Triple_NEG

ERR1084808_Triple_NEG

ERR1084804_Triple_NEG

ERR1084807_Triple_NEG

ERR1084768_Triple_NEG

ERR1084766_Triple_NEG

ERR1084775_ER+

ERR1084765_ER+

ERR1084811_ER+

ERR1084764_ER+

ERR1084763_ER+

ERR1084806_ER+

ERR1084805_ER+02468

CXorf61-ENST00000371894Triple-Negative ER+

CXorf61 is specific to Triple Negative Breast CancerWhen overlapping isoforms expressed by mouse and human, Cxorf61 was identified as

an outlier. CXorf61 fulfils the requirement of an ideal target for cancer immunotherapy as it is cancer-cell selective and expressed at a high frequency in TNBC tumors.

Page 11: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

0

2

4

6

8

10

12ENSG00000091831-Human Estrogen Receptor

ERR10

84802

_Trip

le_NEG

ERR10

8480

1_Tri

ple_N

EG

ERR10

8480

0_Tri

ple_N

EG

RR1084

799_

Triple

_NEG

ERR10

8480

4_Tri

ple_N

EG

ERR10

8480

3_Tri

ple_N

EG

ERR10

8481

0_Tri

ple_N

EG

ERR10

8480

9_Tri

ple_N

EG

ERR10

8480

8_Tri

ple_N

EG

ERR10

8480

7_Tri

ple_N

EG

ERR10

8479

8_Tri

ple_N

EG

ERR10

8476

8_Tri

ple_N

EG

ERR10

8476

6_Tri

ple_N

EG

ERR10

8476

5_ER

+

ERR10

8481

1_ER

+

ERR10

8476

4_ER

+

ERR10

8476

3_PE

_ER+

ERR10

8480

6_ER

+

ERR10

8477

5_ER

+

ERR10

8480

5_ER

+

ERR10

8476

7_HER

2012345

ENSG00000140009-Human Estrogen Receptor 2Demonstrates TNBC cancer does not have Erα protein expression, but does have ER β protein expression.

Second Look at Gene ExpressionDetermining if a Breast sample is TNBC is often done by protein expression values/ histology. It is not uncommon for the TNBC to have some gene expression for Human Estrogen Receptor, still when compared with the expression of the ER+ samples, a few could be considered ambiguous. These samples are labeled in red.

Page 12: Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models

Summary:• Mapping with a Concatenated (Human/Mouse) Genome; expression results in ~60% mapped to human genome and ~40% mapped with

mouse genome. • Human (Tumor) RSEM Breast Expression does separate by hormone subtypes, ER+ and triple negative samples show clear separation.

This trend of expression is in agreement with the publication. • Mouse (Stroma) RSEM Breast Expression doesn't have clear separation between either the tumor type or stroma type, but this needs to

be further investigated. • Factor Regression Analysis is useful to identify relationship between stroma and tumor gene and isoform expression.• When analyzing the hormone expression, a few of the triple negative samples did have higher than expected ESR1 expression. Overall,

the trend did still show significantly higher hormone expression in ER+ samples than triple negative. • This presentation gives a brief overview of the trends found in this data set and show a few examples of how the T-BioInfo platform can

be used with complex data sets to find meaningful results.

Author Results:• Focus on comparison of their PDX models with other clinical data sets as a validation for the validity of the PDX models. • Identified expression differences between the breast subtypes (triple negative and ER+) • Comparison of their samples with other clinical samples demonstrated that key markers of breast cancer can be observed with these

PDX models and highlights the difference of expression when stromal recruitment is accounted for. • The authors struggled to identify any significant association between mouse gender and tumor stage and only small association with

mouse strain, with clusters specific for athymic nude mice.

Files: https://www.dropbox.com/sh/mtat4tjdj3f2rcy/AACGks1itR_spMydQ5Dpq5k2a?dl=0