identification of fusion transcripts with retroviral elements and its application as a cancer...

1
Identification of fusion transcripts with retroviral elements and its application as a cancer biomarker Yun-Ji Kim 1 , Jae-Won Huh 2 , Dae-Soo Kim 3 , Hong-Seok Ha 1 , Kung Ahn 1 , Ja-Rang Lee 1 , Yi-De un Jung 1 , and Heui-Soo Kim 1 1 Division of Biological Sciences, College of Natural Sciences, Pusan National University, Busan 609-735, Republic of Korea 2 National Primate Research Center (NPRC), KRIBB, Ochang, Chungbuk 363-883, Republic of Korea 3 Korea Bioinformation Center, KRIBB, Daejeon 305-806, Korea http://www.primate.or.kr Abstract Introduction Materials & Methods Results Referenc es The human genome is estimated to be composed of 45% transp osable elements (TEs). They have been reported to have capaci ty for affecting adjacent genes by altering transcriptional r egulation. Most TEs are transcriptionally silent in normal ti ssues. However, TEs have been found to be expressed specifica lly in cancer cell lines. Here we investigated the cancer spe cific fusion transcript with TEs using bioinformatics and exp erimental approaches. To identify the candidate cancer marker s, we adopted an analysis pipeline for screening methods to d etect cancer-specific expression from expressed human sequenc es and developed a database. Total 999 genes fused with trans posable elements were found to be cancer-specific in our anal ysis of the EST database. To confirm the candidate marker tra nscripts, experimental validation was conducted by RT-PCR ana lysis in tumor/adjacent normal tissues and corresponding canc er cell lines. Our results could contribute greatly to unders tand the human cancers in relation to transposable element...........................……...…... 1.Kim TH, Jeon YJ, Kim WY, Kim HS: HESAS: HERVs expression and structure analysis sy stem. Bioinformatics 2005, 15:1699-1970. 2. Kim DS, Kim TH, Huh JW, Kim IC, Kim SW, Park HS, Kim HS : LINE FUSION GENES: a da tabase of LINE expression in human genes. BMC Genomic 2006, 7:139 Hypothetical model for retroelements in human genome Promoter region 1 exon Transcription change Supplying the Promoter or Enhancer 1 exon 2 exon Exonization in UTR a nd CDS region Alternative Promoter 1 exon 2 exon Alternative Polyaden ylation last exon Retroelement Retroposon SINE Retrotransposon LINE RNA intermediate - LTR element + LTR element - env + env - RT + RT Yeast Ty1/copia/truncated HERVs LTR ORF1 ORF2 LTR LTR LTR Human THE1 P Poly(A) Human Alu ORF1 ORF2 P Poly(A) L1 gag pol env LTR LTR Full-length HERVs/exogenous retrovirus Retrovirus 11% 82% 6% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 5′UTR CDS 3′UTR Location oftransposable elem ents fusion EST Percent of exons% 13.6% 3.8% 1.6% 0.7% 0.2% 0.1% 0.1% 79.8% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 1 2 3 4 5 6 11 17 Transposable elem entfuion EST counts G enes% Aims Most of TEs are tranScr- iptionally silent in human normal tissues, however, so me of TEs have been found t o be expressed in placenta tissues and cancer cell lin es. The L1 antisense promot er-driven transcription has been detected in human tumo r cells or normal ones, whi le HERV LTR elements have s hown the bidirectional prom oter activity (Medstrand et al., 2001; Nigumann et al., 20 02; Dunn et al., 2003; Sin et al., 2006). Those elements c ould provide biological rol e of organismal complexity by transcriptional diversit y (Landry et al., 2003). He re, we developed a database for understanding the mecha nism of cancer develop- ment in relation to TEs in human ESTsequences, and co nducted experiemental valid ation using RT-PCR in tumor /adjacent normal tissues an d corresponding cancer cell lines to confirm the candidate marker transcript s. RT-PCR & Real-time PCR Bioinformatics NCBI,BLAST,MEGA3 Transposable elements fusion region within genesSINE Family LINE Family LTR Family DNA Family Others CDS 619 280 85 76 1 76 30 33 5 0 3 UTR 44 20 14 5 0 Transposable elements Table. Distribution of transposable element family in region of transposable element exonization 5 UTR AKR1C2 aldo-keto reductase family 1, memb er C2 Chr.10 p15. 1 NM_2058453.1 NM_001354.4 CB106780 1 10 11 1 LTR/MaLR MLT1L LINE/L1 LTR/MaLR MSTA 30 cycle 32cycle 34 cycle liver (N) liver (C ) liver (N) liver (C) liver (N) liver (C) 300 bp GAPDH 1 20 bp NM_004817.2 NM_201629.1 AW604158 Chr. 9 q21.11 1 23 AluJo/FRAM Coding region Untranslated region colon (N) colon (C) colon (N) colon (C) tight junction protein 2 (zona occlud ens 2) TJP 2 168 bp GAPDH 1 20 bp 1 21 Transposable elements fusion region within genes SINE Family LINE Family LTR Family DNA Family Others CDS 619 280 85 76 1 5 UTR 76 30 33 5 0 3′UTR 44 20 14 5 0 Transposable elements Table. Distribution of transposable element family in region of transposable element exonization Type of potential splicing site SINE Family LINE Family LTR Family DNA Family Accept&Donor 83 68 50 12 Accept Site 271 110 33 28 Donor Site 216 80 43 18 Transposable elements Table. Potential splice site are utilized by transposable elements fusion exons Family Subfamily Alu 20 1.44 AluJ 171 12.35 AluS 244 17.62 MIR 250 18.05 FAM 2 0.14 FRAM 18 1.30 FLAM 37 2.67 HAL 13 0.94 L1HS 1 0.07 L1P 18 1.30 L1M 15 3 11.05 L2 151 10.90 L3 25 1.81 MaLR 67 4.84 ERV1 40 2.89 ERVL 27 1.95 ERVK 6 0.43 Charlie 9 0.65 HSMAR2 2 0.14 Kanga1 1 0.07 MARNA 3 0.22 MER 61 4.40 Tigger 14 1.01 Zaphod2 1 0.07 Others Charlie 1 0.07 SINE LINE LTR DNA Transposable elements Occurrences Percent (%) 5UTR CDS 3U TR Alu 0 20 0 AluJ 20 131 12 AluS 13 190 15 AluY 3 37 5 MIR 33 198 7 FA M 0 2 0 FRAM 0 16 2 FLA M 7 25 3 HAL 0 11 0 L1H S 0 1 0 L1P 1 12 5 L1M 6 125 6 L2 22 111 7 L3 1 20 2 M aLR 16 40 6 ERV1 13 23 3 ERVL 4 16 5 ERVK 0 6 0 Charlie 0 9 0 HSM AR2 0 2 0 K anga1 0 0 1 M ARNA 0 3 0 M ER 5 50 3 Tigger 0 11 1 Zaphod2 0 1 0 O thers Charlie 0 1 0 Transposable elem entsfusion in gene region DNA Fam ily LTR Fam ily LIN E Fam ily SIN E Fam ily Family Subfamily Experimental data tumor/adjacent normal tissues DATABASE Computational data

Upload: wendy-gilbert

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification of fusion transcripts with retroviral elements and its application as a cancer biomarker Yun-Ji Kim 1, Jae-Won Huh 2, Dae-Soo Kim 3, Hong-Seok

Identification of fusion transcripts with retroviral elements and its application as a cancer biomarker Yun-Ji Kim1, Jae-Won Huh2, Dae-Soo Kim3, Hong-Seok Ha1, Kung Ahn1, Ja-Rang Lee1, Yi-Deun Jung1, and Heui-Soo

Kim1

1 Division of Biological Sciences, College of Natural Sciences, Pusan National University, Busan 609-735, Republic of Korea2 National Primate Research Center (NPRC), KRIBB, Ochang, Chungbuk 363-883, Republic of Korea 3 Korea Bi

oinformation Center, KRIBB, Daejeon 305-806, Korea http://www.primate.or.kr

Abstract

Introduction

Materials & Methods

Results

References

The human genome is estimated to be composed of 45% transposable elements (TEs). They have been reported to have capacity for affecting adjacent genes by altering transcriptional regulation. Most TEs are transcriptionally silent in normal tissues. However, TEs have been found to be expressed specifically in cancer cell lines. Here we investigated the cancer specific fusion transcript with TEs using bioinformatics and experimental approaches. To identify the candidate cancer markers, we adopted an analysis pipeline for screening methods to detect cancer-specific expression from expressed human sequences and developed a database. Total 999 genes fused with transposable elements were found to be cancer-specific in our analysis of the EST database. To confirm the candidate marker transcripts, experimental validation was conducted by RT-PCR analysis in tumor/adjacent normal tissues and corresponding cancer cell lines. Our results could contribute greatly to understand the human cancers in relation to transposable element.……..........................……...…...

1.Kim TH, Jeon YJ, Kim WY, Kim HS: HESAS: HERVs expression and structure analysis system. Bioinformatics 2005, 15:1699-1970.

2. Kim DS, Kim TH, Huh JW, Kim IC, Kim SW, Park HS, Kim HS : LINE FUSION GENES: a database of LINE expression in human genes. BMC Genomic 2006, 7:139

Hypothetical model for retroelements in human genome

Promoter region

1 exon

Transcription change

Supplying the Promoter or Enhancer

1 exon 2 exonExonization in UTR and C

DS region

Alternative Promoter1 exon 2 exon

Alternative Polyadenylationlast exon

Retroelement

Retroposon

SINE

Retrotransposon

LINE

RNA intermediate

- LTR element + LTR element

- env + env

- RT + RT

Yeast Ty1/copia/truncated HERVsLTR ORF1 ORF2 LTR

LTR LTR

Human THE1

PPoly(A)

Human Alu

ORF1 ORF2PPoly(A)L1

gag pol envLTR LTR

Full-length HERVs/exogenous retrovirus

Retrovirus

11%

82%

6%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

5′UTR CDS 3′UTR

Location of transposable elements fusion EST

Perc

ent o

f exo

ns %

13.6%

3.8%1.6% 0.7% 0.2% 0.1% 0.1%

79.8%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

1 2 3 4 5 6 11 17

Transposable element fuion EST counts

Gen

es %

Aims Most of TEs are tranScr-iptionally silent in human normal tissues, however, some of TEs have been found to be expressed in placenta tissues and cancer cell lines. The L1 antisense promoter-driven transcription has been detected in human tumor cells or normal ones, while HERV LTR elements have shown the bidirectional promoter activity (Medstrand et al., 2001; Nigumann et al., 2002; Dunn et al., 2003; Sin et al., 2006). Those elements could provide biological role of organismal complexity by transcriptional diversity (Landry et al., 2003). Here, we developed a database for understanding the mechanism of cancer develop-ment in relation to TEs in human ESTsequences, and conducted experiemental validation using RT-PCR in tumor/adjacent normal tissues and corresponding cancer cell lines to confirm thecandidate marker transcripts.

RT-PCR & Real-time PCR

Bioinformatics

NCBI,BLAST,MEGA3

Transposable elements

fusion region within genes SINE Family LINE Family LTR Family DNA Family Others

CDS 619 280 85 76 1

76 30 33 5 03′UTR 44 20 14 5 0

Transposable elementsTable. Distribution of transposable element family in region of transposable element exonization

5′UTR

AKR1C2aldo-keto reductase family 1, member C2

Chr.10

p15.1

NM_2058453.1

NM_001354.4

CB106780

1 10

111

LTR/MaLR MLT1L LINE/L1 LTR/MaLR MSTA 30 cycle 32cycle 34 cycle

liver(N

)

liver(C

)

liver(N

)

liver(C

)

liver(N

)

liver(C

)

300 bp

GAPDH 120 bp

NM_004817.2

NM_201629.1

AW604158

Chr.9

q21.11

1 23

AluJo/FRAM Coding region Untranslated regioncolon(N)

colon(C)

colon(N)

colon(C)

tight junction protein 2 (zona occludens 2)TJP2

168 bp

GAPDH 120 bp

1 21

Transposable elements

fusion region within genes SINE Family LINE Family LTR Family DNA Family Others

CDS 619 280 85 76 15 ′UTR 76 30 33 5 03 ′UTR 44 20 14 5 0

Transposable elements

Table. Distribution of transposable element family in region of transposable element exonization

Type of

potential splicing site SINE Family LINE Family LTR Family DNA Family

Accept&Donor 83 68 50 12Accept Site 271 110 33 28Donor Site 216 80 43 18

Transposable elements

Table. Potential splice site are utilized by transposable elements fusion exons

Family SubfamilyAlu 20 1.44AluJ 171 12.35AluS 244 17.62MIR 250 18.05FAM 2 0.14FRAM 18 1.30FLAM 37 2.67HAL 13 0.94L1HS 1 0.07L1P 18 1.30L1M 15

311.05

L2 151 10.90L3 25 1.81

MaLR 67 4.84ERV1 40 2.89ERVL 27 1.95ERVK 6 0.43Charlie 9 0.65

HSMAR2 2 0.14Kanga1 1 0.07MARNA 3 0.22MER 61 4.40Tigger 14 1.01Zaphod2 1 0.07

Others Charlie 1 0.07

SINE

LINE

LTR

DNA

Transposable elementsOccurrences Percent (%)5UTR CDS 3UTR

Alu 0 20 0

AluJ 20 131 12

AluS 13 190 15

AluY 3 37 5

MIR 33 198 7

FAM 0 2 0

FRAM 0 16 2

FLAM 7 25 3

HAL 0 11 0

L1HS 0 1 0

L1P 1 12 5

L1M 6 125 6

L2 22 111 7

L3 1 20 2

MaLR 16 40 6

ERV1 13 23 3

ERVL 4 16 5

ERVK 0 6 0

Charlie 0 9 0

HSMAR2 0 2 0

Kanga1 0 0 1

MARNA 0 3 0

MER 5 50 3

Tigger 0 11 1

Zaphod2 0 1 0

Others Charlie 0 1 0

Transposable elements fusion in gene region

DNA Family

LTR Family

LINE Family

SINE Family

Family Subfamily

Experimental data

tumor/adjacent normal tissues

DATABASE

Computational data