alternative splicing: a playground of evolution

40
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems

Upload: brandice-james

Post on 01-Jan-2016

37 views

Category:

Documents


6 download

DESCRIPTION

Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems. Alternative splicing of human (and mouse) genes. Evolution of alternative exon-intron structure human-mouse - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Alternative splicing:  A playground of evolution

Alternative splicing: A playground of evolution

Mikhail Gelfand

Research and Training Center for Bioinformatics

Institute for Information Transmission Problems

Page 2: Alternative splicing:  A playground of evolution

Alternative splicing of human(and mouse) genes

5% Sharp, 1994 (Nobel lecture)

35% Mironov-Fickett-Gelfand, 1999

38% Brett-…-Bork, 2000 (ESTs/mRNA)

22% Croft et al., 2000 (ISIS database)

55% Kan et al., 2001 (11% AS patterns conserved in mouse ESTs)

42% Modrek et al., 2001 (HASDB)

~33% CELERA, 2001

59% Human Genome Consortium, 2001

28% Clark and Thanaraj, 2002

all? Kan et al., 2002 (17-28% with total minor isoform frequency > 5%)

41% (mouse) FANTOM & RIKEN, 2002

60% (mouse) Zavolan et al., 2003

Page 3: Alternative splicing:  A playground of evolution

• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse– human SNPs

• Alternative splicing and protein structure

Page 4: Alternative splicing:  A playground of evolution

Data and Methods (routine)

• known alternative splicing– HASDB (human, ESTs+mRNAs)– ASMamDB (mouse, mRNAs+genes)

• additional variants– UniGene (human and mouse EST clusters)

• complete genes and genomic DNA– GenBank (full-length mouse genes)– human genome

• TBLASTN (initial identification of orthologs: mRNAs against genomic DNA)

• BLASTN (human mRNAs against genome)• Pro-EST (spliced alignment, ESTs and mRNA against

genomic DNA)

Page 5: Alternative splicing:  A playground of evolution

• Pro-Frame (spliced alignment of proteins against genomic DNA)– confirmation of orthology:

• same exon-intron structure for at least one isoform• >70% identity over the entire protein length

– analysis of conservation of human alternative splicing in the mouse genome: align human protein to mouse genomic DNA; the isoform is conserved if• all exons or parts of exons are conserved• all sites are conserved

– same procedure for mouse proteins and human DNA

We do not require that the isoform is actually observed as mRNA or ESTs

Page 6: Alternative splicing:  A playground of evolution

166 gene pairs

42 84 40

human mouse

Known alternative splicing:

126 124

Page 7: Alternative splicing:  A playground of evolution

Elementary alternatives

Cassette exon

Alternative donor site

Alternative acceptor site

Retained intron

Page 8: Alternative splicing:  A playground of evolution

Human genes

mRNA EST

cons. non-cons. cons. non-cons.

Cassette exons 56 25 74 26

Alt. donors 18 7 16 10

Alt. acceptors 13 5 19 15

Retained introns 4 3 5 0

Total 96 30 114 51

Total genes 45 28 41 44

Conserved elementary alternatives: 69% (EST) - 76% (mRNA)

Genes with all isoforms conserved: 57 (45%)

Page 9: Alternative splicing:  A playground of evolution

Mouse genes

mRNA EST

cons. non-cons. cons. non-cons.

Cassette exons 70 5 39 9

Alt. donors 24 6 17 6

Alt. acceptors 15 6 16 9

Retained introns 8 7 10 4

Total 117 24 82 28

Total genes 68 22 30 26

Conserved elementary alternatives: 75% (EST) - 83% (mRNA)

Genes with all isoforms conserved: 79 (64%)

Page 10: Alternative splicing:  A playground of evolution

Real or aberrant non-conserved AS?

• 24-31% human vs. 17-25% mouse elementary alternatives are not conserved

• 55% human vs 36% mouse genes have at least one non-conserved variant

• denser coverage of human genes by ESTs: – pick up rare (tissue- and stage-specific) => younger

variants– pick up aberrant (non-functional) variants

• 17-24% mRNA-derived elementary alternatives are non-conserved (compared to 25-32% EST-derived ones)

Page 11: Alternative splicing:  A playground of evolution

Comparison to other studies.Modrek and Lee, 2003: skipped exons

• inclusion level is a good predictor of conservation– 98% constitutive exons are conserved– 98% major form exons are conserved– 28% minor form exons are conserved

• inclusion level of conserved exons in human and mouse is highly correlated

• Minor non-conserved form exons are errors? No:– minor form exons are supported by multiple ESTs– 28% of minor form exons are upregulated in one specific tissue– 70% of tissue-specific exons are not conserved– splicing signals of conserved and non-conserved exons are similar

Page 12: Alternative splicing:  A playground of evolution

• Evolution of alternative exon-intron structure – human-mouse

– Drosophila and Anopheles• Evolution of alternative splicing sites: MAGE-A family of CT

antigens• Evolutionary rate in constitutive and alternative regions

– human-mouse– human SNPs

• Alternative splicing and protein structure

Page 13: Alternative splicing:  A playground of evolution

Fruit fly and mosquito

• Technically more difficult than human-mouse:– incomplete genomes– difficulties in alignment, especially at gene

termini– changes in exon-intron structure irrespective of

alternative splicing (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

Page 14: Alternative splicing:  A playground of evolution

Methods

• Pro-Frame: Align Dme protein isoforms to Dps and Aga genes

• coding segments: regions in Dme genes between Dme intron shadows

• We follow the fate of Dme exons and coding segments in Dps and Aga genomes

• slices: regions between all exon-exon junctions (intron shadows) from all three genomes (Dme, Dps, Aga) mapped to Dme isoforms

• slice is conserved if it aligns with 35% identity

Page 15: Alternative splicing:  A playground of evolution

Conservation of coding segments

constitutive segments

alternative segments

D. melanogaster – D. pseudoobscura

97% 75-80%

D. melanogaster – Anopheles gambiae

77% ~45%

Page 16: Alternative splicing:  A playground of evolution

Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes

blue – exactgreen – divided exonsyellow – joined exonorange – mixedred – non-conserved

• retained introns are the least conserved

• mutually exclusive exons are as conserved as constitutive exons

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Page 17: Alternative splicing:  A playground of evolution

Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes

blue – exactgreen – divided exonsyellow – joined exonsorange – mixedred – non-conserved

• ~30% joined, ~10% divided exons (less introns in Aga)

• mutually exclusive exons are conserved exactly

• cassette exons are the least conserved

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

CONSTANTexon

Donor site Acceptor site Retained intron Cassette exon Exclusive exon

Page 18: Alternative splicing:  A playground of evolution

CG1517: cassette exon in Drosophila, alternative acceptor site in Anopheles

Dme, Dps

Aga

a)

Page 19: Alternative splicing:  A playground of evolution

CG31536: cassette exon in Drosophila, shorter cassette exon and alternative

donor site in Anopheles

Dme, Dps

Aga

Page 20: Alternative splicing:  A playground of evolution

CG1587: alternative acceptor site in Drosophila, candidate retained intron

in intronless gene of Anopheles

Dme

Aga

Dps

Page 21: Alternative splicing:  A playground of evolution

• Evolution of alternative exon-intron structure

– human-mouse

– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse

– human SNPs

• Alternative splicing and protein structure

Page 22: Alternative splicing:  A playground of evolution

Alternative splicing in a multigene family: the MAGEA family of

cancer/testis specific antigens

• A locus at the X chromosome containing eleven recently duplicated genes: two subfamilies of four genes each and three single genes

• Retrogene: one protein-coding exon, multiple different 5’-UTR exons

• Mutations create new splicing sites or disrupt existing sites

Page 23: Alternative splicing:  A playground of evolution

Birth of donor sites (new GT in alternative intial exon 5)

Ancestral gene: GCCAGGCACGCGGATCCTGACGTTCACATCTAGGGCTMAGEA3 GCCAGGCACGTGAGTCCTGAGGTTCACATCTACGGCTMAGEA6 GCCAGGCACGTGAGTCCTGAGGTTCACATCTACGGCTMAGEA2 GCCAAGCACGCGGATCCTGACGTTCACATGTACGGCTMAGEA12 GCCAAGCACGCGGATCCTGACGTTCACATCTGTGGCTMAGEA1 GCCAGGCACTCGGATCTTGACGTCCCCATCCAGGGCTMAGEA4 --CAGGCACTCGGATCTTGACATCCACATCGAGGGCTMAGEA5 GACAGGCACACCCATTCTGACGTCCACATCCAGGGCT

Page 24: Alternative splicing:  A playground of evolution

Birth of an acceptor site (new AG and polyY tract in

MAGEA8-specific cassette exon 3)

MAGEA3 TTGAGGGTACC-----------CCTGGGA---CAGAATGCGGAMAGEA6 TTGAGGGTACC-----------CCTGGGA---CAGAATGCGGAMAGEA2 TTGAGGGTACT-----------CCTGGGC---CAGAATGCAGAMAGEA12 TTGAGGGTACC-----------CCTGGGC---CAGAACGCTGAMAGEA1 CTGAGGGTACC-----------CCAGGAC---CAGAACACTGAMAGEA4 TTGAGGGTACC-----------ACAGGGC---CAGAACGCAGAMAGEA5 TTGAGGGCACC-----------CTTGGGC---CAGAACACAGAMAGEA8 TTGAGGGTACCCTCGATGGTTCTCCTAGCAGGCAAAAAACAGAMAGEA9 TCGAGGGTACC-----------TCCAGGC---CAGAGAAACTCMAGEA10 CTGAGGGTACC-----------CCCAGCC---CATAACACAGAMAGEA11 TTGAGGGTTCC-----------TCCTGGC---CAGAACACAGA

Page 25: Alternative splicing:  A playground of evolution

Birth of an alternative donor site (enhanced match to the consensus (AG)

in cassette exon 2)

Ancestral gene: GAGCTCCAGGAACmAGGCAGTGAGGCCTTGGTCTGMAGEA3 GAGCTCCAGGAACAAGGCAGTGAGGACTTGGTCTGMAGEA6 GAGCTCCAGGAACAAGGCAGTGAGGACTTGGTCTGMAGEA2 GAGCTCCAGGAACCAGGCAGTGAGGCCTTGGTCTGMAGEA12 GAGTTCCAAGAACAAGGCAGTGAGGCCTTGGTCTGMAGEA1 GAGCTCCAGGAACCAGGCAGTGAGGCCTTGGTCTGMAGEA4 GAGCTCCAGGAACAAGGCAGTGAGGCCTTGGTCTGMAGEA5 GAGCTCCAGGAAACAGACACTGAGGCCTTGGTCTGMAGEA8 GAGCTCCAGGAACCAGGCTGTGAGGTCTTGGTCTGMAGEA9 GAGCTCCAGGAA----GCAGGCAGGCCTTGGTCTGMAGEA10 GAGCTCCAGGGACTGTGAGGTGAGGCCTTGGTCTAMAGEA11 AAGCTCCAAAAACTGAGCAGTGAGGCCTTGGTCTC

Page 26: Alternative splicing:  A playground of evolution

Birth of an alternative acceptor site (enhanced polyY tract in cassette exon 4)

Ancestral gene: AGGGGCCCCCATGTGGTCGACAGACACAGTGGMAGEA3 AGGGGCCCCTATGTGGTGGACAGATGCAGTGGMAGEA6 AGGGGCCCCTATGTGGTGGACAGATGCAGTGGMAGEA2 AGGGGCCCCCATCTGGTCGACAGATGCAGTGGMAGEA12 AGGGGCCCCCATGTAGTCGACAGACACAGTGGMAGEA1 AGGGACCCCCATCTGGTCTAAAGACAGAGCGGMAGEA4 AGGGACCCCCATCTGGTCTACAGACACAGTGGMAGEA5 AGGGGCCCCCATCTGGTGGATAGACAGAGTGGMAGEA8 AGGGACCCCCATGTGGGCAACAGACTCAGTGGMAGEA9 AGGGAGGCCC-TGTGTTCGACAGACACAGTGGMAGEA10 AGGGAACCCC-TCTTTTCTACAGACACAGTGGMAGEA11 AAAGAGCCCCATATGGTCCACAACTACAGTGG

Page 27: Alternative splicing:  A playground of evolution

• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse– human SNPs

• Alternative splicing and protein structure

Page 28: Alternative splicing:  A playground of evolution

Concatenates of constitutive and alternative regions in all genes: different evolutionary rates

Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end

0,176

0,1990,187

0,301

0,00

0,10

0,20

0,30

Constitutive N-endalternative

Internalalternative

C-endalternative

dN/d

S

0,8860,874 0,878

0,807

0,7

0,8

0,9

Constitutive N-endalternative

Internalalternative

C-endalternative

Amin

o-ac

id id

entit

y• Relatively more non-synonimous

substitutions in alternative regions (higher dN/dS ratio)

• Less amino acid identity in alternative regions

Page 29: Alternative splicing:  A playground of evolution

Individual genes: the rate of non-synonymous to synonymous substitutions dn/ds tends to be larger

in alternative regions (vertical acis) than in constitutive regions (horizontal acis)

0 .0 0 1 0 .0 1 0 .1 1 1 0

0 .0 0 1

0 .0 1

0 .1

1

1 0

С

A

Page 30: Alternative splicing:  A playground of evolution

dn/ds (con) – dn/ds (alt)

N-terminal regions

complete genes

internal regions

C-terminal regions

Page 31: Alternative splicing:  A playground of evolution

• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse

– human SNPs• Alternative splicing and protein structure

Page 32: Alternative splicing:  A playground of evolution

Na/Ns (alternative) > Na/Ns (constitutive)for all evidence levels

0,7

0,8

0,9

1

1,1

1,2

1,3

1,4

EST-1 EST-2 EST-3 EST-4 EST-5 mRNA protein

Na/

Ns

const

alt

average(Zhaoet al.)

Page 33: Alternative splicing:  A playground of evolution

• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles

• Evolution of alternative splicing sites: MAGE-A family of CT antigens

• Evolutionary rate in constitutive and alternative regions– human-mouse

– human SNPs

• Alternative splicing and protein structure

Page 34: Alternative splicing:  A playground of evolution

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

Alternative splicing avoids disrupting domains (and non-domain units)

Control:

fix the domain structure; randomly place alternative regions

Page 35: Alternative splicing:  A playground of evolution

… and this is not simply a consequence of the (disputed) exon-domain correlation

0

1

Ra

tio

(ob

serv

ered

/ex

pec

ted

)

Mouse Human Mouse Human Mouse Human

nonAS_Exons AS_Exons AS

AS&Exon boundaries and SMART domains

inside domains

outside domains

Page 36: Alternative splicing:  A playground of evolution

Positive selection towards domain shuffling (not simply avoidance of disrupting domains)

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

b)

Domains completely

Non-domain units

completely

No annotated

units affected

Expected Observed

Page 37: Alternative splicing:  A playground of evolution

Short (<50 aa) alternative splicing events within domains target protein functional sites

a)

6%

10%

15%

37%

40%

34%

21%

19%

6%13%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Expected Observed

Non-domain functional units partially

Domains partially

No annotated unit affected

Non-domain functional units completely

Domains completely

c)

Prosite

patterns

unaffected

Prosite

patterns

affected

FT

positions

unaffected

FT

positions

affected

Expected Observed

Page 38: Alternative splicing:  A playground of evolution

An attempt of integration• AS is often young (as opposed to degenerating)• young AS isoforms are often minor and tissue-specific• … but still functional

– although unique isoforms may be result of aberrant splicing

• AS often arises from duplication of exons• … or point mutations creating splicing sites• … or intron insertions• AS regions show evidence for positive selection

– excess non-synonymous and damaging SNPs– excess non-synonymous codon substitutions

• AS tends to shuffle exons and target functional sites in proteins

• Thus AS may serve as a testing ground for new functions without sacrificing old ones

Page 39: Alternative splicing:  A playground of evolution

Acknowledgements

• Discussions– Vsevolod Makeev (GosNIIGenetika)– Eugene Koonin (NCBI)– Igor Rogozin (NCBI)– Dmitry Petrov (Stanford)– Dmitry Frishman (GSF, TUM)

• Data– King Jordan (NCBI)

• Support– Ludwig Institute of Cancer Research– Howard Hughes Medical Institute– Russian Academy of Sciences

(program “Molecular and Cellular Biology”)– Russian Fund of Basic Research

Page 40: Alternative splicing:  A playground of evolution

Authors

• Andrei Mironov (Moscow State University) – spliced alignment• Ramil Nurtdinov (Moscow State University) – human/mouse,

data• Irena Artamonova (GSF/MIPS) – human/mouse, MAGE-A• Dmitry Malko (GosNIIGenetika, Moscow) –

mosquito/drosophila• Ekaterina Ermakova (Moscow State University) –

evolution of alternative/constitutive regions• Vasily Ramensky (Institute of Molecular Biology, Moscow) –

SNPs• Shamil Sunyaev (EMBL, now Harvard University Medical

School) – protein structure • Eugenia Kriventseva (EBI, now EMBL) – protein structure