alternative splicing: a playground of evolution
DESCRIPTION
Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems. Alternative splicing of human (and mouse) genes. Evolution of alternative exon-intron structure human-mouse - PowerPoint PPT PresentationTRANSCRIPT
Alternative splicing: A playground of evolution
Mikhail Gelfand
Research and Training Center for Bioinformatics
Institute for Information Transmission Problems
Alternative splicing of human(and mouse) genes
5% Sharp, 1994 (Nobel lecture)
35% Mironov-Fickett-Gelfand, 1999
38% Brett-…-Bork, 2000 (ESTs/mRNA)
22% Croft et al., 2000 (ISIS database)
55% Kan et al., 2001 (11% AS patterns conserved in mouse ESTs)
42% Modrek et al., 2001 (HASDB)
~33% CELERA, 2001
59% Human Genome Consortium, 2001
28% Clark and Thanaraj, 2002
all? Kan et al., 2002 (17-28% with total minor isoform frequency > 5%)
41% (mouse) FANTOM & RIKEN, 2002
60% (mouse) Zavolan et al., 2003
• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles
• Evolution of alternative splicing sites: MAGE-A family of CT antigens
• Evolutionary rate in constitutive and alternative regions– human-mouse– human SNPs
• Alternative splicing and protein structure
Data and Methods (routine)
• known alternative splicing– HASDB (human, ESTs+mRNAs)– ASMamDB (mouse, mRNAs+genes)
• additional variants– UniGene (human and mouse EST clusters)
• complete genes and genomic DNA– GenBank (full-length mouse genes)– human genome
• TBLASTN (initial identification of orthologs: mRNAs against genomic DNA)
• BLASTN (human mRNAs against genome)• Pro-EST (spliced alignment, ESTs and mRNA against
genomic DNA)
• Pro-Frame (spliced alignment of proteins against genomic DNA)– confirmation of orthology:
• same exon-intron structure for at least one isoform• >70% identity over the entire protein length
– analysis of conservation of human alternative splicing in the mouse genome: align human protein to mouse genomic DNA; the isoform is conserved if• all exons or parts of exons are conserved• all sites are conserved
– same procedure for mouse proteins and human DNA
We do not require that the isoform is actually observed as mRNA or ESTs
166 gene pairs
42 84 40
human mouse
Known alternative splicing:
126 124
Elementary alternatives
Cassette exon
Alternative donor site
Alternative acceptor site
Retained intron
Human genes
mRNA EST
cons. non-cons. cons. non-cons.
Cassette exons 56 25 74 26
Alt. donors 18 7 16 10
Alt. acceptors 13 5 19 15
Retained introns 4 3 5 0
Total 96 30 114 51
Total genes 45 28 41 44
Conserved elementary alternatives: 69% (EST) - 76% (mRNA)
Genes with all isoforms conserved: 57 (45%)
Mouse genes
mRNA EST
cons. non-cons. cons. non-cons.
Cassette exons 70 5 39 9
Alt. donors 24 6 17 6
Alt. acceptors 15 6 16 9
Retained introns 8 7 10 4
Total 117 24 82 28
Total genes 68 22 30 26
Conserved elementary alternatives: 75% (EST) - 83% (mRNA)
Genes with all isoforms conserved: 79 (64%)
Real or aberrant non-conserved AS?
• 24-31% human vs. 17-25% mouse elementary alternatives are not conserved
• 55% human vs 36% mouse genes have at least one non-conserved variant
• denser coverage of human genes by ESTs: – pick up rare (tissue- and stage-specific) => younger
variants– pick up aberrant (non-functional) variants
• 17-24% mRNA-derived elementary alternatives are non-conserved (compared to 25-32% EST-derived ones)
Comparison to other studies.Modrek and Lee, 2003: skipped exons
• inclusion level is a good predictor of conservation– 98% constitutive exons are conserved– 98% major form exons are conserved– 28% minor form exons are conserved
• inclusion level of conserved exons in human and mouse is highly correlated
• Minor non-conserved form exons are errors? No:– minor form exons are supported by multiple ESTs– 28% of minor form exons are upregulated in one specific tissue– 70% of tissue-specific exons are not conserved– splicing signals of conserved and non-conserved exons are similar
• Evolution of alternative exon-intron structure – human-mouse
– Drosophila and Anopheles• Evolution of alternative splicing sites: MAGE-A family of CT
antigens• Evolutionary rate in constitutive and alternative regions
– human-mouse– human SNPs
• Alternative splicing and protein structure
Fruit fly and mosquito
• Technically more difficult than human-mouse:– incomplete genomes– difficulties in alignment, especially at gene
termini– changes in exon-intron structure irrespective of
alternative splicing (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)
Methods
• Pro-Frame: Align Dme protein isoforms to Dps and Aga genes
• coding segments: regions in Dme genes between Dme intron shadows
• We follow the fate of Dme exons and coding segments in Dps and Aga genomes
• slices: regions between all exon-exon junctions (intron shadows) from all three genomes (Dme, Dps, Aga) mapped to Dme isoforms
• slice is conserved if it aligns with 35% identity
Conservation of coding segments
constitutive segments
alternative segments
D. melanogaster – D. pseudoobscura
97% 75-80%
D. melanogaster – Anopheles gambiae
77% ~45%
Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes
blue – exactgreen – divided exonsyellow – joined exonorange – mixedred – non-conserved
• retained introns are the least conserved
• mutually exclusive exons are as conserved as constitutive exons
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
CONSTANTexon
Donor site Acceptor site Retained intron Cassette exon Exclusive exon
Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes
blue – exactgreen – divided exonsyellow – joined exonsorange – mixedred – non-conserved
• ~30% joined, ~10% divided exons (less introns in Aga)
• mutually exclusive exons are conserved exactly
• cassette exons are the least conserved
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
CONSTANTexon
Donor site Acceptor site Retained intron Cassette exon Exclusive exon
CG1517: cassette exon in Drosophila, alternative acceptor site in Anopheles
Dme, Dps
Aga
a)
CG31536: cassette exon in Drosophila, shorter cassette exon and alternative
donor site in Anopheles
Dme, Dps
Aga
CG1587: alternative acceptor site in Drosophila, candidate retained intron
in intronless gene of Anopheles
Dme
Aga
Dps
• Evolution of alternative exon-intron structure
– human-mouse
– Drosophila and Anopheles
• Evolution of alternative splicing sites: MAGE-A family of CT antigens
• Evolutionary rate in constitutive and alternative regions– human-mouse
– human SNPs
• Alternative splicing and protein structure
Alternative splicing in a multigene family: the MAGEA family of
cancer/testis specific antigens
• A locus at the X chromosome containing eleven recently duplicated genes: two subfamilies of four genes each and three single genes
• Retrogene: one protein-coding exon, multiple different 5’-UTR exons
• Mutations create new splicing sites or disrupt existing sites
Birth of donor sites (new GT in alternative intial exon 5)
Ancestral gene: GCCAGGCACGCGGATCCTGACGTTCACATCTAGGGCTMAGEA3 GCCAGGCACGTGAGTCCTGAGGTTCACATCTACGGCTMAGEA6 GCCAGGCACGTGAGTCCTGAGGTTCACATCTACGGCTMAGEA2 GCCAAGCACGCGGATCCTGACGTTCACATGTACGGCTMAGEA12 GCCAAGCACGCGGATCCTGACGTTCACATCTGTGGCTMAGEA1 GCCAGGCACTCGGATCTTGACGTCCCCATCCAGGGCTMAGEA4 --CAGGCACTCGGATCTTGACATCCACATCGAGGGCTMAGEA5 GACAGGCACACCCATTCTGACGTCCACATCCAGGGCT
Birth of an acceptor site (new AG and polyY tract in
MAGEA8-specific cassette exon 3)
MAGEA3 TTGAGGGTACC-----------CCTGGGA---CAGAATGCGGAMAGEA6 TTGAGGGTACC-----------CCTGGGA---CAGAATGCGGAMAGEA2 TTGAGGGTACT-----------CCTGGGC---CAGAATGCAGAMAGEA12 TTGAGGGTACC-----------CCTGGGC---CAGAACGCTGAMAGEA1 CTGAGGGTACC-----------CCAGGAC---CAGAACACTGAMAGEA4 TTGAGGGTACC-----------ACAGGGC---CAGAACGCAGAMAGEA5 TTGAGGGCACC-----------CTTGGGC---CAGAACACAGAMAGEA8 TTGAGGGTACCCTCGATGGTTCTCCTAGCAGGCAAAAAACAGAMAGEA9 TCGAGGGTACC-----------TCCAGGC---CAGAGAAACTCMAGEA10 CTGAGGGTACC-----------CCCAGCC---CATAACACAGAMAGEA11 TTGAGGGTTCC-----------TCCTGGC---CAGAACACAGA
Birth of an alternative donor site (enhanced match to the consensus (AG)
in cassette exon 2)
Ancestral gene: GAGCTCCAGGAACmAGGCAGTGAGGCCTTGGTCTGMAGEA3 GAGCTCCAGGAACAAGGCAGTGAGGACTTGGTCTGMAGEA6 GAGCTCCAGGAACAAGGCAGTGAGGACTTGGTCTGMAGEA2 GAGCTCCAGGAACCAGGCAGTGAGGCCTTGGTCTGMAGEA12 GAGTTCCAAGAACAAGGCAGTGAGGCCTTGGTCTGMAGEA1 GAGCTCCAGGAACCAGGCAGTGAGGCCTTGGTCTGMAGEA4 GAGCTCCAGGAACAAGGCAGTGAGGCCTTGGTCTGMAGEA5 GAGCTCCAGGAAACAGACACTGAGGCCTTGGTCTGMAGEA8 GAGCTCCAGGAACCAGGCTGTGAGGTCTTGGTCTGMAGEA9 GAGCTCCAGGAA----GCAGGCAGGCCTTGGTCTGMAGEA10 GAGCTCCAGGGACTGTGAGGTGAGGCCTTGGTCTAMAGEA11 AAGCTCCAAAAACTGAGCAGTGAGGCCTTGGTCTC
Birth of an alternative acceptor site (enhanced polyY tract in cassette exon 4)
Ancestral gene: AGGGGCCCCCATGTGGTCGACAGACACAGTGGMAGEA3 AGGGGCCCCTATGTGGTGGACAGATGCAGTGGMAGEA6 AGGGGCCCCTATGTGGTGGACAGATGCAGTGGMAGEA2 AGGGGCCCCCATCTGGTCGACAGATGCAGTGGMAGEA12 AGGGGCCCCCATGTAGTCGACAGACACAGTGGMAGEA1 AGGGACCCCCATCTGGTCTAAAGACAGAGCGGMAGEA4 AGGGACCCCCATCTGGTCTACAGACACAGTGGMAGEA5 AGGGGCCCCCATCTGGTGGATAGACAGAGTGGMAGEA8 AGGGACCCCCATGTGGGCAACAGACTCAGTGGMAGEA9 AGGGAGGCCC-TGTGTTCGACAGACACAGTGGMAGEA10 AGGGAACCCC-TCTTTTCTACAGACACAGTGGMAGEA11 AAAGAGCCCCATATGGTCCACAACTACAGTGG
• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles
• Evolution of alternative splicing sites: MAGE-A family of CT antigens
• Evolutionary rate in constitutive and alternative regions– human-mouse– human SNPs
• Alternative splicing and protein structure
Concatenates of constitutive and alternative regions in all genes: different evolutionary rates
Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end
0,176
0,1990,187
0,301
0,00
0,10
0,20
0,30
Constitutive N-endalternative
Internalalternative
C-endalternative
dN/d
S
0,8860,874 0,878
0,807
0,7
0,8
0,9
Constitutive N-endalternative
Internalalternative
C-endalternative
Amin
o-ac
id id
entit
y• Relatively more non-synonimous
substitutions in alternative regions (higher dN/dS ratio)
• Less amino acid identity in alternative regions
Individual genes: the rate of non-synonymous to synonymous substitutions dn/ds tends to be larger
in alternative regions (vertical acis) than in constitutive regions (horizontal acis)
0 .0 0 1 0 .0 1 0 .1 1 1 0
0 .0 0 1
0 .0 1
0 .1
1
1 0
С
A
dn/ds (con) – dn/ds (alt)
N-terminal regions
complete genes
internal regions
C-terminal regions
• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles
• Evolution of alternative splicing sites: MAGE-A family of CT antigens
• Evolutionary rate in constitutive and alternative regions– human-mouse
– human SNPs• Alternative splicing and protein structure
Na/Ns (alternative) > Na/Ns (constitutive)for all evidence levels
0,7
0,8
0,9
1
1,1
1,2
1,3
1,4
EST-1 EST-2 EST-3 EST-4 EST-5 mRNA protein
Na/
Ns
const
alt
average(Zhaoet al.)
• Evolution of alternative exon-intron structure – human-mouse– Drosophila and Anopheles
• Evolution of alternative splicing sites: MAGE-A family of CT antigens
• Evolutionary rate in constitutive and alternative regions– human-mouse
– human SNPs
• Alternative splicing and protein structure
a)
6%
10%
15%
37%
40%
34%
21%
19%
6%13%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Expected Observed
Non-domain functional units partially
Domains partially
No annotated unit affected
Non-domain functional units completely
Domains completely
Alternative splicing avoids disrupting domains (and non-domain units)
Control:
fix the domain structure; randomly place alternative regions
… and this is not simply a consequence of the (disputed) exon-domain correlation
0
1
Ra
tio
(ob
serv
ered
/ex
pec
ted
)
Mouse Human Mouse Human Mouse Human
nonAS_Exons AS_Exons AS
AS&Exon boundaries and SMART domains
inside domains
outside domains
Positive selection towards domain shuffling (not simply avoidance of disrupting domains)
a)
6%
10%
15%
37%
40%
34%
21%
19%
6%13%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Expected Observed
Non-domain functional units partially
Domains partially
No annotated unit affected
Non-domain functional units completely
Domains completely
b)
Domains completely
Non-domain units
completely
No annotated
units affected
Expected Observed
Short (<50 aa) alternative splicing events within domains target protein functional sites
a)
6%
10%
15%
37%
40%
34%
21%
19%
6%13%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Expected Observed
Non-domain functional units partially
Domains partially
No annotated unit affected
Non-domain functional units completely
Domains completely
c)
Prosite
patterns
unaffected
Prosite
patterns
affected
FT
positions
unaffected
FT
positions
affected
Expected Observed
An attempt of integration• AS is often young (as opposed to degenerating)• young AS isoforms are often minor and tissue-specific• … but still functional
– although unique isoforms may be result of aberrant splicing
• AS often arises from duplication of exons• … or point mutations creating splicing sites• … or intron insertions• AS regions show evidence for positive selection
– excess non-synonymous and damaging SNPs– excess non-synonymous codon substitutions
• AS tends to shuffle exons and target functional sites in proteins
• Thus AS may serve as a testing ground for new functions without sacrificing old ones
Acknowledgements
• Discussions– Vsevolod Makeev (GosNIIGenetika)– Eugene Koonin (NCBI)– Igor Rogozin (NCBI)– Dmitry Petrov (Stanford)– Dmitry Frishman (GSF, TUM)
• Data– King Jordan (NCBI)
• Support– Ludwig Institute of Cancer Research– Howard Hughes Medical Institute– Russian Academy of Sciences
(program “Molecular and Cellular Biology”)– Russian Fund of Basic Research
Authors
• Andrei Mironov (Moscow State University) – spliced alignment• Ramil Nurtdinov (Moscow State University) – human/mouse,
data• Irena Artamonova (GSF/MIPS) – human/mouse, MAGE-A• Dmitry Malko (GosNIIGenetika, Moscow) –
mosquito/drosophila• Ekaterina Ermakova (Moscow State University) –
evolution of alternative/constitutive regions• Vasily Ramensky (Institute of Molecular Biology, Moscow) –
SNPs• Shamil Sunyaev (EMBL, now Harvard University Medical
School) – protein structure • Eugenia Kriventseva (EBI, now EMBL) – protein structure