alternative splicing from ests

54
Alternative Splicing from ESTs Eduardo Eyras Bioinformatics UPF – February 2004

Upload: bonita

Post on 11-Feb-2016

74 views

Category:

Documents


0 download

DESCRIPTION

Alternative Splicing from ESTs. Eduardo Eyras Bioinformatics UPF – February 2004. Intro ESTs Prediction of Alternative Splicing from ESTs. Transcription. exons. introns. pre-mRNA. Splicing. Mature mRNA. Translation. Peptide. 5’. 3’. 3’. 5’. 5’ CAP. AAAAAAA. Different Splicing. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Alternative Splicing from ESTs

Alternative Splicing from ESTs

Eduardo EyrasBioinformatics UPF – February 2004

Page 2: Alternative Splicing from ESTs

Intro

ESTs

Prediction of Alternative Splicing from ESTs

Page 3: Alternative Splicing from ESTs

AAAAAAA5’ CAPMature mRNA

Splicing

5’

3’

3’

5’

pre-mRNA

Transcriptionexons

introns

Translation

Peptide

Page 4: Alternative Splicing from ESTs

AAAAAAA5’ CAPMature mRNA

Different Splicing

5’

3’

3’

5’

pre-mRNA

Transcriptionexons

introns

Translation

Different Peptide

Page 5: Alternative Splicing from ESTs

Alt splicing as a mechanism of gene regulation

Functional domains can be added/subtracted protein diversity

Can introduce early stop codons, resulting in truncated proteins or unstable mRNAs

It can modify the activity of the transcription factors, affecting the expression of genes

It is observed nearly in all metazoans

Estimated to occur in 30%-60% of human

Page 6: Alternative Splicing from ESTs

Forms of alternative splicing

Exon skipping / inclusion

Alternative 3’ splice site

Alternative 5’ splice site

Mutually exclusive exons

Intron retention

Constitutive exon Alternatively spliced exons

Page 7: Alternative Splicing from ESTs

How to study alternative splicing?

Page 8: Alternative Splicing from ESTs

ESTs (Expressed Sequence Tags)

Single-pass sequencing of a small (end) piece of cDNA

Typically 200-500 nucleotides long

It may contain coding and/or non-coding region

Page 9: Alternative Splicing from ESTs

ESTsCells from a specific organ, tissue or developmental stage

AAAAAA 3’5’

AAAAAA 3’5’

TTTTTT5’3’

AAAAAA 3’5’

TTTTTT5’3’

TTTTTT5’3’

AAAAAA 3’5’

TTTTTT5’3’

mRNA extraction

RNA

DNA

Double stranded cDNA

Add oligo-dT primer

Reverse transcriptase

Ribonuclease H

DNA polimerase Ribonuclease H

Page 10: Alternative Splicing from ESTs

ESTs

AAAAAA 3’5’

TTTTTT5’3’Clone cDNA into a vector

Multiple cDNA clones5’ EST

3’ EST

Single-pass sequence reads

Page 11: Alternative Splicing from ESTs

Splice variants

Genomic

Primary transcript

Splicing

cDNA clones(double stranded)

EST sequences (Single-pass sequence reads) 5’ 3’ 5’ 3’

Sampling the Transcriptome with ESTs

oligo-dT primer

Reverse transcriptase

Page 12: Alternative Splicing from ESTs

Large scale EST-sequencing coupled to Genome sequencing

Page 13: Alternative Splicing from ESTs

EST sequencing

Is fast and cheap

Gives direct information about the gene sequence

Partial information

Resulting ESTs Known gene(DB searches) Similar to known gene

ContaminantNovel gene

Page 14: Alternative Splicing from ESTs

Number of public entries: 20,039,613

Summary by organism

Homo sapiens (human) 5,472,005Mus musculus + domesticus (mouse) 4,056,481Rattus sp. (rat) 583,841Triticum aestivum (wheat) 549,926Ciona intestinalis 492,511Gallus gallus (chicken) 460,385Danio rerio (zebrafish) 450,652Zea mays (maize) 391,417Xenopus laevis (African clawed frog) 359,901…

dbEST release 20 February 2004

Page 15: Alternative Splicing from ESTs

EST lengths

Human EST length distribution (dbEST Sep. 2003 )

~ 450 bp

Page 16: Alternative Splicing from ESTs

ESTs provide expression data

eVOC Ontologies http://www.sanbi.ac.za/evoc/

Anatomical System

Cell Type

The tissue, organ or anatomical system from which the sample was prepared. Examples are digestive, lung and retina.

Pathology

The precise cell type from which a sample was prepared. Examples are: B-lymphocyte, fibroblast and oocyte.

Developmental Stage

The pathological state of the sample from which the sample was prepared.Examples are: normal, lymphoma, and congenital.

Pooling

The stage during the organism's development at which the sample was prepared. Examples are: embryo, fetus, and adult.

Indicates whether the tissue used to prepare the library was derived from single or multiple samples.  Examples are pooled, pooled donor and pooled tissue.

J Kelso et al. Genome Research 2002

Page 17: Alternative Splicing from ESTs

ESTs provide expression data

eVOC Ontologies http://www.sanbi.ac.za/evoc/

Anatomical System

Cell Type Pathology Developmental Stage Pooling

…nervous

brain cerebellum …

Library 1 Library 2 …

ESTs ESTs

Page 18: Alternative Splicing from ESTs

Linking the expression vocabulary to gene annotations

ESTs

GenesV Curwen et al. Genome Research (2004)

Page 19: Alternative Splicing from ESTs

Gene expression vocabulary

Page 20: Alternative Splicing from ESTs

Normalized vs. non-normalized libraries

Page 21: Alternative Splicing from ESTs

The down side of the ESTs

Cannot detect lowly/rarely expressed genes or non-expressed sequences (regulatory)

Random sampling: the more ESTs we sequence the less new useful sequences we will get

Page 22: Alternative Splicing from ESTs

Using ESTs to study Alternative Splicing

Page 23: Alternative Splicing from ESTs

ESTs aligned to the genome

EST

True matchbest in genome

ParalogProcessed

pseudogene

GT AGPolyA

It defines the location of exons and intronsWe can verify the splice sites of introns check the correct strand of spliced ESTsIt helps preventing chimerasIt can avoid putting together ESTs from paralogous genesWe can prevent including pseudogenes in our analysis

*Stop

Must Clip poly A tails before aligning

Page 24: Alternative Splicing from ESTs

Alternative Exons/ 3´ PolyA sites from ESTs

ESTs can also provide information about potential alternative splicing when aligned to the genome (and when aligned to mRNA data)

Page 25: Alternative Splicing from ESTs

Aligning ESTs to the Genome

Many ESTs Fast programs, Fast computers

Nearly exact matches Coverage >= 97%Percent_id >= 97%

Splice sites: GT—AG, AT—AC, GC—AG

Page 26: Alternative Splicing from ESTs

Genomics as a Technology

Development of special software:fast versus accurate alignment

Development of special technology:efficient use of computer farms (~2000 CPUs)

Page 27: Alternative Splicing from ESTs

Recovering full transcripts from ESTs

Page 28: Alternative Splicing from ESTs

Recover the mRNA from the ESTs

Page 29: Alternative Splicing from ESTs

The Problem

What are the transcripts represented in this set of mapped ESTs?

ESTs

Genome

Page 30: Alternative Splicing from ESTs

Transcript predictions

ESTs

Predict Transcripts from ESTs

Merge ESTs according to splicing structure compatibility

Page 31: Alternative Splicing from ESTs

Redundant ESTsConsider 2 ESTs in a Genomic Cluster with more ESTS

xz

z gives redundant splicing information, we could keep only x x

zw

However, the relation with other ESTs in the cluster is important: a third EST, w, is compatible with z but not with x.--> keep all relations

x + z

x + zz + w

Page 32: Alternative Splicing from ESTs

Extension of the exon structureConsider 2 ESTs in a Genomic Cluster with more ESTS

xy

y extends x, we can assume that they are from the same mRNA

xzw

Our success will depend on the coverage of the exons.However, ESTs are 3’and 5’ biased (ESTs like z not so frequent), hence we will have fragmentation.

x + y

Page 33: Alternative Splicing from ESTs

Representation

Extension

Inclusion zx

y

x

For every 2 ESTs in a Genomic Cluster, we decide if they represent equivalent splicing structures

The compatibility relation is a graph:

xy

xz

E Eyras et al. Genome Research (2004)

Page 34: Alternative Splicing from ESTs

Criteria of “merging”

Allow internal mismatches

Allow intron mismatches

Allow edge-exon mismatches

mismatches

Is this intron real?

Page 35: Alternative Splicing from ESTs

Transitivity

Extension

Inclusion wz

y

x

w

x

This reduces the number of comparisons needed

xyz

xzw

Page 36: Alternative Splicing from ESTs

ClusterMerge graph

z

x

x

y

y

z

w

Each node defines an inclusion sub-tree

Extensions form acyclic graphs

yxz

xyzw

E Eyras et al. Genome Research (2004)

Page 37: Alternative Splicing from ESTs

Mergeable sets

1

32

4

65

Example

7

Page 38: Alternative Splicing from ESTs

Mergeable sets

1

32

4

65

Example

7

1

4

2

6

5

3

7

Page 39: Alternative Splicing from ESTs

Mergeable sets

1

32

4

65

Example

7

1

4

2

6

5

3

7

Leaves

Root

Page 40: Alternative Splicing from ESTs

Mergeable sets

1

32

4

65

Example

7

1

4

2

6

5

3

7

Lists produced: (1,2,3,5,6,7) ( 1,2,3,4,5,7)

Leaves

Root

Page 41: Alternative Splicing from ESTs

Deriving the transcripts from the lists

Internal Splice Sites: external coordinates of the 5’ and 3’ exons are not allowed to contribute

Page 42: Alternative Splicing from ESTs

Deriving the transcripts from the lists

Splice Sites: are set to the most common coordinate

5’ and 3’ coordinates: are set to the exon coordinate that extends the potential UTR the most

Page 43: Alternative Splicing from ESTs

Single exon transcripts

Reject resulting single exon transcripts when using ESTs

Page 44: Alternative Splicing from ESTs

Alternative splicing and comparative genomics

Page 45: Alternative Splicing from ESTs

Conservation of Alternative Splicing

Degree of conservation: 30-60%

Methods:

1.- compare single events

2.- Cross-alignment of full transcripts

Page 46: Alternative Splicing from ESTs

Exon Skipping Events

Introns flanking alternatively spliced (skipped) exons have high sequence conservation.Higher on average than constitutive inrons.

R Sorek & G Ast. Genome Research 13:1631-1637, 2003

Page 47: Alternative Splicing from ESTs

Sequences regulating the (Alternative) splicing

Overrepresented sequences in conserved introns (between human and mouse) may beInvolved in the regulation of alternative splicing.

Overrepresented: found in these introns more often than expected at random AND not foundin intronic sequences flanking constitutive exons (and upstream of skipped ones)

R Sorek & G Ast. Genome Research (2003) 13:1631-1637

ConservedAlternative

ExonFlankingIntrons

Overrepresented hexamer (downstream)

Page 48: Alternative Splicing from ESTs

Sequences regulating the (Alternative) splicing

Not all types of events are equally conserved.Introns flanking alternative 5´and 3´exons, and retained introns, have higher sequence conservation.

Sugnet CW, Kent WJ, Ares M Jr, Haussler D. Pac Symp Biocomput. 2004;:66-77

ConservedAlternative

ExonFlankingIntrons

Overrepresented hexamer

Page 49: Alternative Splicing from ESTs

Frame preservation

Frame preserving Constitutive exons Alternative exons

All exons 39.7% (Human)39.5% (Mouse)

41.6% (Human)44.7% (Mouse)

ConservedExon

40.9% (Human)38% (Mouse)

51.8% (Human)51.9% (Mouse)

A Resch et al. Nucleic Acids Research 2004, 32 (4) 1261-1269

Page 50: Alternative Splicing from ESTs

Predicting alternative exons

Page 51: Alternative Splicing from ESTs

Features Differentiating Between Alternatively splice and Constitutively spliced exons

Alternative exons

Constitutive exons

Average size 87 128

length = mutliple of 3 73% 37%

Average human-mouse exon conservation 94% 89%

(A) Exons with upstream intron conserved in mouse

92% 45%

(B) Exons with downstream intron conserved in mouse

82% 35%

(A) + (B) 77% 17%

R Sorek et al. Genome Research (2004) 14:1617-1623

(A), (B) : conservation is considered if at least there 12 consecutive matches over 100bp of the intron

Page 52: Alternative Splicing from ESTs

Build a classifier to make predictions

• Rule: Set of conditions over the parameters:

e.g. “at least 99% conservation with mouse AND divisible by 3, etc…”

• Try all the possible combinations of parameters

• Select the rule that would correctly identify a maximum number of true

alternative exons minimizing the number of false positives

At least 95% identity with mouse orthologous exon

Exon size is a multiple of 3

An upstream intronic alignment of at least 15bp with at least 85% identity

A downstream intronic exact alignment of at least 12bp

R Sorek et al. Genome Research (2004) 14:1617-1623

This rule achieved 31% sensitivity and no false positives in a set of known exons:

Page 53: Alternative Splicing from ESTs

SummaryAlternative splicing is a mechanism to generate function diversity

We can study alternative splicing using ESTs (Expressed Sequence Tags)

EST data is fragmented and full of noise: need to be processed

Some alternative splicing is conserved across species (Human-Mouse)

Prediction of alternative (conserved) exons is possible (a classifier) but no ab initio

Evolution of alternative splicing?

Page 54: Alternative Splicing from ESTs

THE END