the evolution of expression patterns in the arabidopsis genome

38
The evolution of expression patterns in the Arabidopsis genome Todd Vision Department of Biology University of North Carolina at Chapel Hill

Upload: jasper

Post on 12-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

The evolution of expression patterns in the Arabidopsis genome. Todd Vision Department of Biology University of North Carolina at Chapel Hill. Driving forces in genome evolution. Proximate vs. ultimate explanations - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The evolution of expression patterns  in the Arabidopsis genome

The evolution of expression patterns in the Arabidopsis genome

Todd VisionDepartment of Biology

University of North Carolina at Chapel Hill

Page 2: The evolution of expression patterns  in the Arabidopsis genome

Driving forces in genome evolution

• Proximate vs. ultimate explanations• Deleterious mutations are frequent and

selection cannot effectively act on all of them– Substitutions– Insertions and deletions– Duplications– Transpositions

• Cellular processes will be affected by this rain of mutations

• At the molecular level, we must entertain ultimate explanations that do not invoke adaption

Page 3: The evolution of expression patterns  in the Arabidopsis genome

An example: Codon bias• Genes differ in the frequency that they use

the preferred codon for a given amino acid, thereby affecting– Translational efficiency– Translational accuracy

• The strongest codon bias is typically seen in short, highly expressed genes under strong purifying selection

• Realized codon bias is a balance between selection for preferred codons and a continual rain of mutations toward unpreferred codons

Page 4: The evolution of expression patterns  in the Arabidopsis genome

What are the consequences of mutational rain on the

regulatory networks that modulate gene expression?

Page 5: The evolution of expression patterns  in the Arabidopsis genome

Outline

• Arabidopsis gene expression (MPSS)

• Two evolutionary issues in the evolution of expression profiles:– Physical clustering of co-expressed genes – Divergence of duplicated genes

Page 6: The evolution of expression patterns  in the Arabidopsis genome

Digital expression profiling

• “Bar-code” counting raises fewer concerns about cross-hybridization, probe selection, background hybridization, etc.

• Serial Analysis of Gene Expression (SAGE) – Count occurrence of 10-12 bp mRNA signatures– Long SAGE: 21-22 bp signatures– Uses conventional sequencing technology

• Massively Parallel Signature Sequencing (MPSS)– Count occurrence of 17-20 bp mRNA signatures– Cloning and sequencing is done on microbeads– Commercialized by Lynx Therapeutics

Page 7: The evolution of expression patterns  in the Arabidopsis genome

MPSS library constructionAAAAAAA

AAAAAAA

AAAAAAA

AAAAAAAAAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAAAAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

extract mRNA from tissue

AAAAAAATTTTTTT

5’ - Add standard

primer(added by cloning)

3’ - Add unique 32 bp

tag and standard

primer

AAAAAAAmRNA

Cut w/ Sau3A AAAAAAA

TTTTTTT

AAAAAAA

Convert to cDNA

TTTTTTT Add linker

Brenner et al., PNAS 97:1665-70.

Remove 3’ primer and expose single stranded unique tag

(digest, 3' 5' exonuclease)

Anneal to beads coated with unique anti-tag(32 bp, complementary to tag on mRNA) PCR

AAAAAAATTTTTTT

GATC

Page 8: The evolution of expression patterns  in the Arabidopsis genome

MPSS library construction

The result of the library construction is a set of microbeads. Each bead contains many DNA molecules, all derived from the 3’ end of a single transcript.

Beads are loaded in a monolayer on a microscope slide for the sequencing of 17 – 20 bp from the 5’ end.

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAAAAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

Brenner et al., PNAS 97:1665-70.

Sort by FACS to remove ‘empty’ beads

Page 9: The evolution of expression patterns  in the Arabidopsis genome

MPSS Sequencing

Repeat Cycle

8 7 6 5

Steps of four bases; overhang is shifted by four

bases in each round

NNNN

Digest with Type IIS enzyme to

uncover next 4 bases

9 bp

13 bp

CNNN 4 3 2 1

^ ^GNNN CODEC4RS DECODERED

Sequence by hybridization

16 cyclesfor 4 bp

NNXN CODEX2

XNNN CODEX4

NXNN CODEX3

NNNX CODEX1RS

RS

RS

RS

4 3 2 1NNNN

+

Add adaptors

Brenner et al., Nat. Biotech. 18:630-4.

Page 10: The evolution of expression patterns  in the Arabidopsis genome

MPSS Sequencing

GATCAATCGGACTTGTCGATCGTGCATCAGCAGTGATCCGATACAGCTTTGGATCTATGGGTATAGTCGATCCATCGTTTGGTGCGATCCCAGCAAGATAACGATCCTCCGTCTTCACAGATCACTTCTCTCATTAGATCTACCAGAACTCGG..GATCGGACCGATCGACT

253212349417561672702814..2,935

123456789..30,285

Each bead provides a signature of 17-20 bp

Tag #SignatureSequence

# of Beads (Frequency)

Two sets of signatures are generated from each sample in different reading frames staggered

by two bases

Total # of tags: >1,000,000

ATG TGA

Page 11: The evolution of expression patterns  in the Arabidopsis genome

A catalog of signatures in the Arabidopsis genome

All potential signatures (GATC + 13 bp) are identified on both strands of the genomic sequence.

There is one potential signature appx. every 293 bp on each strand of genome

A signature is classified according to its position relative to the 29,084 genes & pseudogenes in the TIGR annotation

Signatures may not be unique. The number of ‘hits’ in the genome is recorded

“Hits” At genome % of total Random 1 748204 87.407% 8450572 88392 10.326% 61343 11019 1.287% 214 3512 0.410% 05 1452 0.170% 06 874 0.102% 07 470 0.055% 08 326 0.038% 09 237 0.028% 010 192 0.022% 011 158 0.018% 012-20 707 0.083% 021-30 247 0.029% 031-50 124 0.014% 0> 50 86 0.010% 0 Total 851,212 851,212

Page 12: The evolution of expression patterns  in the Arabidopsis genome

Classifying signatures

Potential alternative splicing or nested

gene

Potential alternative termination

Potential un-annotated

ORF

Potential anti-sensetranscript

Anti-sense transcript or nested

gene?

Duplicated: expression may

be from other site in genome

Triangles refer to colors used on our web page:Class 1 - in an exon, same strand as ORF.Class 2 - within 500 bp after stop codon, same strand as ORF.Class 3 - anti-sense of ORF (like Class 1, but on opposite strand).Class 4 - in genome but NOT class 1, 2, 3, 5 or 6.Class 5 - entirely within intron, same strand.Class 6 - entirely within intron, anti-sense.

Grey = potential signature NOT expressedClass 0 - signatures found in the expression libraries but not the genome.

or

or

or

or

or

or

Typicalsignatures

Page 13: The evolution of expression patterns  in the Arabidopsis genome

Arabidopsis signatures

Class # in genome % of total1 sense exonic 203,174 24.02 3’UTR, <500 bp 44,202 5.23 anti-sense exonic 197,065 23.34 inter-genic 288,109 34.05 intronic 60,817 7.2 6 anti-sense intronic 57,845 6.8TOTAL 851,212 100.5

Based on TIGR annotation (release 3.0, July 2002)

355 genes lack potential Class 1 or 2 signatures (undetectable)

On average, there are 8.5 class 1 & 2 signatures per gene

8422 genomic signatures have secondary classes due to overlap or near overlap of two genes in the TIGR annotation.

Page 14: The evolution of expression patterns  in the Arabidopsis genome

Core Arabidopsis MPSS librariessequenced by Lynx for Blake Meyers, U. of Delaware

Signatures Distinct

Library sequenced signatures

Root 3,645,414 48,102

Shoot 2,885,229 53,396

Flower 1,791,460 37,754

Callus 1,963,474 40,903

Silique 2,018,785 38,503

TOTAL 12,304,362 133,377

Page 15: The evolution of expression patterns  in the Arabidopsis genome

Genome-wide expression profiling Arabidopsis

Of the 29,084 gene models, 14,674 match unique, expressed signatures

Chr. I

Chr. II

Chr. III

Chr. IV

Chr. V

Page 16: The evolution of expression patterns  in the Arabidopsis genome

http://www.dbi.udel.edu/mpss

Query by• Sequence• Arabidopsis gene identifier• chromosomal position• BAC clone ID• MPSS signature• Library comparison

Site includes• Library and tissue information• FAQs and help pages

Page 17: The evolution of expression patterns  in the Arabidopsis genome

Outline

• Arabidopsis gene expression (MPSS)

• Two evolutionary issues in the evolution of expression profiles:– Physical clustering of co-expressed genes – Divergence of duplicated genes

Page 18: The evolution of expression patterns  in the Arabidopsis genome

Physical clustering of co-expressionCaenorhabditis elegans Roy et al., (2002) Nature 418, 975

Lercher et al (2003) Genome Research 13, 238Drosophila melanogaster Boutanaev et al (2002) Nature 420, 666

Spellman and Rubin (2002) J Biology 1, 5Homo sapiens Caron et al (2001) Science 291, 1289

Lercher et al (2002) Nature Genetics 31, 180Saccharomyces cerevisiae Cohen et al (2000) Nature Genetics 26, 183

Hurst et al (2002) Trends in Genetics 18, 604Mannila et al (2002) Bioinformatics 18, 482

• What are the proximate explanations?– shared cis-regulatory elements– chromatin packaging, etc.

• What are the ultimate explanations? – Adaptive: greater transcriptional efficiency/accuracy?– Maladaptive: mutational rain chipping away at insulators and

other mechanisms that over-ride regional controllers of gene expression?

Page 19: The evolution of expression patterns  in the Arabidopsis genome

Measuring expression distance

library 1

library 2

library 3

Page 20: The evolution of expression patterns  in the Arabidopsis genome

Clustering of tissue-specific expression

Flower (red)Silique (violet)Leaf (green)Root (blue)

Callus (white)

Chromosome 1

Page 21: The evolution of expression patterns  in the Arabidopsis genome

Statistical tests of coexpression clustering

• Measured median pairwise expression distance (MPED) in non-overlapping windows of 20 genes– Summed unique class 1 and 2 signatures for each

gene– Only one gene within each tandemly arrayed

family was counted

• Out of 100 shuffles of gene order– Zero shuffles had as many windows with small

MPED (less than 1.5) as the unshuffled data– Zero shuffles had as large a variance in MPED

among windows as the unshuffled data

Page 22: The evolution of expression patterns  in the Arabidopsis genome

Coexpression in Arabidopsis

Page 23: The evolution of expression patterns  in the Arabidopsis genome

Coexpression in Arabidopsis

Page 24: The evolution of expression patterns  in the Arabidopsis genome

Coexpression in Arabidopsis

Page 25: The evolution of expression patterns  in the Arabidopsis genome

Selection and recombination

• In regions of low recombination– deleterious mutations can hitch-hike to high

frequency along with favorable ones– favorable mutations are kept at low frequency by

linkage to deleterious ones

• Therefore, the effectiveness of natural selection is causally related to recombination rate

• Are clusters more concentrated in regions of – high recombination (i.e. are they adaptive)– low (i.e. are they maladaptive)?

Page 26: The evolution of expression patterns  in the Arabidopsis genome

Measuring recombination rate

0

20

40

60

80

100

120

0 5 10 15 20 25 30 35

physical distance (Mb)

gen

etic

dis

tan

ce (

cm)

0

1

2

3

4

5

6

7

8

9

reco

mb

inat

ion

rat

e (c

m/M

b)

Chromosome 1

Page 27: The evolution of expression patterns  in the Arabidopsis genome

Co-expression is greater in low recombination regions

2.5

3

3.5

2 3 4 5 6 7 8 9

10

>10

recombination rate (cm/Mb)

expr

essi

on d

ista

nce

Page 28: The evolution of expression patterns  in the Arabidopsis genome

Co-expression clusters

• MPSS data provides evidence for clusters of co-expression among non-related genes in Arabidopsis

• Co-expression is greater in regions of low recombination

• Thus, co-expression clusters may be maladapative, at least on average

Page 29: The evolution of expression patterns  in the Arabidopsis genome

Outline

• Arabidopsis gene expression (MPSS)

• Two evolutionary issues in the evolution of expression profiles:– Physical clustering of co-expressed genes – Divergence of duplicated genes

Page 30: The evolution of expression patterns  in the Arabidopsis genome

Divergence of duplicated genes

Age of duplication

Exp

ress

ion

dist

ance

Page 31: The evolution of expression patterns  in the Arabidopsis genome

Duplicated genes in Arabidopsis

Page 32: The evolution of expression patterns  in the Arabidopsis genome

Modes of gene duplication

• Tandem (unequal crossing-over)

• Dispersed (transposition)

• Segmental (polyploidy)

Page 33: The evolution of expression patterns  in the Arabidopsis genome

Divergence of duplicated genes

• All gene families of size 2 in Arabidopsis were classified as ‘dispersed’, ‘segmental’ or ‘tandem’

• Expression distance was calculated for each• The number of silent (i.e. synonymous)

substitutions per site was calculated for each (as a proxy for age since duplication)

Page 34: The evolution of expression patterns  in the Arabidopsis genome

Divergence and mode of duplication

0

1

2

3

4

0 2 4 6 8silent substitutions (per site) x 10

expr

essi

on d

ista

nce

dispersedsegmentaltandem

Page 35: The evolution of expression patterns  in the Arabidopsis genome

Divergence of duplicated genes

• Almost all expression divergence occurs during (or immediately following) duplication

• Initial expression divergence is more extreme for tandem than dispersed duplicates

• Tandem and dispersed duplicates with the most divergent expression profiles are quickly lost

• Segmental duplicates plateau at a lower level of expression divergence than dispersed duplicates

• The average divergence in relative expression level in each tissue is about 8-fold.

Page 36: The evolution of expression patterns  in the Arabidopsis genome

Lessons learned

• Clusters of co-expression in Arabidopsis may be largely the result of a rain of weakly deleterious mutations that homogenize the expression profiles of neighboring genes

• Divergence in expression profile between duplicated genes is dependent on the nature of the mutation that gave rise to the duplication

Page 37: The evolution of expression patterns  in the Arabidopsis genome

Thanks!

• UNC Chapel Hill– Jianhua Hu

• University of Delaware – Blake Meyers

• NSF Plant Genome Research Program

– DBI-01103267 (TJV)– DBI-0110528 (BCM)

Page 38: The evolution of expression patterns  in the Arabidopsis genome