patrocles: a database of polymorphic mir-mediated gene ... · polymorphic mir-mediated gene...

25
Patrocles: a database of polymorphic miR-mediated gene regulation in vertebrates Denis Baurain Samuel Hiard Wouter Coppieters Carole Charlier Michel Georges

Upload: phamtuyen

Post on 03-Jul-2018

242 views

Category:

Documents


0 download

TRANSCRIPT

Patrocles: a database ofpolymorphic miR-mediated generegulation in vertebrates

Denis Baurain

Samuel HiardWouter CoppietersCarole CharlierMichel Georges

Polymorphic miR-mediatedgene regulation

AAAAAAAAA….

3’-UTR

mature miR

miRNP

Pri -miR

nucleus

Pre -miR

Host gene ?

Exportin5

cytoplasm

Drosha

complex

Dicer

Helicase

mRNA

miR/miR*

miRNP

Targets (1)

miRs (100s)

Silencing machinery(overall effect)

DNASequencePolymorphisms: DSPs

Considerable sequence space is devoted to miR-mediated gene regulation(targets, miRs, silencing machinery)

DSPs in silencing components are likely to contribute to (complex)phenotypic variation including disease

Proof in animals: Texel sheep, Clop et al. (2006) Nat. Genet. 38:813-818 Suggestions in humans: Sethupathy & Collins (2008) TIG 24:489-497

Polymorphic miR-mediatedgene regulation

AAAAAAAAA….

3’-UTR

mature miR

miRNP

Pri -miR

nucleus

Pre -miR

Host gene ?

Exportin5

cytoplasm

Drosha

complex

Dicer

Helicase

mRNA

miR/miR*

miRNP

Targets (1)

miRs (100s)

Silencing machinery(overall effect)

DNASequencePolymorphisms: DSPs

Considerable sequence space is devoted to miR-mediated gene regulation(targets, miRs, silencing machinery)

DSPs in silencing components are likely to contribute to (complex)phenotypic variation including disease

Proof in animals: Texel sheep, Clop et al. (2006) Nat. Genet. 38:813-818 Suggestions in humans: Sethupathy & Collins (2008) TIG 24:489-497

http://www.patrocles.org/

Mining public databases forSNPs and other DSPs in the3 sequence compartments

Patrocles - Overview

Patrocles

miRBase

miRs 8nt motifs

UCSC

alignments

Ensembl

3’-UTRs SNPs

SymAtlas

gene expr.

GEO

DGV

HapMap1000 genomes

CNVs

gene expr.genotypes

allele freqs

Literature

8nt motifsmiR expr.CNVs

eQTLmachinery

Updates and Synchronization...

Patrocles - Overview

Patrocles

miRBase

miRs 8nt motifs

UCSC

alignments

Ensembl

3’-UTRs SNPs

SymAtlas

gene expr.

GEO

DGV

HapMap1000 genomes

CNVs

gene expr.genotypes

allele freqs

Literature

8nt motifsmiR expr.CNVs

eQTLmachinery

Updates and Synchronization...

Currently, 7 species human chimp mouse rat dog cow chicken

target sitesin 3’-UTRs

target sitemotifs

SNPsin 3’-UTRs

3’-UTRs

2,674,395 (12.4%)4,072,176 (15.5%)sequence space

19,59530,290 (9.0%)conserved L NOT X-targets

57,15464,010 (22.4%)conserved X NOT L-targets

9,43610,425 (27.7%)conserved X AND L-targets

31,41637,700X AND L-targets

455,620661,187X OR L-targets

219,392375,054L-targets

267,644323,833X-targets

5859X AND L-octamers

9481,164X OR L-octamers

466683L-octamers

117170miR*

484676miR

540540X-octamers

111,178 (87.8%)114,305 (83.9%)known ancestral allele

126,589136,159total

21,634,54826,261,732sequence space

21,91124,319genes

mousehuman

target sitesin 3’-UTRs

target sitemotifs

SNPsin 3’-UTRs

3’-UTRs

2,674,395 (12.4%)4,072,176 (15.5%)sequence space

19,59530,290 (9.0%)conserved L NOT X-targets

57,15464,010 (22.4%)conserved X NOT L-targets

9,43610,425 (27.7%)conserved X AND L-targets

31,41637,700X AND L-targets

455,620661,187X OR L-targets

219,392375,054L-targets

267,644323,833X-targets

5859X AND L-octamers

9481,164X OR L-octamers

466683L-octamers

117170miR*

484676miR

540540X-octamers

111,178 (87.8%)114,305 (83.9%)known ancestral allele

126,589136,159total

21,634,54826,261,732sequence space

21,91124,319genes

mousehuman

Friedman et al. (2009) Genome Res. 19:92-105

Targets – Methods 2 collections of 8nt motifs

X-targets: 540 8nt motifs (mammals)conserved in 3’-UTRs, putative miR target sitesXie et al. (2005) Nature 434:338-345

L-targets: 683 8nt motifs (human)rc(2-8nt)+A from mature miRs in miRBaseLewis et al. (2005) Cell 120:15-20

2 collections of 7nt motifs (from L-targets) 7mer-A1 7mer-m8

target sitesin 3’-UTRs

target sitemotifs

SNPsin 3’-UTRs

3’-UTRs

2,674,395 (12.4%)4,072,176 (15.5%)sequence space

19,59530,290 (9.0%)conserved L NOT X-targets

57,15464,010 (22.4%)conserved X NOT L-targets

9,43610,425 (27.7%)conserved X AND L-targets

31,41637,700X AND L-targets

455,620661,187X OR L-targets

219,392375,054L-targets

267,644323,833X-targets

5859X AND L-octamers

9481,164X OR L-octamers

466683L-octamers

117170miR*

484676miR

540540X-octamers

111,178 (87.8%)114,305 (83.9%)known ancestral allele

126,589136,159total

21,634,54826,261,732sequence space

21,91124,319genes

mousehuman

target sitesin 3’-UTRs

target sitemotifs

SNPsin 3’-UTRs

3’-UTRs

2,674,395 (12.4%)4,072,176 (15.5%)sequence space

19,59530,290 (9.0%)conserved L NOT X-targets

57,15464,010 (22.4%)conserved X NOT L-targets

9,43610,425 (27.7%)conserved X AND L-targets

31,41637,700X AND L-targets

455,620661,187X OR L-targets

219,392375,054L-targets

267,644323,833X-targets

5859X AND L-octamers

9481,164X OR L-octamers

466683L-octamers

117170miR*

484676miR

540540X-octamers

111,178 (87.8%)114,305 (83.9%)known ancestral allele

126,589136,159total

21,634,54826,261,732sequence space

21,91124,319genes

mousehuman

Targets - Concordance betweenX and L target site motifs

540 8mers577 7mers554 6mers

683 8mers1265 7mers1448 6mers

91%

40%

Targets - Conserved vs.total numbers of target sites

1. human: A ...TTTGGTGAAACCAAC... => ancestral allele human: G ...TTTGGTGGAACCAAC... => derived allele chimp ...TTTGGTGAAACCAAC... => sibling species

2. rat ...TTTGGTGAAACAAAC... mouse ...CTTGGTGAAACAAAC...

3. dog ...TTTGGTGAAACTAAC... cow ...TTTGGTGAAACTAAC...

(3/3) TTTGGTGA (3/3) TTGGTGAA (3/3) TGGTGAAA (3/3) GGTGAAAC(2/3) not in dog/cow gtgaaacc (2/3) not in dog/cow tgaaacca(2/3) not in dog/cow gaaaccaa => hsa-miR-29b-2*(2/3) not in dog/cow aaaccaac

TargetsPatrocles SNPs - Methods

1. human: A ...TTTGGTGAAACCAAC... => ancestral allele ||||||||| 3'-GAUUCGGUGGUACACUUUGGUC-5' => hsa-miR-29b-2* |||.||||| human: G ...TTTGGTGGAACCAAC... => derived allele chimp ...TTTGGTGAAACCAAC... => sibling species

2. rat ...TTTGGTGAAACAAAC... mouse ...CTTGGTGAAACAAAC...

3. dog ...TTTGGTGAAACTAAC... cow ...TTTGGTGAAACTAAC...

(3/3) TTTGGTGA ........ ........(2/3) not in dog/cow gaaaccaa => hsa-miR-29b-2*(2/3) not in dog/cow aaaccaac

TargetsPatrocles SNPs - Methods

CNCDNCnot cons.P

(CC)DCconserved

?derancsite \ allele

+S +W7C / S7C

TargetsPatrocles SNPs - Results

mousehuman

5637837741S

2,0652,2903,2951,944P

7,5738,54511,2449,006CNC

7,2507,73210,3287,392DNC

496+65951+102959+581,546+50DC+CC

17,50519,65726,71920,679total

LewisXieLewisXiepSNP class

# destructions

=# creations

Targets - Patrocles SNPsEvidence for purifying selection

SNP shuffling in 3’-UTR sequence space with preservation of trinucleotide context

human - DC

Targets - Patrocles SNPsEvidence for purifying selection

SNP shuffling in 3’-UTR sequence space with preservation of trinucleotide context

human - DC

possible elimination of SNPs affecting conserved targets 22 to 35% in human 53 to 67% in mouse

Chen & Rajewsky (2006) Nat. Genet. 38:1452-1456 depletion of SNPs in conserved miR target sites when compared to

other conserved 3’-UTR sequences

TargetsPrioritization for lab validation most interesting pSNPs are

pSNPs destroying conserved target sites pSNPs creating target sites in anti-targets

to yield a phenotype, target and miR have to beexpressed in the same tissue (at the same time)

co-expression plots for human and mouse target genes: SymAtlas miRs: Landgraf et al. (2007) Cell 129:1401-1414

two different kinds of plots comparing miR and target comparing miR host gene (if any) and target

miRtarget

tissue mapping?

pSNPs - Co-expression plotsrs34542287 A/G [0.985/0.015]Destroyed Conserved target site

miR-9 vs. actin-binding LIM protein 1ACCA[A]AGA

rs28399411 G/A [0.994/0.006]Destroyed Conserved target site

miR-32 vs. Axonal membrane protein GAP-43TGTGC[A]AT

mature miR counts[+] direct evidence[–] gross tissue mapping

host gene expression[+] perfect matching of tissues[–] indirect evidence

pri-miRs (stem-loops) from miRBase

pDSPs altering miR sequence SNPs (de-)stabilizing interaction (seed, mature non-seed)

pDSPs altering miR concentration SNPs altering processing efficiency (anywhere in stem-loop) CNVs encompassing miR genes

human: http://projects.tcag.ca/variation/ mouse: She et al. (2008) Nat. Genet. 40:909-914 rat: Guryev et al. (2008) Nat. Genet. 40:538-545

eQTL (or allelic imbalance) corresponding to host genes (only human) Morley et al. (2004) Nature 430:743-747 Cheung et al. (2005) Nature 437:1365-1369 Ge et al. (2005) Genome Res. 15:1584-1591 Stranger et al. (2005) PLoS Genet. 1:e78 Pant et al. (2006) Genome Res. 16:331-339 Dixon et al. (2007) Nat. Genet. 39:1202-1207 Goring et al. (2007) Nat. Genet. 39:1208-1216 Spielman et al. (2007) Nat. Genet. 39:226-231 Stranger et al. (2007) Nat. Genet. 39:1217-1224

Polymorphic miRs - Methods

A

5’

9 8 7 6 5 4 3 2

miR seed

target site

1 2 3 4 5 6 7 8

Targeted mRNA

1

5’

Pri-miR

Host gene ?

Pri-miR

Host gene ?

Polymorphic miRs – Results

n.d.85affected miRs

n.d.78eQTLmiRs hostedin eQTL genes

0256affected miRs

0158CNVsmiRs in CNVs

79146other

626mature non-seed

412seed

89184total

71136affected miRs

SNPsin pre-miRs

466676pre-miRs

mousehuman

Polymorphic miRs – Results

Duan et al. (2007) Hum. Mol. Genet. 16:1124-1131

T Ge.g., hsa-miR-125a

SNP in seed (+8) blocks processingof pri-miR to pre-miR

manually curated list of 52 gene products involved inRNA-mediated gene silencing

3 broad compartments 1. miR biogenesis: 4 (+4) 2. RISC/mRNP: 12 (+2) 3. P-bodies: 27 (+3)

pDSPs altering machinery gene sequence SNPs (non-synonymous, stop/frameshift, splicing site)

pDSPs altering machinery gene product concentration CNVs encompassing machinery genes (human, mouse, rat) eQTL corresponding to machinery genes (human)

Silencing machinery – Methods

1

2

3

Silencing machinery – Results

n.d.21 machinery genes identified as eQTL

017affected genes

017CNVsmachinery genesin CNVs

5242splicing sites

245stops / frameshifts

73151non-synonymous

127237total

3549affected genes

SNPs inmachinery genes

5152genes

mousehuman

Conclusions other features of Patrocles

allelic imbalance plots (HapMap) reported associations between pSNPs (or SNPs in

miR genes) and phenotypes Patrocles Finder for custom sequences

most pSNPs are likely false positives due topoor specificity of target site predictions Patrocles still contains some interesting biology

(e.g., purifying selection on non-conserved sites) systematic validation of pSNPs become possible

(e.g., AgoIP + estimation of allelic imbalance inRISC-bound mRNAs; HITS-CLIP)

http://www.patrocles.org/