encode variation analysis. analysis goals quantify genetic variation in encode regions detect...

15
Encode variation analysis

Upload: dominic-hudson

Post on 05-Jan-2016

231 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Encode variation analysis

Page 2: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Analysis goals

• Quantify genetic variation in ENCODE regions

• Detect selective constraint in ENCODE features

• Develop rules for interpretation of functional variation

• Motivate experiments to test functional variation

Page 3: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Data

• Encode SNPs (HapMap resequencing)

• 5kB HapMap SNPs

• DIPs

• Gene expression variation

Page 4: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Metrics of variation

• Derived allele frequency spectrum (Manolis)• Diversity/Het (Ewan)• SNP density (Ewan, others)• DIP density (Jim, Taane)• LD/Recombination (Daryl/Oxford)• Regions of contiguous DNA without variation (Manolis)• Accelerated (positively selected?) regions (Manolis)• Standard tests of neutrality McDonald Kreitman/Tajima’s

D etc (Mike, others)• Other non-parametric tests of selection (Andy)• Tagging (Paul)

Page 5: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Analysis plansAnalysis wrt to genomic features• Calculate variability in a large number of genomic features with all metrics• Correlate variability metrics with “intensity” of feature (e.g. levels

conservation with levels of variability)• Variation, alternative spicing and expression• Distance effects from genomic features• Association of gene expression with SNPs (some is in UCSC and some will

be provided by Manolis at the workshop)

Analysis independent of genomic features (in principle)• Tag SNPs and comparison of resequencing data to 5 Kb map. Here it will

be a good idea to see how the 5 Kb map captures variation within genomic elements. If we really aim to capture variation mainly in functional genomic elements (e.g. known regulatory regions, or nonsym SNPs) how can we modify the tag algorithms?

• General description of levels of variation wrt to the functional content of the 44 ENCODE regions

Page 6: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

av2pq/SNP av2pq/pos #snps

Promoters : 0.15 0.00045 856Region Rnd2 : 0.16 0.00041 737

Completely Rnd: 0.16 0.00045 1584

Exons : 0.14 0.00039 635RRnd Exons : 0.15 0.00040 636

Overall : 0.16 0.00042 16609

Diversity in featuresEwan Birney

Page 7: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Derived allele frequency spectrum

Derived_Allele_Frequency_CEU

Perc

ent

0.980.840.700.560.420.280.140.00

20

15

10

5

0

cns_inter01

Histogram of Derived_Allele_Frequency_CEU

CNS intersectionP = 0.003

Page 8: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Derived allele frequency spectrum

Derived_Allele_Frequency_CEU

Perc

ent

0.980.840.700.560.420.280.140.00

18

16

14

12

10

8

6

4

2

0

transfrags01

Histogram of Derived_Allele_Frequency_CEU Transfrags unionP = 0.204

Page 9: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Taane Clark Heterozygosity

Page 10: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Indels

Page 11: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Human

Chimp

Macaque

Human

Chimp

Macaque

Identification of accelerated CNGs

Frequency class

Frequency

0.900.750.600.450.300.150.00

25

20

15

10

5

0

DAF Control CNGs (orange) vs. Accelerated CNGs (green)

P = 0.0003

Regions accelerated in humans

Page 12: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

selective constrains differ for genes expressed in different tissuesNuria Lopez

Page 13: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Genes expressed in more tissues have more selective constrains (lower dN)

Page 14: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Tagging

• ENCODE is near-complete inventory of common (MAF≥5%) sites

• How well do tag SNPs picked from thinned versions of ENCODE (to mimic ascertainment of Phase I and II) capture:– all common variants– functional sites

Paul de Baker

Page 15: Encode variation analysis. Analysis goals Quantify genetic variation in ENCODE regions Detect selective constraint in ENCODE features Develop rules for

Coverage of common variants by tags picked from simulated

Phase I and II HapMapPopulation sample

% r2>0.2 % r2>0.5 % r2>0.8 Mean r2

5kb HapMap (Phase I) CEU 97.2% 86.4% 71.6% 0.83 JPT/CHB 96.3% 84.9% 70.2% 0.82 YRI 90.3% 64.4% 41.9% 0.64

1kb HapMap (Phase II) CEU 99.6% 97.7% 93.9% 0.96 JPT/CHB 99.5% 97.7% 93.9% 0.96 YRI 99.1% 92.5% 80.9% 0.90