biostatistics-lecture 19 linkage disequilibrium and snp detection

Post on 02-Jan-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection. Ruibin Xi Peking University School of Mathematical Sciences. Haplotype Freqeuncies. Linkage Equilibrium. Linkage Disequilibrium. Disequilibrium Coefficient D AB. D AB is hard to interpret. Sign is arbitrary … - PowerPoint PPT Presentation

TRANSCRIPT

Biostatistics-Lecture 19Linkage Disequilibrium and SNP

detection

Ruibin XiPeking University

School of Mathematical Sciences

Haplotype Freqeuncies

Linkage Equilibrium

Linkage Disequilibrium

Disequilibrium Coefficient DAB

DAB is hard to interpret

• Sign is arbitrary …– A common convention is to set A, B to be the

common allele and a, b to be the rare allele• Range depends on allele Frequencies– Hard to compare between markers

r2 (also called Δ2)

• Ranges between 0 and 1– 1 when the two markers provide identical

information– 0 when they are in perfect equilibrium

Raw r2 data from chr22

Comparing Populations

CEPH: Utah residents with ancestry from northern and western Europe (CEU)

Use LD for SNP imputation and detection

fastPhase

Use LD for SNP imputation and detection

fastPhase

Model for haplotypes

• Observed n haplotypes– Each with M markers– bij = 0, 1

• Assume each haplotye originates from one of K clusters– zi: unknown cluster of origin of bi

– Since clusters of origin are unknown

Local clustering of haplotype

• Assume zi = (zi1,…, ziM) forms a Markov chain on {1,…,K}– zim denote the cluster origin for bim

– Initial probabilities

– Transition probabilities

– Conditional on the cluster of origin

– Marginal

Local clustering of genotype data

• We have genotype data• gim: genotype at marker m of individual i– Take values 0, 1, 2

• Initial probabilities ( unordered cluster of origins)

• Transition probabilities

Local clustering of genotype data

• Genotype probabilities conditional on cluster of origins

• Joint likelihood

Algorithms for genotype imputation

• fastPhase

• BEAGLE

• IMPUTE

• PLINK

• MaCH

Algorithms for genotype imputation

• fastPhase

• BEAGLE

• IMPUTE

• PLINK

• MaCHPicture taken from IMPUTE v2

SNP detection with LD information

• MaCH: (G: genotye, S: cluster)

SNP detection with LD information

• For sequencing data G is not observed• Coverage of base A, B are observed, we have

the HMM

SNP detection with LD information

Nielsen et al. 2011 Nature Review Genetics

top related