wi’08structure population sub-structure. wi’08structure projects harish/nitin gaurav (tuesday)...

52
Wi’08 Structure Population sub-structure

Upload: michael-quinn

Post on 17-Jan-2018

226 views

Category:

Documents


0 download

DESCRIPTION

Wi’08Structure Population sub-structure confounds association Consider an association test in a recently admixed population Suppose location A increases susceptibility to a disease common in Africans. Let locus B associate with skin-color. As expected, locus A is associated with the disease Unexpectedly, locus B also shows association The problem arises because the assumption of a random mating population is no longer true – The population has structure! D N D N D N D D N A B African European

TRANSCRIPT

Page 1: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Population sub-structure

Page 2: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Projects• Harish/Nitin• Gaurav (Tuesday)• Stefano/Hossein (Tuesday)• Nisha/Yu• David• Jian/Josue (Tuesday)

Page 3: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Population sub-structure confounds association

• Consider an association test in a recently admixed population

• Suppose location A increases susceptibility to a disease common in Africans.

• Let locus B associate with skin-color.• As expected, locus A is associated

with the disease• Unexpectedly, locus B also shows

association• The problem arises because the

assumption of a random mating population is no longer true– The population has structure!

0 .. 1 D0 .. 1 N0 .. 0 D1 .. 1 N0 .. 1 D0 .. 1 D0 .. 1 D0 .. 1 D0 .. 1 D

1 .. 0 N1 .. 0 N0 .. 0 D1 .. 1 D1 .. 0 N1 .. 0 N1 .. 0 N1 .. 0 N1 .. 0 N

A B

Afr

ican

Euro

pean

Page 4: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Population sub-structure can increase LD

• Consider two populations that were isolated and evolving independently.

• They might have different allele frequencies in some regions.

• Pick two regions that are far apart (LD is very low, close to 0)

0 .. 10 .. 10 .. 01 .. 10 .. 10 .. 10 .. 10 .. 10 .. 1

1 .. 01 .. 00 .. 01 .. 11 .. 01 .. 01 .. 01 .. 01 .. 0

Pop. A

Pop. B

p1=0.1q1=0.9P11=0.1D=0.01

p1=0.9q1=0.1P11=0.1D=0.01

Page 5: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Recent ad-mixing of population

• If the populations came together recently (Ex: African and European population), artificial LD might be created.

• D = 0.15 (instead of 0.01), increases 10-fold

• This spurious LD might lead to false associations

• Other genetic events can cause LD to increase, and one needs to be careful

0 .. 10 .. 10 .. 01 .. 10 .. 10 .. 10 .. 10 .. 10 .. 1

1 .. 01 .. 00 .. 01 .. 11 .. 01 .. 01 .. 01 .. 01 .. 0

Pop. A+B

p1=0.5q1=0.5P11=0.1D=0.1-0.25=0.15

Page 6: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Determining population sub-structure

• Given a mix of people, can you sub-divide them into ethnic populations.

• Turn the ‘problem’ of spurious LD into a clue. – Find markers that are too far apart to show LD– If they do show LD (correlation), that shows the

existence of multiple populations.– Sub-divide them into populations so that LD

disappears.

Page 7: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Determining Population sub-structure• Same example as before:• The two markers are too

similar to show any LD, yet they do show LD.

• However, if you split them so that all 0..1 are in one population and all 1..0 are in another, LD disappears

0 .. 10 .. 10 .. 01 .. 10 .. 10 .. 10 .. 10 .. 10 .. 1

1 .. 01 .. 00 .. 01 .. 11 .. 01 .. 01 .. 01 .. 01 .. 0

Page 8: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Iterative algorithm for population sub-structure

• Define• N = number of individuals (each has a single

chromosome)• k = number of sub-populations. • Z {1..k}N is a vector giving the sub-

population. – Zi=k’ => individual i is assigned to population k’

• Xi,j = allelic value for individual i in position j• Pk,j,l = frequency of allele l at position j in

population k

Page 9: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Example• Ex: consider the

following assignment• P1,1,0 = 0.9• P2,1,0 = 0.1

0 .. 10 .. 10 .. 01 .. 10 .. 10 .. 10 .. 10 .. 10 .. 10 .. 1

1 .. 01 .. 00 .. 01 .. 11 .. 01 .. 01 .. 01 .. 01 .. 01 .. 0

1111111111

2222222222

P1,1,0

Page 10: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Goal• X is known.• P, Z are unknown. • The goal is to estimate Pr(P,Z|X)• Various learning techniques can be employed.

– maxP,Z Pr(X|P,Z) (Max likelihood estimate)– maxP,Z Pr(X|P,Z) Pr(P,Z) (MAP)– Sample P,Z from Pr(P,Z|X)

• Here a Bayesian (MCMC) scheme is employed to sample from Pr(P,Z|X). We will only consider a simplified version

Page 11: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Algorithm:Structure

• Iteratively estimate – (Z(0),P(0)), (Z(1),P(1)),.., (Z(m),P(m))

• After ‘convergence’, Z(m) is the answer.

• Iteration– Guess Z(0)

– For m = 1,2,..• Sample P(m) from Pr(P | X, Z(m-1))• Sample Z(m) from Pr(Z | X, P(m))

• How is this sampling done?

(Z1,P1)

(Z3,P3)

(Z4,P4)

(Z2,P2)

Page 12: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Example

• Choose Z at random, so each individual is assigned to be in one of 2 populations. See example.

• Now, we need to sample P(1) from Pr(P | X, Z(0))

• Simply count• Nk,j,l = number of people in

population k which have allele l in position j

• pk,j,l = Nk,j,l / N

0 .. 10 .. 10 .. 01 .. 10 .. 10 .. 10 .. 10 .. 10 .. 10 .. 1

1 .. 01 .. 00 .. 01 .. 11 .. 01 .. 01 .. 01 .. 01 .. 01 .. 0

1221121212

1221121221

Page 13: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Example• Nk,j,l = number of people in

population k which have allele l in position j

• pk,j,l = Nk,j,l / Nk,j,*• N1,1,0 = 4• N1,1,1 = 6• p1,1,0 = 4/10• p1,2,0 = 4/10 • Thus, we can sample P(m)

0 .. 10 .. 10 .. 01 .. 10 .. 10 .. 10 .. 10 .. 10 .. 10 .. 1

1 .. 01 .. 00 .. 01 .. 11 .. 01 .. 01 .. 01 .. 01 .. 01 .. 0

1221121212

1221121221

Page 14: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Sampling Z• Pr[Z1 = 1] = Pr[”01” belongs to population 1]?• We know that each position should be in

linkage equilibrium and independent.• Pr[”01” |Population 1] = p1,1,0 * p1,2,1

=(4/10)*(6/10)=(0.24) • Pr[”01” |Population 2] = p2,1,0 * p2,2,1 =

(6/10)*(4/10)=0.24• Pr [Z1 = 1] = 0.24/(0.24+0.24) = 0.5

Assuming, HWE, and LE

Page 15: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Sampling

• Suppose, during the iteration, there is a bias.

• Then, in the next step of sampling Z, we will do the right thing

• Pr[“01”| pop. 1] = p1,1,0 * p1,2,1 = 0.7*0.7 = 0.49

• Pr[“01”| pop. 2] = p2,1,0 * p2,2,1 =0.3*0.3 = 0.09

• Pr[Z1 = 1] = 0.49/(0.49+0.09) = 0.85• Pr[Z6 = 1] = 0.49/(0.49+0.09) = 0.85• Eventually all “01” will become 1

population, and all “10” will become a second population

0 .. 10 .. 10 .. 01 .. 10 .. 10 .. 10 .. 10 .. 10 .. 10 .. 1

1 .. 01 .. 00 .. 01 .. 11 .. 01 .. 01 .. 01 .. 01 .. 01 .. 0

1112121211

2221221221

Page 16: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Allowing for admixture• Define qi,k as the fraction of individual i

that originated from population k.• Iteration

– Guess Z(0)

– For m = 1,2,..• Sample P(m),Q(m) from Pr(P,Q | X, Z(m-1))• Sample Z(m) from Pr(Z | X, P(m),Q(m))

Page 17: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Estimating Z (admixture case)• Instead of estimating Pr(Z(i)=k|X,P,Q), (origin

of individual i is k), we estimate Pr(Z(i,j,l)=k|X,P,Q)

i,1i,2

j

Pr(Z i, j,l = k | X,P,Q) =qi,k Pr(X i, j,l | Z i, j,l = k,P)

qi,k' Pr(X i, j ,l | Z i, j ,l = k',P)k'

Page 18: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Results: Thrush data• For each

individual, q(i) is plotted as the distance to the opposite side of the triangle.

• The assignment is reliable, and there is evidence of admixture.

Page 19: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

NJ versus Structure:thrush data

• Objective function is different in standard clustering algorithms!

Page 20: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Population Structure in isolated human populations

• 377 locations (loci) were sampled in 1000 people from 52 populations.

• 6 genetic clusters were obtained, which corresponded to 5 geographic regions (Rosenberg et al. Science 2003)

AfricaEurasia East Asia

America

Ocea

nia

Page 21: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Population sub-structure:research problem• Systematically explore the effect of

admixture. Can admixture be predicted for a locus, or for an individual

• The sampling approach may or may not be appropriate. Formulate as an optimization/learning problem:– (w/out admixture). Assign individuals to sub-

populations so as to maximize linkage equilibrium, and hardy weinberg equilibrium in each of the sub-populations

– (w/ admixture) Assign (individuals, loci) to sub-populations

Page 22: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Population Substructure Conclusions• Populations substructure (violating the

assumption of random mating) can create artificial linkage disequlibrium, and false associations

• One way out of it is to identify and sub-divide the populations into sub-populations.

• Another approach is to ‘correct’ for the effects of structure.

Page 23: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Detecting structural variations

Page 24: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Genomic variation• Inherited phenotypes arise from genetic variation

– Point mutations, indels, genetic recombination, but also….– Structural Variation

• Insertions, deletions, translocations, inversions, fissions, fusions….

deletions

inversions

translocations

Page 25: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Fluoroscent in situ hybridization

• (Cancer genomes show extensive structural variation)

• Historically, larger structural variations (easily observed under a microscope were commonly studied, mostly in the context of diseases…

Page 26: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

HapMap and other projects

• The development of molecular techniques (sequencing, genotyping) shifted interest onto smaller scale mutations. They were assumed to be the dominant source of genetic variation

• It turns out that structural variation in normal populations is more common than assumed.

Page 27: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

1. Inversions and HapMap

A

AA

A

A

ACC

GG

G

G

T

TT

G

G

GAG

A

AA

G

A

A GGA

CTCA

• SNP data is seemingly oblivious to Inversions (& other structural polymorphisms)

• Can we detect a signal for inversion polymorphisms in SNPs?

A

A

A

AA

G

G

G

G

GG

G

GG

reference genome sequence

population

Page 28: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Array-CGH

Garnis et al. Molecular Cancer 2004

Page 29: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

CNVs dominate, possibly because of technology

Page 30: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Structural variations

• Structural variants are common, but non CNVs are hard to detect.

• Some of the translocations are important in disease

Page 31: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

2. Variations in tumor genomes

• Fusion observed in leukemia, lymphoma, and sarcomas

• “Philadelphia Translocation” Drugs target this fusion protein

• Can we detect fused genes in tumors?

Page 32: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

3. Assaying for specific variations

• Most tumors start off with a single cell, which then proliferate.

• Drugs like Gleevec are used well after cancer has taken hold.

• Can we detect the cancer early by detecting the genomic abnormality?

– If a very few cells in the person are cancerous, can we still detect it?

• Can we track a patient through his treatment?

Page 33: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Today

• Detection of inversion polymorphisms (& other copy neutral variation) using genotype data.

• Detection of gene fusion events using sequencing

• PCR based detection of variation in the presence of heterosomy: only a fraction of cells carry the variant genome

Page 34: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Chr. 17 Inversion

Page 35: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Inversion polymorphisms

Page 36: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Common inverted haplotype

Page 37: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

LLR statistic computation

• Multi-allelic markers are chosen in blocks around the putative breakpoints

• LD is measured using multi-allelic D-statistic

M1 M2 M3 M4

LD24 LD34LD13LD12

Bansal et al.Genome Research, 2007

Page 38: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

LLRl

M1 M2 M3 M4

LD13LD13

LLRl = log Pr(LD12,LD13 | M1 → M3 → M2 → M4 )Pr(LD12,LD13 | M1 → M2 → M3 → M4 )

⎛ ⎝ ⎜

⎞ ⎠ ⎟

d12d13

Page 39: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

LLRR

M1 M2 M3 M4

LD24 LD34

LLRR = log Pr(LD24 ,LD34 | M1 → M3 → M2 → M4 )Pr(LD24 ,LD34 | M1 → M2 → M3 → M4 )

⎛ ⎝ ⎜

⎞ ⎠ ⎟

d12d24

Page 40: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Inversion polymorphisms in human

• LLRl and LLRr define our inversion statistic • Large positive values for both likelihood ratios

indicate inversion • A permutation test to estimate the significance

of the statistic• Overlapping breakpoints are merged

Page 41: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Power

• Difficult to simulate inversions using the coalescent• We simulated inversions on HapMap data.• (a) varying frequency, 500kb inversion• (b) varying length

Page 42: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Inversion polymorphisms

• We used the Phase I phased haplotype data from the International HapMap project – 3 samples: CEU, YRI, Asian: CHB+JPT about 900K

SNPs • The location between two adjacent SNPs was

considered a potential inversion breakpoint • The statistic was computed for every pair of

inversion breakpoints within a certain distance (200kb - 6Mb)

• 176 inversion polymorphisms were detected

Page 43: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Inversions with secondary evidence

• 18 regions had highly homologous repeat sequence, 11 with inverted repeats (p-value 0.006 against random distribution of inversion breakpoints)

Page 44: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Inversions with secondary evidence

Page 45: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

1.4Mb inversion at 16p12(CEU+YRI)

• Supported by fosmid pair evidence (Tuzun et al., 2005)• 80kb long inverted repeat• Identical breakpoints in CEU and YRI data• Known chimp repeat

Page 46: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

1.2Mb Chr 10 (CHB+JPT)

• Inversion supported by fosmid evidence (Tuzun et al. 2005)

Page 47: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Chr 6 Inversion(YRI) & TCBA1

• 66 inversion breakpoints covered by genes• Less than expected by chance (p-value 0.02)• Many overlapping genes are known to be disrupted in disease• T-cell lymphoma breakpoint associated target (TCBA1)

– Structurally disrupted in T-cell lymphoma cell-lines

Page 48: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

ICAp69

• Islet cell antigen (ICAp69) gene

Page 49: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Inversion Polymorphism summary

• Many inversions (and copy neutral) polymorphisms remain to be detected.

• The genotype LD signal is useful but has low power.– We are investigating other signals to

improve detection• Some of these variations lead to genes

fusing.

Page 50: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Conclusions for the class

• Genetic variations are important to our understanding of evolution and genotype-phenotype relationship

Page 51: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Topics covered1. We started (and finished) by considering sources of

variation2. Models of evolution of population under natural

assumptions1. HW/Linkage equlibrium2. Efficient simulation of populations via coalescent theory

3. Evolution under recombinations/recombination hot-spot detection via counting of recombination events

4. Detection of regions under selection5. Association testing6. Population sub-structure7. Detecting structural variation

Page 52: Wi’08Structure Population sub-structure. Wi’08Structure Projects Harish/Nitin Gaurav (Tuesday) Stefano/Hossein (Tuesday) Nisha/Yu David Jian/Josue (Tuesday)

Wi’08 Structure

Algorithmic diversions1. Phylogeny reconstruction (perfect phylogeny,

distance-based methods)2. Optimization using (Integer-) Linear programming,

simulated annealing and other paradigms3. Greedy algorithms, dynamic programming4. Stochastic sampling methods (MCMC, Gibbs

sampling)5. Statistical tests for significance6. Efficient simulation techniques

1. Coalescent for populations2. Simulating genotype/phenotype associations