fast tag snp selection wang yue joint work with postdoc guimei liu and prof limsoon wong

23
Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

Upload: victor-ross

Post on 04-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

Fast Tag SNP Selection

Wang Yue

Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

Page 2: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Outline

• Some preliminary definitions

• Previous Work

• Our work- Fast Tagger

• Experiment Result

• Tag SNP Application

• Future Work

Page 3: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

SNP(Single-nucleotide polymorphism)

Page 4: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Why research on SNP

• Variation among human beings can affect how human develop certain diseases and respond to pathogens, chemicals, drugs, vaccines, and other agents

eg:Researchers found that persons with the specific alterations (SNPs) have a 50% higher relative risk of developing glioblastoma, a type of Brain Cancer.

• A promising area to realize the "Personalized medicine"

• Important in crop and livestock breeding programs

Page 5: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Tag SNP• A tag SNP is a SNP in a region of the genome with

high linkage disequilibrium

• Possible to identify genetic variation without genotyping every SNP in a chromosomal region.

• Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

Page 6: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Tag SNP-linkage disequilibrium• In population genetics, linkage disequilibrium is the

non-random association of alleles at two or more loci, not necessarily on the same chromosome.

• Usually use r2 to measure

• where P(XY), P(Xy), P(xY), P(xy) are freq ofpossible alleles; P(X) =P(XY)+P(Xy), P(x)=P(xY)+P(xy),

Page 7: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Tag SNP selection

• Given dataset, we can find a huge number of tag snp relation among SNPs as long as we can enumerate the possible r2 value between SNPs

• The reality is• We desire to select a smallest set of high quality SN

Ps which can tag the rest SNPs, in other words, if we understand this smallest set of SNPs, we can refer the rest based on the r2 value.

Page 8: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Tag SNP Selection-- More formal description

• Given a set S of SNPs, find the smallest set of tag

SNPs Stag such that for every SNPj S − S∈ tag, there is

at least one SNP set Sj S⊆ tag such that

• – r2 (Sj, SNPj) ≥ min_r2

• – |Sj| ≤ max_size

• – Distance between every pair of SNPs in Sj {SNP∪ j}

is no larger than max_dist

Page 9: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Previous Work

• Step 1: Correlations between SNPs within certain distance are calculated

• Step 2: Find smallest set of tag SNPs usingcorrelations calculated in Step 1

• Most algo use greedy approach to find a near optimal set of tag SNPs in Step 2

Earlier tag SNP selection methods rely on pairwise correlations• MultiTag & MMTagger find multimarker rules – {SNP1, SNP2, SNP3} ->SNPx

• Cannot handle >100k SNP

• MultiTag takes hundreds of hours for 30k SNP• MMTagger takes hours & 1GB memory for 30k SNP

Page 10: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Fast Tagger• Similar two major steps

• first step: borrow the typical data mining techniques to mine tagging rules based on r2 value

• Second step: Use a greedy algorithm to select the small set of tag SNPs from the tagging rules generated in first step

Page 11: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Why beat the previous work?

• Previous work like MMtagger will generate a lot of redundant tagging relations

• Ours can avoid this by 1. Merge nearby equivalent SNPs2. Prune redundant correlation rules3. Skip the rules if RHS has been covered many times4. If total size of rules exceeds memory, divide chromosome into blocks, and then find tag SNPs within each block

Page 12: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Experiment Setting

Japanese and Han in HapMap release 21– 45 unrelated individuals– 6 chromosomes

Page 13: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Experiment Result—running time and # tag SNPs

Comparison with state-of-the-art work: MMTagger

Page 14: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Experiment Result-memory consumption

MMTagger consumes much more memoryFailed on large chromosomes when max_size = 3Step 2 of FastTagger consumes much more memory thanStep 1 because this step needs to store rules generated inthe memory

Page 15: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Effectiveness of Merging Nearby Equiv SNPs

# of rules, tag SNPs, and runtime are significantlyreduced

Page 16: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Effectiveness of Skipping Rules

Memory usage and runtime are significantlyreduced, while # of tag SNPs is marginallyincreased

Page 17: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Effectiveness of Pruning Redundant Rules

Memory usage and # rules are significantlyreduced

Page 18: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Conclusions• Compared to existing genome-wide tag SNP

selection algorithm using multi-marker correlations,• FastTagger is

– Many times faster

– Consumes much less memory

– Can work on chromosomes with > 100k SNPs• Merging equiv SNPs together is most effective

technique in reducing running time and memory

consumption

Page 19: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Tag SNP Application• Using the tagging rules generated by our data minin

g technique to infer extra SNPs from existing SNP list

• We obtained two SNP list from two major SNP chip company:

• IIiuminia ,1145784 SNPs• Affimetric,927654 SNPs

• How many extra SNPs we can infer?

Page 20: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Experiment Setting

• Our rules are generated from Data set Japanese and Han in HapMap release 21,contrary to previous experiment, we use 22 chromosomes

• In this experiment, two factors will determine how many extra SNPs we can infer

1. r2 threshold: empirical set 0.8, we set 0.80, 0.85, 0.90, 0.95

2. Rule size: we set 1,2

Page 21: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

r2 : 0.80length 2 length 1

Affimetric 1006866 382962

lluminia 993321 417107

r2 : 0.85

r2 : 0.90

r2 : 0.95

length 2 length 1

Affimetric 927026 310671

lluminia 923971 340070

length 2 length 1

Affimetric 821042 226306

lluminia 827858 248512

length 2 length 1

Affimetric 118994 112896

lluminia 131263 125200

Page 22: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Future Work• Test the accuracy of our selected SNPs with state-of-

the-art work• Support adaptive user requirement to select the SNP

s, such as I have only 1 million, just give me 1000 most informative SNPs

• How the division of the chromosomes influence the # of tag SNPs

• More to explore

Page 23: Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

NUS Presentation Title 2006

Many thanks to

• My supervisor : Prof Limsoon Wong• My senior: Guimei Liu• Some slides are adapted from Prof Wong's notes an

d Wikipedia• Thank you for listening• Q&A