single nucleotide polymorphism copy number variations and snp array
DESCRIPTION
Single Nucleotide Polymorphism Copy Number Variations and SNP Array. Xiaole Shirley Liu and Jun Liu. Outline. Definition and motivation SNP distribution and characteristics Allele frequency, LD, population stratification SNP discovery (unknown) and genotyping (known) CNV detection. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/1.jpg)
Single Nucleotide PolymorphismCopy Number Variations
and SNP Array
Xiaole Shirley Liu and
Jun Liu
![Page 2: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/2.jpg)
2
Outline• Definition and motivation• SNP distribution and characteristics
– Allele frequency, LD, population stratification• SNP discovery (unknown) and genotyping
(known)– CNV detection
![Page 3: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/3.jpg)
3
Polymorphism• Polymorphism: sites/genes with “common”
variation, less common allele frequency ≥1%, otherwise called rare variant and not polymorphic
• First discovered (early 1980): restriction fragment length polymorphism
• Some definitions: – Locus: position on chromosome where sequence
or gene is located– Allele: alternative form of DNA on a locus
![Page 4: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/4.jpg)
4
Polymorphism• Single Nucleotide Polymorphism
– Occasionally short (1-3 bp) indels are considered SNPs too
– Come from DNA-replication mistake individual germ line cell, then transmitted
– ~90% of human genetic variation• Copy number variations
– May or may not be genetic
![Page 5: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/5.jpg)
5
Why Should We Care• Disease gene discovery
– Association studies, certain SNPs are susceptible for diabetes
– Chromosome aberrations, duplication / deletion might cause cancer
• Personalized Medicine– Drug only effective if you have one allele
![Page 6: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/6.jpg)
6
![Page 7: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/7.jpg)
7
![Page 8: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/8.jpg)
8
SNP Distribution• Most common, 1 SNP / 100-300 bp
– Balance between mutation introduction rate and polymorphism lost rate
– Most mutations lost within a few generations• 2/3 are CT differences• In non-coding regions, often less SNPs at
more conserved regions• In coding regions, often more synonymous
than non-synonymous SNPs
![Page 9: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/9.jpg)
9
SNP Characteristics: Allele Frequency Distribution
• Most alleles are rare (minor allele frequency < 10%)
![Page 10: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/10.jpg)
10
Mode of inheritance
![Page 11: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/11.jpg)
12
SNP Characteristics:Hardy-Weinberg equilibrium (HWE)– In a population with genotypes BB, bb, and Bb, if p =
freq(B), q =freq(b), the frequencies of BB, bb and Bb will be p2, q2, and 2 pq respectively at equilibrium, and will not change.
– Assumptions for HWE: no mutation, no migration or emigration, infinite population size, no selective pressure, random mating. Could derivate from HWE if violated
– It provides a baseline against which to measure change, e.g., inbreeding index:
– More than 2 alleles:
![Page 12: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/12.jpg)
13
SNP Characteristics:Linkage Disequilibrium
• Equilibrium Disequilibrium
• LD: If Alleles occur together more often than can be accounted for by chance, then indicate two alleles are physically close on the DNA– In mammals, LD is often lost at ~100 KB– In fly, LD often decays within a few hundred
bases
![Page 13: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/13.jpg)
14
SNP Characteristics:Linkage Disequilibrium
• Statistical Significance of LD– Chi-square test with 1 df– eij = ni. n.j / nT
ji ij
ijij
een
,
22 )(
B1 B2 TotalA1 n11 n12 n1.A2 n21 n22 n2.Total n.1 n.2 nT
![Page 14: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/14.jpg)
15
SNP Characteristics:Linkage Disequilibrium
• Three ways to calculate LD
11 1 1
1 2 2 1max max
1 1 2 2
22
1 2 1 2
max( , ) 0' / , where
max( , ) 0
D p p q
p q p q if DD D D D
p q p q if D
Drp p q q
ObservedExpected
![Page 15: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/15.jpg)
16
SNP Characteristics:Linkage Disequilibrium
• Haplotype block: a cluster of linked SNPs• Haplotype boundary: blocks of sequence
with strong LD within blocks and no LD between blocks, reflect recombination hotspots
• Haplotype size distribution
![Page 16: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/16.jpg)
17
SNP Characteristics:Linkage Disequilibrium
• Can see haplotype block: a cluster of linked SNPs
![Page 17: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/17.jpg)
18
SNP Characteristics:Linkage Disequilibrium
• [C/T] [A/G] T X C [A/C] [T/A]– Possible haplotype: 24
– In reality, a few common haplotypes explain 90% variations
• Tagging SNPs: – SNPs that capture
most variations in haplotypes
– removes redundancy
Redundant
![Page 18: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/18.jpg)
19
SNP Characteristics:Population Stratification
• Population stratification: individuals selected from two genetically different populations, stratification may be environmental, cultural, or genetic
• Could give spurious results in case control association studies – the example of “chopstick genes”
![Page 19: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/19.jpg)
20
Using genetic variation to study populations
![Page 20: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/20.jpg)
21
SNP Discovery Methods• Sequencing individuals for difference: too costly • First check whether big regions have SNPs
– Basic idea: denature and re-anneal two samples, detect heterduplex
– Can pool samples (e.g. 10 African with 10 Caucasians) to speed screening
• Resequence to verify• dbSNP: 12M RefSNP, 6M validated
![Page 21: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/21.jpg)
22
SNP Genotyping• For a known locus TT C/A AG, does this individual
have CC, AA or AC? Many methods• Hybridization-based methods
– Dynamic allele-specific hybridization– Molecular beacons– SNP-array chip (simultaneously genotype thousands of SNPs)
• Enzyme-based methods– RFLP– PCR-based methods– Flap endonuclease– Primer extension– Oligonucleotide ligase assay
• Other methods (based on physical properties of DNA)
![Page 22: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/22.jpg)
23
SNP Array• One SNP at a time or genome-wide (SNP array)
2.5kb5.8kb0.30
![Page 23: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/23.jpg)
24
40 Probes Used Per SNP• Allele call
– AA, BB, AB• Signal
– Theoretically 1A+1B, 2A, 2B– But couldhave 1A+3BAmplified!
![Page 24: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/24.jpg)
25
T
SNP Chip for LOH• Loss of Heterozygosity: tumor suppressor
gene inactivation by allelic loss in cancers
T T
Normal First genetic hit Cancer
XOR
T T X TX TXA B A A AA B
LOH
![Page 25: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/25.jpg)
27
SNP Array for CNV• Collect normal / diseased samples on SNP arrays• Probe normalization, background subtraction
• Use HMM to infer CNV
![Page 26: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/26.jpg)
28
Integrate CNV with Expression toIdentify oncogene MITF in melanoma
![Page 27: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/27.jpg)
29
Summary• SNP and CNV• SNP distribution and characteristics
– Allele frequency (minor allele > 1%)– LD: linkage ~ physical proximity– Population stratification
• SNP discovery: heteroduplex• SNP genotyping
– SNP array– CNV detection: HMM
![Page 28: Single Nucleotide Polymorphism Copy Number Variations and SNP Array](https://reader035.vdocuments.site/reader035/viewer/2022062520/56815d8d550346895dcb9be5/html5/thumbnails/28.jpg)
30
Acknowledgement• Stefano Monti• Tim Niu• Kenneth Kidd, Judith Kidd and Glenys
Thomson• Joel Hirschhorn• Greg Gibson & Spencer Muse