single nucleotide polymorphisms

Download Single Nucleotide Polymorphisms

Post on 25-Feb-2016




3 download

Embed Size (px)


Single Nucleotide Polymorphisms. Arthur M. Lesk Bologna Winter School 2011. What are SNPs and why are they important?. SNP = Single nucleotide polymorphism, an isolated change in a single nucleotide SNPs are one type of mutation Some have obvious functional consequences - PowerPoint PPT Presentation


Single Nucleotide Polymorphisms

Single Nucleotide PolymorphismsArthur M. LeskBologna Winter School 20111What are SNPs and why are they important?SNP = Single nucleotide polymorphism, an isolated change in a single nucleotideSNPs are one type of mutation Some have obvious functional consequencesSickle-cell haemoglobin: gaggtg (6 GlnVal)First molecular disease sickle-cell anaemiaSome are silentSome are in non-coding regionsaffect splice sites?affect regulatory sites?some have no known phenotypic effect

2What is a SNP?The genomes of individuals in a population contain a particular base at some position most of the time.That is, there is a normal sequenceA SNP is a deviation from the normal sequence. Many people require that a variation occur in at least 1% of the population, to be considered a SNP

But: what population? What if two distinct populations have a consistent polymorphism?

3SNPs in human genomesSNPs are about 90% of all inter-human variationOccur on the average once in every 300 bases 2/3 of SNPs are CT changes (perhaps because C can easily deaminate)


cytosineuracilSNP density varies across human genomeSome high-density patchesSome desertsSNPs in coding regions ~1/3 as many as in non-coding regionsSNP density correlated with recombination rate (which causes which??)AT microsatellites: long (AT)n repeat tracts tend to appear in regions of low SNP density

5Figure 14 SNP density in each 100-kbp interval as determined with Celera-PFP SNPs.

J C Venter et al. Science 2001;291:1304-1351Published by AAASSNP density in each 100-kbp interval as determined with Celera-PFP SNPs. The color codes are as follows: black, Celera-PFP SNP density; blue, coalescent model; and red, Poisson distribution. The figure shows that the distribution of SNPs along the genome is nonrandom and is not entirely accounted for by a coalescent model of regional history.What is normal?Obviously we all differ genomicallySwedes and Chinese have obviously different phenotypesMost Swedes and Chinese are healthy indvidualsTherefore genetic differences do not necessarily cause diseasePointless to check for differences from a single reference sequenceOf course, many genetic differences not just SNPs7Variation in human and other speciesAny two humans ~99.5% identical in sequenceChimpanzees, gorillas: twice as variable, despite much smaller population sizeImplies prehistoric bottleneck in human population, recent common originMost SNPs (> 5%) shared among human populations from around the worldMost populations (e.g. British) contain 85-90% of all known variation8Variation in human and other speciesSome variation is population-specificIn some cases, there is local selective pressureFor example, adult lactose tolerance, malaria resistanceAfrican populations have greatest genetic diversitySupports Out of Africa theory of human origin and migration9Identification of geographical origin, phenotypeA criminal leaves a blood sample at a crime sceneHow much can we tell about him or her?Not perfectly, but:Ethnic groupEye and hair colour (hair colour easier to change)Family name?10Types of SNPsTransitions:purine purinepyrimidine pyrimidine (cytosineuracil) Transversions:purine pyrimidineTransitions are more common than transversions

11Prevalence of SNPs in human genomesapproximately 1 in 300 bp (0.001%)compare difference between human / chimpanzee genomes:4% different (not all SNPs!)

12Life cycle of a SNPGeneration of a mutationInitial survival, against sampling lossIncrease in frequency survival until become homozygous in some individuals; chance of loss reduced (helped by bottlenecks, founder effects population size dependent)Fixation13Initial survival of a SNPSuppose a person is heterozygous for a novel, selectively-neutral mutation. Suppose the person has 2 children that survive to reproductive age. The probability of loss of the mutation is 25%. If each descendant has 2 children that survive to reproductive age, probability of loss in 200 years = 94%14Where do SNPs occur in the human genome?Distributed throughout the genome50% in non-coding regions NOT the same as non-functional!!!25% missense mutations (amino acid substitution)25% silent (amino acid unchanged)silent = no change in encoded amino-acid sequenceNOT the same as no phenotypic effect!!!would be better to call them synonomous SNPs rather than silent SNPs

15SNPs in non-human genomesOf course other species have SNPsHere we will focus on human SNPs because of relevance to human diseaseHowever, SNPs in pathogens are sometimes associated with antibiotic resistance, and therefore related to human diseaseSNPs in some plants give clues to domestication16Organised efforts to collect SNPsThe HapMap is a catalogue of common human genetic variants HapMap Project = international collaboration among Japan, the United Kingdom, Canada, China, Nigeria, and the United StatesNOT EuropeCarry out measurements, provide databaseOther projects collect SNPs in other species17HapMap projectInternational consortium: International HapMap Project of human genetic variants :What sites?How distributed frequency in different populationsRaw material for linking genomics with disease 18Origin of samplesTotal of 270 people. The Yoruba people of Ibadan, NigeriaJapan (Tokyo)China (Beijing)U.S. residents with Northern and Western European ancestry19What is a haplotype?Often, a set of SNPs appear nearby on the same chromosomeIn absence of recombination, they will be inherited in blocksPattern of SNPs in a block is called a haplotypeA block may contain many SNPs, but only a few are needed to identify a haplotypeThese signature SNPs within a haplotype block are called `tag SNPs20

21 to SNP databasesSNPlinks: dbSNP SNP Consortium Biosystems Assays-on-Demand /Form/assay_search_basic.jspEnsembl database at NCBInon-redundant datasetnomenclature: rs numberrs = reference SNP.24General human mutationsHuman Gene Mutation Database 100000 mutations, in 3700 genes6.2% of total ~23000 genes about 10000 new mutations found per yearOMIM (Online Mendelian Inheritance in Man)database of mutations associated with human diseaseOMIA (Online Mendelian Inheritance in Animal)

Databases with important related informationOnline Mendelian Inheritance in Man (OMIM) [NCBI]Comprehensive compendium of human genes and associated phenotypesNot limited to SNPsSNPs3D assigns molecular functional effects to non-synonymous SNPs based on structure and sequence analysis. SNPper SNPs by position or gene association

26Quality of sequence information is importantSNPs appear in human genome at approximately 1 in 300 basesObviously error rate in resequencing must be substantially lower than this if SNP data are to be meaningfulMeasure of DNA sequencing quality: PHRED27PHRED measure of sequence qualityPhred scores accepted to characterize the quality of DNA sequencesOriginally Phred was a program, that determined accurate quality scores indicating error probabilities. Accepted as general standardPhred quality score Q. Let P = probability of base errorQ = -10 log10 P28Phred quality score QProbability of incorrect base callBase call accuracy 101 in 1090% 201 in 10099% 301 in 100099.9% 401 in 1000099.99% 501 in 10000099.999%29Phred quality score QProbability of incorrect base callBase call accuracy 101 in 1090% 201 in 10099% 301 in 100099.9% 401 in 1000099.99% 501 in 10000099.999%A method that gave an averaged phred score Q = 30 would give approximately as many errors as there are SNPs!30What can SNPs tell us?Causes of disease -- dysfunctional proteinCorrelation with disease prognosis, success of particular treatmentUseful genetic markers, to locate some gene of phenotypic interest; for instance, a gene correlated with a diseaseCharacterise individualsCharacterise populations (SNP distribution)Applications in anthropology -- tracing of migrations, human evolution

31Use of SNPs as genetic markersBefore 1980, genetic maps were constructed by measuring recombination frequencies between genes giving measurable phenotypic traitsThis goes back at least to Sturtevandt and Morgan, if not to MendelAt that time, phenotypes were the only visible aspect of the genome32Use of SNPs as genetic markersIn 1980, Botstein, Davis, Skolnick & White proposed using polymorphic DNA markers for genetic mapping, even if they had no known phenotypic effectExample: (then) restriction sites SNPs restriction fragment length polymorphisms (RFLPs)Did linkage mapping with restriction sitesNow we can use SNPs33Traits depending on multiple lociUse of SNPs to identify traits, including but not limited to diseases, that depend on multiple lociSingle genes fo


View more >