Analyzing human population genetic
history through the study of genetic variation
Mark MataMentor: Eleazar Eskin
UCLA Zar LabSoCalBSI 2009
Background
To study human population genetic history is to study parts of human evolution
Human evolution is one of the fundamental questions in science We ask ourselves many questions like:
Where do we come from? Why are we all different? How are we all different?
Background
The ZarLab does studies with the most recent events in human evolution: Now that we have modern humans, what
variations have occurred in our genes since our ancient African ancestors
To answer this question our group is looking at human variation to produce a genetic history of these changes
Why do we care?
Many diseases are caused by variations that have occurred in our genetic history
Better understanding of our genetic history and human variation may eventually lead to better treatment plans
Personalized medicine: “The right drug, in the right dose, to the right person,
at the right time.”
PerkinElmer website: http://las.perkinelmer.com/content/snps/genotyping.asp#snps
Human Variation Modern humans share 99.9% of our DNA
0.1% account for variations between humansOf this, 80% of the variation are the result of SNPs
SNP (single-nucleotide polymorphism) – position in the genome where there are two different bases present in the population. The base at a SNP on a chromosome is referred to as the “allele”
A haplotype is the sequence of alleles on a genomeThe other 20% are from deletions or insertions on the
genome
PerkinElmer website: http://las.perkinelmer.com/content/snps/genotyping.asp#snps
International HapMap Project Study done by the International HapMap Consortium
“…create a public, genome-wide database of common human sequence variation…”
Identified SNPs and compiled the SNP alleles into a database of haplotypes for four different populations (Phase 1)
Population used were a group of 60 Mormons in Utah Have been widely studied in the past Western and Northern European descent Have very detailed records Used their chromosome 19
“A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October 2005
My Project
Goals Reconstruct human genetic history
This is a very difficult problemSub-problem: Identify recent genetic
events Make the assumption that these new genetic
events are rare or very few in number Easier to classify and identify relationships when
compared to older more common haplotypes These new events are important because they
identify shared recent ancestry Disease causing variations could be from recent
events
Identifying Recent Genetic Events1. Select a region in a haplotype and find the
frequency of variation
2. Group variations into common and rare
3. Find recent point mutations
4. Find recent recombinations
Workflow
Individual’s Frequency of IdentifyHaplotypes Variation Events
TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA
TTTTTTTTTTTTTTT Common AAAAAAAAAT*AAAAAAAAAAAAAAA AAAAAAAAAA – 49%
TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTT Rare AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%
AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA
TTTTTTA*TTT
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare
3. Find recent point mutations4. Find recent recombination events
Frequency of Variation
Individual’s Region How ManyHaplotypeTTTTTTTTTTTTTTT TTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAATTTTTTTTTTTTTTT TTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAATTTTTTTTTTTTTTT TTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAA AAAAAAAAAA - 59TTTTTTTTTTTTTTT TTTTTTTTTT TTTTTTTTTT - 58AAAAAAAAATTTTTT AAAAAAAAAT AAAAAAAAAT - 1AATTTTTTTTTTTTT AATTTTTTTT AATTTTTTTT - 1TTTTTTATTTTTTTT TTTTTTATTT TTTTTTATTT - 1AAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
Frequency of Variation
Individual’s How Many Frequency ofHaplotype VariationTTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAATTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAATTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAA AAAAAAAAAA – 59/120 ~49%TTTTTTTTTT|TTTTT TTTTTTTTTT – 58/120 ~48%AAAAAAAAAT|TTTTT AAAAAAAAAT – 1/120 ~1%AATTTTTTTT|TTTTT AATTTTTTTT – 1/120 ~1%TTTTTTATTT|TTTTT TTTTTTATTT – 1/120 ~1%AAAAAAAAAA|AAAAAAAAAAAAAAA|AAAAA
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
Grouping Variations
Classified as either common or rare haplotypes
Make the assumption that new genetic events are rare or very few in number
A cut off rate of 5% frequency or higher was used to separate common subsequences from rare subsequences
5% was a number that came from the International HapMap Consortium study
“A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October 2005
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare
3. Find recent point mutations4. Find recent recombination events
Grouping Variations
Individual’s Frequency of GroupGenes VariationTTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAATTTTTTTTTT|TTTTTAAAAAAAAAA|AAAAA Common:TTTTTTTTTT|TTTTT AAAAAAAAAAAAAAAAAAAA|AAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTT|TTTTT TTTTTTTTTT – 48%AAAAAAAAAT|TTTTT AAAAAAAAAT – 1% Rare:AATTTTTTTT|TTTTT AATTTTTTTT – 1% AAAAAAAAATTTTTTTATTT|TTTTT TTTTTTATTT – 1% AATTTTTTTTAAAAAAAAAA|AAAAA TTTTTTATTTAAAAAAAAAA|AAAAA
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare
3. Find recent point mutations4. Find recent recombination events
Recent Events
Make comparisons to identify two forms of variation: Point mutations Recombination events
Common: Rare:AAAAAAAAAA AAAAAAAAATTTTTTTTTTT AATTTTTTTT
TTTTTTATTT
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
Point Mutations
Individual’s Frequency of IdentifyHaplotypes Variation Events
TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA
TTTTTTTTTTTTTTT AAAAAAAAAT*AAAAAAAAAAAAAAA
TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%
AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA
TTTTTTA*TTT
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
Point MutationsIndividual’s Frequency of IdentifyHaplotypes Variation EventsTTTTTTTTTTTTTTTAAAAAAAAAAAAAAATTTTTTTTTTTTTTTAAAAAAAAAAAAAAATTTTTTTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAA – 49%TTTTTTTTTTTTTTT TTTTTTTTTT – 48%AAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTATTTTTTTT TTTTTTATTT – 1%
AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA
TTTTTTA*TTT
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
Recent Events
Point mutationsAre found by comparing a common haplotype
and with a rare haplotypeA difference of one shows that a rare haplotype
is a point mutation of a common haplotypeMarked by a “*” next to the point mutation
Common: TTTTTTTTTTTTTTTTA*TTT
Rare: TTTTTTATTT
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
Recombination
Individual’s Frequency of IdentifyHaplotypes Variation Events
TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA
TTTTTTTTTTTTTTT AAAAAAAAAT*AAAAAAAAAAAAAAA
TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%
AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA
TTTTTTA*TTT
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
RecombinationIndividual’s Frequency of IdentifyHaplotypes Variation EventsTTTTTTTTTTTTTTTAAAAAAAAAAAAAAATTTTTTTTTTTTTTTAAAAAAAAAAAAAAA
TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA – 49%TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
Recent Events
RecombinationCombine portions of two common haplotypes and
see if they form a rare haplotype
Common: Possible Recombinations:AAAAAAAAAA AA|TTTTTTTTTTTTTTTTTT AAA|TTTTTTT
AAAA|TTTTTTAAAAA|TTTTTAAAAAA|TTTTAAAAAAA|TTTAAAAAAAA|TT
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
Rare Mutations Marked by a “|” at the border between one
haplotype and another haplotype
Possible Recombinations: Actual Recombinations:AA|TTTTTTTT AA|TTTTTTTTAAA|TTTTTTTAAAA|TTTTTTAAAAA|TTTTTAAAAAA|TTTTAAAAAAA|TTTAAAAAAAA|TT
1. Select a region in a haplotype and find the frequency of variation
2. Group variations into common and rare3. Find recent point mutations4. Find recent recombination events
Sample input and outputchr-haplotypes.txt: new_chr-haplotypes.txt:
Indv1 Indv1TTTTTTTTTTTTTTT T T T T T T T T T T
Indv1 Indv1AAAAAAAAATTTTTT A A A A A A A A A T*
Indv2 Indv2AATTTTTTTTTTTTT A A|T T T T T T T T
Indv2 Indv2TTTTTTATTTTTTTT T T T T T T A*T T T
Visualization Tool
Expanding to the Whole Chromosome Now that we have a way to look for
variations in regions of a chromosome, we can expand the technique to look for variations in a whole chromosome
We used a technique of overlapping windows
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |AAAAAAAAAA|
|AAAAAAAAAA||AAAAAAAAAA|
|AAAAAAAAAA| |AAAAAAAAAA|
Overlapping Windows
Individual’s Frequency of IdentifyHaplotypes Variation Events
TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA
TTTTTTTTTTTTTTT AAAAAAAAAT*AAAAAAAAAAAAAAA
TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA – 49% TTTTTTTTTTTTTTT TTTTTTTTTT – 48% AA|TTTTTTTTAAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1% TTTTTTTTTTTTTTTTATTTTTTTT TTTTTTATTT – 1%
AAAAAAAAAAAAAAA TTTTTTTTTTAAAAAAAAAAAAAAA
TTTTTTA*TTT
Overlapping WindowsIndividual’s Frequency of IdentifyHaplotypes Variation Events
TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA
TTTTTTTTTTTTTTT AAAAAAAAAT*AAAAAAAAAAAAAAATTTTTTTTTTTTTTTAAAAAAAAAAAAAAA AAAAAAAAAA – 49%TTTTTTTTTTTTTTT TTTTTTTTTT – 48%AAAAAAAAATTTTTT AAAAAAAAAT – 1%AATTTTTTTTTTTTT AATTTTTTTT – 1%TTTTTTATTTTTTTT TTTTTTATTT – 1%AAAAAAAAAAAAAAA
Overlapping Recombination events that looked like point
mutationsCommon: AAAAAAAAAAAAAAA
TTTTTTTTTTTTTTTRare: AAAAAAAAATTTTTT
First 10 Slide over 5 and next 10
Common: AAAAAAAAAA Common: AAAAAAAAAATTTTTTTTTT
Rare: AAAAAAAAAT* Rare: AAAA|TTTTTT
AAAAAAAAA|T*TTTTT
AAAAAAAAA|TTTTTT
Applying to a Population’s Chromosome Now that we have a technique to look for
new variations in a whole chromosome We can apply it to a population and identify
regions where recent genetic events took place
Identified Recent Genetic Events
In chromosome 19:Unique point mutations = 13723Unique recombination events = 4065Total unique events = 15697
Total point mutations = 46072Total recombination events = 11381Total number of events = 57453
Average point mutations per individual = 383Average recombination events per individual = 94Average events per individual = 478
Point Mutations
SNP Position in the Haplotype
Recombination Events
Haplotype SNP Position in the Haplotype
Point Mutations and Recombination Events
Haplotype SNP Position in the Haplotype
Conclusion
We have developed an algorithm for identifying recent genetic events in an individual
There were more point mutations identified than there were recombination events
Certain regions in the genome where there were many recent genetic events and there are regions with fewrecent genetic events
Future Work
Run the algorithm over the whole genome Extend the algorithm to multiple
populations Identify recent events that are unique
to a population vs. ones that are shared Identify genetic relations between common
haplotypes Create a chronological order of recent
events in an individual Adapt the algorithm for high-throughput
sequencing data
UCLA ZarLab Dr. Eleazar Eskin All the lab people
SoCalBSI Dr. Jamil Momand Dr. Sandra Sharp Dr. Nancy Warter-Perez Dr. Wendie Johnston Dr. Beverly Krilowicz Dr. Silvia Heubach Dr. Jennifer Faust Ronnie Cheng Funded By: SoCalBSI 2009 Interns
The other ancestors are determined through SNP differences of 2 or more
Determining ancestors
My Project
Red linePoint Mutation
Blue lineAncestor to common relationship
Black dashed lineHaplotype resulted from cross over mutation
Graph
Graph is generated by a program called Graphviz which is a graphical visualization program
Graph