identification and genotyping of single feature polymorphisms in complex genomes justin borevitz...
Post on 15-Jan-2016
215 views
TRANSCRIPT
Identification and Genotyping ofSingle Feature Polymorphismsin Complex Genomes
Justin BorevitzUniversity of Chicagonaturalvariation.org
Talk Outline
• Intro/QTL mapping
• Single Feature Polymorphisms (SFPs)– Potential deletions
• Bulk Segregant Mapping– Extreme Array Mapping
• Transcriptional profiling– for QTL candidate genes
Quantitative Trait Loci
EPI1 EPI2
SNP377
SM184
SM50
SM35
SM106
G2395
SNP65
SM40
SEQ8298
TH1
MSAT7964
MAT7787
CER45
5.50
5.87
6.34
7.01
7.30
7.44
7.60
7.79
7.96
8.13
8.29
8.65
9.32
MbMarker
Near-Isogenic Lines for LIGHT1 Ler / Cvi #3
mm
81N-J 17A-A/J 114 124 189Ler
6 2 4 3 3 3 Plants
Line
RVE7
GI
194
3
5.0 5.8 5.8 5.1 5.9 5.7 5.8 Phenotype
What is Array Genotyping?
• Affymetrix expression GeneChips contain 202,806 unique 25bp oligo nucleotides.
• 11 features per probset for 21546 genes• New array’s have even more• Genomic DNA is randomly labeled with
biotin, product ~50bp.• 3 independent biological replicates
compared to the reference strain Col
GeneChip
Potential Deletions
Spatial Correction
Spatial Artifacts
Improved reproducibilityNext: Quantile Normalization
False Discovery and Sensitivity
PM only
SAM threshold
5% FDR
GeneChip SFPs nonSFPs Cereon marker accuracy 3806 89118 100% Sequence 817 121 696 Sensitivity
Polymorphic 340 117 223 34% Non-polymorphic 477 4 473
False Discovery rate: 3% Test for independence of all factors: Chisq = 177.34, df = 1, p-value = 1.845e-40 SAM threshold 18% FDR
GeneChip SFPs nonSFPs Cereon marker accuracy 10627 82297 100% Sequence 817 223 594 Sensitivity
Polymorphic 340 195 145 57% Non-polymorphic 477 28 449
False Discovery rate: 13% Test for independence of all factors: Chisq = 265.13, df = 1, p-value = 1.309e-59
3/4 Cvi markers were also confirmed in PHYB
90% 80% 70%
41% 53% 85%
90% 80% 70%
67% 85% 100%
Cereonmay be asequencingError
TIGRmatch isa match
Effect of SNP position
340 CandidatePolymorphisms
False negative
True Positive
Chip genotyping of a Recombinant Inbred Line
29kb interval
Discovery 6 replicates X $500 12,000 SFPs = $0.25Typing 1 replicate X $500 12,000 SFPs = $0.041
LIGHT1 NIL
Potential Deletions
>500 potential deletions45 confirmed by Ler sequence
23 (of 114) transposons
Disease Resistance(R) gene clusters
Single R gene deletions
Genes involved in Secondary metabolism
Unknown genes
Potential Deletions Suggest Candidate Genes
FLOWERING1 QTL
Chr1 (bp)
Flowering Time QTL caused by a natural deletion in MAF1
MAF1
MAF1 natural deletion
Fast Neutron deletions
FKF1 80kb deletion CHR1 cry2 10kb deletion CHR1
Het
Map bibb100 bibb mutant plants100 wt mutant plants
bibb mapping
ChipMapAS1
Bulk segregantMapping usingChip hybridization
bibb maps toChromosome2 near ASYMETRIC LEAVES1
BIBB = ASYMETRIC LEAVES1
Sequenced AS1 coding region from bib-1 …found g -> a change that would introduce a stop codon in the MYB domain
bibb as1-101
MYB
bib-1W49*
as-101Q107*
as1bibb
AS1 (ASYMMETRIC LEAVES1) =MYB closely related toPHANTASTICA located at 64cM
0 20 40 60 80 100
-0.2
-0.1
0.0
0.1
0.2
arr11mut
cM Chromosome 1
alle
le fr
eque
ncy
0 20 40 60 80-0
.2-0
.10.
00.
10.
2
arr11mut
cM Chromosome 2
alle
le fr
eque
ncy
0 20 40 60 80
-0.2
-0.1
0.0
0.1
0.2
arr11mut
cM Chromosome 3
alle
le fr
eque
ncy
0 20 40 60
-0.2
-0.1
0.0
0.1
0.2
arr11mut
cM Chromosome 4
alle
le fr
eque
ncy
0 20 40 60 80 100
-0.2
-0.1
0.0
0.1
0.2
arr11mut
cM Chromosome 5
alle
le fr
eque
ncy
arythmic11Mapping confirmedSam Hazen
0 20 40 60 80 100
-0.2
-0.1
0.0
0.1
0.2
arr90mut
cM Chromosome 1
alle
le fr
eque
ncy
0 20 40 60 80
-0.2
-0.1
0.0
0.1
0.2
arr90mut
cM Chromosome 2
alle
le fr
eque
ncy
0 20 40 60 80
-0.2
-0.1
0.0
0.1
0.2
arr90mut
cM Chromosome 3
alle
le fr
eque
ncy
0 20 40 60
-0.2
-0.1
0.0
0.1
0.2
arr90mut
cM Chromosome 4
alle
le fr
eque
ncy
0 20 40 60 80 100
-0.2
-0.1
0.0
0.1
0.2
arr90mut
cM Chromosome 5
alle
le fr
eque
ncy
0 20 40 60 80 100
-0.4
-0.2
0.0
0.2
0.4
arr21mut
cM Chromosome 1
alle
le fr
eque
ncy
0 20 40 60 80
-0.4
-0.2
0.0
0.2
0.4
arr21mut
cM Chromosome 2
alle
le fr
eque
ncy
0 20 40 60 80
-0.4
-0.2
0.0
0.2
0.4
arr21mut
cM Chromosome 3
alle
le fr
eque
ncy
0 20 40 60-0
.4-0
.20.
00.
20.
4
arr21mut
cM Chromosome 4
alle
le fr
eque
ncy
0 20 40 60 80 100
-0.4
-0.2
0.0
0.2
0.4
arr21mut
cM Chromosome 5
alle
le fr
eque
ncy
arythmic90Gene clonedSam Hazen
arythmic21Allelic to arr90Sam Hazen
stamenstayLerSarah LiljegrenMapping confirmed
0 20 40 60 80 100
-0.5
0.0
0.5
stamenstaymut
cM Chromosome 1
alle
le fr
eque
ncy
0 20 40 60 80
-0.5
0.0
0.5
stamenstaymut
cM Chromosome 2
alle
le fr
eque
ncy
0 20 40 60 80
-0.5
0.0
0.5
stamenstaymut
cM Chromosome 3
alle
le fr
eque
ncy
0 20 40 60
-0.5
0.0
0.5
stamenstaymut
cM Chromosome 4
alle
le fr
eque
ncy
0 20 40 60 80 100
-0.5
0.0
0.5
stamenstaymut
cM Chromosome 5
alle
le fr
eque
ncy
0 20 40 60 80 100
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
ein6F2mut
cM Chromosome 1
alle
le fr
eque
ncy
0 20 40 60 80
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
ein6F2mut
cM Chromosome 2al
lele
freq
uenc
y
0 20 40 60 80
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
ein6F2mut
cM Chromosome 3
alle
le fr
eque
ncy
0 20 40 60
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
ein6F2mut
cM Chromosome 4
alle
le fr
eque
ncy
0 20 40 60 80 100
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
ein6F2mut
cM Chromosome 5
alle
le fr
eque
ncy ein6een
double mutantRamlah NehringMapping confirmed
eXtreme Array Mapping
Histogram of Kas/Col RILs Red light
hypocotyl length (mm)
cou
nts
6 8 10 12 14
02
46
81
01
2
15 tallest RILs pooled vs15 shortest RILs pooled
LOD
eXtreme Array Mapping
Red light QTL RED2 from 100 Kas/ Col RILs
Allele frequencies determined by SFP genotyping. Thresholds set by simulations
15 tallest RILs pooled vs15 shortest RILs pooled
0
4
8
12
16
0 20 40 60 80 100cM
LO
D
Composite Interval Mapping
RED2 QTL
Chromosome 2
RED2 QTL 12cM
Fine Mapping with Arrays
0 100 200 300 400 500 600
-1.0
-0.5
0.0
0.5
1.0
Chromosome 1 (cM)
kb
geno
type
0 100 200 300 400 500 600
-1.0
-0.5
0.0
0.5
1.0
Chromosome 2 (cM)
kbge
noty
pe
0 100 200 300 400 500 600
-1.0
-0.5
0.0
0.5
1.0
Chromosome 3 (cM)
kb
geno
type
0 100 200 300 400 500 600
-1.0
-0.5
0.0
0.5
1.0
Chromosome 4 (cM)
kb
geno
type
0 100 200 300 400 500 600
-1.0
-0.5
0.0
0.5
1.0
Chromosome 5 (cM)
kb
geno
type
Single Additive Gene1000 F2sSelect recombinantsby PCR 1Mb region
SFPs for reverse genetics
http://naturalvariation.org/sfp
14 Accessions 30,950 SFPs
Barley SFPs gDNA
• 9 arrays, random labeled genomic DNA
• 3 wild type, 3 parent 1, 3 parent 2
• Hope to verify some RNA SFPs
• Pairs plots, correlation matrix
• SFP table
Just better than permutations
delta ori.data perm.data difference FDR0.10 2866 2114.2 751.8 0.740.15 1870 578.4 1291.6 0.310.20 1274 269.3 1004.7 0.210.25 991 174.7 816.3 0.180.30 816 126.8 689.2 0.160.35 660 95.8 564.2 0.150.40 554 75.8 478.2 0.14
Increase specific activity with other labeling methodsPerform more replicates
• Single Feature Polymorphisms– Improve with replicates (easy)– Improved statistical models
• Genotyping– Precisely define recombination breakpoints– Fine mapping
• Potential Deletions– Candidate genes/ induced mutations
• Bulk segregant Mapping– eXtreme Array Mapping, F2s etc
• Look for gene expression differences between genotypes
• Identify candidate genes that map to mutation
• Downstream targets that map elsewhere
Transcription based cloning
differences may be due to expression or hybridization
PAG1 down regulated in Cvi
PLALE GREEN1 knock out has long hypocotyl in red light
SFPs from RNA
• Barley Affy array 22801 probe sets– Most probes sets 11 probes– Background correction “rma2”– Quantile normalization
• 36 arrays total– 3 replicates– 6 tissues, leaf, crown, root, radical, gem, col?– 2 genotypes (Golden Promise 7,459 ESTs)– (Morex 52,695 ESTs)
Look at some plots raw data
Remove probe effect
Remove tissue effect
Remove Genotype effect
SAM False Discovery Rate
delta ori.data perm.data difference FDR0.1 13210 1210.34 11999.66 0.0916230130.2 7903 183.95 7719.05 0.0232759710.3 5462 49.18 5412.82 0.0090040280.4 4036 18.31 4017.69 0.0045366700.5 3024 8.49 3015.51 0.0028075400.6 2285 3.85 2281.15 0.001684902
Both + and – SFPs since no reference comparison
Need to compare with ESTs
Review• Single Feature
Polymorphisms (SFPs) can be used to identify recombination breakpoints, potential deletions, for eXtreme Array mapping, and haplotyping
• Expression analysis to identify QTL candidate genes and downstream responses that consider polymorphisms
RNA DNA
Universal Whole Genome Array
Transcriptome AtlasExpression levelsTissues specificity
Transcriptome AtlasExpression levelsTissues specificity
Gene DiscoveryGene model correctionNon-coding/ micro-RNAAntisense transcription
Gene DiscoveryGene model correctionNon-coding/ micro-RNAAntisense transcription
Alternative SplicingAlternative Splicing Comparative GenomeHybridization (CGH)
Insertion/Deletions
Comparative GenomeHybridization (CGH)
Insertion/Deletions
MethylationMethylation
ChromatinImmunoprecipitation
ChIP chip
ChromatinImmunoprecipitation
ChIP chip
Polymorphism SFPsDiscovery/Genotyping
Polymorphism SFPsDiscovery/Genotyping
~35 bp tile, non-repetitive regions, “good” binding oligos, evenly spaced
NaturalVariation.org
SyngentaHur-Song ChangTong Zhu
SyngentaHur-Song ChangTong Zhu
University of Guelph, CanadaDave WolynUniversity of Guelph, CanadaDave Wolyn
Salk
Jon WernerTodd MocklerSarah LiljegrenRamlah NehringJoanne ChoryDetlef WeigelJoseph Ecker
UC Davis
Julin Maloof
UC San Diego
Charles Berry
Scripps
Sam HazenElizabeth Winzeler
NaturalVariation.orgSalk
Jon WernerTodd MocklerSarah LiljegrenRamlah NehringJoanne ChoryDetlef WeigelJoseph Ecker
UC Davis
Julin Maloof
UC San Diego
Charles Berry
Scripps
Sam HazenElizabeth Winzeler