![Page 1: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/1.jpg)
GBS Usage
Cases:
Non-model
OrganismsKatie E. Hyma, PhD
Bioinformatics Core
Institute for Genomic Diversity
Cornell University
![Page 2: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/2.jpg)
Q: How many SNPs will I get?
A: 42.
What question do you really want to ask?
![Page 3: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/3.jpg)
Q: How many SNPs do I need?
A: 42.
What question are you trying to answer?
![Page 4: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/4.jpg)
The Best Advice I can Give You:
![Page 5: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/5.jpg)
The Best Advice I can Give You:
Does my Experimental Design make Sense?
Do I have a Plan for Analysis?
Do my Results make Biological Sense??
![Page 6: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/6.jpg)
Why can GBS be
complicated?
Sequencing Error
SNP calling
Missing data
Large data sets
Population type
Biological Reality
![Page 7: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/7.jpg)
Is GBS more complicated for
non-model organisms?
I don’t have a reference genome
We don’t know anything about
population structure or diversity
My reference genome is in a pre-
assembly state, or the physical ordering is
inaccurate
My species is outcrossing
![Page 8: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/8.jpg)
Roadmap
No reference genome (UNEAK pipeline)
SNP CALLING – reference pipeline
Examples and usage cases
![Page 9: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/9.jpg)
The non-model problem:No Reference Genome
![Page 10: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/10.jpg)
Universal Network Enabled Analysis Kit (UNEAK)
A reference free SNP calling pipeline
Fei Lu, Cornell University
![Page 11: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/11.jpg)
Overview of UNEAK
Genome is digested, sequenced using GBS
Reads are trimmed to 64 bp
Identical reads = tag
A
B
slide courtesy Fei Lu, Cornell University
![Page 12: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/12.jpg)
Overview of UNEAK – Network filter
Pairwise alignment to findtag pairs with 1 bp mismatch
Topology of tag networks
Keep common reciprocal tags
C
E
D
errorreal tags
F
Build tag networks
count
slide courtesy Fei Lu, Cornell University
![Page 13: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/13.jpg)
Topology of tag networks
Tag
Error
Plastid &Highly repetitivetags
Moderatelyrepetitive tags,Paralogs &SNPs
Networks of 2496 tags
slide courtesy Fei Lu, Cornell University
![Page 14: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/14.jpg)
Details about network filter
SNP
Error tolerance
slide courtesy Fei Lu, Cornell University
![Page 15: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/15.jpg)
Pipeline validated with maize inbred linkage population –network filter
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0
0.0
8
0.1
6
0.2
4
0.3
2
0.4
0.4
8
0.5
6
0.6
4
0.7
2
0.8
0.8
8
0.9
6Pro
po
rtio
n o
f S
NP
s
Allele frequency
Single-site rate(Blast to maize)
Allele frequencydistribution
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0
0.0
7
0.1
4
0.2
1
0.2
8
0.3
5
0.4
2
0.4
9
0.5
6
0.6
3
0.7
0.7
7
0.8
4
0.9
1
0.9
8
Pro
po
rtio
n o
f S
NP
s
Allele frequency
23.3% 85.0%
Step 1 Pairwise alignment of tags
Step 2Network filter
Evaluationcriteria
slide courtesy Fei Lu, Cornell University
![Page 16: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/16.jpg)
Pipeline validated with maize inbred linkage population –SNP validation
LD distribution (SNPs against 1106 markers)
Alignment (B97 tags against B97 shotgun genome from Maize HapMap2 data)
93.2% SNPs are polymorphic
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0
0.0
4
0.0
8
0.1
2
0.1
6
0.2
0.2
4
0.2
8
0.3
2
0.3
6
0.4
0.4
4
0.4
8
0.5
2
0.5
6
0.6
0.6
4
0.6
8
0.7
2
0.7
6
0.8
0.8
4
0.8
8
0.9
2
0.9
6
Pro
po
rtio
n o
f S
NP
s
LD (r2)
0.2
92.2%
slide courtesy Fei Lu, Cornell University
![Page 17: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/17.jpg)
![Page 18: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/18.jpg)
Discovery
Tag Counts
SNP Caller
Genotypes
Tags by Taxa
Sequence
Network Filter
GBS UNEAK Pipeline Fork
1) UTagCountToTagPairPlugin
2) UExportTagPairPlugin
TASSEL 3.0
![Page 19: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/19.jpg)
No Reference Genome SNPs can be called with the UNEAK pipeline if
there is no reference genome.
Be careful with your sampling strategy
Optimized for single species
Allele frequency can matter!
Decide whether your downstream analysis will be qualitatively influenced by biases in SNP calling
Are you doing association mapping? Estimating population genetic parameters?
![Page 20: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/20.jpg)
No Reference Genome
Tag pairs are maximum 1 bp different / 64
bp sequenced
Examples when sampling strategy matters:
The same locus may be identified as two separate loci when there is
population structure and overall divergence is > 1.5%
With highly divergent sub-populations some loci that are monomorphic within
a subpopulation but diverged > 1bp/tag between species will not be
detected
![Page 21: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/21.jpg)
No Reference Genome
Allele frequency can matter for SNP calling
in the UNEAK pipeline – and is influenced by
your population/ sampling strategy
Undersampled diversity (and true rare
alleles) can look like error, and are even
harder to detect with the non-reference
pipeline
![Page 22: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/22.jpg)
Sequence based SNP calling:
What’s in a SNP?
Total Reads# minor allele reads to call
heterozygote (1% error rate)
2 1
3 1
4 1
5 1
6 1
7 2
8 2
9 2
10 2
11 2
12 2
13 2
14 3
Lookup table for GBS pipeline is calculated based on binomial distribution
![Page 23: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/23.jpg)
Sequence based SNP calling:
What’s in a SNP?
Total Reads# minor allele reads to call
heterozygote (1% error rate)# minor allele reads to call
heterozygote (0.33% error rate)
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 2 1
8 2 1
9 2 2
10 2 2
11 2 2
12 2 2
13 2 2
14 3 2
Lookup table for GBS pipeline is calculated based on binomial distribution
![Page 24: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/24.jpg)
Sequence based SNP calling:
What’s in a SNP?
Total read depth: 7
Major allele (“G”) count: 4
Minor allele (“C”) count: 3
SNP call: “S” (heterozygous G/C)
![Page 25: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/25.jpg)
Sequence based SNP calling:
What’s in a SNP?
Total read
depth: 7
Major allele
(“G”) count: 6
Minor allele
(“A”) count: 1
SNP call: “G”
(invariant)
![Page 26: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/26.jpg)
Sequence based SNP calling:
What’s in a SNP?
Total read
depth: 3
Major allele
(“G”) count: 3
Minor allele
(“C”) count: 0
SNP call: “G”
(homozygote)
![Page 27: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/27.jpg)
Sequence based SNP calling:
What’s in a SNP?
Total read depth: 3
Major allele (“G”) count: 2
Minor allele (“A”) count: 1
SNP call: “R” (A/G heterozygote)
![Page 28: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/28.jpg)
Sequence based SNP calling:
What’s in a SNP?
Depth Depth
![Page 29: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/29.jpg)
Heterozygote undercalling in
Vitis spp.
Without filters Aa > AA error rate
(heterozygote undercalling)
in Vitis is
30-50%
50,951 SNPs
Mean depth 5.7x
![Page 30: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/30.jpg)
Heterozygote Undercalling
Solutions
1. Focus on the high coverage genotypes (through VCF plugin)
Ex: Vitisgen project
2. Use genetics (ex. Maize project)
Each project is different, and different solutions are appropriate for each situation
![Page 31: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/31.jpg)
SNP calling in Vitis
(heterozygous)
Read Depth of Each allele is captured for SNP calling in Vitis using the Variant Call format (VCF)
0/0:3,0,0:3:88:0,0,108
Plugins available in TASSEL 3 to output VCF format tbt2vcfPlugin MergeDuplicateSNP_vcf_Plugin Does NOT work with HDF5 TBT/TOPM
VCF integration in TASSEL 4 in now implemented
Qi Sun Jon Zhang
Jeff Glaubitz
Terry Casstevens
![Page 32: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/32.jpg)
Sequenced based SNP calling: What’s in a SNP?
VCF pipeline SNP calling algorithm
Hohenlohe PA, Cresko WA et al. PLOS Genetics
2010 Feb 26;6(2):e1000862.
Likelihood score calculation. No allele
frequency based prior probability was
used due to mixed population
structures.
![Page 33: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/33.jpg)
Heterozygote undercalling in
Vitis spp.
0%
20%
40%
60%
80%
100%
120%
no filter >3 >5 GQ>98
Filtering criteria
Call rate
% Errors
8%
15%
30%
3%
100%
82%
56%
67%
depth depth- - -
![Page 34: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/34.jpg)
Working With VCF Files VCFtools
http://vcftools.sourceforge.net/
VCFtools output formats: plink *** can be used as TASSEL input
012 IMPUTE ldhat BEAGLE-GL
VCFtools is installed on CBSU BioHPC Computing Lab http://cbsu.tc.cornell.edu/lab/lab.aspx
![Page 35: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/35.jpg)
How to Get VCF Output Tassel 3.0
1. QseqToTagCountPlugin/FastqToTagCountPlugin2. MergeMultipleTagCountPlugin3. TagCountToFastqPlugin4. Align5. SAMConverterPlugin6. QseqToTBTPlugin/FastqToTBTPlugin (use –y to output in TBTByte format)7. MergeTagsByTaxaFilesPlugin8. tbt2vcfPlugin9. MergeDuplicateSNP_vcf_Plugin
running each plugin without any options will output a list of available options on the console there is no plugin to merge taxa for VCF files. Merge samples by name at the TBT stage if desired using
the flag –x with the MergeTagsByTaxaFilesPlugin.
TASSEL 4.0 From TagsToSNPByAlignmentPlugin
![Page 36: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/36.jpg)
The non-model problem:Less-than-perfect reference genome
![Page 37: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/37.jpg)
How less than perfect is it? Even relatively short contigs are suitable for anchoring 64bp
GBS tags!
Assembly errors can affect alignment and SNP calling.
Common with short read assemblies
Reference allele does not match GBS alleles
Reference genome assembly
Paralogous regions
Biological Reality
![Page 38: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/38.jpg)
Less than Perfect Reference
SNPs can be called with the reference
pipeline even on genomes that are not
fully assembled
Hint: concatenate contigs into pseudo-
chromosomes, but pad with “Ns” in
between contigs to avoid aligning to
junctions – for 64 bp tags pad with > 64 “Ns”
![Page 39: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/39.jpg)
A non-model case study:Vitis – a highly heterozygous outcrossing crop
species with a genus-level breeding program
![Page 40: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/40.jpg)
“Mapping the way to a better grape”
– 37 Mapping populations
– Genotyping-by-Sequencing (GBS) and SSR based approach to identify QTL and candidate genes
– biotic stress resistance, abiotic stress resistance, and fruit quality.
– Inclusion of 32 different wild and cultivated species to help in map building and genome assembly.
![Page 41: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/41.jpg)
Outcrosses
• Use pseudo-testcross markers (Aa x aa)
• Leverage deep sequencing of parentalgenotyping to phase markers
• Filter by Mendelian errors and/or Linkage Disequilibrium
• Methods development in Vitis for genetic mapping with GBS data
![Page 42: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/42.jpg)
• Dense Mapping of 4-way Cross GBS Data
• Three pipelines:
– Synteny
– Synteny with genetic ordering
– De-novo linkage group formation and genetic ordering
Dense maps from GBS data – 4 way cross
![Page 43: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/43.jpg)
Dense maps from GBS data – 4 way cross
Progeny Number (index)
• ~26,000 markers for a single bi-parental family with ~ 200 progeny
• Separate maps for male and female parents
• No parental genotypes
• Up to 50% missing data per SNP locus
Chromosome 1 markers from male parent in a VitisGen family (phased) and ordered by physical position
Ph
ysic
al p
osi
tio
n
![Page 44: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/44.jpg)
The non-model problem:Take home messages
![Page 45: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/45.jpg)
Designing your GBS study Know what you need for your analyses before
planning your project
GBS is a powerful reduced representation approach, with two distinct components molecular methods
bioinformatics pipelines
The GBS bioinformatics pipelines were developed and optimized for specific applications in mind They can work for you, but be mindful of your
needs
![Page 46: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/46.jpg)
Keep Calm and… Filter – expect to filter up to 90% of SNPs when imputation is not
practical
Pedigree independent filtering: Read depth/genotype quality Percentage of absent calls (missingness) Minor allele frequency For inbred species, inbreeding coefficient filters out most paralogous
sites
Likely to require customized bioinformatics – especially without a reference genome
Be ready for post-processing of data into other formats! Have a plan!
![Page 47: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/47.jpg)
The Best Advice I can Give You:
![Page 48: GBS Usage Cases: Non-model Organisms...VCF pipeline SNP calling algorithm Hohenlohe PA, Cresko WA et al. PLOS Genetics 2010 Feb 26;6(2):e1000862. Likelihood score calculation. No allele](https://reader030.vdocuments.site/reader030/viewer/2022041012/5ec07851c920a177b7172540/html5/thumbnails/48.jpg)
The Best Advice I can Give You:
Does my Experimental Design make Sense?
Do I have a Plan for Analysis?
Do my Results make Biological Sense??