helmholtz zentrum münchen institute of human genetics10.07.2012
DESCRIPTION
Institute of Human Genetics. Variant Calling. Helmholtz Zentrum München Institute of Human Genetics10.07.2012. Methods. HG00117 Lab7 BAM. HG00355 Lab7 BAM. NA06986 Lab7 BAM. NA19095 Lab7 BAM. NA20527 Lab7 BAM. HG00117 Lab6 BAM. HG00117 Lab6 BAM. HG00117 Lab6 BAM. - PowerPoint PPT PresentationTRANSCRIPT
Helmholtz Zentrum München Helmholtz Zentrum München Institute of Human GeneticsInstitute of Human Genetics10.07.2012 10.07.2012
Institute of Human Genetics
Variant Calling
Methods
SamtoolsMultisample Variant Calling
SamtoolsMultisample Variant Calling
HG00117Lab1 BAM
HG00117Lab2 BAM
HG00117Lab3 BAM
HG00117Lab4 BAM
HG00117Lab5 BAM
HG00117Lab6 BAM
HG00117Lab7 BAM
HG00117Lab1 BAM
HG00117Lab2 BAM
HG00117Lab3 BAM
HG00117Lab4 BAM
HG00117Lab5 BAM
HG00117Lab6 BAM
HG00355Lab7 BAM
HG00117Lab1 BAM
HG00117Lab2 BAM
HG00117Lab3 BAM
HG00117Lab4 BAM
HG00117Lab5 BAM
HG00117Lab6 BAM
NA06986Lab7 BAM
HG00117Lab1 BAM
HG00117Lab2 BAM
HG00117Lab3 BAM
HG00117Lab4 BAM
HG00117Lab5 BAM
HG00117Lab6 BAM
NA19095Lab7 BAM
HG00117Lab1 BAM
HG00117Lab2 BAM
HG00117Lab3 BAM
HG00117Lab4 BAM
HG00117Lab5 BAM
HG00117Lab6 BAM
NA20527Lab7 BAM
Comparison with 1000G data
• Filtering of 1000G Variants:
• Gencode v12 Transcripts
• Covered by at least one read in corresponding BAM files
• Comparison of VCF files with custom Perl script
• Output:
• matching.vcf
• not_matching.vcf
• not_found.vcf
Comparison with 1000G data
Sample CorrectGTs WrongGTs NotFoundGTs
HG00117 146911 18932 2692240
HG00355 146192 20208 2691683
NA06986 141685 20138 2696260
NA19095 134587 25872 2697624
NA20527 147888 18632 2691563
Comparison with 1000G data
CorrectGTs NotFoundGTs
number of reads number of reads
Methods
…
SamtoolsMultisample Variant Calling
SamtoolsMultisample Variant Calling
HG00117Lab1 BAM
HG00355Lab1 BAM
NA06986Lab1 BAM
NA19095Lab1 BAM
NA20527Lab1 BAM
SamtoolsMultisample Variant Calling
SamtoolsMultisample Variant Calling
HG00117Lab2 BAM
HG00355Lab2 BAM
NA06986Lab2 BAM
NA19095Lab2 BAM
NA20527Lab2 BAM
SamtoolsMultisample Variant Calling
SamtoolsMultisample Variant Calling
HG00117Lab3 BAM
HG00355Lab3 BAM
NA06986Lab3 BAM
NA19095Lab3 BAM
NA20527Lab3 BAM
Known RNA editing sites
chrom position gene % readsoverlap with
1000G coverage
chr4 158257875 GRIA2 0 no 0
chr4 158281294 GRIA2 0 no 0
chr4 57976234 IGFBP7 0 no 0
chr4 57976286 IGFBP7 0 no 1
chr5 156736808 CYFIP2 0 no 6573
chr11 105804694 GRIA4 0 no 1
chr12 5021742 KCNA1 0 no 31
chr20 36147533 BLCAP 0 no 2365
chr20 36147563 BLCAP 0 no 1930
chr20 36147572 BLCAP 16% no 1829
chr21 30953750 GRIK1 0 no 24
chrX 122598962 GRAI3 67% ? 3
chrX 151358319 GABRA3 0 ? 19939
chrX 153579950 FLNA 0 ? 0
Li et al., Science, 2009
Future work
• Multi Sample calling on all GEM BAM files
• Systematic analysis of not_found/not_matching calls
• Comparison with imputation data (and future phase 2 data)
• Systematik calling of RNA editing sites (e.g. Ramaswami et al., Nature Methods, 2012)