creating a snp calling pipeline

49
1 Potato SNPs Dan Bolser and David Martin Next Gen Bug, Dundee 01/18/2010

Upload: dan-bolser

Post on 14-Jun-2015

1.698 views

Category:

Education


0 download

DESCRIPTION

Creating a SNP calling pipeline in the context of the Potato Genome Sequencing Consortium project.

TRANSCRIPT

Page 1: Creating a SNP calling pipeline

1

Potato SNPs

Dan Bolser and David Martin

Next Gen Bug, Dundee01/18/2010

Page 2: Creating a SNP calling pipeline

2

Aims of the work

1) Learn about handling RNASeq Create a SNP calling pipeline

2) Select SNPs for genetic mapping Using Illumina's GoldenGate SNP chip (OPA)

Page 3: Creating a SNP calling pipeline

3

Creating a SNP calling pipeline

Page 4: Creating a SNP calling pipeline

4

Page 5: Creating a SNP calling pipeline

5

1) Index the potato genome assembly

bwa index [-a bwtsw|div|is] [-c] <in.fasta>

2) Perform the alignment

bwa aln [options] <in.fasta> <in.fq>

3) Output results in SAM format (single end)

bwa samse <in.fasta> <in.sai> <in.fq>

Align (using BWA)

Page 6: Creating a SNP calling pipeline

Align (using Bowtie)

1) Index the potato genome assembly

bowtie-build [options] <in.fasta> <ebwt>

2) Perform the alignment and output results

bowtie [options] <ebwt> <in.fq>

Page 7: Creating a SNP calling pipeline

7

Page 8: Creating a SNP calling pipeline

8

1) Convert SAM to BAM for sorting

samtools view -S -b <in.sam>

2) Sort BAM for SNP calling

samtools sort <in.bam> <out.bam.s>

Alignments are both compressed for long term storage and sorted for variant discovery.

Convert (using SAMtools)

Page 9: Creating a SNP calling pipeline

9

Page 10: Creating a SNP calling pipeline

10

Coverage profiles /Depth vectors

Page 11: Creating a SNP calling pipeline

11

SAMtools...

Dump a coverage profile

samtools mpileup -f <in.fasta> <my.bam.s>P1 244526 A 10 ...,.,,,.. BBQa`aaaa[P1 244527 A 10 ...,.,,,.. BBZ_`^a_a[P1 244528 C 10 .$.$.,.,,,.. >>RaZ`aaaaP1 244529 C 8 .,.,,,.. NaXaaaa`P1 244530 T 8 .,.,,,.. Xa\_aaa`P1 244531 C 8 .,.,,,.. Rb\abbaaP1 244532 T 9 .,.,,,..^~. EE^^^^^^AP1 244533 T 9 .,.,,,... BB\\\\\\BP1 244534 T 9 .$,$.,,,... @@^^^^^^E

Page 12: Creating a SNP calling pipeline

12

SAMtools Bio::DB::Sam (BioPerl)

Dump a coverage profile 2

Page 13: Creating a SNP calling pipeline

13

SAMtools Bio::DB::Sam (BioPerl)

P41630

Matches : 9

0 2 3 3 3 3 3 3 3 3 3 3 3 3 4 5 5 5 5 5 5 5 5 5 5 6 6 6 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 7 6 6 6 6 6 6 6 6 6 6 6 6 5 4 4 4 4 4 4 4 4 4 4 3 3 3 2 2 1 1 1 1 1 1 1 1 0 0 0

Page 14: Creating a SNP calling pipeline

14

Page 15: Creating a SNP calling pipeline

15

mpileup

samtools mpileup collects summary information in the input BAMs, computes the likelihood of data given each possible genotype and stores the likelihoods in the BCF format.

bcftools view applies the prior and does the actual calling.

Finally, we filter.

Page 16: Creating a SNP calling pipeline

SNP call

1) Index the potato genome assembly (again!)

samtools faidx in.fasta

2) Run 'mpileup' to generate VCF format

samtools mpileup -ug -f in.fasta my1.bam.s my2.bam.s > my.raw.bcf

Actually, all we did (I think) is perform a format conversion (BAM to VCF).

Page 17: Creating a SNP calling pipeline

17

VCF format

Page 18: Creating a SNP calling pipeline

18

VCF format

A standard format for sequence variation: SNPs, indels and structural variants.

Compressed and indexed.

Developed for the 1000 Genomes Project.

VCFtools for VCF like SAMtools for SAM.

Specification and tools available from http://vcftools.sourceforge.net

Page 19: Creating a SNP calling pipeline

19

Page 20: Creating a SNP calling pipeline

20

SNP call and filter

1) Call SNPs

bcftools view -bvcg my.raw.bcf > my.var.bcf

2) Filter SNPs

bcftools view my.var.bcf | vcfutils.pl varFilter my.var.bcf > my.var.bcf.filt

Page 21: Creating a SNP calling pipeline

21

Page 22: Creating a SNP calling pipeline

22

Aims of the work

1) Learn about handling RNASeq Create a SNP calling pipeline

2) Select SNPs for genetic mapping Using Illumina's GoldenGate SNP chip (OPA)

Page 23: Creating a SNP calling pipeline

23

Select SNPs for genetic mapping Using Illumina's GoldenGate SNP chip (OPA)

Page 24: Creating a SNP calling pipeline

24

SNP chip (OPA) construction

A set of DM SNP positions was provided by the SolCAP project (RNASeq derived).

A subset was selected for developing OPAs (Illumina’s SNP chip technology).

OPAs were run, and results have now been compared to RNASeq.

Page 25: Creating a SNP calling pipeline

Comparison (using an early SAMtools)

Page 26: Creating a SNP calling pipeline

Comparison (using an early SAMtools)

Page 27: Creating a SNP calling pipeline

27

Page 28: Creating a SNP calling pipeline
Page 29: Creating a SNP calling pipeline

Comparison (using an early SAMtools)

Page 30: Creating a SNP calling pipeline

Comparison (using new SAMtools)

Page 31: Creating a SNP calling pipeline
Page 32: Creating a SNP calling pipeline
Page 33: Creating a SNP calling pipeline

Comparison (using new SAMtools)

Page 34: Creating a SNP calling pipeline

34

Looking into the RNASeq data…

Page 35: Creating a SNP calling pipeline

35

Page 36: Creating a SNP calling pipeline

36

Potato genome assembly

RNASeq read library

RNASeq read library

Page 37: Creating a SNP calling pipeline

37

Page 38: Creating a SNP calling pipeline

38

Page 39: Creating a SNP calling pipeline

39

Page 40: Creating a SNP calling pipeline

40

Page 41: Creating a SNP calling pipeline

41

Page 42: Creating a SNP calling pipeline

42

A lot more questions to answer…

Track down more ‘strange’ SNPs based on the expected AFS of the two samples.

Go beyond bialleleic SNPs

Check the OPA base... Was the right base probed by the chip?

Page 43: Creating a SNP calling pipeline

43

Thank you for your patience!

Page 44: Creating a SNP calling pipeline
Page 45: Creating a SNP calling pipeline

OPAs in 5 steps...

The DNA sample is activated for binding to paramagnetic particles.

Page 46: Creating a SNP calling pipeline

OPAs in 5 steps...

Three oligos are designed for each SNP locus. Two are specific to each allele of the SNP site (ASO) and a Locus-Specific Oligo (LSO).

Page 47: Creating a SNP calling pipeline

OPAs in 5 steps...

Several wash steps remove excess and mis-hybridized oligos.

Extension of the appropriate ASO and ligation to the LSO joins information about the genotype to the address sequence on the LSO.

Page 48: Creating a SNP calling pipeline

OPAs in 5 steps...

The single-stranded, dye-labeled DNAs are hybridized to their complement bead type through their unique address sequences.

Page 49: Creating a SNP calling pipeline

OPAs in 5 steps...

Key to the assay:

Scalable, multiplexing sample preparation (one tube reaction).

Highly parallel array-based read-out.

High-quality data: Average call rates above 99% accuracy.