how sequencing experiments fail - babraham institute

49
How Sequencing Experiments Fail v1.0 Simon Andrews [email protected]

Upload: others

Post on 23-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How Sequencing Experiments Fail - Babraham Institute

How Sequencing Experiments Fail

v1.0

Simon Andrews

[email protected]

Page 2: How Sequencing Experiments Fail - Babraham Institute

Classes of Failure

Something went wrong with a machine

Samples aren’t what they’re supposed to be

Problems during sequencing library preparation

Unexpected material in your libraries

Samples didn’t behave the way you expected

Drawing the wrong conclusion from the data

Library

Contamination

Biological

Interpretation

Technical

Tracking

Page 3: How Sequencing Experiments Fail - Babraham Institute

Technical

Page 4: How Sequencing Experiments Fail - Babraham Institute

Technical FailuresTechnical

G

A

T

C

Signal Level

Call = TConfidence = High

G

A

T

C

Signal Level

Call = TConfidence = Low

Call = TConfidence = Low

G

A

T

C

Signal Level

Page 5: How Sequencing Experiments Fail - Babraham Institute

Phred ScoresTechnical

Phred = -10 log10 pp = Probability call is incorrect

10% error Phred101% error Phred200.1% error Phred30

Page 6: How Sequencing Experiments Fail - Babraham Institute

Incorrect EncodingTechnical

1

10

100

1000

10000

100000

1000000

33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126

Phred33 (Sanger)

Phred64 (Illumina)

!”#$%&’()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh

Phred33

Phred64

Page 7: How Sequencing Experiments Fail - Babraham Institute

Phred ScoresTechnical

Page 8: How Sequencing Experiments Fail - Babraham Institute

Positional Phred ScoresTechnical

Page 9: How Sequencing Experiments Fail - Babraham Institute

Positional Phred ScoresTechnical

Page 10: How Sequencing Experiments Fail - Babraham Institute

Biased Phred ScoresTechnical

GGATCCGTATGCGATGCTAGCGT

GGATCATATATATGCTAGCGTAT

GGATCTATATTGCGCGATACTGG

GGATCCCGTAGCTGCGATGCTGA

GGATCAAGGATAGCGCGTCTAGA

GGATCTATATAGTTGCCGTATCG

GGATCGGAGCGGGATCGGATGCG

HindIII sites

Page 11: How Sequencing Experiments Fail - Babraham Institute

Tracking

Page 12: How Sequencing Experiments Fail - Babraham Institute

Tracking - BarcodesTracking

Page 13: How Sequencing Experiments Fail - Babraham Institute

Tracking - BarcodesTracking

Page 14: How Sequencing Experiments Fail - Babraham Institute

Tracking - BarcodesTracking

Page 15: How Sequencing Experiments Fail - Babraham Institute

Tracking ExerciseTracking

You have some barcode statistics from a set of runs from the same group.

Red = Expected barcodeGrey = Unexpected barcode

Can you see if you can spot a problem within this data set, and say how many of the lanes it might affect?

Page 16: How Sequencing Experiments Fail - Babraham Institute

Tracking – Swapped Samples

• Swapped between users

– Different sample type

– Different species

– See later Contamination / Biology sections

• Swapped within experiment

– Look for consistent biological signal

– Use other knowledge of the samples to validate

Tracking

Page 17: How Sequencing Experiments Fail - Babraham Institute

Tracking – Sample groupsTracking

Page 18: How Sequencing Experiments Fail - Babraham Institute

Tracking – Sample groupsTracking

Page 19: How Sequencing Experiments Fail - Babraham Institute

Tracking – Sample swaps

KO1KO2KO3KO4

WT1WT2WT3WT4

Tracking

Page 20: How Sequencing Experiments Fail - Babraham Institute

Library

Page 21: How Sequencing Experiments Fail - Babraham Institute

Library Problems

• Material Lost– Overamplification

• Duplication

• Biases in selection– Priming bias– GC bias– Methylation bias– Size selection bias

• Technical contamination– Read through adapter– Adapter dimers

Library

Page 22: How Sequencing Experiments Fail - Babraham Institute

Duplication

• Over-sequencing of library complexity

• Too little material or too much PCR

• Can be difficult to assess

• Why does duplication matter?

– Potentially biased

– Over-estimates measurement accuracy

Library

Page 23: How Sequencing Experiments Fail - Babraham Institute

DuplicationLibrary

Page 24: How Sequencing Experiments Fail - Babraham Institute

DuplicationLibrary

Page 25: How Sequencing Experiments Fail - Babraham Institute

DuplicationLibrary

Page 26: How Sequencing Experiments Fail - Babraham Institute

DuplicationLibrary

Page 27: How Sequencing Experiments Fail - Babraham Institute

Duplication - RepeatsLibrary

Repeat Repeat

R1 R2NR

Real

Mapped

R1 R2NR

Deduplicated

R1 R2NR

Peak callers (MACS for example) deduplicate internally, so you don’t have to consciously do this.

Only avoided by using uniquely mapped reads.

Page 28: How Sequencing Experiments Fail - Babraham Institute

Deduplication

Page 29: How Sequencing Experiments Fail - Babraham Institute

Priming BiasLibrary

Page 30: How Sequencing Experiments Fail - Babraham Institute

Contamination

Page 31: How Sequencing Experiments Fail - Babraham Institute

Contamination

• Different kinds– Technical contamination

• Adapter dimers

– Contamination with a species you might expect• E.coli in a mouse sample

– Contamination with something unexpected

– Contamination with the wrong material• DNA in an RNA-Prep

– Mixed samples

Contamination

Page 32: How Sequencing Experiments Fail - Babraham Institute

Mapping EfficiencyContamination

• Know what to expect

– Data type (genomic / transcriptomic)

– How good / complete is the genome

• Distinguish unique / multi-mapped reads

– Understand the mapping process

Reads:

Input: 5725730

Mapped: 4703342 (82.1% of input)

of these: 471516 (10.0%) have multiple alignments

(471516 have >1)

82.1% overall read alignment rate.

Page 33: How Sequencing Experiments Fail - Babraham Institute

Species ScreenContamination

Page 34: How Sequencing Experiments Fail - Babraham Institute

Species ScreenContamination

Page 35: How Sequencing Experiments Fail - Babraham Institute

Species ScreenContamination

Page 36: How Sequencing Experiments Fail - Babraham Institute

TAGC PlotsContamination

Assemble

Filter contigs

Plot %GC vs Coverage

Sample and blast

Page 37: How Sequencing Experiments Fail - Babraham Institute

ContaminationContamination

Page 38: How Sequencing Experiments Fail - Babraham Institute

Internal ContaminationContamination

Page 39: How Sequencing Experiments Fail - Babraham Institute

Biological

Page 40: How Sequencing Experiments Fail - Babraham Institute

Samples Don’t Behave

• All samples come with a set of expectations– Biological effect

– Sample source

– Rough biological behaviour

• If these aren’t met– Samples may not be what you expect

– Statistical analyses may be invalid

– Larger biological picture may be missed

Biological

Page 41: How Sequencing Experiments Fail - Babraham Institute

ExerciseBiological

You have been given a set of QC and visualisation results for a knockout in male black6 mice (same genotype as the reference) of a single gene.

Have a look through the plots and see if there is anything which would cause you concern regarding the behaviour of the samples.

Page 42: How Sequencing Experiments Fail - Babraham Institute

Expected effects missingBiological

Page 43: How Sequencing Experiments Fail - Babraham Institute

Confounded effectsBiological

Page 44: How Sequencing Experiments Fail - Babraham Institute

ChIP doesn’t behaveBiological

Page 45: How Sequencing Experiments Fail - Babraham Institute

ChIP doesn’t behaveBiological

Page 46: How Sequencing Experiments Fail - Babraham Institute

Methylation doesn’t behaveBiological

Page 47: How Sequencing Experiments Fail - Babraham Institute

RNA-Seq doesn’t behaveBiological

Page 48: How Sequencing Experiments Fail - Babraham Institute

Multiple subgroupsBiological

Page 49: How Sequencing Experiments Fail - Babraham Institute

Interpretation