Transcript
Page 1: Avoiding Nonsense Results in your NGS Variant Studies

Avoiding Nonsense Resultsin your NGS Variant Studies

James Lyons-Weiler, PhDScientific Director/

Senior Research ScientistBioinformatics Analysis Core

Genomics & Proteomics Core LaboratoriesUniversity of Pittsburgh

Pittsburgh, PAMay 1, 2014

Page 2: Avoiding Nonsense Results in your NGS Variant Studies

Two Parts

• Identifying sites with low genotypic signal increases concordance among variant callers

• Hazards in finding differentially expressed genes in RNASeq – how to do it more robustly.

Page 3: Avoiding Nonsense Results in your NGS Variant Studies

23andMe: High risk of RA and psiriosisGTL: Low risk of RA and psiriosis

Page 4: Avoiding Nonsense Results in your NGS Variant Studies

NYTimes Article, etc.

Page 5: Avoiding Nonsense Results in your NGS Variant Studies

Data were from Illumina hi-seq 2000

Page 6: Avoiding Nonsense Results in your NGS Variant Studies

Among method averageConcordance57.5% overall; 32.7% at high coverage

O’Rawe et al.

Page 7: Avoiding Nonsense Results in your NGS Variant Studies

TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)

SEQUENCER

MAPPER

VARIANT CALLERS

LOW CONCORDANCE (O’Rawe et al., 2013)

Consensus Analysise.g.,2/3, ¾, set analysis

Information Theory(-> modeling)

Improve Callers(fix errors, modeling) Bake Offs

Simulations

Spiked Ins

Page 8: Avoiding Nonsense Results in your NGS Variant Studies

Entropy of Base Distributions

A T C GA T C G A T C GLow entropyHigh enthalpy

Low entropyHigh enthalpy

High entropyLow enthalpy

Page 9: Avoiding Nonsense Results in your NGS Variant Studies

Boltzmann Entropy

• s = k ln w (Planck)

• w = antiln(s/k)

http://schneider.ncifcrf.gov/images/boltzmann/boltzmann-tomb-4.html

Page 10: Avoiding Nonsense Results in your NGS Variant Studies

Rank Sorted Distribution of w(O’Rawe et al. data)

Homozygotes w = 1

Heterozygotes w = 2

Page 11: Avoiding Nonsense Results in your NGS Variant Studies

Example w Density Distribution

Page 12: Avoiding Nonsense Results in your NGS Variant Studies

w and FBVCA T C G w pw Zygosity Genotype200 0 0 0 1 0 Homozygote AA

16 158 13 13 2.102558 0 Homozygote TT100 100 0 0 2 0 Heterozygote AT

58 30 1 111 2.768507 0 Heterozygote AG28 80 14 78 3.303636 0 Heterozygote TG76 38 29 57 3.758733 0 Heterozygote AG33 49 60 58 3.895496 0.0126 Heterzygote? CG?50 50 50 50 4 1 noise unknown

Page 13: Avoiding Nonsense Results in your NGS Variant Studies

Operational*Equiprobable Null Distribution

{f(A) = f(T) = f(G) = f(C)}

Page 14: Avoiding Nonsense Results in your NGS Variant Studies

Convergence of significance (pw)

Page 15: Avoiding Nonsense Results in your NGS Variant Studies

What We Expect

TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)

SEQUENCER

MAPPER

VARIANT/BASE CALLERS

Genotypic Signal Filtering

INCREASED CONCORDANCE

Page 16: Avoiding Nonsense Results in your NGS Variant Studies
Page 17: Avoiding Nonsense Results in your NGS Variant Studies

Phom Function

Page 18: Avoiding Nonsense Results in your NGS Variant Studies

gatkConcordance w/ FBVC Hom Het

ALL 0.5762 11868 17670pw<=0.05 0.9976 11282 5676

pw>0.05 0.0074 586 11994samtools

ALL 0.5649 11541 18799pw<=0.05 0.9917 11489 5761

pw>0.05 0.0002 52 13038snver

ALL 0.6006 11904 16729pw<=0.05 0.9934 11812 5470

pw>0.05 0.0007 92 11259

From the O’Rawe et al. generated resultsFBVC = frequency-based variant caller (Lyons-Weiler et al.)

Page 19: Avoiding Nonsense Results in your NGS Variant Studies

Signal Tx %ConcordanceFBVC_vs_FBVC Marked ALL 85.64

pw<=0.05 91.08pw>0.05 35.66

FBVC_vs_FBVC Realigned ALL 83.82pw<=0.05 91.69

pw>0.05 28.21FBVC_vs_FBVC Recalibrated ALL 93.14

pw<=0.05 ***99.39pw>0.05 48.53

FBVC_vs_FBVC Reduced ALL 21.54pw<=0.05 24.57

pw>0.05 4.25FBVC_vs_FBVC Marked-Realigned ALL 76.91

pw<=0.05 86.11pw>0.05 15.44

FBVC_vs_FBVC Marked-Realigned-Recalibrated ALL 76.73pw<=0.05 85.99

pw>0.05 15.34

FBVC_vs_FBVC Marked-Realigned-Recalibrated-Reduced ALL 19.98pw<=0.05 22.9

pw>0.05 2.66

Page 20: Avoiding Nonsense Results in your NGS Variant Studies
Page 21: Avoiding Nonsense Results in your NGS Variant Studies

TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)

SEQUENCER

MAPPER

VARIANT CALLERS

LOW CONCORDANCE (O’Rawe et al., 2013)

Consensus Analysise.g.,2/3, ¾, set analysis

Information Theory(-> modeling)

Improve Callers(fix errors, modeling) Bake Offs

Simulations

Spiked Ins

Page 22: Avoiding Nonsense Results in your NGS Variant Studies

Lifescope reads (read)

Shrimp2 reads (blue)

Mappers must be systematically evaluated

Page 23: Avoiding Nonsense Results in your NGS Variant Studies

Part 2: Good and Bad News forRNASeq (and everything else):

The Bad News:

Fold Change is Biased.

The Good News:

We have identified a much less biased method.

Page 24: Avoiding Nonsense Results in your NGS Variant Studies

T-test is not appropriatefor small N, large P data

(such as RNASeq)

Page 25: Avoiding Nonsense Results in your NGS Variant Studies

Fold Change > 2.0

Delta > 25

Page 26: Avoiding Nonsense Results in your NGS Variant Studies

FC(A/B) is Blind to Large Portionsof Your Data

FC(A/B)

Delta(and J5: Patel & Lyons-Weiler, 2004)

Page 27: Avoiding Nonsense Results in your NGS Variant Studies

Ratio are Hard to Interpret asBiological Differences

Gene A B delta (A-B) FC(A/B)

gene1 5 3 2 1.667

gene2 50 30 20 1.667

gene3 500 300 200 1.667

gene4 5000 3000 2000 1.667

gene5 50000 30000 20000 1.667

Page 28: Avoiding Nonsense Results in your NGS Variant Studies

A-B is a differenceA/B is a quotient.

Page 29: Avoiding Nonsense Results in your NGS Variant Studies

Log2 TransformationDoes not Help

Reveals Minor Delta (&J5) Bias

Pink = FC(A/B)Black = Delta

Page 30: Avoiding Nonsense Results in your NGS Variant Studies

G-Thresholding J5

Page 31: Avoiding Nonsense Results in your NGS Variant Studies

FC Bias in Amyotrophic Lateral Sclerosis

0

50000

100000

150000

200000

250000

300000

350000

0 50000 100000 150000 200000

Control

ALS DEGy

FCDEGy

Black circles = FC(A/B). Pink = Gthr-J5 genes

Page 32: Avoiding Nonsense Results in your NGS Variant Studies
Page 33: Avoiding Nonsense Results in your NGS Variant Studies
Page 34: Avoiding Nonsense Results in your NGS Variant Studies

Black circles = FC(A/B). Pink = Gthr-J5 genes

FC(A/B) Bias inAlchohol-Induced Hepatitis

Page 35: Avoiding Nonsense Results in your NGS Variant Studies

Conclusions• Not all NGS/HTS sites have sufficient genotypic signal to warrant

a base call. High coverage alone does not provide a solution.

• By measuring genotypic signal, we can determine which sites we can call with confidence.

• Fold-change(FC(A/B) is blind to highly expressed genes and should be abandoned as a measure of differential expression altogether – even for single gene or single protein studies!

• Published microarray data sets analyzed to date using FC(A/B) only are a gold-mine for re-analysis using less biased methods.

Page 36: Avoiding Nonsense Results in your NGS Variant Studies

Credits and Contact• pw, pHom, etc: James Lyons-Weiler, Alan Twaddle, Rahil Sethi.

– (MS in preparation)– Our software is called Gconf (not yet available)

• Fold-Change Bias: James Lyons-Weiler, Tamanna Sultana, Rick Jordan, Rahil Sethi– (Paper in review)– For now, read

• Mariani TJ, Budhraja V, Mecham BH, Gu CC, Watson MA, Sadovsky Y. 2003. A variable fold change threshold determines significance for expression microarrays. FASEB J. 17:321-3. doi: 10.1096/fj.02-0351fje

• Pearson, K. 1897. On a form of spurious correlation that may arise when indices are used for the measurement of organs. Proc Roy Soc Lond 60:489-498 doi: 10.1098/rspl.1896.0076


Top Related