avoiding nonsense results in your ngs variant studies

36
Avoiding Nonsense Results in your NGS Variant Studies James Lyons-Weiler, PhD Scientific Director/ Senior Research Scientist Bioinformatics Analysis Core Genomics & Proteomics Core Laboratories University of Pittsburgh Pittsburgh, PA May 1, 2014

Upload: jackknight

Post on 02-Dec-2014

115 views

Category:

Science


0 download

DESCRIPTION

Presented at the 2014 Bio-IT World Expo in Boston, this slideshow provides info on the use of Lyons-Weiler's entropy-based measures of genotypic signal to improve concordance among alternative variant calling algorithms and to evaluate various steps in the GATK best practices pipeline. The second part of the talk presented data showing a demarcation bias in the widely used measure of fold change in selected differentially expressed genes, transcripts or proteins from microarray and RNASeq data. http://www.bio-itworldexpo.com/Next-Gen-Sequencing-Informatics/

TRANSCRIPT

Page 1: Avoiding Nonsense Results in your NGS Variant Studies

Avoiding Nonsense Resultsin your NGS Variant Studies

James Lyons-Weiler, PhDScientific Director/

Senior Research ScientistBioinformatics Analysis Core

Genomics & Proteomics Core LaboratoriesUniversity of Pittsburgh

Pittsburgh, PAMay 1, 2014

Page 2: Avoiding Nonsense Results in your NGS Variant Studies

Two Parts

• Identifying sites with low genotypic signal increases concordance among variant callers

• Hazards in finding differentially expressed genes in RNASeq – how to do it more robustly.

Page 3: Avoiding Nonsense Results in your NGS Variant Studies

23andMe: High risk of RA and psiriosisGTL: Low risk of RA and psiriosis

Page 4: Avoiding Nonsense Results in your NGS Variant Studies

NYTimes Article, etc.

Page 5: Avoiding Nonsense Results in your NGS Variant Studies

Data were from Illumina hi-seq 2000

Page 6: Avoiding Nonsense Results in your NGS Variant Studies

Among method averageConcordance57.5% overall; 32.7% at high coverage

O’Rawe et al.

Page 7: Avoiding Nonsense Results in your NGS Variant Studies

TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)

SEQUENCER

MAPPER

VARIANT CALLERS

LOW CONCORDANCE (O’Rawe et al., 2013)

Consensus Analysise.g.,2/3, ¾, set analysis

Information Theory(-> modeling)

Improve Callers(fix errors, modeling) Bake Offs

Simulations

Spiked Ins

Page 8: Avoiding Nonsense Results in your NGS Variant Studies

Entropy of Base Distributions

A T C GA T C G A T C GLow entropyHigh enthalpy

Low entropyHigh enthalpy

High entropyLow enthalpy

Page 9: Avoiding Nonsense Results in your NGS Variant Studies

Boltzmann Entropy

• s = k ln w (Planck)

• w = antiln(s/k)

http://schneider.ncifcrf.gov/images/boltzmann/boltzmann-tomb-4.html

Page 10: Avoiding Nonsense Results in your NGS Variant Studies

Rank Sorted Distribution of w(O’Rawe et al. data)

Homozygotes w = 1

Heterozygotes w = 2

Page 11: Avoiding Nonsense Results in your NGS Variant Studies

Example w Density Distribution

Page 12: Avoiding Nonsense Results in your NGS Variant Studies

w and FBVCA T C G w pw Zygosity Genotype200 0 0 0 1 0 Homozygote AA

16 158 13 13 2.102558 0 Homozygote TT100 100 0 0 2 0 Heterozygote AT

58 30 1 111 2.768507 0 Heterozygote AG28 80 14 78 3.303636 0 Heterozygote TG76 38 29 57 3.758733 0 Heterozygote AG33 49 60 58 3.895496 0.0126 Heterzygote? CG?50 50 50 50 4 1 noise unknown

Page 13: Avoiding Nonsense Results in your NGS Variant Studies

Operational*Equiprobable Null Distribution

{f(A) = f(T) = f(G) = f(C)}

Page 14: Avoiding Nonsense Results in your NGS Variant Studies

Convergence of significance (pw)

Page 15: Avoiding Nonsense Results in your NGS Variant Studies

What We Expect

TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)

SEQUENCER

MAPPER

VARIANT/BASE CALLERS

Genotypic Signal Filtering

INCREASED CONCORDANCE

Page 16: Avoiding Nonsense Results in your NGS Variant Studies
Page 17: Avoiding Nonsense Results in your NGS Variant Studies

Phom Function

Page 18: Avoiding Nonsense Results in your NGS Variant Studies

gatkConcordance w/ FBVC Hom Het

ALL 0.5762 11868 17670pw<=0.05 0.9976 11282 5676

pw>0.05 0.0074 586 11994samtools

ALL 0.5649 11541 18799pw<=0.05 0.9917 11489 5761

pw>0.05 0.0002 52 13038snver

ALL 0.6006 11904 16729pw<=0.05 0.9934 11812 5470

pw>0.05 0.0007 92 11259

From the O’Rawe et al. generated resultsFBVC = frequency-based variant caller (Lyons-Weiler et al.)

Page 19: Avoiding Nonsense Results in your NGS Variant Studies

Signal Tx %ConcordanceFBVC_vs_FBVC Marked ALL 85.64

pw<=0.05 91.08pw>0.05 35.66

FBVC_vs_FBVC Realigned ALL 83.82pw<=0.05 91.69

pw>0.05 28.21FBVC_vs_FBVC Recalibrated ALL 93.14

pw<=0.05 ***99.39pw>0.05 48.53

FBVC_vs_FBVC Reduced ALL 21.54pw<=0.05 24.57

pw>0.05 4.25FBVC_vs_FBVC Marked-Realigned ALL 76.91

pw<=0.05 86.11pw>0.05 15.44

FBVC_vs_FBVC Marked-Realigned-Recalibrated ALL 76.73pw<=0.05 85.99

pw>0.05 15.34

FBVC_vs_FBVC Marked-Realigned-Recalibrated-Reduced ALL 19.98pw<=0.05 22.9

pw>0.05 2.66

Page 20: Avoiding Nonsense Results in your NGS Variant Studies
Page 21: Avoiding Nonsense Results in your NGS Variant Studies

TRUTH (BIOLOGICAL MOLECULAR SEQUENCE)

SEQUENCER

MAPPER

VARIANT CALLERS

LOW CONCORDANCE (O’Rawe et al., 2013)

Consensus Analysise.g.,2/3, ¾, set analysis

Information Theory(-> modeling)

Improve Callers(fix errors, modeling) Bake Offs

Simulations

Spiked Ins

Page 22: Avoiding Nonsense Results in your NGS Variant Studies

Lifescope reads (read)

Shrimp2 reads (blue)

Mappers must be systematically evaluated

Page 23: Avoiding Nonsense Results in your NGS Variant Studies

Part 2: Good and Bad News forRNASeq (and everything else):

The Bad News:

Fold Change is Biased.

The Good News:

We have identified a much less biased method.

Page 24: Avoiding Nonsense Results in your NGS Variant Studies

T-test is not appropriatefor small N, large P data

(such as RNASeq)

Page 25: Avoiding Nonsense Results in your NGS Variant Studies

Fold Change > 2.0

Delta > 25

Page 26: Avoiding Nonsense Results in your NGS Variant Studies

FC(A/B) is Blind to Large Portionsof Your Data

FC(A/B)

Delta(and J5: Patel & Lyons-Weiler, 2004)

Page 27: Avoiding Nonsense Results in your NGS Variant Studies

Ratio are Hard to Interpret asBiological Differences

Gene A B delta (A-B) FC(A/B)

gene1 5 3 2 1.667

gene2 50 30 20 1.667

gene3 500 300 200 1.667

gene4 5000 3000 2000 1.667

gene5 50000 30000 20000 1.667

Page 28: Avoiding Nonsense Results in your NGS Variant Studies

A-B is a differenceA/B is a quotient.

Page 29: Avoiding Nonsense Results in your NGS Variant Studies

Log2 TransformationDoes not Help

Reveals Minor Delta (&J5) Bias

Pink = FC(A/B)Black = Delta

Page 30: Avoiding Nonsense Results in your NGS Variant Studies

G-Thresholding J5

Page 31: Avoiding Nonsense Results in your NGS Variant Studies

FC Bias in Amyotrophic Lateral Sclerosis

0

50000

100000

150000

200000

250000

300000

350000

0 50000 100000 150000 200000

Control

ALS DEGy

FCDEGy

Black circles = FC(A/B). Pink = Gthr-J5 genes

Page 32: Avoiding Nonsense Results in your NGS Variant Studies
Page 33: Avoiding Nonsense Results in your NGS Variant Studies
Page 34: Avoiding Nonsense Results in your NGS Variant Studies

Black circles = FC(A/B). Pink = Gthr-J5 genes

FC(A/B) Bias inAlchohol-Induced Hepatitis

Page 35: Avoiding Nonsense Results in your NGS Variant Studies

Conclusions• Not all NGS/HTS sites have sufficient genotypic signal to warrant

a base call. High coverage alone does not provide a solution.

• By measuring genotypic signal, we can determine which sites we can call with confidence.

• Fold-change(FC(A/B) is blind to highly expressed genes and should be abandoned as a measure of differential expression altogether – even for single gene or single protein studies!

• Published microarray data sets analyzed to date using FC(A/B) only are a gold-mine for re-analysis using less biased methods.

Page 36: Avoiding Nonsense Results in your NGS Variant Studies

Credits and Contact• pw, pHom, etc: James Lyons-Weiler, Alan Twaddle, Rahil Sethi.

– (MS in preparation)– Our software is called Gconf (not yet available)

• Fold-Change Bias: James Lyons-Weiler, Tamanna Sultana, Rick Jordan, Rahil Sethi– (Paper in review)– For now, read

• Mariani TJ, Budhraja V, Mecham BH, Gu CC, Watson MA, Sadovsky Y. 2003. A variable fold change threshold determines significance for expression microarrays. FASEB J. 17:321-3. doi: 10.1096/fj.02-0351fje

• Pearson, K. 1897. On a form of spurious correlation that may arise when indices are used for the measurement of organs. Proc Roy Soc Lond 60:489-498 doi: 10.1098/rspl.1896.0076