fixing the leaks in the pipeline from public genomics data to the clinic

74
fixing the leaks in the genomics

Upload: jtleek

Post on 29-Jul-2015

199 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: Fixing the leaks in the pipeline from public genomics data to the clinic

fixing the leaks in the genomics

Page 2: Fixing the leaks in the pipeline from public genomics data to the clinic

http://jhudatascience.org/

Page 3: Fixing the leaks in the pipeline from public genomics data to the clinic

https://www.coursera.org/specialization/genomics/41

Page 4: Fixing the leaks in the pipeline from public genomics data to the clinic

@simplystatshttp://simplystatistics.org

Page 5: Fixing the leaks in the pipeline from public genomics data to the clinic

@jtleekhttp://www.jtleek.com

Page 6: Fixing the leaks in the pipeline from public genomics data to the clinic

https://www.counsyl.com/

Page 7: Fixing the leaks in the pipeline from public genomics data to the clinic

Their basic pitch was “Genomics is a fraud”

“”

http://www.technologyreview.com/news/535771/a-contrarian-in-biotech/

Page 8: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 9: Fixing the leaks in the pipeline from public genomics data to the clinic

“The explosive growth of next-generation sequencing data submitted into the SRA exceeds the growth rate of storage capacity ”

http://www.ncbi.nlm.nih.gov/pubmed/22009675

Page 10: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 11: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 12: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 13: Fixing the leaks in the pipeline from public genomics data to the clinic

3 costanalyst variationmotivation

Page 14: Fixing the leaks in the pipeline from public genomics data to the clinic

1 cost

Page 15: Fixing the leaks in the pipeline from public genomics data to the clinic

costs

moneyinterpretability

Page 16: Fixing the leaks in the pipeline from public genomics data to the clinic

http://arxiv.org/pdf/math/0606441.pdf

Page 17: Fixing the leaks in the pipeline from public genomics data to the clinic

http://www.ncbi.nlm.nih.gov/pubmed/19276151

Page 18: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 19: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Page 20: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 21: Fixing the leaks in the pipeline from public genomics data to the clinic

http://www.ncbi.nlm.nih.gov/pubmed/25788628

Page 22: Fixing the leaks in the pipeline from public genomics data to the clinic

http://www.ncbi.nlm.nih.gov/pubmed/25788628

Page 23: Fixing the leaks in the pipeline from public genomics data to the clinic

Agilent/Grade 1 Agilent/Grade 3 Illumina/Grade1 Illumina/Grade3

100%

75%

50%

25%

0%

Acc

urac

y

Pam Scaled Pam Unscaled TSP

http://www.ncbi.nlm.nih.gov/pubmed/25788628

Page 24: Fixing the leaks in the pipeline from public genomics data to the clinic

algorithm1.select useful pairs2.screen pairs for association3.build a simple cart predictor

Page 25: Fixing the leaks in the pipeline from public genomics data to the clinic

http://www.ncbi.nlm.nih.gov/pubmed/19276151

Page 26: Fixing the leaks in the pipeline from public genomics data to the clinic

Patil et al. (in prep)

Page 27: Fixing the leaks in the pipeline from public genomics data to the clinic

Patil et al. (in prep)

Page 28: Fixing the leaks in the pipeline from public genomics data to the clinic

Patil et al. (in prep)

Page 29: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Data:

xik

- value for feature i, sample k

yk - group indicator for sample k

TSP is (i,j) pair that maximizes:

|Pr(xik

< xjk

| yk=1) – Pr(x

ik < x

jk | y

k=0)| ⌃ ⌃

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1989150/

Page 30: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

zijk

=1(xik

< xjk

)

E[zijk

|yk] = a

0ij + a

1ijy

k

→ max |a1jk

| = TSP

Patil et al. (in prep)

Page 31: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

• Not the same as TSP• But |â/s.e.(â)| = |û/s.e.(û)|, algebraically• “Variance regularized” TSP• zijk invariant to monotone transformations• Fix parameters → find features

E[yk|z

ijk] = u

0ij + u

1ijz

ijk

Patil et al. (in prep)

Page 32: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

1. Calculate t-statistic for all pairs2. Choose top pair (or covariate)3. Continue for a fixed number of pairs

E[yk|z

ijk] = u

0ij + u

1ijz

ijk

Patil et al. (in prep)

Page 33: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

http://astor.som.jhmi.edu/~marchion//breastTSP.html

Page 34: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

USP7 < RP11-423C15.3

NM_018610 < MTCH1

RND1 < LGALS14

No Recur

No Recur

No Recur

Recur

No Yes

No Yes

No Yes

Page 35: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Page 36: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Mammaprint

Patil et al. (in prep)

Page 37: Fixing the leaks in the pipeline from public genomics data to the clinic

2 analyst variation

Page 38: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 39: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 40: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 41: Fixing the leaks in the pipeline from public genomics data to the clinic

what went wrong?

2things

Page 42: Fixing the leaks in the pipeline from public genomics data to the clinic

what went wrong? transparency

The data/code weren’t reproducible

Page 43: Fixing the leaks in the pipeline from public genomics data to the clinic

what went wrong? transparency

There was a lack of cooperation

Page 44: Fixing the leaks in the pipeline from public genomics data to the clinic

what went wrong? expertise

They used silly prediction rules

(Pr(FEC) = 5/8[Pr(F) + Pr(E) + Pr(C)] – ¼)

Page 45: Fixing the leaks in the pipeline from public genomics data to the clinic

what went wrong? expertise

They had study design problems

(Batch effects)

Page 46: Fixing the leaks in the pipeline from public genomics data to the clinic

what went wrong? expertise

Their predictions weren’t locked down

Today: Pr(FEC) = 0.8Tomorrow: Pr(FEC) = 0.1

Page 47: Fixing the leaks in the pipeline from public genomics data to the clinic

At the end of the day the Pottianalysis was fully reproducible

The problem is that the analysiswas wrong

Page 48: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

http://bit.ly/10vS1yt

Page 49: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

http://bit.ly/OgW3xv

Page 50: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Drinkel et al. Oganometalics 2013

Page 51: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Page 52: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Page 53: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Page 54: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Page 55: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 56: Fixing the leaks in the pipeline from public genomics data to the clinic

http://simplystatistics.tumblr.com/post/19646774024/laws-of-nature-and-the-law-of-patents-supreme-court

Page 57: Fixing the leaks in the pipeline from public genomics data to the clinic

3 motivation

Page 58: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 59: Fixing the leaks in the pipeline from public genomics data to the clinic

$(from reducing sample size)

Page 60: Fixing the leaks in the pipeline from public genomics data to the clinic

basic idearandomization isn’t perfect “rebalance” with baseline covariatesimprove estimator precision

Page 61: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 62: Fixing the leaks in the pipeline from public genomics data to the clinic
Page 63: Fixing the leaks in the pipeline from public genomics data to the clinic

Ack Math!!!!

Page 64: Fixing the leaks in the pipeline from public genomics data to the clinic

Estimate probability of being in arm given baseline covariates

Page 65: Fixing the leaks in the pipeline from public genomics data to the clinic

Calculate initial estimate for each person using each arm model using propensity score weighted logistic regression

Page 66: Fixing the leaks in the pipeline from public genomics data to the clinic

Define a covariate as the residual from fitting the arm-level models minus the arm-level means and fit new propensity models

Page 67: Fixing the leaks in the pipeline from public genomics data to the clinic

Use these propensities to re-fit WLR from (2), then average predictions to get covariate-adjusted treatment effect

Page 68: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

http://astor.som.jhmi.edu/~marchion//breastTSP.html

Page 69: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Age, Tumor Size, Grade 5.1%

Age, Tumor Size, Grade, ER Status

4.9%

Mammaprint Risk Category (MRC)

5.4%

Age, Tumor Size, Grade, ER Status, MRC

7.8%

Page 70: Fixing the leaks in the pipeline from public genomics data to the clinic

@leekgroup

Age, Tumor Size, Grade 5.1%

Age, Tumor Size, Grade, ER Status

4.9%

Mammaprint Risk Category (MRC)

5.4%

Age, Tumor Size, Grade, ER Status, MRC

7.8%

Age, Tumor Size, Grade, ER Status, TSP

6.2%

Page 71: Fixing the leaks in the pipeline from public genomics data to the clinic

3 costanalyst variationmotivation

Page 72: Fixing the leaks in the pipeline from public genomics data to the clinic

acknowledgementsLeek groupPrasad PatilLeo Collado TorresAbhi NelloreClaire RubermanJack FuKai Kammers

CollaboratorsMichael RosenblumBenjamin Haibe-KainsP.O. Bachant-WinnerRoger Peng

Page 73: Fixing the leaks in the pipeline from public genomics data to the clinic

Prasad Patilhttp://www.biostat.jhsph.edu/~prpatil/

Page 74: Fixing the leaks in the pipeline from public genomics data to the clinic

Links

https://github.com/leekgroup/sig2trial

http://jtleek.com/talks/