nearest neighbor and kernel survival analysis

36
Nearest Neighbor and Kernel Survival Analysis George H. Chen Assistant Professor of Information Systems Carnegie Mellon University June 11, 2019 Nonasymptotic Error Bounds and Strong Consistency Rates

Upload: others

Post on 13-Nov-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nearest Neighbor and Kernel Survival Analysis

Nearest Neighbor and Kernel Survival Analysis

George H. ChenAssistant Professor of Information Systems

Carnegie Mellon University

June 11, 2019

Nonasymptotic Error Bounds and Strong Consistency Rates

Page 2: Nearest Neighbor and Kernel Survival Analysis

Glutenallergy

Immuno-suppressant

Low resting heart rate High BMIIrregular

heart beatTime of death

Day 2

Day 10

Day ≥ 6

Survival Analysis

Page 3: Nearest Neighbor and Kernel Survival Analysis

Glutenallergy

Immuno-suppressant

Low resting heart rate High BMIIrregular

heart beatTime of death

Day 2

Day 10

Day ≥ 6

Feature vector XX Observed time YY

Survival Analysis

Page 4: Nearest Neighbor and Kernel Survival Analysis

Glutenallergy

Immuno-suppressant

Low resting heart rate High BMIIrregular

heart beatTime of death

Day 2

Day 10

Day ≥ 6

Feature vector XX Observed time YY

Survival Analysis

When we stop collecting training data, not everyone has died!

Page 5: Nearest Neighbor and Kernel Survival Analysis

Glutenallergy

Immuno-suppressant

Low resting heart rate High BMIIrregular

heart beatTime of death

Day 2

Day 10

Day ≥ 6

Goal: Estimate S(t|x) = P(survive beyond time t | feature vector x)

Feature vector XX Observed time YY

Survival Analysis

When we stop collecting training data, not everyone has died!

Page 6: Nearest Neighbor and Kernel Survival Analysis

Problem Setup

Page 7: Nearest Neighbor and Kernel Survival Analysis

Model: Generate data point as follows:(X,Y, �)

Problem Setup

Page 8: Nearest Neighbor and Kernel Survival Analysis

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

Problem Setup

Page 9: Nearest Neighbor and Kernel Survival Analysis

2. Sample time of death T ⇠ PT |X

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

Problem Setup

Page 10: Nearest Neighbor and Kernel Survival Analysis

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

Problem Setup

Page 11: Nearest Neighbor and Kernel Survival Analysis

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

Problem Setup

Page 12: Nearest Neighbor and Kernel Survival Analysis

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

Problem Setup

Y = T, � = 1Set

Page 13: Nearest Neighbor and Kernel Survival Analysis

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

Problem Setup

Y = T, � = 1Set

Page 14: Nearest Neighbor and Kernel Survival Analysis

Estimator (Beran 1981):

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

Problem Setup

Y = T, � = 1Set

Page 15: Nearest Neighbor and Kernel Survival Analysis

Estimator (Beran 1981):

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

x

Problem Setup

Y = T, � = 1Set

Page 16: Nearest Neighbor and Kernel Survival Analysis

find training points closest to . data pointsk

xk

Estimator (Beran 1981):

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

x

Problem Setup

Y = T, � = 1Set

Page 17: Nearest Neighbor and Kernel Survival Analysis

find training points closest to . data pointsk

xk

Estimator (Beran 1981):

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

xKaplan-Meier

estimatorbS(t | x)

Problem Setup

Y = T, � = 1Set

Page 18: Nearest Neighbor and Kernel Survival Analysis

find training points closest to . data pointsk

xk

Estimator (Beran 1981):

Kernel variant is similar

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

xKaplan-Meier

estimatorbS(t | x)

Problem Setup

Y = T, � = 1Set

Page 19: Nearest Neighbor and Kernel Survival Analysis

find training points closest to . data pointsk

xk

Estimator (Beran 1981):

Kernel variant is similar

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

xKaplan-Meier

estimatorbS(t | x)

Error: for time horizon tsupt2[0,⌧ ]

|bS(t|x)� S(t|x)| ⌧

Problem Setup

Y = T, � = 1Set

Page 20: Nearest Neighbor and Kernel Survival Analysis

find training points closest to . data pointsk

xk

Estimator (Beran 1981):

Kernel variant is similar

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

xKaplan-Meier

estimatorbS(t | x)

Error: for time horizon tsupt2[0,⌧ ]

|bS(t|x)� S(t|x)| ⌧

Enough of the n training data have Y values > tY ⌧n

Problem Setup

Y = T, � = 1Set

Page 21: Nearest Neighbor and Kernel Survival Analysis

find training points closest to . data pointsk

xk

Estimator (Beran 1981):

Kernel variant is similar

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

xKaplan-Meier

estimatorbS(t | x)

Error: for time horizon tsupt2[0,⌧ ]

|bS(t|x)� S(t|x)| ⌧

Feature space is separable metric space(intrinsic dimension d)d

Enough of the n training data have Y values > tY ⌧n

Problem Setup

Y = T, � = 1Set

Page 22: Nearest Neighbor and Kernel Survival Analysis

find training points closest to . data pointsk

xk

Estimator (Beran 1981):

Kernel variant is similar

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

xKaplan-Meier

estimatorbS(t | x)

Error: for time horizon tsupt2[0,⌧ ]

|bS(t|x)� S(t|x)| ⌧

Continuous r.v. in time &smooth w.r.t. feature space

(Hölder index a)↵

Feature space is separable metric space(intrinsic dimension d)d

Enough of the n training data have Y values > tY ⌧n

Problem Setup

Y = T, � = 1Set

Page 23: Nearest Neighbor and Kernel Survival Analysis

find training points closest to . data pointsk

xk

Estimator (Beran 1981):

Kernel variant is similar

2. Sample time of death T ⇠ PT |X

3. Sample time of censoring C ⇠ PC|X

Otherwise: Set Y = C, � = 0

Model: Generate data point as follows:(X,Y, �)

1. Sample feature vector X ⇠ PX

4. If death happens before censoring ( ):T C

xKaplan-Meier

estimatorbS(t | x)

Error: for time horizon tsupt2[0,⌧ ]

|bS(t|x)� S(t|x)| ⌧

Borel prob. measureContinuous r.v. in time &

smooth w.r.t. feature space(Hölder index a)↵

Feature space is separable metric space(intrinsic dimension d)d

Enough of the n training data have Y values > tY ⌧n

Problem Setup

Y = T, � = 1Set

Page 24: Nearest Neighbor and Kernel Survival Analysis

Theory (Informal)

Page 25: Nearest Neighbor and Kernel Survival Analysis

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)

Page 26: Nearest Neighbor and Kernel Survival Analysis

If no censoring, problem reduces to conditional CDF estimation

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)

Page 27: Nearest Neighbor and Kernel Survival Analysis

If no censoring, problem reduces to conditional CDF estimation→ Error upper bound, up to a log factor, matches conditional CDF

estimation lower bound by Chagny & Roche 2014

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)

Page 28: Nearest Neighbor and Kernel Survival Analysis

If no censoring, problem reduces to conditional CDF estimation→ Error upper bound, up to a log factor, matches conditional CDF

estimation lower bound by Chagny & Roche 2014

Proof ideas also give finite sample rates for:

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)

Page 29: Nearest Neighbor and Kernel Survival Analysis

If no censoring, problem reduces to conditional CDF estimation→ Error upper bound, up to a log factor, matches conditional CDF

estimation lower bound by Chagny & Roche 2014

Proof ideas also give finite sample rates for:• Kernel Kaplan-Meier estimators

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)

Page 30: Nearest Neighbor and Kernel Survival Analysis

If no censoring, problem reduces to conditional CDF estimation→ Error upper bound, up to a log factor, matches conditional CDF

estimation lower bound by Chagny & Roche 2014

Proof ideas also give finite sample rates for:• Kernel Kaplan-Meier estimators

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

( )• k-NN & kernel Nelson-Aalen cumulative hazard estimators � logS(t | x)

Theory (Informal)

Page 31: Nearest Neighbor and Kernel Survival Analysis

If no censoring, problem reduces to conditional CDF estimation→ Error upper bound, up to a log factor, matches conditional CDF

estimation lower bound by Chagny & Roche 2014

Proof ideas also give finite sample rates for:• Kernel Kaplan-Meier estimators

• Generalization bound for automatic k using validation data

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

( )• k-NN & kernel Nelson-Aalen cumulative hazard estimators � logS(t | x)

Theory (Informal)

Page 32: Nearest Neighbor and Kernel Survival Analysis

If no censoring, problem reduces to conditional CDF estimation→ Error upper bound, up to a log factor, matches conditional CDF

estimation lower bound by Chagny & Roche 2014

Proof ideas also give finite sample rates for:• Kernel Kaplan-Meier estimators

• Generalization bound for automatic k using validation data

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

( )• k-NN & kernel Nelson-Aalen cumulative hazard estimators � logS(t | x)

Most general finite sample theory for k-NN and kernel survival estimators

Theory (Informal)

Page 33: Nearest Neighbor and Kernel Survival Analysis

If no censoring, problem reduces to conditional CDF estimation→ Error upper bound, up to a log factor, matches conditional CDF

estimation lower bound by Chagny & Roche 2014

Proof ideas also give finite sample rates for:• Kernel Kaplan-Meier estimators

• Generalization bound for automatic k using validation data

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

( )• k-NN & kernel Nelson-Aalen cumulative hazard estimators � logS(t | x)

Most general finite sample theory for k-NN and kernel survival estimators

Theory (Informal)

Existing kernel results only for Euclidean space (Dabrowska 1989, Van Keilegom & Veraverbeke 1996, Van Keilegom 1998)

Page 34: Nearest Neighbor and Kernel Survival Analysis

0.60 0.65 0.70c-Lndex

DdDptLve Nernel

rDndom survLvDl Iorest

Nernel (trLDngle) L1

Nernel (trLDngle) L2

Nernel (box) L1

Nernel (box) L2

N-11 (trLDngle) L1

N-11 (trLDngle) L2

N-11 L1

N-11 L2

cox

DDtDset "gbsg2" ConcordDnce IndLces

Experiments

Page 35: Nearest Neighbor and Kernel Survival Analysis

0.60 0.65 0.70c-Lndex

DdDptLve Nernel

rDndom survLvDl Iorest

Nernel (trLDngle) L1

Nernel (trLDngle) L2

Nernel (box) L1

Nernel (box) L2

N-11 (trLDngle) L1

N-11 (trLDngle) L2

N-11 L1

N-11 L2

cox

DDtDset "gbsg2" ConcordDnce IndLces

Experiments

Distance/kernel choice matter a lot

in practice

Page 36: Nearest Neighbor and Kernel Survival Analysis

0.60 0.65 0.70c-Lndex

DdDptLve Nernel

rDndom survLvDl Iorest

Nernel (trLDngle) L1

Nernel (trLDngle) L2

Nernel (box) L1

Nernel (box) L2

N-11 (trLDngle) L1

N-11 (trLDngle) L2

N-11 L1

N-11 L2

cox

DDtDset "gbsg2" ConcordDnce IndLces

Experiments

Distance/kernel choice matter a lot

in practice

Learning the kernel typically has best performance (but no theory yet!)