nearest neighbor and kernel survival analysis

Nearest Neighbor and Kernel Survival Analysis

George H. ChenAssistant Professor of Information Systems

Carnegie Mellon University

June 11, 2019

Nonasymptotic Error Bounds and Strong Consistency Rates

Glutenallergy

Immuno-suppressant

Low resting heart rate High BMIIrregular

heart beatTime of death

Day 2

Day 10

Day ≥ 6

Survival Analysis

Glutenallergy

Immuno-suppressant



Day 2

Day 10

Day ≥ 6

Feature vector XX Observed time YY

Survival Analysis

Glutenallergy

Immuno-suppressant



Day 2

Day 10

Day ≥ 6


Survival Analysis

When we stop collecting training data, not everyone has died!

Glutenallergy

Immuno-suppressant



Day 2

Day 10

Day ≥ 6

Goal: Estimate S(t|x) = P(survive beyond time t | feature vector x)


Survival Analysis

When we stop collecting training data, not everyone has died!

Problem Setup

Model: Generate data point as follows:(X,Y, �)

Problem Setup


1. Sample feature vector X ⇠ PX

Problem Setup

2. Sample time of death T ⇠ PT |X



Problem Setup


3. Sample time of censoring C ⇠ PC|X



Problem Setup





4. If death happens before censoring ( ):T C

Problem Setup






Problem Setup

Y = T, � = 1Set



Otherwise: Set Y = C, � = 0




Problem Setup

Y = T, � = 1Set

Estimator (Beran 1981):







Problem Setup

Y = T, � = 1Set








x

Problem Setup

Y = T, � = 1Set

find training points closest to . data pointsk

xk








x

Problem Setup

Y = T, � = 1Set


xk








xKaplan-Meier

estimatorbS(t | x)

Problem Setup

Y = T, � = 1Set


xk


Kernel variant is similar







xKaplan-Meier

estimatorbS(t | x)

Problem Setup

Y = T, � = 1Set


xk









xKaplan-Meier

estimatorbS(t | x)

Error: for time horizon tsupt2[0,⌧ ]

|bS(t|x)� S(t|x)| ⌧

Problem Setup

Y = T, � = 1Set


xk









xKaplan-Meier

estimatorbS(t | x)


|bS(t|x)� S(t|x)| ⌧

Enough of the n training data have Y values > tY ⌧n

Problem Setup

Y = T, � = 1Set


xk









xKaplan-Meier

estimatorbS(t | x)


|bS(t|x)� S(t|x)| ⌧

Feature space is separable metric space(intrinsic dimension d)d


Problem Setup

Y = T, � = 1Set


xk









xKaplan-Meier

estimatorbS(t | x)


|bS(t|x)� S(t|x)| ⌧

Continuous r.v. in time &smooth w.r.t. feature space

(Hölder index a)↵



Problem Setup

Y = T, � = 1Set


xk









xKaplan-Meier

estimatorbS(t | x)


|bS(t|x)� S(t|x)| ⌧

Borel prob. measureContinuous r.v. in time &

smooth w.r.t. feature space(Hölder index a)↵



Problem Setup

Y = T, � = 1Set

Theory (Informal)

k-NN estimator with has strong consistency rate:

supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)

If no censoring, problem reduces to conditional CDF estimation


supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)

If no censoring, problem reduces to conditional CDF estimation→ Error upper bound, up to a log factor, matches conditional CDF

estimation lower bound by Chagny & Roche 2014


supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)



Proof ideas also give finite sample rates for:


supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)



Proof ideas also give finite sample rates for:• Kernel Kaplan-Meier estimators


supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

Theory (Informal)





supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))

( )• k-NN & kernel Nelson-Aalen cumulative hazard estimators � logS(t | x)

Theory (Informal)




• Generalization bound for automatic k using validation data


supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))


Theory (Informal)






supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))


Most general finite sample theory for k-NN and kernel survival estimators

Theory (Informal)






supt2[0,⌧ ]

|bS(t|x)� S(t|x)| eO(n�↵/(2↵+d))

k = e⇥(n2↵/(2↵+d))


Most general finite sample theory for k-NN and kernel survival estimators

Theory (Informal)

Existing kernel results only for Euclidean space (Dabrowska 1989, Van Keilegom & Veraverbeke 1996, Van Keilegom 1998)

0.60 0.65 0.70c-Lndex

DdDptLve Nernel

rDndom survLvDl Iorest

Nernel (trLDngle) L1


Nernel (box) L1

Nernel (box) L2

N-11 (trLDngle) L1

N-11 (trLDngle) L2

N-11 L1

N-11 L2

cox

DDtDset "gbsg2" ConcordDnce IndLces

Experiments

0.60 0.65 0.70c-Lndex

DdDptLve Nernel




Nernel (box) L1

Nernel (box) L2

N-11 (trLDngle) L1

N-11 (trLDngle) L2

N-11 L1

N-11 L2

cox


Experiments

Distance/kernel choice matter a lot

in practice

0.60 0.65 0.70c-Lndex

DdDptLve Nernel




Nernel (box) L1

Nernel (box) L2

N-11 (trLDngle) L1

N-11 (trLDngle) L2

N-11 L1

N-11 L2

cox


Experiments

Distance/kernel choice matter a lot

in practice

Learning the kernel typically has best performance (but no theory yet!)

nearest neighbor and kernel survival analysis

Documents