coresets for bayesian logistic regression · •our proposal: use data summarization for fast,...

71
Coresets for Bayesian Logistic Regression ITT Career Development Assistant Professor, MIT Tamara Broderick With: Jonathan H. Huggins, Trevor Campbell

Upload: others

Post on 05-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets for Bayesian Logistic Regression

ITT Career Development Assistant Professor,

MIT

Tamara Broderick

With: Jonathan H. Huggins, Trevor Campbell

Page 2: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Bayesian inference

1

Page 3: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modularBayesian inference

1

Page 4: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertaintiesBayesian inference

1

Page 5: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

Page 6: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

p(✓|y) /✓ p(y|✓)p(✓)• Complex, modular; coherent uncertainties; prior info

Bayesian inference

1

Page 7: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

p(✓|y) /✓ p(y|✓)p(✓)• Complex, modular; coherent uncertainties; prior info

Bayesian inference

1

Page 8: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

p(✓|y) /✓ p(y|✓)p(✓)

Page 9: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMCp(✓|y) /✓ p(y|✓)p(✓)

Page 10: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMC: Accurate but can be slow [Bardenet, Doucet, Holmes 2015]

p(✓|y) /✓ p(y|✓)p(✓)

Page 11: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMC: Accurate but can be slow • (Mean-field) variational Bayes: (MF)VB

[Bardenet, Doucet, Holmes 2015]

p(✓|y) /✓ p(y|✓)p(✓)

Page 12: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMC: Accurate but can be slow • (Mean-field) variational Bayes: (MF)VB

• Fast

[Bardenet, Doucet, Holmes 2015]

p(✓|y) /✓ p(y|✓)p(✓)

Page 13: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMC: Accurate but can be slow • (Mean-field) variational Bayes: (MF)VB

• Fast, streaming, distributed

[Bardenet, Doucet, Holmes 2015]

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

p(✓|y) /✓ p(y|✓)p(✓)

Page 14: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMC: Accurate but can be slow • (Mean-field) variational Bayes: (MF)VB

• Fast, streaming, distributed

(3.6M) (350K)

[Bardenet, Doucet, Holmes 2015]

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

p(✓|y) /✓ p(y|✓)p(✓)

Page 15: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMC: Accurate but can be slow • (Mean-field) variational Bayes: (MF)VB

• Fast, streaming, distributed

• Misestimation & lack of quality guarantees[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

(3.6M) (350K)

[Bardenet, Doucet, Holmes 2015]

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

p(✓|y) /✓ p(y|✓)p(✓)

Page 16: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMC: Accurate but can be slow • (Mean-field) variational Bayes: (MF)VB

• Fast, streaming, distributed

• Misestimation & lack of quality guarantees[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011; Fosdick 2013; Dunson 2014;

Bardenet, Doucet, Holmes 2015]

(3.6M) (350K)

[Bardenet, Doucet, Holmes 2015]

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

p(✓|y) /✓ p(y|✓)p(✓)

Page 17: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMC: Accurate but can be slow • (Mean-field) variational Bayes: (MF)VB

• Fast, streaming, distributed

• Misestimation & lack of quality guarantees[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011; Fosdick 2013; Dunson 2014;

Bardenet, Doucet, Holmes 2015; Opper, Winther 2003; Giordano, Broderick, Jordan 2015]

(3.6M) (350K)

[Bardenet, Doucet, Holmes 2015]

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

p(✓|y) /✓ p(y|✓)p(✓)

Page 18: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees

• Complex, modular; coherent uncertainties; prior infoBayesian inference

1

• MCMC: Accurate but can be slow • (Mean-field) variational Bayes: (MF)VB

• Fast, streaming, distributed

• Misestimation & lack of quality guarantees

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011; Fosdick 2013; Dunson 2014; Bardenet, Doucet, Holmes 2015; Opper, Winther 2003; Giordano, Broderick, Jordan 2015]

(3.6M) (350K)

[Bardenet, Doucet, Holmes 2015]

p(✓|y) /✓ p(y|✓)p(✓)

Page 19: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Data summarization

2

Page 20: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Data summarization• Exponential family likelihood

p(y|x, ✓) =NY

n=1

exp [T (yn, xn) · ⌘(✓)]

= exp

"(NX

n=1

T (yn, xn)

)· ⌘(✓)

#

2

p(y1:N |x1:N , ✓) =

Page 21: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Data summarization• Exponential family likelihood

p(y|x, ✓) =NY

n=1

exp [T (yn, xn) · ⌘(✓)]

= exp

"(NX

n=1

T (yn, xn)

)· ⌘(✓)

#

Sufficient statistics

2

p(y1:N |x1:N , ✓) =

Page 22: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Data summarization• Exponential family likelihood

p(y|x, ✓) =NY

n=1

exp [T (yn, xn) · ⌘(✓)]

= exp

"(NX

n=1

T (yn, xn)

)· ⌘(✓)

#

Sufficient statistics

2

p(y1:N |x1:N , ✓) =

Page 23: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Data summarization• Exponential family likelihood

p(y|x, ✓) =NY

n=1

exp [T (yn, xn) · ⌘(✓)]

= exp

"(NX

n=1

T (yn, xn)

)· ⌘(✓)

#

Sufficient statistics

• Scalable, single-pass, streaming, distributed, complementary to MCMC

2

p(y1:N |x1:N , ✓) =

Page 24: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Data summarization• Exponential family likelihood

p(y|x, ✓) =NY

n=1

exp [T (yn, xn) · ⌘(✓)]

= exp

"(NX

n=1

T (yn, xn)

)· ⌘(✓)

#

Sufficient statistics

• Scalable, single-pass, streaming, distributed, complementary to MCMC

• But: Often no simple sufficient statistics

2

p(y1:N |x1:N , ✓) =

Page 25: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Data summarization• Exponential family likelihood

p(y|x, ✓) =NY

n=1

exp [T (yn, xn) · ⌘(✓)]

= exp

"(NX

n=1

T (yn, xn)

)· ⌘(✓)

#

Sufficient statistics

• Scalable, single-pass, streaming, distributed, complementary to MCMC

• But: Often no simple sufficient statistics • E.g. Bayesian logistic regression; GLMs; “deeper” models

2

p(y1:N |x1:N , ✓) =

Page 26: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

p(y|x, ✓) =NY

n=1

1

1 + exp(�ynxn · ✓)

Data summarization• Exponential family likelihood

p(y|x, ✓) =NY

n=1

exp [T (yn, xn) · ⌘(✓)]

= exp

"(NX

n=1

T (yn, xn)

)· ⌘(✓)

#

Sufficient statistics

• Scalable, single-pass, streaming, distributed, complementary to MCMC

• But: Often no simple sufficient statistics • E.g. Bayesian logistic regression; GLMs; “deeper” models

• Likelihood

2

p(y1:N |x1:N , ✓) =

p(y1:N |x1:N , ✓) =

Page 27: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Data summarization• Exponential family likelihood

p(y|x, ✓) =NY

n=1

exp [T (yn, xn) · ⌘(✓)]

= exp

"(NX

n=1

T (yn, xn)

)· ⌘(✓)

#

Sufficient statistics

• Scalable, single-pass, streaming, distributed, complementary to MCMC

• But: Often no simple sufficient statistics • E.g. Bayesian logistic regression; GLMs; “deeper” models

• Our proposal: approximate sufficient statistics2

p(y1:N |x1:N , ✓) =

p(y|x, ✓) =NY

n=1

1

1 + exp(�ynxn · ✓)• Likelihood p(y1:N |x1:N , ✓) =

Page 28: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

[Agarwal et al 2005, Feldman & Langberg 2011]

Baseb

all

Curling

3

Page 29: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

[Agarwal et al 2005, Feldman & Langberg 2011]3

Baseb

all

Curling

Page 30: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

[Agarwal et al 2005, Feldman & Langberg 2011]

Baseb

all

Curling

3

Page 31: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

[Agarwal et al 2005, Feldman & Langberg 2011]

Baseb

all

Curling

3

Page 32: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set

[Agarwal et al 2005, Feldman & Langberg 2011]

Baseb

all

Curling

3

Page 33: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality

[Agarwal et al 2005, Feldman & Langberg 2011]

Baseb

all

Curling

3

Page 34: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed

[Agarwal et al 2005, Feldman & Langberg 2011]

Baseb

all

Curling

3

Page 35: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas

[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999]

Baseb

all

Curling

3

Page 36: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas, subsampling

[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999]

Baseb

all

Curling

3

Page 37: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

3

CurlingBas

eball

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas, subsampling

[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999]

Page 38: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas, subsampling

[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999]

Baseb

all

Curling

3

Page 39: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas, subsampling • We develop: coresets for Bayes

[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999; Huggins, Campbell, Broderick 2016]

Baseb

all

Curling

3

Page 40: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas, subsampling • We develop: coresets for Bayes

• Focus on: Logistic regression[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999;

Huggins, Campbell, Broderick 2016]

Baseb

all

Curling

3

Page 41: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas, subsampling • We develop: coresets for Bayes

• Focus on: Logistic regression[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999;

Huggins, Campbell, Broderick 2016]

Baseb

all

3

Curling

Page 42: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas, subsampling • We develop: coresets for Bayes

• Focus on: Logistic regression[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999;

Huggins, Campbell, Broderick 2016]3

CurlingBas

eball

Page 43: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas, subsampling • We develop: coresets for Bayes

• Focus on: Logistic regression[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999;

Huggins, Campbell, Broderick 2016]3

Baseb

all

Curling

Page 44: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Coresets

• Pre-process data to get a smaller, weighted data set • Theoretical guarantees on quality • Fast algorithms; error bounds for streaming, distributed • Cf. data squashing, big data GP ideas, subsampling • We develop: coresets for Bayes

• Focus on: Logistic regression[Agarwal et al 2005, Feldman & Langberg 2011, DuMouchel et al 1999, Madigan et al 1999;

Huggins, Campbell, Broderick 2016]

Baseb

all

Curling

3

Page 45: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

[Huggins, Campbell, Broderick 2016]4

Page 46: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 1: calculate sensitivities of each datapoint

[Huggins, Campbell, Broderick 2016]5

Page 47: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 1: calculate sensitivities of each datapoint

[Huggins, Campbell, Broderick 2016]5

Page 48: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 2: sample points proportionally to sensitivity

[Huggins, Campbell, Broderick 2016]6

Page 49: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 2: sample points proportionally to sensitivity

[Huggins, Campbell, Broderick 2016]6

Page 50: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 3: weight points by inverse of their sensitivity

[Huggins, Campbell, Broderick 2016]7

Page 51: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 3: weight points by inverse of their sensitivity

[Huggins, Campbell, Broderick 2016]7

Page 52: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 4: input weighted points to existing approximate posterior algorithm

[Huggins, Campbell, Broderick 2016]8

Page 53: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 4: input weighted points to existing approximate posterior algorithm

Full MCMCFast!

[Huggins, Campbell, Broderick 2016]8

Page 54: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 4: input weighted points to existing approximate posterior algorithm

Full MCMCFast!

[Huggins, Campbell, Broderick 2016]

webspam!350K points!127 features

8

Page 55: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 4: input weighted points to existing approximate posterior algorithm

> 2 days

Full MCMCFast!

[Huggins, Campbell, Broderick 2016]

webspam!350K points!127 features

8

Page 56: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 4: input weighted points to existing approximate posterior algorithm

> 2 days

Full

Coreset

MCMCFast!

MCMC

Slow

[Huggins, Campbell, Broderick 2016]

webspam!350K points!127 features

8

Page 57: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Step 4: input weighted points to existing approximate posterior algorithm

> 2 days

< 2 hours

Full

Coreset

MCMCFast!

MCMC

Slow

[Huggins, Campbell, Broderick 2016]

webspam!350K points!127 features

8

Page 58: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Theory

[Huggins, Campbell, Broderick 2016]9

Page 59: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Theory

Thm sketch (HCB). Choose ε > 0, δ ∈ (0,1). Our algorithm runs in O(N) time and creates

W.p. 1 - δ, it constructs a coreset with ���ln E � ln E

��� ✏ |ln E|

coreset-size ⇠ const · ✏�2+ log(1/�)

• Finite-data theoretical guarantee

[Huggins, Campbell, Broderick 2016]9

Page 60: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Theory

Thm sketch (HCB). Choose ε > 0, δ ∈ (0,1). Our algorithm runs in O(N) time and creates

W.p. 1 - δ, it constructs a coreset with ���ln E � ln E

��� ✏ |ln E|

coreset-size ⇠ const · ✏�2+ log(1/�)

• Finite-data theoretical guarantee • On the log evidence (vs. posterior mean, uncertainty, etc)

[Huggins, Campbell, Broderick 2016]9

Page 61: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Theory

Thm sketch (HCB). Choose ε > 0, δ ∈ (0,1). Our algorithm runs in O(N) time and creates

W.p. 1 - δ, it constructs a coreset with ���ln E � ln E

��� ✏ |ln E|

coreset-size ⇠ const · ✏�2+ log(1/�)

1. If Di’ is an ε-coreset for Di, then D1’ ⋃ D2’ is an ε-coreset for D1 ⋃ D2.

2. If D’ is an ε-coreset for D and D’’ is an ε’-coreset for D’, then D’’ is an ε’’-coreset for D, where ε’’ = (1 + ε)(1 + ε’) − 1.

• Finite-data theoretical guarantee • On the log evidence (vs. posterior mean, uncertainty, etc)

• Can quantify the propagation of error in streaming and parallel settings

[Huggins, Campbell, Broderick 2016]9

Page 62: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

[Huggins, Adams, Broderick, submitted]10

Page 63: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

• Subset yields 6M data points, 1K features

[Huggins, Adams, Broderick, submitted]10

Page 64: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Polynomial approximate sufficient statistics

• Subset yields 6M data points, 1K features

[Huggins, Adams, Broderick, submitted]10

Page 65: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Polynomial approximate sufficient statistics

• Subset yields 6M data points, 1K features

[Huggins, Adams, Broderick, submitted]10

Page 66: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Polynomial approximate sufficient statistics

• Subset yields 6M data points, 1K features

• Streaming, distributed; minimal communication

[Huggins, Adams, Broderick, submitted]10

Page 67: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Polynomial approximate sufficient statistics

• Subset yields 6M data points, 1K features

• Streaming, distributed; minimal communication

• 24 cores, <20 sec

[Huggins, Adams, Broderick, submitted]10

Page 68: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Polynomial approximate sufficient statistics

• Subset yields 6M data points, 1K features

• Streaming, distributed; minimal communication

• 24 cores, <20 sec • Bounds on Wasserstein

[Huggins, Adams, Broderick, submitted]10

Page 69: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

Conclusions

• Reliable Bayesian inference at scale via data summarization • Coresets, polynomial approximate sufficient statistics • Streaming, distributed

• Challenges and opportunities: • Beyond logistic regression • Generalized linear models; deep models; high-

dimensional models

[Huggins, Campbell, Broderick 2016; Huggins, Adams, Broderick, submitted; Bardenet, Maillard 2015; Geppert, Ickstadt, Munteanu, Quedenfeld, Sohler 2017; Ahfock, Astle, Richardson 2017]11

Page 70: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

T Broderick, N Boyd, A Wibisono, AC Wilson, and MI Jordan. Streaming variational Bayes. NIPS 2013. !T Campbell*, JH Huggins*, J How, and T Broderick. Truncated random measures. Submitted. ArXiv:1603.00861. !R Giordano, T Broderick, and MI Jordan. Linear response methods for accurate covariance estimates from mean field variational Bayes. NIPS 2015. !R Giordano, T Broderick, R Meager, JH Huggins, and MI Jordan. Fast robustness quantification with variational Bayes. ICML Workshop on #Data4Good: Machine Learning in Social Good Applications, 2016. ArXiv:1606.07153. !JH Huggins, T Campbell, and T Broderick. Coresets for scalable Bayesian logistic regression. NIPS 2016.

References

12

Page 71: Coresets for Bayesian Logistic Regression · •Our proposal: use data summarization for fast, streaming, distributed algs. with theoretical guarantees • Complex, modular; coherent

References• PK Agarwal, S Har-Peled, and KR Varadarajan.

Geometric approximation via coresets. Combinatorial and computational geometry, 2005.

• D Ahfock, WJ Astle, S Richardson. Statistical properties of sketching algorithms. arXiv 1706.03665.

• R Bardenet, A Doucet, and C Holmes. On Markov chain Monte Carlo methods for tall data. arXiv, 2015.

• R Bardenet, O-A Maillard, A note on replacing uniform subsampling by random projections in MCMC for linear regression of tall datasets. Preprint.

• CM Bishop. Pattern Recognition and Machine Learning, 2006.

• W DuMouchel, C Volinsky, T Johnson, C Cortes, D Pregibon. Squashing flat files flatter. SIGKDD 1999.

• D Dunson. Robust and scalable approach to Bayesian inference. Talk at ISBA 2014.

• D Feldman and . Langberg. A unified framework for approximating and clustering data. Symposium on Theory of Computing, 2011.13

• B Fosdick. Modeling Heterogeneity within and between Matrices and Arrays, Chapter 4.7. PhD Thesis, University of Washington, 2013.

• LN Geppert, K Ickstadt, A Munteanu, J Quedenfeld, C Sohler. Random projections for Bayesian regression. Statistics and Computing, 2017.

• DJC MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

• D Madigan, N Raghavan, W Dumouchel, M Nason, C Posse, G Ridgeway. Likelihood-based data squashing: A modeling approach to instance construction. Data Mining and Knowledge Discovery, 2002.

• M Opper and O Winther. Variational linear response. NIPS 2003.

• RE Turner and M Sahani. Two problems with variational expectation maximisation for time-series models. In D Barber, AT Cemgil, and S Chiappa, editors, Bayesian Time Series Models, 2011.

• B Wang and M Titterington. Inadequacy of interval estimates corresponding to variational Bayesian approximations. In AISTATS, 2004.