the adaptive lasso and its oracle properties hui zou, 2006bapoczos/other_presentations/... ·...

40
The Adaptive Lasso and its Oracle Properties Hui Zou, 2006 Beyond Lasso Reading Group UofA, Edmonton Barnabás Póczos April 2, 2009

Upload: others

Post on 29-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

The Adaptive Lasso and its Oracle Properties

Hui Zou, 2006

Beyond Lasso Reading GroupUofA, Edmonton

Barnabás Póczos

April 2, 2009

Page 2: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

2

Brief Summary

Necessary conditions for Lasso variable selection to be consistent.

Scenarios where Lasso variable selection is inconsistent.

Lasso cannot be an oracle procedure. Consistent variable selection Performs as well as if true model were given

) new version of Lasso, Adaptive Lasso.

Adaptive Lasso enjoys the oracle properties.

Page 3: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

3

Brief Summary

Adaptive Lasso can be solved by the same algorithms as Lasso.

Adaptive Lasso can be extended to Generalized Linear Models, and the oracle properties will still hold.

Nonnegative Garotte is shown to be consistent for variable selection.

Page 4: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

4

Notations

Page 5: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

5

Definitions

Convergence in Probability

Convergence in Distribution

Weak convergence of cdf-s

Remark:

The cdf-s are weakly convergent

Page 6: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

6

Oracle Property

Definition: method δ is an oracle procedure if

For an optimal procedure we need continuous shrinkage property, too.

Identifies the right subset model

Optimal root-n estimation rate

Page 7: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

7

Lasso

Regularization technique for simultaneous estimation and variable selection (L1 penalization is also called basis pursuit)

Theoretical work Pro: L1 approach is able to discover the right sparse

representations (Donoho et. al. 2002, 2004)

Contra: Variable selection with Lasso can and cannot be consistent (Meinshausen and Bühlmann, 2004)

Page 8: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

8

Asymptotic Properties of Lasso

Setup from Knight and Fu (2000)

Page 9: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

9

Asymptotic Properties of Lasso (Knight & Fu, 2000)

Definition: Lasso variable selection is consistent

Page 10: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

10

Asymptotic Properties of Lasso (Knight & Fu, 2000)

Lemma 1 (about estimation consistency)

Page 11: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

11

Asymptotic Properties of Lasso (Knight & Fu, 2000)

Lemma 2 (about root-n estimation consistency)

Page 12: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

12

Lasso Variable Selection can be Inconsistent

Proposition 1 (root-n estimation consistent Lasso cannot be selection consistent…grrr..)

Page 13: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

13

Asymptotic Properties of Lasso

Lemma 3 (about not too slow, not too fast λn)

Page 14: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

14

Summay of Lemmas 1 - 3

Page 15: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

15

Necessary Condition for Selection Consistency

The previous lemmas investigated the estimation consistency. We need conditions for selection consistency, too.

Theorem 1 (Necessary condition for selection consistency)

When this condition fails ) Lasso selection is not consistent

Further results on model selection consitency: Meinshausen & Bühlman (2004)

Zhao & Yu (2006) + Zhao’s JMLR article published today

Page 16: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

16

Example when Lasso Selection is Not Consistent

Corollary 1 (An example when Lasso selection is not consistent)

Page 17: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

17

Special Cases

Orthogonal design guarantees the consistency of Lasso selection.

Two features (p=2) Necessary condition is satisfied. 9 closed form Lasso solution. If λn is appropriately chosen ) selection is consistent.

Page 18: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

18

Consistent Selection vs Optimal Predicition

Meinshausen & Bühlman: In Lasso there is a conflict between

Optimal prediction Consistent variable selection

Optimal λ for prediction gives inconsistent variable selection results, many noise features will be included in the predictive models.

Page 19: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

19

Weighted Lasso

We have shown that Lasso cannot be an oracle procedure. Let us try to fix this.

Definition: Weighted Lasso

Proposition: w weights are data dependent and cleverly chosen ) Weighted Lasso can have oracle properties.

Page 20: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

20

Adaptive Lasso (AdaLasso)

Remark: Adaptive Lasso is still a convex optimization problem, LARS can be used for solving it.

Page 21: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

21

The Oracle Properties of Adaptive Lasso

Theorem 2 Width the proper choice of λn the Adaptive Lasso enjoys the oracle properties.

Page 22: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

22

The Oracle Properties of Adaptive Lasso

Remarks

Page 23: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

23

Thresholding Functions

Hard Bridge, L0.5 Lasso

SCAD Adaptive Lasso, γ=0.5 Adaptive Lasso, γ=2

Page 24: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

24

γ=1 and the Nonnegative Garotte Problem

The Nonnegative Garotte Problem (NGP).

NGP can be used for sparse modeling.

Corollary 2: The NGP is consistent for variable selection with proper λn.

Page 25: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

25

The Nonnegative Garotte Problem, Remarks

Yuan & Lin also proved the variable selection consistency of NGP

NGP can be considered as an Adaptive Lasso with γ=1, and additional sign constraints.

Proof:

Page 26: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

26

Computations

The Adaptive Lasso problem is a convex problem and can be solved e.g. by LARS.

Algorithm:

Page 27: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

27

Adaptive Lasso, Tuning

Tuning: we want to find the optimal (γ, λn) for a given n.) two-dimensional cross validation.(For a given γ LARS can calculate the entire path for λn)

In principle we can replace β(OLS) to any other consistent estimator (e.g. β(Ridge)).) three-dimensional cross validation (β, γ, λn)…

In case of collinearity β(Ridge) can be more stable than β(OLS).

Page 28: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

28

Other Oracle methods

Oracle procedure goal Simultaneously achieving:

Consistent variable selection Optimal estimation prediction

Other oracle methods: Bach: Bolasso Fan & Li: SCAD Bridge (q<1)

Oracle properties do not hold for Lasso usually

Page 29: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

29

Numerical Experiments

We compare Adaptive Lasso, Lasso, SCAD, NGP

Adaptive Lasso: OLS for adaptive weighting + LARS Lasso: LARS SCAD: LQA algorithm of Fan & Lin (2001) NGP: Quadratic Programming

Page 30: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

30

Model 0, Inconsistent Lasso

As n increases, σ decreases the variable selection problem is expected to become easier

The empirical probability of the solution path is containing the true model (100 repetitions)

Page 31: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

31

Numerical Experiments, Relative Prediction Error

Model 1

Model 2

Page 32: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

32

Relative Prediction Error, Observations

Lasso performs best when SNR is low, (σ is high) Oracle methods are more accurate when SNR is high (σ

small) Adaptive Lasso seems to be able to combine the strength of

Lasso and SCAD Medium SNR ) Adaptive Lasso outperforms SCAD and

Garotte High SNR ) adaptive Lasso outperforms Lasso Adaptive Lasso is more stable than SCAD None of the four methods can universally dominate the

three others

Page 33: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

33

Performance in Variable Selection

All of them can identify the true significant variables. Lasso tends to select two noise variables into the final model.

Page 34: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

34

Extension to Exponential Family and GLM

We can generalize AdaLasso from linear regression with Gaussian noise to Generalized Linear Models (GLM).

GLM is related to the Exponential Family:

Instead of the generative model with additive noise we had before…

Page 35: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

35

Extension to GLM

Page 36: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

36

The Fisher Information matrix

Page 37: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

37

The Oracle Properties of AdaLasso on GLM

Theorem 4 Under some mild regularity conditions the AdaLasso estimates enjoys the oracle properties, if λn is chosen appropriately.

Page 38: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

38

GLM results, Logistic regression

AdaLassoSCADLasso

Page 39: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

39

Not Discussed Parts from the Article

Standard Error Formula Definitions in Tibshirani (1996)

Near Minimax Optimality Definitions in Donoho and Johnstone (1994)

Proofs Bit sketchy… Heavily use results from other articles Some of these results can be found in other articles.

Page 40: The Adaptive Lasso and its Oracle Properties Hui Zou, 2006bapoczos/other_presentations/... · 2009-04-03 · 2 Brief Summary Necessary conditions for Lasso variable selection to be

40

Conclusions

Proposed Adaptive Lasso for simultaneous. Estimation Variable selection

Lasso variable selection can be inconsistent. AdaLasso enjoys the oracle properties using

weighted L1 penalty. AdaLasso is still convex problem, LARS can be used

for solving it. Oracle properties does NOT automatically result in

optimal prediction: Lasso can be advantageous in difficult noisy problems.