lasso regression: some recent developmentsstat.rutgers.edu/iob/bioconf07/slides/madigan.pdflasso...
TRANSCRIPT
![Page 1: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/1.jpg)
Lasso Regression: SomeRecent Developments
David MadiganSuhrid BalakrishnanRutgers Universitystat.rutgers.edu/~madigan
![Page 2: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/2.jpg)
•Linear model for log odds of categorymembership:
Logistic Regression
log = ∑ βj xij = βxi
p(y=1|xi)
p(y=-1|xi)
![Page 3: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/3.jpg)
Maximum Likelihood Training
• Choose parameters (βj's) that maximizeprobability (likelihood) of class labels (yi's)given documents (xi’s)
• Tends to overfit• Not defined if d > n• Feature selection
![Page 4: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/4.jpg)
• Shrinkage methods allow a variable to be partlyincluded in the model. That is, the variable isincluded but with a shrunken co-efficient
• Avoids combinatorial challenge of featureselection
• L1 shrinkage/regularization + feature selection
• Expanding theoretical understanding
• Empirical performance
Shrinkage Methods
![Page 5: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/5.jpg)
!=
"p
j
js
1
2#
Maximum likelihood plus a constraint:
Ridge Logistic Regression
Maximum likelihood plus a constraint:
Lasso Logistic Regression
!=
"p
j
js
1
#
![Page 6: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/6.jpg)
s
![Page 7: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/7.jpg)
1/s
![Page 8: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/8.jpg)
![Page 9: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/9.jpg)
Bayesian Perspective
![Page 10: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/10.jpg)
• Open source C++ implementation. Compiledversions for Linux, Windows, and Mac (soon)
• Binary and multiclass, hierarchical, informativepriors
• Gauss-Seidel co-ordinate descent algorithm
• Fast? (parallel?)
• http://stat.rutgers.edu/~madigan/BBR
Implementation
![Page 11: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/11.jpg)
AleksJakulin’sresults
![Page 12: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/12.jpg)
1-of-K Sample Results: brittany-l1-of-K Sample Results: brittany-l
5249223.9All words
2205727.63suff+POS+3suff*POS+Argamon
1297627.93suff*POS
867628.73suff
365534.92suff*POS
184940.62suff
55450.91suff*POS
12164.21suff
4475.1POS
38074.8“Argamon” functionwords, raw tf
Number ofFeatures
%errors
Feature Set
89 authors with at least 50 postings. 10,076 training documents, 3,322 test documents.
BMR-Laplace classification, default hyperparameter
4.6 million parameters
Madigan et al. (2005)
![Page 13: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/13.jpg)
• Standard “ICISS” score poorly calibrated
• Lasso logistic regression with 2.5M predictors:
Risk Severity Score for Trauma
Burd and Madigan (2006)
![Page 14: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/14.jpg)
Monitoring Spontaneous Drug Safety Reports
• Focus on 2X2 contingency table projections
– 15,000 drugs * 16,000 AEs = 240 million tables– Shrinkage methods better than e.g. chi square tests– “Innocent bystander”– Regression makes more sense– Regress each AE on all drugs
![Page 15: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/15.jpg)
• Lasso not always consistent for variable selection
• SCAD (Fan and Li, 2001, JASA) consistent but non-convex
• relaxed lasso (Meinshausen and Buhlmann),adaptive lasso (Wang et al) have certainconsistency results
• Zhao and Yu (2006) “irrepresentable condition”
“Consistency”
![Page 16: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/16.jpg)
• If there are many correlated features, lasso givesnon-zero weight to only one of them
• Maybe correlated features (e.g. time-ordered)should have similar coefficients?
Fused Lasso
Tibshirani et al. (2005)
![Page 17: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/17.jpg)
![Page 18: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/18.jpg)
• Suppose you represent a categorical predictorwith indicator variables
• Might want the set of indicators to be in or out
Group Lasso
Yuan and Lin (2006)
regular lasso:
group lasso:
![Page 19: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/19.jpg)
![Page 20: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/20.jpg)
![Page 21: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/21.jpg)
• Vaccinate macaques with varying doses;subsequently “challenge” with anthrax spores
• Are measurable aspects of the state of theimmune system predictive of survival?
• Problem: hundreds of different assay timepointsbut fewer than one hundred macaques
Anthrax Vaccine Study in Macaques
![Page 22: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/22.jpg)
Immunoglobulin G
(antibody)
![Page 23: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/23.jpg)
ED50
(toxin-neutralizingantibody)
![Page 24: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/24.jpg)
IFNeli
(interferon - proteinsproduced by theimmune system)
![Page 25: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/25.jpg)
L1 Logistic Regression-imputation
-common weeks only (0,4,8,26,30,38,42,46,50)
-no interactions
bbrtrain -p 1 -s --autosearch --accurate commonBBR.txt commonBBR.mod
IGG_38 -0.16 (0.17)
ED50_30 -0.11 (0.14)
SI_8 -0.09 (0.30)
IFNeli_8 -0.07 (0.24)
ED50_38 -0.03 (0.35)
ED50_42 -0.03 (0.36)
IFNeli_26 -0.02 (0.26)
IL4/IFNeli_0 +0.04 (0.36)
![Page 26: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/26.jpg)
![Page 27: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/27.jpg)
Balakrishnan and Madigan (2006)
Functional Decision Trees
![Page 28: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/28.jpg)
Group Lasso, Non-Identity
•multivariate power exponential prior
•KKT conditions lead to an efficient andstraightforward block coordinate descentalgorithm, similar to Tseng and Yun (2006).
![Page 29: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/29.jpg)
“soft fusion”
![Page 30: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/30.jpg)
![Page 31: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/31.jpg)
• Group lasso
• Non-diagonal K to incorporate, e.g., serialdependence
• For macaque example, within group have:
(block diagonal K)
• Search for partitions that maximize a modelscore/average over partitions
LAPS: Lasso with Attribute Partition Search
β1 β2 βd
![Page 32: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/32.jpg)
• Currently use a BIC-like scoreand/or test accuracy
• Hill-climbing vs. MCMC/BMA
• Uniform prior on partitionspace
• Consonni & Veronese (1995)
LAPS: Lasso with Attribute Partition Search
![Page 33: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/33.jpg)
![Page 34: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/34.jpg)
![Page 35: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/35.jpg)
• Rigorous derivation of BIC and df
• Prior on partitions
• Better search strategies for partition space
• Out of sample predictive accuracy
• LAPS C++ implementation
Future Work
![Page 36: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/36.jpg)
• Predictive modeling with 105-107 predictorvariables is feasible and sometimes useful
• Google builds ad placement models with 108
predictor variables
• Parallel computation
Final Comments
![Page 37: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/37.jpg)
Backup Slides
![Page 38: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/38.jpg)
IgG ED50 SI IL6mIL4IFNm
Group Lasso with Soft Fusion
IL4eli
![Page 39: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/39.jpg)
LAPS: Bell-Cylinder example
![Page 40: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/40.jpg)
LAPS Simulation StudyX ~ N(0,1)^15 (iid, uncorrelated attributes)Beta = one of three conditions (corresponding to Sim1, Sim2 and Sim3)Small (or SM) => small sample = 50 observationsLarge (or LG) => large sample = 500 observationsTrue betas (used to simulate data)Adjusted so that Bayes error (on a large dataset) ~=0.20 SIM1 SIM2 SIM3 (favors BBR) (fv GR. Lasso, kij=0) (fv Fused Gr Lasso, kij->1) 1.1500 0 0 0 -1.1609 -0.9540 0.5750 0.5804 -0.9540 -0.2875 -0.8706 -0.9540 0 0.5804 -0.9540 0 0 0 -0.2875 0 0 0.5750 0 0 0 -0.5804 -0.4770 1.1500 0.2902 -0.4770 0 -1.1609 -0.4770 -1.1500 0 0 0 0 0 0 0.8706 0.7155 -0.8625 -0.2902 0.7155
![Page 41: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/41.jpg)
![Page 42: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/42.jpg)
![Page 43: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/43.jpg)
![Page 44: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/44.jpg)
Priors (per D.M. Titterington)
![Page 45: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/45.jpg)
Genkin et al. (2004)
![Page 46: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/46.jpg)
ModApte: Bayesian Perspective Can Help(training: 100 random samples)
93.572.0Laplace & DK-based mode
87.165.3Laplace & DK-based variance
76.237.2Laplace
ROCMacro F1
Dayanik et al. (2006)
![Page 47: Lasso Regression: Some Recent Developmentsstat.rutgers.edu/iob/bioconf07/slides/Madigan.pdfLasso Regression: Some Recent Developments David Madigan Suhrid Balakrishnan Rutgers University](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b07a7a17f8b9a520e8b7258/html5/thumbnails/47.jpg)