lightning: large scale machine learning in python
TRANSCRIPT
![Page 1: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/1.jpg)
LIGHTNING, A LIBRARY FORLARGE-SCALE MACHINELEARNING IN PYTHON
, Fabian Pedregosa (1) Mathieu Blondel (2)
(1) Chaire Havas-Dauphine / INRIA, Paris France
(2) NTT Communication Science Laboratories, Kyoto Japan
![Page 2: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/2.jpg)
SCIKIT-LEARN: WITH GREAT CODECOMES GREAT RESPONSABILITY
# lines of code in scikit-learn
Very selective for new algorithms/models.
![Page 3: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/3.jpg)
LIGHTNINGIncorporate recent progress in large-scale optimization.
scikit-learn compatible .scalable on large datasets.support for dense and sparse input.emphasis on structured sparsity penalties.
dependencies = Python + Cython + scikit-learn.
![Page 4: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/4.jpg)
SCIKIT-LEARN COMPATIBLE
mix lightning with scikit-learn Pipeline, GridSearchCV,etc.
⟹
![Page 5: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/5.jpg)
FROM LARGE DATA TO LARGEOPTIMIZATION
Big data comes in different flavors.
n{⎛
⎝
⎜⎜⎜⎜
DA
TA
⎞
⎠
⎟⎟⎟⎟
pLarge sample:
Computer vision, advertising,etc.
Large dimension:Biology, neuroscience, etc.
![Page 6: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/6.jpg)
LEARNING FROM LARGE SAMPLESUsual methods (gradient descent, BFGS, etc.):
Pass through the data at each iteration.Prohibitive for large datasets.
Back to simple methods:
Stochastic gradient descent (Robbins and Monro, 1951).
![Page 7: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/7.jpg)
LEARNING FROM LARGE SAMPLES
lighting example, n=100.000
In last 5 years, flurry ofnew stochastic methods:
Stochastic variance-reduced gradient(SVRG)Stochastic DualCoordinate Ascent(SDCA)Stochastic AverageGradient (SAG/SAGA)
They are all in lightning!
![Page 8: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/8.jpg)
LEARNING FROM LARGE FEATURESIterate through the columns.Coordinate Descent-like algorithms.Very efficient for sparse models.
(Blondel et al. 2013) , multiclass classification with group-lasso penalty
![Page 9: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/9.jpg)
STRUCTURED SPARSITYThere's so much more than the Lasso ...
Group sparse penalty.Total variation.Trace norm (low rank).
![Page 10: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/10.jpg)
APISimilarities and differences with scikit-learn
scikit-learn: (penalty = 'l1', )LogisticRegression
loss function
solver='liblinear' algorithm
lightning: (penalty = 'l1', ) CDClassifier
algorithm
loss='log' loss function
API based on algorithms, not models.
![Page 11: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/11.jpg)
EXTENSIBILITYTypical loss and penalties available.Possible to pass custom loss or penalty function
clf = FistaClassifier( loss=my_loss, penalty=my_penalty)
(available for Fista* and SAGA*)
![Page 12: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/12.jpg)
FUTURE CHALLENGESParallel stochastic methods
(Leblond, Pedregosa, Lacoste-Julien 2016)
Out of core (scale beyond computer memory).
![Page 13: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/13.jpg)
SCIKIT-LEARN-CONTRIBlightning is just the beginning.
Welcome projects that are:
Your browser does not support SVG
scikit-learn compatible.Documented.Test coverage > 80%.
![Page 14: Lightning: large scale machine learning in python](https://reader034.vdocuments.site/reader034/viewer/2022042907/58731d7d1a28ab673e8b6b5d/html5/thumbnails/14.jpg)
THANKS FOR YOUR ATTENTIONhttp://contrib.scikit-learn.org/lightning/
(We're hiring!)