scalable machine learning algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · •...
TRANSCRIPT
![Page 1: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/1.jpg)
Esteban García-Cuesta – Computer Science Department
Scalable Machine LearningAlgorithms
and Applications
PhD. Esteban García-Cuesta
Associate Professor & Head of Data Science Laboratory
Universidad Europea de Madrid
Esteban García-Cuesta – Computer Science Department
![Page 2: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/2.jpg)
Esteban García-Cuesta – Computer Science Department
Professor and Researcher at Universidad Europea de Madrid
Head of Data Science Lab Research Group• Machine Learning and data mining• Affective computing• Dimensionality reduction and latent spaces• Social mining
Contact informationEmail: [email protected]: egarciacuestaTel: +34 912115163
PhD. In Computer Science(Artificial Intelligence) byUniversidad Carlos III de Madrid
![Page 3: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/3.jpg)
Esteban García-Cuesta – Computer Science Department
CANOPY ALS SLMVP
END
![Page 4: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/4.jpg)
Esteban García-Cuesta – Computer Science Department
CANOPYClustering
![Page 5: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/5.jpg)
Esteban García-Cuesta – Computer Science Department
High Dimensional Data
• Given a cloud of data points we want to understand its structure
![Page 6: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/6.jpg)
Esteban García-Cuesta – Computer Science Department
Clustering Images
• Image segmentation• Goal: break up the images into meaningful or perceptually similar regions
Nuclear segmentation in microscope cell images: A hand-segmented dataset and comparison of algorithms" by "Luis Pedro Coelho and Aabid Shariff and Robert F. Murphy"; DOI: 10.1109/ISBI.2009.5193098
![Page 7: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/7.jpg)
Esteban García-Cuesta – Computer Science Department
Clustering Problem: Galaxies (SkyCat)
• A catalog of 2 billion “sky objects” represents objects by theirradiation in 7 dimensions (frequency bands)
• Problem: Cluster into similar objects, e.g., galaxies, nearby stars,quasars, etc.
• Sloan Digital Sky Survey
![Page 8: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/8.jpg)
Esteban García-Cuesta – Computer Science Department
Clustering is a hard problem!
![Page 9: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/9.jpg)
Esteban García-Cuesta – Computer Science Department
Why is it hard?
![Page 10: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/10.jpg)
Esteban García-Cuesta – Computer Science Department
Why is it hard?
• Clustering in two dimensions looks easy
• Clustering small amounts of data looks easy
• And in most cases, looks are not deceiving
• Many applications involve not 2, but 10 or 10,000 dimensions
• High-dimensional spaces look different: Almost all pairs of points areat about the same distance
![Page 11: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/11.jpg)
Esteban García-Cuesta – Computer Science Department
Previous step for reducing the number of operations to be performed by k-
means
Suitable for large data sets (large number of samples)
Results are similar to those provided by k-means itself
Canopy clustering
![Page 12: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/12.jpg)
Esteban García-Cuesta – Computer Science Department
Canopy clustering
![Page 13: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/13.jpg)
Esteban García-Cuesta – Computer Science Department
1
![Page 14: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/14.jpg)
Esteban García-Cuesta – Computer Science Department
2
![Page 15: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/15.jpg)
Esteban García-Cuesta – Computer Science Department
3
![Page 16: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/16.jpg)
Esteban García-Cuesta – Computer Science Department
4
![Page 17: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/17.jpg)
Esteban García-Cuesta – Computer Science Department
5
![Page 18: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/18.jpg)
Esteban García-Cuesta – Computer Science Department
6
![Page 19: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/19.jpg)
Esteban García-Cuesta – Computer Science Department
7
![Page 20: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/20.jpg)
Esteban García-Cuesta – Computer Science Department
8
![Page 21: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/21.jpg)
Esteban García-Cuesta – Computer Science Department
9
![Page 22: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/22.jpg)
Esteban García-Cuesta – Computer Science Department
10
![Page 23: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/23.jpg)
Esteban García-Cuesta – Computer Science Department
11
![Page 24: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/24.jpg)
Esteban García-Cuesta – Computer Science Department
12
![Page 25: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/25.jpg)
Esteban García-Cuesta – Computer Science Department
13
![Page 26: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/26.jpg)
Esteban García-Cuesta – Computer Science Department
14
![Page 27: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/27.jpg)
Esteban García-Cuesta – Computer Science Department
15
![Page 28: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/28.jpg)
Esteban García-Cuesta – Computer Science Department
Assigning points to canopies
![Page 29: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/29.jpg)
Esteban García-Cuesta – Computer Science Department
16
![Page 30: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/30.jpg)
Esteban García-Cuesta – Computer Science Department
17
![Page 31: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/31.jpg)
Esteban García-Cuesta – Computer Science Department
18
![Page 32: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/32.jpg)
Esteban García-Cuesta – Computer Science Department
19
![Page 33: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/33.jpg)
Esteban García-Cuesta – Computer Science Department
20
![Page 34: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/34.jpg)
Esteban García-Cuesta – Computer Science Department
Canopy as initial step for k-means
![Page 35: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/35.jpg)
Esteban García-Cuesta – Computer Science Department
21
![Page 36: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/36.jpg)
Esteban García-Cuesta – Computer Science Department
22
![Page 37: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/37.jpg)
Esteban García-Cuesta – Computer Science Department
23
![Page 38: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/38.jpg)
Esteban García-Cuesta – Computer Science Department
24
![Page 39: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/39.jpg)
Esteban García-Cuesta – Computer Science Department
25
![Page 40: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/40.jpg)
Esteban García-Cuesta – Computer Science Department
26
![Page 41: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/41.jpg)
Esteban García-Cuesta – Computer Science Department
27
![Page 42: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/42.jpg)
Esteban García-Cuesta – Computer Science Department
Summary (Canopy Algorithm)
• Start with a list of data points and two distances T1 > T21. Select any point (at random) for the list to form a canopy center
2. Calculate its distance to all the other points in the list
3. Put all the points which fall within the distance threshold of T1 into a canopy
4. Remove from the main dataset list all the points which fall within thethreshold of T2. These points are excluded from being the center of a formin new canopies.
5. Repeat from step 1 to 4 until original list is empty
Andrew McCallum, Kamal Nigam, and Lyle H. Ungar. 2000. Efficient clustering of high-dimensional data sets with application to reference matching. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '00). ACM, New York, NY, USA, 169-178. DOI=http://dx.doi.org/10.1145/347090.347123
![Page 43: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/43.jpg)
Esteban García-Cuesta – Computer Science Department
The processing is done in 3 M/R steps:
1. The data is massaged into suitable input format
2. Each mapper performs canopy clustering on the points in its input set and outputs its canopies’ centers
3. The reducer clusters the canopy centers to produce the final canopy centers
4. The points are then clustered into these final canopies
Canopy clustering (Parallelization Summary)
![Page 44: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/44.jpg)
Esteban García-Cuesta – Computer Science Department
Co
stFu
nct
ion
#N de clusters
Thumb rule k=(n/2)^0.5
A better approximation
Canopy (How to Choose K?)
![Page 45: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/45.jpg)
Esteban García-Cuesta – Computer Science Department
Co
stFu
nct
ion
#N de clusters
Optimal
Thumb rule k=(n/2)^0.5
A better approximation
Canopy (How to Choose K?)
![Page 46: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/46.jpg)
Esteban García-Cuesta – Computer Science Department
• Check how good are the clusters for the applicationunder use: e.g. Portugal market segmentation
We
igth
Height Height
We
igth
Canopy (How to Choose K?)
![Page 47: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/47.jpg)
Esteban García-Cuesta – Computer Science Department
• Check how good are the clusters for the applicationunder use: e.g. Portugal market segmentation
We
igth
Height Height
We
igth
Canopy (How to Choose K?)
![Page 48: Scalable Machine Learning Algorithmsprojectbasedschool.universidadeuropea.es/blogs/dsl/... · • Machine Learning and data mining • Affective computing • Dimensionality reduction](https://reader036.vdocuments.site/reader036/viewer/2022071213/6037ac99f1f396546f72f593/html5/thumbnails/48.jpg)
Esteban García-Cuesta – Computer Science Department
Copyright
Nota para los usuarios de las diapositivas proporcionadas: Nos encantaríaque este material le resulte útil para dar sus propias conferencias. Siéntaselibre de usar estas diapositivas textualmente, o modificarlas para que seajusten a sus propias necesidades. Los originales de PowerPoint estándisponibles. Si utiliza una parte importante de estas diapositivas en su propiaconferencia o charla, incluya este mensaje.
Note to the users of provided slides: We would be delighted if you foundthis our material useful in giving your own lectures. Feel free to use theseslides verbatim, or to modify them to fit your own needs. PowerPointoriginals are available. If you make use of a significant portion of these slidesin your own lecture, please include this message.