welcome to the course! · unsupervised learning in r k-means in r > # k-means algorithm with 5...
TRANSCRIPT
![Page 1: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/1.jpg)
UNSUPERVISED LEARNING IN R
Welcome to the course!
![Page 2: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/2.jpg)
Unsupervised Learning in R
Chapter 1 overview● Unsupervised learning
● Three major types of machine learning
● Execute one type of unsupervised learning using R
![Page 3: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/3.jpg)
Unsupervised Learning in R
Types of machine learning● Unsupervised learning
● Finding structure in unlabeled data
● Supervised learning
● Making predictions based on labeled data
● Predictions like regression or classification
● Reinforcement learning
![Page 4: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/4.jpg)
Unsupervised Learning in R
Labeled vs. unlabeled data
Color Shape Size (cm)
Blue Square 10
Red Ellipse 2.4
Red Ellipse 20.7
Group
1
1
2
Features
Obs
erva
tion
s
Sample from Murphy, Machine Learning: A Probabilistic Perspective
Label
Unlabeled dataLabeled data
![Page 5: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/5.jpg)
Unsupervised Learning in R
Unsupervised learning● Finding homogenous subgroups within larger group
People have features such as income, education a!ainment, and gender
![Page 6: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/6.jpg)
Unsupervised Learning in R
Unsupervised learning● Finding homogenous subgroups within larger group
![Page 7: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/7.jpg)
Unsupervised Learning in R
Unsupervised learning● Finding homogenous subgroups within larger group
Subgroup A Subgroup B
![Page 8: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/8.jpg)
Unsupervised Learning in R
Unsupervised learning● Finding homogenous subgroups within larger group
● Clustering
Subgroup A Subgroup B
![Page 9: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/9.jpg)
Unsupervised Learning in R
Clustering examples
![Page 10: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/10.jpg)
Unsupervised Learning in R
Unsupervised learning● Finding homogenous subgroups within larger group
● Clustering
● Finding pa!erns in the features of the data
● Dimensionality reduction
![Page 11: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/11.jpg)
Unsupervised Learning in R
Dimensionality reduction● Find pa!erns in the features of the data
● Visualization of high dimensional data
● Pre-processing before supervised learning
![Page 12: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/12.jpg)
Unsupervised Learning in R
Challenges and benefits● No single goal of analysis
● Requires more creativity
● Much more unlabeled data available than cleanly labeled data
![Page 13: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/13.jpg)
UNSUPERVISED LEARNING IN R
Let’s practice!
![Page 14: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/14.jpg)
UNSUPERVISED LEARNING IN R
Introduction to k-means clustering
![Page 15: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/15.jpg)
Unsupervised Learning in R
● First of two clustering algorithms covered in this course
● Breaks observations into pre-defined number of clusters
k-means clustering algorithm
Subgroup 1
Subgroup 2
![Page 16: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/16.jpg)
Unsupervised Learning in R
k-means in R> # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20)
● One observation per row, one feature per column
● k-means has a random component
● Run algorithm multiple times to improve odds of the best model
![Page 17: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/17.jpg)
Unsupervised Learning in R
First exercises● First exercise uses synthetic data
● Synthetic data generated from 3 subgroups
● Selecting the best number of subgroups for k-means
● Example with more fun data later in the chapter
![Page 18: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/18.jpg)
UNSUPERVISED LEARNING IN R
Let’s practice!
![Page 19: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/19.jpg)
UNSUPERVISED LEARNING IN R
How kmeans() works and practical ma!ers
![Page 20: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/20.jpg)
Unsupervised Learning in R
Objectives● Explain how k-means algorithm is implemented visually
● Model selection: determining number of clusters
![Page 21: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/21.jpg)
Unsupervised Learning in R
![Page 22: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/22.jpg)
Unsupervised Learning in R
![Page 23: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/23.jpg)
Unsupervised Learning in R
![Page 24: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/24.jpg)
Unsupervised Learning in R
![Page 25: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/25.jpg)
Unsupervised Learning in R
![Page 26: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/26.jpg)
Unsupervised Learning in R
![Page 27: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/27.jpg)
Unsupervised Learning in R
![Page 28: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/28.jpg)
Unsupervised Learning in R
![Page 29: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/29.jpg)
Unsupervised Learning in R
Model selection● Recall k-means has a random component
● Best outcome is based on total within cluster sum of squares:
● For each cluster
● For each observation in the cluster
● Determine squared distance from observation to cluster center
● Sum all of them together
![Page 30: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/30.jpg)
Unsupervised Learning in R
Model selection
● Running algorithm multiple times helps find the global minimum total within cluster sum of squares
● You’ll see an example in the exercises
> # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20)
![Page 31: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/31.jpg)
Unsupervised Learning in R
Identical groupings and Total Within SS, but different cluster labels
![Page 32: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/32.jpg)
Unsupervised Learning in R
Determining number of clusters● Trial and error is not the best approach
Elbow at 2
Number of clusters
Tota
l Wit
hin
SSScree plot
![Page 33: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/33.jpg)
UNSUPERVISED LEARNING IN R
Let’s practice!
![Page 34: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/34.jpg)
UNSUPERVISED LEARNING IN R
Introduction to Pokemon data
![Page 35: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/35.jpg)
Unsupervised Learning in R
"Real" data exercise
![Page 36: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/36.jpg)
Unsupervised Learning in R
The Pokemon dataset> head(pokemon) HitPoints Attack Defense SpecialAttack SpecialDefense Speed [1,] 45 49 49 65 65 45 [2,] 60 62 63 80 80 60 [3,] 80 82 83 100 100 80 [4,] 80 100 123 122 120 80 [5,] 39 52 43 60 50 65 [6,] 58 64 58 80 65 80
● Hosted at h!ps://www.kaggle.com/abcsds/pokemon
● More information on Pokemon and these features can be found at h!p://pokemondb.net/pokedex
![Page 37: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/37.jpg)
Unsupervised Learning in R
Data challenges● Selecting the variables to cluster upon
● Scaling the data (will handle in last chapter)
● Determining the number of clusters
● O"en no clean "elbow" in scree plot
● This will be a core part of the exercises
● Visualize the results for interpretation
![Page 38: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/38.jpg)
UNSUPERVISED LEARNING IN R
Let’s practice!
![Page 39: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/39.jpg)
UNSUPERVISED LEARNING IN R
Review of k-means clustering
![Page 40: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/40.jpg)
Unsupervised Learning in R
Chapter review● Unsupervised vs. supervised learning
● How to create k-means cluster model in R
● How k-means algorithm works
● Model selection
● Application to "real" (and hopefully fun) dataset
![Page 41: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/41.jpg)
Unsupervised Learning in R
Coming up: chapter 2
![Page 42: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/42.jpg)
Unsupervised Learning in R
Coming up: chapter 3
![Page 43: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/43.jpg)
Unsupervised Learning in R
Coming up: chapter 4> # Repeat for components 1 and 3 > plot(wisc.pr$x[, c(1, 3)], col = (diagnosis + 1), xlab = "PC1", ylab = "PC3")
![Page 44: Welcome to the course! · Unsupervised Learning in R k-means in R > # k-means algorithm with 5 centers, run 20 times > kmeans(x, centers = 5, nstart = 20) One observation per row,](https://reader035.vdocuments.site/reader035/viewer/2022063006/5fb583fe5be60b07d359fdaa/html5/thumbnails/44.jpg)
UNSUPERVISED LEARNING IN R
Let’s practice!