machine learning for data science (cs4786) lecture 1 · 2017-08-22 · machine learning for data...
TRANSCRIPT
![Page 1: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/1.jpg)
Machine Learning for Data Science (CS4786)Lecture 1
Tu-Th 11:40AM to 12:55 PMKlarman Hall KG70
Instructor : Karthik Sridharan
![Page 2: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/2.jpg)
THE AWESOME TA’S
1 Caroline Chang
2 Jason Chen
3 Cheng Preng Phoo
4 Pulkit Kashyap
5 Ayush Sekhari
1 Amol Tandel
2 Dilip Thiagarajan
3 Lequn Wang
4 Qiantong Xu
5 Zhao Zhan
![Page 3: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/3.jpg)
COURSE INFORMATION
Course webpage is the official source of information:http://www.cs.cornell.edu/Courses/cs4786/2017fa
Join Piazza: https://piazza.com/class/j6i8gdcku9x7i8
TA office hours will start from next week
While the course is not coding intensive, you will need to do somelight coding.
![Page 4: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/4.jpg)
SYLABUS
1 Dimensionality Reduction:1 Principal Component Analysis (PCA)2 Canonical Component Analysis (CCA)3 Random Projections4 Kernel Methods/Kernel PCA
2 Clustering and More:1 Single Link Clustering2 K-means Algorithm3 Gaussian Mixture Models and Other Mixture Models4 Spectral Clustering
3 Probabilistic Modeling and Graphical Models:1 MLE Vs MAP Vs Bayesian Methods2 EM Algorithm3 Graphical Models
1 Hidden Markov Models2 Latent Dirichlet Allocation3 Exact Inference: Variable Elimination, Belief Propogation4 Learning in Graphical Models5 Approximate Inference
![Page 5: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/5.jpg)
ASSIGNMENTS
Diagnostic assignment 0 is out: for our calibration.
Students who want to take course for credit need to submit this,only then you will be added to CMS.
Submit the assignment online via google form in link given byAugust 29th.
Has to be done individually
Write your full name and net id on the first page of the hand-in.You will be added to cms based on this.
![Page 6: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/6.jpg)
ASSIGNMENTS
Besides the diagnostic assignments, there are 6 other assignmentsto be done individuallyThe 6 assignments are worth 60% of your grades.Rough timeline:
1 Assignment 1: Out: August 31st Due: September 7th
2 Assignment 2: Out: September 12th Due: September 19th
3 Assignment 3: Out: September 21st Due: September 28th
4 Assignment 4: Out: September 28th Due: October 5th
5 Assignment 5: Out: October 17th Due: October 24th
6 Assignment 6: Out: November 2rd Due: November 14th
![Page 7: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/7.jpg)
SURVEYS
Two surveys + final course evaluation
The surveys provide very valuable information about how courseis progressing and helps me improve
All surveys are anonymous and will be on Piazza
If overall class participation is above 90% on all three I will dropworst assignment for everyone (evaluation on best 5 assignments)
![Page 8: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/8.jpg)
COMPETITIONS
2 competition/challenges, worth 40% of total course gradeCompetition I: Clustering challenge (Due Mid Oct)Competition II: Graphical Model centric challenge (Due Nov end)
Will be hosted on “In class Kaggle”!
40% of the competition grades for kaggle score/performance
60% of the competition grades for report.
Mid competition, a one page preliminary report (to be submittedindividually) explaining work done so far by each individual inthe group. Worth 10% of the competition grade.
Groups of size at most 4.
![Page 9: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/9.jpg)
Lets get started . . .
![Page 10: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/10.jpg)
DATA DELUGE
Each time you use your credit card: who purchased what, whereand when
Netflix, Hulu, smart TV: what do different groups of people like towatch
Social networks like Facebook, Twitter, . . . : who is friends withwho, what do these people post or tweet about
Millions of photos and videos, many tagged
Wikipedia, all the news websites: pretty much most of humanknowledge
![Page 11: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/11.jpg)
Guess?
![Page 12: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/12.jpg)
Social Network of Marvel Comic Characters!
by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands
![Page 13: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/13.jpg)
What can we learn from all this data?
![Page 14: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/14.jpg)
WHAT IS MACHINE LEARNING?
Use data to automatically learn to perform tasks better.
Close in spirit to T. Mitchell’s description
![Page 15: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/15.jpg)
WHERE IS IT USED ?
Movie Rating Prediction
![Page 16: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/16.jpg)
WHERE IS IT USED ?
Pedestrian Detection
![Page 17: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/17.jpg)
WHERE IS IT USED ?
Market Predictions
![Page 18: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/18.jpg)
WHERE IS IT USED ?
Spam Classification
![Page 19: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/19.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 20: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/20.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 21: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/21.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 22: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/22.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 23: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/23.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 24: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/24.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 25: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/25.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 26: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/26.jpg)
TOPICS WE WILL COVER
1 Dimensionality Reduction:Principal Component Analysis (PCA), Canonical Component Analysis(CCA), Random projections, Compressed Sensing (CS), . . .
2 Clustering and Mixture models:k-means clustering, gaussian mixture models, single-link clustering,spectral clustering, . . .
3 Probabilistic Modeling & Graphical Models:Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inferenceand learning in graphical models, Latent Dirichlet Allocation (LDA),Hidden Markov Models (HMM), . . .
![Page 27: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/27.jpg)
TOPICS WE WILL COVER
1 Dimensionality Reduction:Principal Component Analysis (PCA), Canonical Component Analysis(CCA), Random projections, Compressed Sensing (CS), . . .
2 Clustering and Mixture models:k-means clustering, gaussian mixture models, single-link clustering,spectral clustering, . . .
3 Probabilistic Modeling & Graphical Models:Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inferenceand learning in graphical models, Latent Dirichlet Allocation (LDA),Hidden Markov Models (HMM), . . .
![Page 28: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/28.jpg)
TOPICS WE WILL COVER
1 Dimensionality Reduction:Principal Component Analysis (PCA), Canonical Component Analysis(CCA), Random projections, Compressed Sensing (CS), . . .
2 Clustering and Mixture models:k-means clustering, gaussian mixture models, single-link clustering,spectral clustering, . . .
3 Probabilistic Modeling & Graphical Models:Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inferenceand learning in graphical models, Latent Dirichlet Allocation (LDA),Hidden Markov Models (HMM), . . .
![Page 29: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/29.jpg)
TOPICS WE WILL COVER
1 Dimensionality Reduction:Principal Component Analysis (PCA), Canonical Component Analysis(CCA), Random projections, Compressed Sensing (CS), . . .
2 Clustering and Mixture models:k-means clustering, gaussian mixture models, single-link clustering,spectral clustering, . . .
3 Probabilistic Modeling & Graphical Models:Probabilistic modeling, MLE Vs MAP Vs Bayesian approaches, inferenceand learning in graphical models, Latent Dirichlet Allocation (LDA),Hidden Markov Models (HMM), . . .
|{z
}
unsupervised
learning
![Page 30: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/30.jpg)
UNSUPERVISED LEARNING
Given (unlabeled) data, find useful information, pattern or structure
Dimensionality reduction/compression : compress data set byremoving redundancy and retaining only useful information
Clustering: Find meaningful groupings in data
Topic modeling: discover topics/groups with which we can tagdata points
![Page 31: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/31.jpg)
UNSUPERVISED LEARNING
Given (unlabeled) data, find useful information, pattern or structure
Dimensionality reduction/compression : compress data set byremoving redundancy and retaining only useful information
Clustering: Find meaningful groupings in data
Topic modeling: discover topics/groups with which we can tagdata points
![Page 32: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/32.jpg)
UNSUPERVISED LEARNING
Given (unlabeled) data, find useful information, pattern or structure
Dimensionality reduction/compression : compress data set byremoving redundancy and retaining only useful information
Clustering: Find meaningful groupings in data
Topic modeling: discover topics/groups with which we can tagdata points
![Page 33: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/33.jpg)
UNSUPERVISED LEARNING
Given (unlabeled) data, find useful information, pattern or structure
Dimensionality reduction/compression : compress data set byremoving redundancy and retaining only useful information
Clustering: Find meaningful groupings in data
Topic modeling: discover topics/groups with which we can tagdata points
![Page 34: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/34.jpg)
DIMENSIONALITY REDUCTION
You are provided with n data points each in Rd
Goal: Compress data into n, points in RK where K << d
Retain as much information about the original data set
Retain desired properties of the original data set
Eg. PCA, compressed sensing, . . .
![Page 35: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/35.jpg)
DIMENSIONALITY REDUCTION
You are provided with n data points each in Rd
Goal: Compress data into n, points in RK where K << d
Retain as much information about the original data set
Retain desired properties of the original data set
Eg. PCA, compressed sensing, . . .
![Page 36: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/36.jpg)
PRINCIPAL COMPONENT ANALYSIS (PCA)
Eigen Face:
Write down each data point as a linear combination of smallnumber of basis vectors
Data specific compression scheme
One of the early successes: in face recognition: classification basedon nearest neighbor in the reduced dimension space
Turk & Pentland’91
![Page 37: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/37.jpg)
PRINCIPAL COMPONENT ANALYSIS (PCA)
Eigen Face:
Write down each data point as a linear combination of smallnumber of basis vectors
Data specific compression scheme
One of the early successes: in face recognition: classification basedon nearest neighbor in the reduced dimension space
Turk & Pentland’91
![Page 38: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/38.jpg)
PRINCIPAL COMPONENT ANALYSIS (PCA)
Eigen Face:
Write down each data point as a linear combination of smallnumber of basis vectors
Data specific compression scheme
One of the early successes: in face recognition: classification basedon nearest neighbor in the reduced dimension space
Turk & Pentland’91
![Page 39: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/39.jpg)
PRINCIPAL COMPONENT ANALYSIS (PCA)
Eigen Face:
Write down each data point as a linear combination of smallnumber of basis vectors
Data specific compression scheme
One of the early successes: in face recognition: classification basedon nearest neighbor in the reduced dimension space
Turk & Pentland’91
![Page 40: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/40.jpg)
CANONICAL COMPONENT ANALYSIS (PCA)
Extract common information between multiple sources views
Noise specific to only one or subset of views is automaticallyfiltered
Success story: Speaker/speach recognition using both audio andvideo data
![Page 41: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/41.jpg)
CANONICAL COMPONENT ANALYSIS (PCA)
Extract common information between multiple sources views
Noise specific to only one or subset of views is automaticallyfiltered
Success story: Speaker/speach recognition using both audio andvideo data
![Page 42: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/42.jpg)
CANONICAL COMPONENT ANALYSIS (PCA)
Extract common information between multiple sources views
Noise specific to only one or subset of views is automaticallyfiltered
Success story: Speaker/speach recognition using both audio andvideo data
![Page 43: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/43.jpg)
CANONICAL COMPONENT ANALYSIS (PCA)
Extract common information between multiple sources views
Noise specific to only one or subset of views is automaticallyfiltered
Success story: Speaker/speach recognition using both audio andvideo data
![Page 44: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/44.jpg)
CANONICAL COMPONENT ANALYSIS (PCA)
Extract common information between multiple sources views
Noise specific to only one or subset of views is automaticallyfiltered
Success story: Speaker/speach recognition using both audio andvideo data
![Page 45: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/45.jpg)
DATA VISUALIZATION
2D projection
Help visualize data (in relation to each other)Preserve relative distances among data-points (at least close byones)
![Page 46: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/46.jpg)
DATA VISUALIZATION
2D projection
Help visualize data (in relation to each other)Preserve relative distances among data-points (at least close byones)
![Page 47: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/47.jpg)
DATA VISUALIZATION
2D projection
Help visualize data (in relation to each other)Preserve relative distances among data-points (at least close byones)
![Page 48: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/48.jpg)
CLUSTERING
K-means clustering
Given just the data points group them in natural clustersRoughly speaking
Points within a cluster must be close to each otherPoints between clusters must be separated
Helps bin data points, but generally hard to do
![Page 49: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/49.jpg)
CLUSTERING
K-means clustering
Given just the data points group them in natural clustersRoughly speaking
Points within a cluster must be close to each otherPoints between clusters must be separated
Helps bin data points, but generally hard to do
![Page 50: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/50.jpg)
TELL ME WHO YOUR FRIENDS ARE . . .
Cluster nodes in a graph.Analysis of social network data.
![Page 51: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/51.jpg)
TELL ME WHO YOUR FRIENDS ARE . . .
Cluster nodes in a graph.Analysis of social network data.
![Page 52: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/52.jpg)
TELL ME WHO YOUR FRIENDS ARE . . .
Cluster nodes in a graph.Analysis of social network data.
![Page 53: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/53.jpg)
TOPIC MODELLING
Probabilistic generative model for documentsEach document has a fixed distribution over topics, each topic ishas a fixed distribution over words belonging to itUnlike clustering, groups are non-exclusive
Blei, Ng & Jordan’06
![Page 54: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/54.jpg)
TOPIC MODELLING
Probabilistic generative model for documentsEach document has a fixed distribution over topics, each topic ishas a fixed distribution over words belonging to itUnlike clustering, groups are non-exclusive
Blei, Ng & Jordan’06
![Page 55: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/55.jpg)
TOPIC MODELLING
Probabilistic generative model for documentsEach document has a fixed distribution over topics, each topic ishas a fixed distribution over words belonging to itUnlike clustering, groups are non-exclusive
Blei, Ng & Jordan’06
![Page 56: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/56.jpg)
TOPIC MODELLING
Probabilistic generative model for documentsEach document has a fixed distribution over topics, each topic ishas a fixed distribution over words belonging to itUnlike clustering, groups are non-exclusive
Blei, Ng & Jordan’06
![Page 57: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/57.jpg)
HIDDEN MARKOV MODEL
Speech data is a stream of data flowing inOnly makes sense to consider entire stream not each bit aloneHidden markov models, capture our belief that we producesound based on phoneme we think ofPhonemes in right sequence model what we want to say
![Page 58: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/58.jpg)
WHAT WE WON’T COVER
Feature extraction is a problem/domain specific art, we won’tcover this in class
We won’t cover optimization methods for machine learning
Implementation tricks and details won’t be covered
There are literally thousands of methods, we will only cover a few!
![Page 59: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/59.jpg)
WHAT WE WON’T COVER
Feature extraction is a problem/domain specific art, we won’tcover this in class
We won’t cover optimization methods for machine learning
Implementation tricks and details won’t be covered
There are literally thousands of methods, we will only cover a few!
![Page 60: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/60.jpg)
WHAT WE WON’T COVER
Feature extraction is a problem/domain specific art, we won’tcover this in class
We won’t cover optimization methods for machine learning
Implementation tricks and details won’t be covered
There are literally thousands of methods, we will only cover a few!
![Page 61: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/61.jpg)
WHAT WE WON’T COVER
Feature extraction is a problem/domain specific art, we won’tcover this in class
We won’t cover optimization methods for machine learning
Implementation tricks and details won’t be covered
There are literally thousands of methods, we will only cover a few!
![Page 62: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/62.jpg)
WHAT YOU CAN TAKE HOME
How to think about a learning problem and formulate it
Well known methods and how and why they work
Hopefully we can give you an intuition on choice ofmethods/approach to try out on a given problem
![Page 63: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/63.jpg)
WHAT YOU CAN TAKE HOME
How to think about a learning problem and formulate it
Well known methods and how and why they work
Hopefully we can give you an intuition on choice ofmethods/approach to try out on a given problem
![Page 64: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/64.jpg)
WHAT YOU CAN TAKE HOME
How to think about a learning problem and formulate it
Well known methods and how and why they work
Hopefully we can give you an intuition on choice ofmethods/approach to try out on a given problem
![Page 65: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/65.jpg)
WHAT YOU CAN TAKE HOME
How to think about a learning problem and formulate it
Well known methods and how and why they work
Hopefully we can give you an intuition on choice ofmethods/approach to try out on a given problem
![Page 66: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/66.jpg)
Clustering
![Page 67: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/67.jpg)
CLUSTERING
Grouping sets of data points s.t.
points in same group are similar
points in different groups are dissimilar
A form of unsupervised classification where there are no predefined labels
![Page 68: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/68.jpg)
CLUSTERING
Partition data into K disjoint groups
Compression or QuantizationCompress n points into K representatives/groups
Visualization or UnderstandingTaxonomy: Animals Vs plants Vs Microbes, Science Vs Math VsSocial SciencesSegmentation: different types of customers, students etc. Findnatural groupings in data
What this does not include: items belonging to more than one type
![Page 69: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/69.jpg)
CLUSTERING
Partition data into K disjoint groups
Compression or QuantizationCompress n points into K representatives/groups
Visualization or UnderstandingTaxonomy: Animals Vs plants Vs Microbes, Science Vs Math VsSocial SciencesSegmentation: different types of customers, students etc. Findnatural groupings in data
What this does not include: items belonging to more than one type
![Page 70: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/70.jpg)
CLUSTERING
Partition data into K disjoint groups
Compression or QuantizationCompress n points into K representatives/groups
Visualization or UnderstandingTaxonomy: Animals Vs plants Vs Microbes, Science Vs Math VsSocial SciencesSegmentation: different types of customers, students etc. Findnatural groupings in data
What this does not include: items belonging to more than one type
![Page 71: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/71.jpg)
CLUSTERING
Partition data into K disjoint groups
Compression or QuantizationCompress n points into K representatives/groups
Visualization or UnderstandingTaxonomy: Animals Vs plants Vs Microbes, Science Vs Math VsSocial SciencesSegmentation: different types of customers, students etc. Findnatural groupings in data
What this does not include: items belonging to more than one type
![Page 72: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/72.jpg)
EXAMPLES
What are the clusters?
![Page 73: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/73.jpg)
EXAMPLES
What are the clusters?
![Page 74: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/74.jpg)
EXAMPLES
What are the clusters?
![Page 75: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/75.jpg)
EXAMPLES
What are the clusters?
![Page 76: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/76.jpg)
EXAMPLES
What are the clusters?
![Page 77: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/77.jpg)
EXAMPLES
What are the clusters?
![Page 78: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/78.jpg)
CLUSTERING
![Page 79: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/79.jpg)
CLUSTERING
X
d
n
![Page 80: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/80.jpg)
CLUSTERING
X
d
n XX
![Page 81: Machine Learning for Data Science (CS4786) Lecture 1 · 2017-08-22 · Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 11:40AM to 12:55 PM ... Join Piazza: https: ... While](https://reader031.vdocuments.site/reader031/viewer/2022022015/5b5abb607f8b9ab8578c83f1/html5/thumbnails/81.jpg)
Can we formalize criterion/objectives for clustering?
. Assume points are represented as vectors
. Use Euclidean distances for now
. Similar points in same cluster
. Points across clusters are dissimilar