![Page 1: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/1.jpg)
Machine Learning for Data Science (CS4786)Lecture 1
Tu-Th 8:40 AM to 9:55 AMKlarman Hall KG70
Instructor : Karthik Sridharan
![Page 2: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/2.jpg)
Welcome the first lecture!
![Page 3: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/3.jpg)
I know its really early …
![Page 4: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/4.jpg)
And cold out there
![Page 5: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/5.jpg)
![Page 6: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/6.jpg)
Here
![Page 7: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/7.jpg)
Here
But lets have fun!
![Page 8: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/8.jpg)
THE AMAZING TA’S
1 Ian Delbridge
2 William Gao
3 Maithra Raghu
4 Joseph Kim
6 Jakob Kaminsky
7 Kabir Walia
8 Derek Lim
9 Aarushi Parashar
![Page 9: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/9.jpg)
COURSE INFORMATION
Course webpage is the official source of information:http://www.cs.cornell.edu/Courses/cs4786/2020sp
Join Piazza: https://piazza.com/class/k5igx6vp7de1if
TA office hours will start from week 2. Time and locations will beposted on info tab of course webpage
Basic knowledge of python is required.
![Page 10: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/10.jpg)
PLACEMENT EXAM!
Passing the placement exam is required to enroll
Exam can be found at: http://www.cs.cornell.edu/courses/cs4786/2020sp/hw0.html
Upload your solutions in PDF format via the google formindicated in the exam page
Score on the placement exam is only for feedback, does not counttowards grades for the course.
. . . But you need to do adequately well to stay in the course.
![Page 11: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/11.jpg)
COURSE GRADES
Assignments: 28%
Prelims: 25%
Finals: 25%
Competition: 20%
Survey: 2%
![Page 12: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/12.jpg)
ASSIGNMENTS
Total of 4 assignments.
Each worth 7% of the grade
Will be on Vocareum (using Jupyter notebook/python)
Has to be done individually
![Page 13: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/13.jpg)
EXAMS
1 Prelim
On March 26th at URHG01
Worth 25% of the grades.
2 Finals
Not yet on schedule but will be soon . . .
Worth 25% of the grades.
![Page 14: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/14.jpg)
COMPETITIONS
One in-class Kaggle competition worth 20% of the grade
You are allowed to work in groups of at most 4.
Kaggle scores only factor in for part of the grade.
Grades for project focus more on thought process(demonstrated through your reports)
![Page 15: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/15.jpg)
SURVEYS
2 Surveys worth 1% each just for participation
Survey will be anonymous (I will only have a list of students whoparticipated)
Important form of feedback I can use to steer the class
Free forum for you to tell us what you want.
![Page 16: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/16.jpg)
ACADEMIC INTEGRITY
1 0 Tolerance Policy: no exceptions
We have checks in place to look for violations in Vocareum
2 If you use any source (internet, book, paper, or personalcommunication) cite it.
3 When in doubt cite.
![Page 17: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/17.jpg)
SOME INFO ABOUT CLASS . . .
Would really love it to be interactive.
You can feel free to ask me anything
We will have informal quizzes and handouts most classes
![Page 18: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/18.jpg)
Lets get started . . .
![Page 19: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/19.jpg)
DATA DELUGE
Each time you use your credit card: who purchased what, whereand when
Netflix, Hulu, smart TV: what do different groups of people like towatch
Social networks like Facebook, Twitter, . . . : who is friends withwho, what do these people post or tweet about
Millions of photos and videos, many tagged
Wikipedia, all the news websites: pretty much most of humanknowledge
![Page 20: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/20.jpg)
DATA DELUGE
Each time you use your credit card: who purchased what, whereand when
Netflix, Hulu, smart TV: what do different groups of people like towatch
Social networks like Facebook, Twitter, . . . : who is friends withwho, what do these people post or tweet about
Millions of photos and videos, many tagged
Wikipedia, all the news websites: pretty much most of humanknowledge
![Page 21: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/21.jpg)
DATA DELUGE
Each time you use your credit card: who purchased what, whereand when
Netflix, Hulu, smart TV: what do different groups of people like towatch
Social networks like Facebook, Twitter, . . . : who is friends withwho, what do these people post or tweet about
Millions of photos and videos, many tagged
Wikipedia, all the news websites: pretty much most of humanknowledge
![Page 22: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/22.jpg)
DATA DELUGE
Each time you use your credit card: who purchased what, whereand when
Netflix, Hulu, smart TV: what do different groups of people like towatch
Social networks like Facebook, Twitter, . . . : who is friends withwho, what do these people post or tweet about
Millions of photos and videos, many tagged
Wikipedia, all the news websites: pretty much most of humanknowledge
![Page 23: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/23.jpg)
DATA DELUGE
Each time you use your credit card: who purchased what, whereand when
Netflix, Hulu, smart TV: what do different groups of people like towatch
Social networks like Facebook, Twitter, . . . : who is friends withwho, what do these people post or tweet about
Millions of photos and videos, many tagged
Wikipedia, all the news websites: pretty much most of humanknowledge
![Page 24: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/24.jpg)
Guess?
![Page 25: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/25.jpg)
Social Network of Marvel Comic Characters!
by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands
![Page 26: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/26.jpg)
What can we learn from all this data?
![Page 27: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/27.jpg)
WHAT IS MACHINE LEARNING?
Use data to automatically learn to perform tasks better.
Close in spirit to T. Mitchell’s description
![Page 28: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/28.jpg)
WHERE IS IT USED ?
Movie Rating Prediction
![Page 29: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/29.jpg)
WHERE IS IT USED ?
Pedestrian Detection
![Page 30: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/30.jpg)
WHERE IS IT USED ?
Market Predictions
![Page 31: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/31.jpg)
WHERE IS IT USED ?
Spam Classification
![Page 32: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/32.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 33: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/33.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 34: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/34.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 35: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/35.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 36: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/36.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 37: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/37.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 38: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/38.jpg)
MORE APPLICATIONS
Each time you use your search engine
Autocomplete: Blame machine learning for bad spellings
Biometrics: reason you shouldn’t smile
Recommendation systems: what you may like to buy based onwhat your friends and their friends buy
Computer vision: self driving cars, automatically tagging photos
Topic modeling: Automatically categorizing documents/emailsby topics or music by genre
. . .
![Page 39: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/39.jpg)
COURSE SYNOPSIS
Primary focus: Unsupervised learning
Roughly speaking 4 parts:
1 Dimensionality reduction:Principle Components Analysis, Random Projections, CanonicalComponents Analysis, Kernel PCA, tSNE, Spectral Embedding
2 Clustering:single-link, Hierarchical clustering, k-means, Gaussian Mixturemodel
3 Probabilistic models and Graphical modelsMixture models, EM Algorithm, Hidden Markov Model, Graphicalmodels Inference and Learning, Approximate inference
4 Socially responsible MLPrivacy in ML, Differential Privacy, Fairness, Robustness againstpolarization
![Page 40: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/40.jpg)
COURSE SYNOPSIS
Primary focus: Unsupervised learning
Roughly speaking 4 parts:
1 Dimensionality reduction:Principle Components Analysis, Random Projections, CanonicalComponents Analysis, Kernel PCA, tSNE, Spectral Embedding
2 Clustering:single-link, Hierarchical clustering, k-means, Gaussian Mixturemodel
3 Probabilistic models and Graphical modelsMixture models, EM Algorithm, Hidden Markov Model, Graphicalmodels Inference and Learning, Approximate inference
4 Socially responsible MLPrivacy in ML, Differential Privacy, Fairness, Robustness againstpolarization
![Page 41: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/41.jpg)
COURSE SYNOPSIS
Primary focus: Unsupervised learning
Roughly speaking 4 parts:
1 Dimensionality reduction:Principle Components Analysis, Random Projections, CanonicalComponents Analysis, Kernel PCA, tSNE, Spectral Embedding
2 Clustering:single-link, Hierarchical clustering, k-means, Gaussian Mixturemodel
3 Probabilistic models and Graphical modelsMixture models, EM Algorithm, Hidden Markov Model, Graphicalmodels Inference and Learning, Approximate inference
4 Socially responsible MLPrivacy in ML, Differential Privacy, Fairness, Robustness againstpolarization
![Page 42: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/42.jpg)
COURSE SYNOPSIS
Primary focus: Unsupervised learning
Roughly speaking 4 parts:
1 Dimensionality reduction:Principle Components Analysis, Random Projections, CanonicalComponents Analysis, Kernel PCA, tSNE, Spectral Embedding
2 Clustering:single-link, Hierarchical clustering, k-means, Gaussian Mixturemodel
3 Probabilistic models and Graphical modelsMixture models, EM Algorithm, Hidden Markov Model, Graphicalmodels Inference and Learning, Approximate inference
4 Socially responsible MLPrivacy in ML, Differential Privacy, Fairness, Robustness againstpolarization
![Page 43: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/43.jpg)
COURSE SYNOPSIS
Primary focus: Unsupervised learning
Roughly speaking 4 parts:
1 Dimensionality reduction:Principle Components Analysis, Random Projections, CanonicalComponents Analysis, Kernel PCA, tSNE, Spectral Embedding
2 Clustering:single-link, Hierarchical clustering, k-means, Gaussian Mixturemodel
3 Probabilistic models and Graphical modelsMixture models, EM Algorithm, Hidden Markov Model, Graphicalmodels Inference and Learning, Approximate inference
4 Socially responsible MLPrivacy in ML, Differential Privacy, Fairness, Robustness againstpolarization
![Page 44: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/44.jpg)
COURSE SYNOPSIS
Primary focus: Unsupervised learning
Roughly speaking 4 parts:
1 Dimensionality reduction:Principle Components Analysis, Random Projections, CanonicalComponents Analysis, Kernel PCA, tSNE, Spectral Embedding
2 Clustering:single-link, Hierarchical clustering, k-means, Gaussian Mixturemodel
3 Probabilistic models and Graphical modelsMixture models, EM Algorithm, Hidden Markov Model, Graphicalmodels Inference and Learning, Approximate inference
4 Socially responsible MLPrivacy in ML, Differential Privacy, Fairness, Robustness againstpolarization
![Page 45: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/45.jpg)
UNSUPERVISED LEARNING
Given (unlabeled) data, find useful information, pattern or structure
Dimensionality reduction/compression : compress data set byremoving redundancy and retaining only useful information
Clustering: Find meaningful groupings in data
Topic modeling: discover topics/groups with which we can tagdata points
![Page 46: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/46.jpg)
UNSUPERVISED LEARNING
Given (unlabeled) data, find useful information, pattern or structure
Dimensionality reduction/compression : compress data set byremoving redundancy and retaining only useful information
Clustering: Find meaningful groupings in data
Topic modeling: discover topics/groups with which we can tagdata points
![Page 47: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/47.jpg)
UNSUPERVISED LEARNING
Given (unlabeled) data, find useful information, pattern or structure
Dimensionality reduction/compression : compress data set byremoving redundancy and retaining only useful information
Clustering: Find meaningful groupings in data
Topic modeling: discover topics/groups with which we can tagdata points
![Page 48: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/48.jpg)
UNSUPERVISED LEARNING
Given (unlabeled) data, find useful information, pattern or structure
Dimensionality reduction/compression : compress data set byremoving redundancy and retaining only useful information
Clustering: Find meaningful groupings in data
Topic modeling: discover topics/groups with which we can tagdata points
![Page 49: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/49.jpg)
DIMENSIONALITY REDUCTION
Given n data points in high-dimensional space, compress them intocorresponding n points in lower dimensional space.
![Page 50: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/50.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 51: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/51.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 52: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/52.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 53: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/53.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 54: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/54.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 55: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/55.jpg)
We can compress the following images using JPEG?
![Page 56: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/56.jpg)
What if our dataset looked like this?
![Page 57: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/57.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 58: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/58.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 59: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/59.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 60: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/60.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 61: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/61.jpg)
Visualizing Data
![Page 62: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/62.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 63: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/63.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 64: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/64.jpg)
WHY DIMENSIONALITY REDUCTION?
Data compression
Data visualization
For computational ease
As input to supervised learning algorithm
Before clustering to remove redundant information and noise
Noise reduction
![Page 65: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/65.jpg)
DIMENSIONALITY REDUCTION
Desired properties:1 Original data can be (approximately) reconstructed
2 Preserve distances between data points
3 “Relevant” information is preserved
4 Redundant information is removed
5 Models our prior knowledge about real world
Based on the choice of desired property and formalism we getdifferent methods
![Page 66: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/66.jpg)
DIMENSIONALITY REDUCTION
Desired properties:1 Original data can be (approximately) reconstructed
2 Preserve distances between data points
3 “Relevant” information is preserved
4 Redundant information is removed
5 Models our prior knowledge about real world
Based on the choice of desired property and formalism we getdifferent methods
![Page 67: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/67.jpg)
DIMENSIONALITY REDUCTION
Desired properties:1 Original data can be (approximately) reconstructed
2 Preserve distances between data points
3 “Relevant” information is preserved
4 Redundant information is removed
5 Models our prior knowledge about real world
Based on the choice of desired property and formalism we getdifferent methods
![Page 68: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/68.jpg)
DIMENSIONALITY REDUCTION
Desired properties:1 Original data can be (approximately) reconstructed
2 Preserve distances between data points
3 “Relevant” information is preserved
4 Redundant information is removed
5 Models our prior knowledge about real world
Based on the choice of desired property and formalism we getdifferent methods
![Page 69: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/69.jpg)
DIMENSIONALITY REDUCTION
Desired properties:1 Original data can be (approximately) reconstructed
2 Preserve distances between data points
3 “Relevant” information is preserved
4 Redundant information is removed
5 Models our prior knowledge about real world
Based on the choice of desired property and formalism we getdifferent methods
![Page 70: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/70.jpg)
DIMENSIONALITY REDUCTION
Desired properties:1 Original data can be (approximately) reconstructed
2 Preserve distances between data points
3 “Relevant” information is preserved
4 Redundant information is removed
5 Models our prior knowledge about real world
Based on the choice of desired property and formalism we getdifferent methods
![Page 71: Machine Learning for Data Science (CS4786) Lecture 1](https://reader030.vdocuments.site/reader030/viewer/2022012718/61b068b99a18b21aab020a76/html5/thumbnails/71.jpg)
SNEAK PEEK
Linear projections
Principle component analysis