introduction of data science
TRANSCRIPT
![Page 2: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/2.jpg)
Agenda
What is big data
What is data science
Data science applications
System infrastructure
Case study – recommendation system
![Page 3: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/3.jpg)
![Page 4: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/4.jpg)
Data Scientist
Analytics
ArtificialIntelligenceStatistics
Natural Language ProcessingFeature Engineering
ScientificMethod
Simulation
Data & Text Mining
Machine LearningPredictiveModeling
GraphAnalytics
Data Management
Data Warehousing
Mashups
Databases
Business IntelligenceBig Data
Information Retrieval
Art & Design
Business Mindset
ComputerScience
Visualization
Communication
Data Product Design
Domain Knowledge
Ethics
Privacy & Security
Programming
Cloud Computing Distributed SystemsTechnology & Infrastructure
GrowthHacking
Social network
Public Relation
Online ToolsResource
![Page 5: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/5.jpg)
Data Science Applications
Recommendation System
Self-driving
Text Cognition
Spam Filtering
![Page 6: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/6.jpg)
https://en.wikipedia.org/wiki/Data_science#/media/File:Data_visualization_process_v1.png
![Page 7: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/7.jpg)
Machine Learning AlgorithmSupervised
learning
Regression
Classification
Neural network,
deep learning
Unsupervised learning
Clustering
![Page 8: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/8.jpg)
Recommendation SystemAre a subclass of information filtering system that seek to predict the “rating” or “preference” that a user would give to an item ---- Wikipedia
![Page 9: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/9.jpg)
Case Study
![Page 10: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/10.jpg)
AlgorithmsCollaborative filtering
Content-based recommendation
Learning to rank
Context-aware recommendation
Social network recommendation
![Page 11: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/11.jpg)
Collaborative FilteringBasic Assumption• Users with similar interests have common
preference• Sufficiently large number of user preferences are
available
Main Approaches• User-based• Item-based
![Page 12: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/12.jpg)
User-based Filtering
User user-item rating
matrix
Make user-to-user
correlations
Find highly correlated
users
Recommend items to
![Page 13: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/13.jpg)
Item-based Filtering
User user-item ratings matrix
Make item-to-item correlations
Find items that are highly corated
Recommend items with highest correlation
![Page 14: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/14.jpg)
Steps in item-based CF
Predicted rating for item 2 for user 1
![Page 15: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/15.jpg)
Problem with Collaborative Filtering
New user cold start problem
New item cold start problem
Popularity bias: tend to recommend only popular items
Sparsity problem: if there are many items to be recommended, user/rating matrix is sparse and it hard to find the users who have rated the same item
![Page 16: Introduction of Data Science](https://reader035.vdocuments.site/reader035/viewer/2022062900/58d19c761a28ab6f6b8b4fcd/html5/thumbnails/16.jpg)