stanford university :// · serving data, exploratory data analytics, advanced topics on big data...
TRANSCRIPT
![Page 1: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/1.jpg)
Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University
http://www.mmds.org
![Page 2: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/2.jpg)
What is the purpose of big data systems?
To support analysis and knowledge discovery from very
large amounts of data
![Page 3: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/3.jpg)
![Page 4: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/4.jpg)
Data contains value and knowledge
![Page 5: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/5.jpg)
But to extract the knowledge data needs to be
Stored emphasis on this class
Managed emphasis on this class
Analyzed emphasis on this class
Visualized
Data Analytics ≈ Data Mining ≈ Big Data ≈ Predictive Analytics ≈ Data Science
![Page 6: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/6.jpg)
![Page 7: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/7.jpg)
Given lots of data Discover patterns and models that are:
Valid: hold on new data with some certainty
Useful: should be possible to act on the item
Unexpected: non-obvious to the system
Understandable: humans should be able to interpret the pattern
![Page 8: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/8.jpg)
Scalability
Streaming
Context
Quality
Usage
![Page 9: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/9.jpg)
This class stresses more on
Storage Systems
Distributed Computing Platforms
Algorithms, Scalability
Automation for handling large data
Machine
Learning
Visualization
Database Systems
Data Mining
![Page 10: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/10.jpg)
We will learn to process different types of data: Data is high dimensional
Data is a graph
Data is infinite/never-ending
Data is labeled We will learn to use different models of
computation: Distributed (MapReduce)
Streams and online algorithms
Single machine in-memory
![Page 11: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/11.jpg)
Hands-on experience working with systems and tools for storing and processing big data:
MapReduce/Hadoop
Hive/BigQuery
Apache Spark
OpenRefine
…
![Page 12: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/12.jpg)
How do you want that data?
![Page 13: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/13.jpg)
![Page 14: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/14.jpg)
Website http://www.eecs.yorku.ca/~papaggel/courses/eecs4415/
Piazza Q&A website: Available from the website
http://piazza.com/yorku.ca/fall2018/eecs4415
You need to register with your yorku.ca email
Please participate and help each other!
e-mail for personal issues: [email protected]
![Page 15: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/15.jpg)
Course Prerequisites EECS-3421: Introduction to Database Systems
EECS-3101: Design and Analysis of Algorithms
General prerequisites No single topic in the course is too hard by itself But we will cover and touch upon many topics
and this is what makes the course hard Good background in: Database Systems
Algorithms
Programming: You should be able to write non-trivial programs (in Python)
![Page 16: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/16.jpg)
Component I Data-driven Organizations, Data Ingestions, Data Quality, Data Lakes, Data Cleaning
Component IIComputing Platforms, Storage Systems, Distributed Processing Systems (for general-purpose batch data, structured data, graph data, streaming data), Data processing methods (Aggregation, grouping, filtering)
Component IIIServing data, Exploratory Data Analytics, Advanced Topics on Big Data Mining
![Page 17: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/17.jpg)
Work Weight Comment
Weakly Readings 10% 1% each
3 Assignments 30% 10% each
Team Project(team project + source +
report)
30%
proposal: 15%
Project milestone: 25%
Class presentation: 20%
Final report: 40%
Final Exam 30%Final exam grade must be
> 40%
![Page 18: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/18.jpg)
You need to:identify a problemfind datadesign a big data architectureprepare data for analysis process datauncover insightscommunicate critical findingscreate a data-driven solution
+ team-work
![Page 19: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/19.jpg)
Need for data collectionNeed for data storageNeed for data analysisNeed for data visualization (optionally)
Collection Storage Analysis Visualization
…but, more of an iterative process than a sequence
![Page 20: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/20.jpg)
www.kaggle.com
![Page 21: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/21.jpg)
Text Data Multivariate DataNetwork Data
![Page 22: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/22.jpg)
Big Data Systems
Visualization Tools
Distributed Systems
Data Mining &
ML
Exploratory Data Analysis
Databases
![Page 23: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/23.jpg)
Current interest in Data ScienceYou are interested in the general area of data science
Interest in Big Data TechnologiesYou are interested in big data systems and engineering
Interest in Big Data AnalyticsYou are interested in finding interesting patterns and insights in large amounts of data
![Page 24: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/24.jpg)
Data Analytics
+ tools for data analytics
![Page 25: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each](https://reader036.vdocuments.site/reader036/viewer/2022062602/5ed72ffec30795314c175c3f/html5/thumbnails/25.jpg)
Item Comment
Classes Tue @ 19:00-22:00
Classroom SC 216 (Stong College)
Credits 3
Websitehttp://www.eecs.yorku.ca/~papaggel/c
ourses/eecs4415/
Office hourDrop anytime by my office (LAS3050)
or by appointment