data management & data visualization...last year 86 students dm written score dm project score dm...
TRANSCRIPT
-
Data Management &
Data Visualization
-
• Teaching team
• Course goal and organization
• Final exam
• Experience from the past
Outline
-
• Data management
– Prof. Andrea Maurino (lead professor) [email protected]
– Dott. Anisa Rula (assistant professor) [email protected]
– Dott. Vincenzo Cutrona (laboratory) [email protected]
• Data Visualization
– Dott. Federico Cabitza (Associate professor)
– Mister X
Teaching Team
-
• Main topic: DATA LIFE CYCLE
– Data science not only BIG DATA
Course organization
Data management
Data management
Data Visualization
Data visualizationstorytelling
- Machine Learningand decision models
- Statistical modelling
-
Data management
Capture
Store- Hadoop- DataBase Management System relational or NoSQL
Analyze Use
- Download
- SQL queries
- API
- (Web) Scraper
- Spark- SQL- NoSQL
Query languages
- Python- R- …
- Tableau
Process
- Batch- Stream
Enrich
- Quality- Integration
-
Data Visualization
-
• Davy Cielen, Arno D. B. Meysman, and Mohamed Ali. Introducing Data Science, manning, 2016
• Harrison Next Generation Databases, Apress, 2015.
• Rezzani, Big Data Analytics, APOGEO 2017
• No need to buy these textbooks but we will take
inspiration and material from them.
Textbooks
-
• A written exame (40% of global grade)
• One common project (60% of global grade)– Find a issue →discover datasets, select them, acquisition,
clean data, (integration), store, query (descriptive not predictive)
– Preliminary exploration, storytelling (of a subset of data)
• Both the written score and the project one are valid for one academic year– It is possible to split the two part of the exam
Exam
Data managementData visualization
-
• The project must to be preapproved by the teacher
• From 1 to max 3 students
• At least 2 of 3 V
• The finale report, code and data must be shared via google drive with the teacher within the day of the written exam
Minimum requirements for the
project
PICK TWO!
Volume (at least 2gb of data)
Velocity (real time collection analysis)
Variety (at least 2 differentsource of data with differentdescription of format)
Social listening
-
• http://www.infodata.ilsole24ore.com/2018/06/28/si-costruisce-lartista-musicale-successo-chiedilo-spotify/
Spotywhy
-
• https://www.infodata.ilsole24ore.com/2018/07/08/spagna-portogallo-finita-3-3-sui-social-cosa-successo-2/
Social Listening
-
• What, when and where italian use tweets in August
August
-
• Train, auditel, Emmy awards, atp cincinnati, e-sports, Trivago & booking, Criptocurrency, stocks, amazon, medial data…
• Where I can find dataset to find some idea?
– Open data portal
– Kaggle
– Ask the teacher!
Other example
-
• Microsoft azure
– One virtual pc with 8GB ram , Intel I7, 1Tb hdd
– Some low cost pcs for collecting data
Virtual lab
-
Last year
86 studentsDM written score
DM project score DM score
data viz score Final grade
% of student 87,21% 81,40% 80,23% 80,23% 79,07%
Average grade 25,14 28,33 27,32 27,32 27,84
Known issues- exam procedure (project)- virtual lab- teaching (both dm and dv)- sharing lectures
Codice e Denominazione della
AD
Frequenza
ConoscenzePreliminari
MaterialeDidattico
Chiarezze Modalita'd'Esame
Rispetto degli Orari
Stimolare l'Interesse
degli studenti
Esposizione
Utilita'della
Didattica Integrativ
a
Coerenza con
quanto dichiarato in offerta
Reperibilita del
docente
Interesse per la materia
Soddisfazione
Complessiva
Efficacia Didattica
Aspettiorganizzat
ivi
[F9101Q003] DATA MANAGEMENT AND VISUALIZATION Freq 1,24 1,2 1,19 2,22 2,09 1,81 1,35 1,94 2,26 2,57 1,48 2,02 1,64[F9101Q003] DATA MANAGEMENT AND VISUALIZATION
Non_Freq 1,48 1,29 1,24 2,13 2,24 1,38 2,13 1,26