cs 1944: sophomore seminar big data and machine learning b. aditya prakash assistant professor nov...
DESCRIPTION
3TRANSCRIPT
CS 1944: Sophomore SeminarBig Data and Machine Learning
B. Aditya PrakashAssistant Professor
Nov 3, 2015
2
About me Assistant Professor, CS– Member, Discovery Analytics Center
Previously– Ph.D. in Computer Science, Carnegie Mellon University– B.Tech in Computer Science and Engg, Indian Institute of Technology (IIT) – Bombay– Internships at Sprint, Yahoo, Microsoft Research
Prakash 2015
3Prakash 2015
4Data contains value and knowledge
Prakash 2015
5
Data and Business
Prakash 2015 Source: A. Machhanavajjhala
6
Data and Science
Prakash 2015
7
Data and Government
Prakash 2015 Source: A. Machhanavajjhala
8
Data and Culture
Prakash 2015 Source: A. Machhanavajjhala
9Prakash 2015
10
Good news: Demand for Data Mining
Prakash 2015
11
How to extract value from data?
Manipulate Data– CS, Domain expertise
Analyze Data– Math, CS, Stat…
Communicate your results– CS, Domain Expertise
Prakash 2015
12
Communication is important!
Prakash 2015
13
What is Data Mining? Given lots of data Discover patterns and models that are:– Valid: hold on new data with some certainty– Useful: should be possible to act on the item – Unexpected: non-obvious to the system– Understandable: humans should be able to
interpret the pattern
Prakash 2015
14
Data Mining Tasks
Descriptive methods– Find human-interpretable patterns that
describe the data• Example: Clustering
Predictive methods– Use some variables to predict unknown
or future values of other variables• Example: Recommender systems
Prakash 2015
ML & Stats.
Comp. Systems
Theory & Algo.
Biology
Econ.
Social Science
Physics
15
Big data
Prakash 2015
16
Data at CS, VT
Knowledge, Information and Data
http://www.cs.vt.edu/undergraduate/tracks/kid
People: Fox, Harrison, Huang, Lu (in NVA), Ramakrishnan (in NVA), Rozovskaya, Prakash
Prakash 2015
17
Courses
Background in some areas: – CS3414 (Numerical Methods); also prob/stat
4000 level– 4244 Internet Software Development– 4604 Database Management Systems– 4624 Capstone (Multimedia, Information Access)– 4634 Design of Information (Capstone)– 4804 AI– 4984 Computational Linguistics (Capstone)
Prakash 2015
18
Discovery Analytics Center
Prakash 2015
19
MY RESEARCH
Prakash 2015
20
Networks are everywhere!
Human Disease Network [Barabasi 2007]
Gene Regulatory Network [Decourty 2008]
Facebook Network [2010]
The Internet [2005]
Prakash 2015
21
What else do they have in common?
Prakash 2015
22
High School Dating Network
Prakash 2015
Bearman et. al. Am. Jnl. of Sociology, 2004. Image: Mark Newman Blue: Male
Pink: Female
Interesting observations?
23
The Internet
Prakash 2015
Skewed DegreesRobustness
24
Karate Club Network
Prakash 2015
25
Dynamical Processes over networks are also everywhere!
Prakash 2015
26
Why do we care? Social collaboration Information Diffusion Viral Marketing Epidemiology and Public Health Cyber Security Human mobility Games and Virtual Worlds Ecology........Prakash 2015
27
Why do we care? (1: Epidemiology)
Dynamical Processes over networks[AJPH 2007]
CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts
Diseases over contact networks
Prakash 2015
28
Why do we care? (1: Epidemiology)
Dynamical Processes over networks
• Each circle is a hospital• ~3000 hospitals• More than 30,000 patients transferred
[US-MEDICARE NETWORK 2005]
Problem: Given k units of disinfectant, whom to immunize?
Prakash 2015
29
Why do we care? (1: Epidemiology)
CURRENT PRACTICE OUR METHOD
~6x fewer!
[US-MEDICARE NETWORK 2005]
Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)Prakash 2015
30
Why do we care? (2: Online Diffusion)> 800m users, ~$1B revenue [WSJ 2010]
~100m active users
> 50m users
Prakash 2015
31
Why do we care? (2: Online Diffusion)
Dynamical Processes over networks
Celebrity
Buy Versace™!
Followers
Social Media MarketingPrakash 2015
Social Biological Contagion
Automatically learnmodels
Prakash 2014 32
33
Why do we care? (3: To change the world?)
Dynamical Processes over networks
Social networks and Collaborative ActionPrakash 2015
34
High Impact – Multiple Settings
Q. How to squash rumors faster?
Q. How do opinions spread?
Q. How to market better?
epidemic out-breaks
products/viruses
transmit s/w patches
Prakash 2015
35
Dynamical Processes = (a lot of) Networks + (some) Time-Series
Prakash 2015
36
Research Theme
DATALarge real-world
networks & processes
ANALYSISUnderstanding
POLICY/ ACTIONManaging
Prakash 2015
37
Research Theme – Public Health
DATAModeling # patient
transfers
ANALYSISWill an epidemic
happen?
POLICY/ ACTION
How to control out-breaks?Prakash 2015
38
Research Theme – Social Media
DATAModeling Tweets
spreading
POLICY/ ACTION
How to market better?
ANALYSIS# cascades in
future?
Prakash 2015
39
A Question How many of you think your friends have more friends
than you?
A recent Facebook study– Examined all of FB’s users: 721 million people with 69 billion
friendships. • about 10 percent of the world’s population!
– Found that user’s friend count was less than the average friend count of his or her friends, 93 percent of the time.
– Users had an average of 190 friends, while their friends averaged 635 friends of their own.
Prakash 2015
40
Possible Reasons?
You are a loner? Your friends are extroverts? There are more extroverts than introverts in
the world?
Prakash 2015
41
Example
Prakash 2015
Source: S. Strogatz, NYT 2012
Average number of friends?
42
Example
Prakash 2015
Source: S. Strogatz, NYT 2012
Average number of friends= ( 1 + 3 + 2 + 2 ) / 4= 2
43
Example
Prakash 2015
Source: S. Strogatz, NYT 2012
Average number of friends= ( 1 + 3 + 2 + 2 ) / 4= 2
Average number of friends of friends
44
Example
Prakash 2015
Source: S. Strogatz, NYT 2012
Average number of friends= ( 1 + 3 + 2 + 2 ) / 4= 2
Average number of friends of friends= (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8= ((1x1) + (3x3) + (2x2) + (2x2))/8
45
Example
Prakash 2015
Source: S. Strogatz, NYT 2012
Average number of friends= ( 1 + 3 + 2 + 2 ) / 4= 2
Average number of friends of friends= (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8= ((1x1) + (3x3) + (2x2) + (2x2))/8= 2.25!
46
Actually it is (almost) always true!
Proof?
Prakash 2015
47
Actually it is (almost) always true!
Proof?
Prakash 2015
48
Actually it is (almost) always true!
Proof?
Prakash 2015
49
Actually it is (almost) always true!
Proof?
Prakash 2015
50
Actually it is (almost) always true!
Proof?
Prakash 2015
Essentially, it is true if there is any spread in # of friends (non-zero variance)!
51
Implications
Immunization – acquaintance immunization• Immunize friend-of-friend
Early warning of outbreaks– Again, monitor friends of friends
Prakash 2015
52
Thanks---Questions?
B. Aditya Prakash3160 F Torgersen [email protected] my homepage for more details and papers: http://www.cs.vt.edu/~badityap
Prakash 2015