social multimedia as sensors
TRANSCRIPT
2
Big Data
Machine Learning
Computer Vision Data Mining
Surveillance video analytics Surgery video analysis
3D scene modelAugmented photographyImage geolocation
Visual recognition using weakly labeled big image data(people, object, action, event, activity)
Social media summarizationMedia-driven recommendation
Cultural influence on social mediaCrowd-sourced learning
Data analytics for healthcare
Nowcasting and forecasting
Multimodal sentiment/affect analysis
Deep user profiling & demographicsIndividual or group behaviors
Wisdom of social
multimedia
Non-contact sensing
Make Computers See Let Data Speak
Medical image analysis
Changing Times
5
3
It’s not just young people
6
7
• A sensor network consists of spatially distributed autonomous sensors to monitor physical or environmental conditions, such as temperature, sound, light, pressure, etc. and to cooperatively pass their data through the network to a main location.
Sensor Networks: Then and Now
autonomous. intelligent. active. mobile. no maintenance. multimodal. social. sentiment .
4
Why Images? Why Multimedia?
• Photography is the only “language" understood in all parts of the world
• “A picture is worth 1000 words”
8
More and More Multimedia in Social Media
Time Social Network & Apps Support of multimedia
2011, June Twitter Release of integrated photo-sharing service
2012, Jan Pinterest Fastest site ever to break through 10 million unique visitor mark
2012, April Instagram Online photo-sharing service (cost Facebook 1 billion dollars)
2012, Nov Snapchat One billion photos have been shared in Snapchat
2013, Jan Twitter (Vine) Twitter released 6-second apps for short looping videos
2013, May Tumblr Support text, images, videos, quotes or links (Yahoo! spent 1.1 $billion in the acquisition)
2013, June Facebook Introduce its own short-video service (in Instagram)
More and more multimedia content will come in social media …
A brief summary of recent influential multimedia platforms
5
What Happens Online Every 60 Seconds
Highlight of multimedia content
216,000 images shared on Instagram
104,000 images on Snapchat
20 million photos viewed on Flickr
20,000 photos uploaded to Tumblr
72 hours of videos are uploaded to Youtube
SRC: http://www.wamda.com/2013/07/what-happens-online-every-60-seconds-infographic
11
What is Social Multimedia?
Andreas M. Kaplan, Michael Haenlein (2010). "Users of the world, unite! The challenges and opportunities of Social Media". Business Horizons 53 (1): 59–68.
A group of Internet‐based applications that build on the ideological and technological foundations of Web 2.0, which allows the creation and exchange of user‐generated content
Social Multimedia = multimedia generated for social interactions
7
14
A Heterogeneous Network Model
15
LikeMiner: The Power of Social InteractionJin, Wang, Luo, Han, KDD 2011
8
17
londoneye, trafalgarsquare, britishmuseum, bigben, towerbridge, piccaillycircus, buckinghampalace, tatemodern, …
Diversified Tourist Trajectory Patterns
(1) Cluster locations mean-shift algorithm (27974 photos in London)
(2) Form sequencesID User Date Sequence
1 Alice 04/26/11 londoneye -> bigben -> downingstreet -> trafalgarsquare
2 Alice 04/27/11 londoneye -> tatemodern -> towerbridge
3 Bob 04/26/11 londoneye -> bigben -> tatemodern
… … … …
Yin, Cao, Han, Luo, Huang, SDM 2011
28
Taking it Further
• To predict other human geography metrics, such as GDP, wealth by region, unemployment rate, stock market sentiment
• To monitor health/disease, environment, ecology, social unrests, …
9
Twitter Health
• You can explore health patterns with the web application at GermTracker
29
Adam Sadilek, Henry Kautz, and Vincent Silenzio, Predicting Disease Transmission from Geo-Tagged Micro-Blog Data, AAAI 2012
30
Two Most Important Social Signals (IMO)
• User
• Sentiment
Events
10
Fine‐Grained User Profiling from Multiple Social Multimedia PlatformsMain Contributors: Quanzeng You, Sumit Bhatia*, Tong Sun* and Jiebo Luo
User Expression & Behavior
(demographics & interests)
32
User Demographics & Interests
13
Pinterest Board Recommendation for Twitter Users
39
User-Curated Image Collections: Modeling and Recommendation
40
15
Sentiment Analysis in Social Media
• Sentiment is arguably the most important signal from social media
– User connections
– User preference
• Most existing methods are based on textual information only– Comments, reviews, textual tweets, and status updates
• Twitter– Easy? Only a limited number of words per tweet
– Difficult? Lots of noise and little information
• Questions– Do users express themselves only using text?
– Can the emerging multimedia content provide additional useful signals?
Textual Sentiment Analysis
• Dictionary-based approach–Lexicon contains a large amount of words with sentiment labels
–Emoticons, widely used in online social networksSimple and effective in most cases, however fail to capture the rich
contextual information
• Semantic analysis–Using NLP related techniques to build more robust featuresDifficult to develop a method that works for all languages
We use sentiment140 for textual sentiment analysis• Using emoticons as auxiliary information
• Open API available
Source: http://www.sentiment140.com
16
Image Tweets
• Image tweets: tweets that contain images
• Different users may prefer different types of tweets
Observation: Users who prefer image tweets tend to have more positive tweets
Results: Advantage over Low-level Methods
S. Siersdorfer, J. Hare, E. Minack, and F. Deng. Analyzing and predicting sentiment ofimages on the social web. In ACM Multimedia 2010, pages 715–718. ACM, October 2010.
Representative attributes:- Aged/worn- Constructing- Smoke- Stressful
Representative attributes:- Glossy- Playing- Socializing- Positive Facial
Emotion
• Low-level visual feature based method [S. Siersdorfer et al.]
– SIFT, Global Color Histogram, Local Color Histogram as features
– Linear SVM as Classifier
• Mid-level result easy to interpret, and amenable to modular learning
– Positive Negative
17
Low-Level and Mid-Level Features
•Low level visual features– HOG: object and human recognition
– GIST: scene recognition
– SSIM: invariant scene layout
– GEO-COLOR-HIST: robust histogram features
•Mid Level visual features (selected attributes)
Mid-level Attribute Classification
•Dataset: SUN dataset (MIT)–102 mid-level attributes [Patterson & Hays]
–Each is labeled by 3 individuals, making votes ranking for 0 to 3
–Consider images with votes of more than 1 as positive samplesas Soft Decision (SD), and votes of more than 2 as positive asHard Decision (HD)
Genevieve Patterson, James Hays. SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes. CVPR 2012.
18
Linking Attributes to Sentiments
• Mutual Information (MI) analysis
• Attributes with the10 highest MI values for SD and HD
TOP 10 Soft Decision Hard Decision
1 congregating railing
2 flowers hiking
3 aged/ worn gaming
4 vinyl/ linoleum competing
5 still water trees
6 natural light metal
7 glossy Tiles
8 open area direct sun/sunny
9 glass aged/ worn
10 ice Constructing
Results: Sentiment Classification
• Comparison between the low-level visual feature based algorithm andmid-level attribute based algorithm
• Comparison between the mid-level visual content based algorithm andtextual content based algorithm [Wilson et al.]
T. Wilson, J. Wiebe, and P. Homann. Recognizing contextual polarity in phrase-levelsentiment analysis. In Proceedings of the conference on Human Language Technologyand Empirical Methods in Natural Language Processing, pages 347-354. Association forComputational Linguistics, 2005.
19
Deep Learning for Image Sentiment Analysis
Convolutional Neural Network for Image Sentiment Analysis• Domain‐transfer Learning;• Boosted Learning using Noisy Labels
Users who like to post many image tweets, they aremore likely to have positive sentiments.
Main Contributors: Quanzeng You, Hailin Jin*, Jianchao Yang*, Jianbo Yuan and Jiebo Luo, AAAI‐2015
Weakly labelled Images
CNN for Visual Sentiment Analysis
Implemented CNN architecture (using Caffe)
256
256
227
227
3
3
227
227
1111
5
5
96
55
55
256
27
27
512 512
24
2
20
Exploiting Noisy Training Data
• Expensive to manually label a large amount of training data
• Possible to gather weakly labeled images
• Progressively selecting the training set
• After each training generation, select i with probability
si is the prediction sentiment score
Progressively Trained CNN
Progressively fine‐tune the neural network
21
What PCNN Learned
Experiments
• Half million weakly labeled Flickr images from Visual Sentiment Ontology
• Two GTX Titan GPUS and 32 GB RAM
• Statistics of the data set
• Performance of CNN and PCNN on the testing images
22
Twitter Testing Data Set
• Employ Amazon Mechanical Turk to manually label selected Twitter images
• Assign 5 workers for each image
• Statistics of the labeling results from AMT workers on 1269 images
Performance on the Twitter Testing Data Set
• Trained models of CNN and PCNN on the Flickr images
• With transfer of knowledge (using 5‐fold cross validation)
23
Examples of Top Ranked Images
• Positive examples and negative examples– left to right: PCNN, CNN, Sentribute, Sentibank, GCH, LCH, GCH+BoW,
LCH+BoW
Joint Visual-Textual Sentiment AnalysisWSDM 2016
• Cross‐modality Consistent Regression
24
Building a Large Scale Dataset for Image Emotion Recognition
• We started from 3+ million weakly labeled images of different emotions and ended up with an AMT manually labeled data set that is 30 times as large as the current largest publicly available visual emotion data set.
• We also performed extensive benchmarking analyses on this large data set using the state of the art methods including CNNs, and established a nontrivial baseline for further research by the community
Main Contributors: Quanzeng You, Hailin Jin*, Jianchao Yang*, Jianbo Yuan and Jiebo Luo, AAAI 2016
What the Language You Tweet Says About Your Occupation
Tianran Hu, Haoyuan Xiao, Jiebo Luo, and Thuy‐vy Thi Nguyen
ICWSM 2016
25
Visualization method described in (Schwartz et al. 2013)
Visualization method described in (Schwartz et al. 2013)
26
Visualization method described in (Schwartz et al. 2013)
Visualization method described in (Schwartz et al. 2013)
27
Visualization method described in (Schwartz et al. 2013)
Visualization method described in (Schwartz et al. 2013)
29
Skill list of one’s LinkedIn profile
How to find one’s occupation
DM ML C++
C SQL … SEO MKTG
…
U1 32 89 4 3 12 … 0 0 …
U2 0 0 32 42 12 … 0 0 …
U3 2 0 0 0 10 … 17 23
User‐skill matrix
30
Apply LDA on user‐skill matrix
A word = a skill
Word count = endorsements of skills
A document = a user’s skill list
A topic = linear combination of skills
DM ML C++
C SQL … SEO MKTG
…
U1 32 89 4 3 12 … 0 0 …
U2 0 0 32 42 12 … 0 0 …
U3 2 0 0 0 10 … 17 23
about.me is a platform that connect the same user’s multiple social media accounts.
Language Patterns are extracted from tweets
Occupation is extracted form LinkedIn profile
31
Open Vocabulary Approach
1) Collect the tweets of people2) Count the number of word, terms,
topics of each person3) Compute the Correlation between a
person’s count of each word/term/topics and weight on each
Open Vocabulary ApproachInteresting Findings
Administrators Like: “Make a difference”, “courage”, “honor”, “we need to”, “we must”, “we are”Dislike: “blessing”, “pray”, “god’s”, and “can’t”
Start‐upLike: “founders”, “investors”, “growth”, “valuation”, and “companies”, and “silicon” Most dislike: “ I can’t” followed by “I don’t know”
Software EngineerLike: “web”, “UI”, “code”, and “plugin”Dislike: “love this!”, “so excited”, “Sunday”, “girl” ,“her”, and “relationship”
Office ClerkLike: “my life”, and other phrases related to daily life such as “woke up”, “fall asleep”Dislike: “interesting”, “creating”, “great”
(Schwartz et al. 2013)
32
Personality features of different occupations
O: Openness N: Neuroticism A : Agreeableness E : Extraversion C : Conscientiousness
IBM Watson Personality Insights service API is applied to compute the Big Five
Occupation Prediction
1) Using the words a person tweets, we can predict the occupation with reasonable accuracy (data is assumed to be balanced )
1) Software Engineer, Designer, Editor & Writer are relatively easy because they usually use specific words (especially engineer)
33
Sensing from a Distance
[John is holding a gun to his head]Terminator: You cannot self‐terminate.John Connor: No, you can't. I can do anything I want. I'm a human being, not some god‐damn robot.Terminator: [correcting him] Cybernetic organism.John Connor: Whatever! Either we go, and save her Dad, or so much for the Great John Conner. Because your future, my destiny, I want no part in it, I never did.Terminator: Based on your pupil dilation, skin temperature, and motor functions, I calculate an 83% probability that you will not pull the trigger.
Tacking Mental Health
• Motivation– Mental health is a significant problem on the rise with reports of anxiety,
stress, depression, suicide, and violence
– Mental illness has been and remains a major cause of disability, dysfunction, and even violence and crime
• Challenges– Traditional methods of monitoring mental health are expensive, intrusive,
and often geared toward serious mental disorders
– These methods do not scale to a large population of varying demographics, and are not particularly designed for those in the early stages of developing mental health problems
• Opportunities– Advances in computer vision and machine learning, coupled with the
widespread use of the Internet and adoption of social media, are opening doors for a new approach to tackling mental health using physically noninvasive, low-cost multimodal sensors already in people’s daily lives
34
Tackling Mental Health Via Multimodal SensingDawei Zhou, Jiebo Luo, Vincent Silenzio*, Yun Zhou, Jile Hu, Glenn Currier*, Henry Kautz, AAAI‐2015
Innovation
• Extracting fine-grained psycho-behavioral signals that reflect the mental state of the subject from imagery unobtrusively captured by the webcams built in most mobile devices (laptops, tablets, and smartphones). We develop robust computer vision algorithms to monitor real-time psycho-behavioral signals including the heart rate, eye blink rate, pupil variations, head movements, and facial expressions of the users.
• Analyzing effects from personal social media stream data, which may reveal the mood and sentiment of its users. We measure the mood and emotion of the subject from the social media posted by the subject as a prelude to assessing the effects of social contacts and context within such media.
• Establishing the connection between mental health and multimodal signals extracted unobtrusively from social media and webcams using machine learning methods.
35
Multimodal (Weak) Signals
84
0 10 20 300
2
4
6
8
10
12
Time (min)
Hea
d M
ovem
ent R
ate
PositiveNeutralNegative
0 10 20 30 40 50 60 70 80-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (min)
Pup
il D
iam
eter
Positive Neutral Negative
0 5000 10000 15000 20000 25000 30000 35000 40000
Mouse Wheel
Mouse Moving Distance
Mouse Click
Key Stroke
PositiveNeutralNegative
Pattern Classification and Mining
85
36
Experiments
• Experiment I– 27 participants (16 females and 11
males), including undergraduate students, PhD students, and faculties, with different backgrounds in terms of education, income, and disciplines. The age of the participants ranges from 19 to 33, consistent with the age of the primary users of social media.
• Experiment II – 5 depression patients (2
severe/suicidal and 3 moderate) and a control group of five normal users. FaceTime with doctors.
• Approaching Terminator T-600 (83%)
86
Table 3. Leave-One-Subject-Out Test for Experiment 1.
TP FP Prec. Rec. F-1 AUC
Negative 0.89 0.08 0.82 0.89 0.84 0.95
Neutral 0.56 0.13 0.67 0.56 0.59 0.79
Positive 0.78 0.17 0.76 0.78 0.75 0.91
Table 4. Leave-One-Subject-Out Test for Experiment 2.
Patients vs.
Control in
positive
mood
Patients vs.
Control in
negative mood
Patients vs.
Control in
neutral
mood
precision 0.814 0.817 0.813
recall 0.674 0.738 0.717
Deployment
• Physically non-invasive– Detecting emotional information
using both online social media and passive sensors
– No specialized wired or wireless invasive sensors
– Can potentially enhance the effectiveness and quality of new services delivered online or via mobile devices in current depression patient care.
– Can incorporate other sensors
• Mobile app– Self-awareness
– Self-management
– Informed intervention
87
38
Understanding the Pulse of Our Society• Social interactions and social activities• Public health surveillance• Web sentiment analysis and trend prediction• Cyber terrorism, extremism, and activism• Fads and infectious ideas• Marketing intelligence analytics • Traffic and human mobility patterns• Human and environment• Social unrest, protest and riot
Social Multimedia‐based Prediction of Elections
Prediction for the swing states in 2012 US Presidential Election
Social images can act like a prism to reveal split public opinions
Main Contributors: Quanzeng You, Liangliang Cao*, Junhuan Zhu, John R. Smith* and Jiebo Luo
Competitive Vector Auto Regression
Textual and Visual Sentiment
Negative Campaign
39
Fine‐Grained Analysis of the 2016 Election
America Tweets China: Analysis of State and Individual Characteristics Regarding Attitudes towards China
Main Contributors: Yu Wang and Jiebo Luo, IEEE Big Data Conference, 2015
40
Using Social Multimedia to Solve Social Problems Main Contributors: Ran Pang, Jiebo Luo, and Henry Kautz
Drinking Levels among YouthThe CDC 2011 Youth Risk Behavior Survey found that among high school students, during the past 30 days:
• 39% drank some amount of alcohol.• 22% binge drank.• 8% drove after drinking alcohol.• 24% rode with a driver who had been
drinking alcohol.
Consequences of Underage Drinking• School problems, such as higher absence and poor or failing grades.• Social problems, such as fighting, physical and sexual assault.• Legal problems, such as arrest for driving or physically hurting
someone while drunk.• Physical problems, such as hangovers or illnesses.• Unwanted, unplanned, and unprotected sexual activity.• Higher risk for suicide and homicide.• Alcohol‐related car crashes and other unintentional injuries.• Abuse of other drugs.
Using Social Multimedia to Solve Social Problems Main Contributors: Ran Pang, Jiebo Luo, and Henry Kautz
41
Human Behavior
• Distributed Sensing
• Integrated Mining
Time Patten of Underage Alcohol UseMain Contributors: Ran Pang, Jiebo Luo, and Henry Kautz
NYC
ALL
42
Brand Influence in Underage Alcohol UseMain Contributors: Ran Pang, Jiebo Luo, and Henry Kautz
Vodka 1 Vodka 2 Champagne Beer 1 Beer 2
Young Male 6.43% 6.79% 6.10% 13.21% 10.95%
Adult Male 29.69% 42.16% 24.27% 52.41% 51.91%
Young Female
19.76% 15.12% 19.49% 11.58% 12.17%
Adult Female 44.12% 35.93% 50.14% 22.79% 24.97%
EXPERIMENTS (3): Youth Exposure to Alcohol Media
43
• Mining deeper level patterns in terms of factors such as family income, rural vs. urban, coastal vs.
heartland regions, as well as social influence by peers in the social networks
• Combining the proposed approach with surveys, which can be used to verify the findings from
social media data mining.
• Applying this methodology to other social problems that involve youth, such as tobacco, drugs, teen
pregnancy, unsafe sex, unsafe driving, obesity, stress, and depression.
ONGOING DIRECTIONS
Drug Image Classification
• Fine‐tuned CNN• Starting with the pre‐trained VGG Net • Fine‐tuned CNN features + SVM
• Using noisy data downloaded from Google
• Fine‐tuned data statistics• Instagram photos
label pills bottle weed total Non‐drug
# 2421 1233 675 4329 12253
Main Contributors: Xitong Yang, Meredith McCarron, Lacey Kelly, Jiebo Luo
44
Drug Use Patterns from InstagramMain Contributors: Yiheng Zhou, Numair Sani, Jiebo Luo
Big cities vs. Small cities
1. Different mobility patterns?
2. Exciting vs. Routine?
3. Stressful vs. Relaxed?
4. Fast vs. Slow?
Geo-tagged social media makes it possible to understand various life styles in different cities at scale
46
Human Mobility and Human‐Environment InteractionMain Contributors: Yuncheng Li, Jifei Huang, and Jiebo Luo (ICIMCS 2015)
Geotagged tweets
Morning and evening rush hours Haze Dehazed
Experiments: Metrics
• Spearman correlation coefficients
– rank correlation
• Haze level:
– ordinal data
– sign is irrelevant
• The metric:
– absolute spearman coefficients
47
When Do Luxury Cars Hit the Road?
From Catwalk to Main StreetMain Contributors: Kezhen Chen, Kuan‐Ting Chen*, Peizhong Cong, Winston Hsu, Jiebo Luo (MM ‘15 Grand Challenge)
Motivations• In modern times, a growing number of people pay more attention to fashion and the mass has the penchant to emulate what large city residents and celebrities wear
• Investigating fashion trends is of great interest to the industry and academia because of the potential for boosting many emerging applications, such as clothing recommendation, advertising by clothing brand association, etc.
Approach1. Constructing a large dataset from the New York
Fashion Shows and New York street chic in order to understand the likely clothing fashion trends in New York
2. Utilizing a learning‐based approach to discover fashion attributes as the representative characteristics of fashion trends, and
3. Comparing the analysis results from the New York Fashion Shows and street‐chic images to verify whether the fashion shows have actual influence on the public in New York City.
48
From Catwalk to Main StreetMain Contributors: Kezhen Chen, Kuan‐Ting Chen*, Peizhong Cong, Jiebo Luo
Where to go for dinner tonight?
1. Different occasions
2. Different ambience
3. Formal, causal, trendy, fun?
4. Friends, couple, family?
53
RECAP: Organic Sensor Networks
• 52% of adults use online social networks (teenagers?)
– Smartphone access (>1 billion)• Real time (“In the moment”)
• Location aware
• Detailed measurements at a population scale
– No active user participation• Fine granularity
• Timely
• Multimodal sensing from multimedia data
– Advantages• autonomous. intelligent. active. mobile. no maintenance.
multimodal. social. sentiment.
– Inference & Prediction141
• Heterogeneous multimedia
• Heterogeneous information networks
• Intelligence from visual data is increasingly crucial
• From information fusion to information integration
• Separating wheat from chaff
• Geospatial analysis
• Mobile platform/context
• (Data) Rich gets richer!
• Tighter collaboration between industry and academia
• Continuing search for killer apps142
Trends and Open Issues