event analytics on social media: challenges and solutions

Download Event Analytics on Social Media: Challenges and  Solutions

If you can't read please download the document

Upload: thetis

Post on 25-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Event Analytics on Social Media: Challenges and Solutions. Yuheng Hu. Committee Members Dr. Subbarao Kambhampati , Chair Dr. Eric Horvitz, Dr. John Krumm Dr. Huan Liu Dr. Hari Sundaram. Since the dawn of civilization, people congregated in town squares to discuss events. - PowerPoint PPT Presentation

TRANSCRIPT

Listening to the Crowd: Automated Analysis of Events via Aggregated Twitter Sentiment

Event Analytics on Social Media: Challenges and SolutionsYuheng Hu

Committee MembersDr. Subbarao Kambhampati, ChairDr. Eric Horvitz, Dr. John Krumm Dr. Huan LiuDr. Hari Sundaram

Since the dawn of civilization, people congregated in town squares to discuss eventsThe emergence of social media has now created a sprawling virtual town square,whose scope is vast, and whose chatter can be captured!opening exciting possibilities for analyzing what people are actually saying.. The overall goal of my thesis work is to support autmated analysis

2

debatei-5 bridge collapseSuperbowl

Obamas selfie

3

Which part of the event did a tweet refer to?

Whats the relation between event and tweets?

What were the topics of the event and tweets?What were the sentiments of the event elicited on tweets?

How to characterize the crowds tweeting behavior

How to detect an event from social media responses?

How to predict the crowds' engagement in future eventsHow to distill insights about event based on social media responsesHow to predict future development of eventHow to predict crowds engagement in future eventHow to find social media responses about the events How to model relations between event and its responsesHow to link social media responses to eventsHow to infer topics and sentiments of social media responsesHow to characterize the crowds behavior in response to eventsHow to address these challenges?Computational Journalism

Political CampaignPotential applications

Fox News Unveils New State-Of-The-Art Newsroom the Verge, Oct 17, 2013The event master

The fact: vast amounts of social media responsesTweets volume on Egypt & Morsi12k ~ per hourWe need automated solutions!Event Analytics on Social MediaMost existing event analytics solutions are primitive. Simply combining other solutions ignores connections between events and responsesGiven the vast amounts of social media responses and complex nature of events, we need automated tools to conduct in-depth analysisIn this proposal, we present Eventics

Task 1: Event sensemakingTask 2: Event recognitionTask 3: Event engagement predictionTrending eventswith associated Twitter responsesPredict users engagement in future eventsEvent topics, segments, Event-tweet alignment,Event sentimentsFour tools for 3 task14

ET-LDA [AAAI12, ICWSM12, MMW12]SpecificSpecificSpecificGeneralSpecificGeneralSpecificGeneralGeneralET-LDA [AAAI12, ICWSM12, MMW12]

Frequency of specific tweetsEvent-tweets alignment

Evolution of specific tweets

SpecificSpecificSpecificGeneralSpecificGeneralSpecificGeneralGeneral

SocSent [IJCAI13]

Fire happened at 5 St and Pike, heard sirens, lots smokeDeMA[CHI13]

Alice[under review]

Hey Mike: we found this event may be of interest to you based on our prediction on your potential engagement ! Our predictions were made based on your Twitter engagement history.

Regards,AliceSummary of Contributions

ET-LDA & SocSent for Event sensemakingDeMA for Event recognitionAlice for Event engagement predictionEventics, automated toolbox to conduct in-depth analysis of 3 core tasks in event analyticsHow people respond to events on TwitterWhat factors affect crowds engagement in eventsOur toolbox enables a richer perspective about

Event CharacterizationEvent in TwitterReal-world occurrence e with 1) associated time period Te, 2) a stream of corresponding Twitter messages Me about e, and published during time TePlanned e with 1) pre-known event context, e.g., topic or hashtags, 2) time, at which e is planned to occur Trending eventse with one or more features (e.g., terms) of Me exhibiting bursty patterns during Te

Event examplesPlanned events and trending events are not mutually exclusive event types only represent an event along different dimensions. TypesEventsPlanned Events2012 U.S Presidential debate2013 SuperbowlOccupy wall street in New York CityHomcoming party at ASUUnplanned Trending Events 2013 Boston BombingsShooting in downtown Seattle on May 23, 20132012 Earthquake in JapanEvent Sensemaking

Republican Primary Debate, 09/07/2011

Tweets tagged with #ReaganDebate??MotivationWhats the relation between an event and tweets?Which part of the event did a tweet refer to?What were the topics of the event and tweets?How to characterize the crowds tweeting behavior?Event Sensemaking: the ProblemGiven an events transcript S, and its associated tweets T Characterize the event in terms of its topics and segments, and its influences (w.r.t the nature and magnitude) on the crowds Twitter responding behaviorRequirements:Extract topics in the event and tweetsSegment the event into topically coherent chunksEstablish the alignment between the event and tweetsMeasure the influence of the event on its associated tweetsEvent Sensemaking: the ChallengesBoth topics and segments are latentTweets are topically influenced by the content of the event. A tweets topics can be general (high-level and constant across the entire event), or specic (concrete and relate to specic segments of the event)An event is formed by discrete sequentially-ordered segments, each of which discusses a particular set of topicsEvent Sensemaking: Possible ApproachesUnfortunately, these approaches are not able to discover latent topics/segments, besides they model event and its Twitter responses independently

Applying existing event segmentation toolse.g., time-windowsFor each pair, measuring similarities e.g., TF-IDFCounting related tweets for each segmentOur Contribution: ET-LDA

....................

Event transcriptET-LDA (joint Event and Tweets LDA) is a hierarchical fully Bayesian model, which jointly models an event and its Twitter responses via their inter-dependency, i.e., topical influencesYuheng Hu, Ajita John, Fei Wang, Subbarao Kambhampati. ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback. In AAAI Conference on Artificial Intelligence (AAAI) 2012 Yuheng Hu, Ajita John, Doree Duncan Seligmann, Fei Wang. What were the Tweets about? Topical Associations between Public Events and Twitter Feeds. ICWSM12 Yuheng Hu, Ajita John, Doree Duncan Seligmann. Event Analytics via Social Media. In Proc. ACM Multimedia 2011 Workshop on Social and Behavioral Networked Media Access (SBNMA) , 201127ET-LDA: Generative ProcessForeach paragraph s in Sdraw a segment choice indicate Csif Cs = 1 then draw a new topic mixture for selse then topic of s is as same as the topic of previous paragraph s-1Foreach tweet t in Tdraw a topic changing indicate Ct if Ct = 1 then draw a new topic for t else then draw a paragraph s assign topic mixture of s to tET-LDA: Graphical Model

EventTweetsDetermine tweet typeC(t)~Bernoulli()Determine which segment a tweet (word) refers toS(t) ~ Categorical()Determine words topic in eventZs~multinomial()Tweets words topicZt~multinomial() orZt~multinomial()

General topics(t)~Dirichlet()in the event part, we assume that an event is formed by discrete sequentially-ordered segments, each of which discusses a particular set of topics

to do segmentatoin/model the topic evolutions in the event, we apply the Markov assumption on \theta(s),with some probability, it is as same as the distribution of topics of previous paragraph s-1, otherwise, a new distribution of topics \theta(s) is sampled from a dirichlet. This pattern of dependency is produced by associating a binary variable c(s). with each paragraph, indicating whether its topic is the same as that of the previous paragraph or different. If the topic remains the same, these paragraphs are merged to form one segment.

we assume that a tweet consists ofwords which can belong to two distinct types of topics: general topics, which are high-level and constant across the entire event, and specic topics, which are detailed and relate to the segments of the event. As a result, the distribution of general topics is xed for a tweet. However, the distribution of specic topics keeps varying with respect to the development of the event.

each word in a tweet is associated with a distribution of topics. It can be either sampled from a mixture of specic topics \theta(s), or a mixture of general topics \psi(t) over K topicsdepending on a binary variable c(t) sampled from a binomial distribution. In the rst case, \theta(s)is from a referring segment s of the event, where s is chosen according to a categorical distribution s(t). An important property of the categorical distribution s(t) is to allow choosing any segment in the event. This reects the fact that a person may compose a tweet on topics discussed in a segment that (1) was in the past (2) is currently occurring, or (3) will occur after the tweet is posted (usually when she expects certain topics to be discussed in the event)29in the event part, we assume that an event is formed by discrete sequentially-ordered segments, each of which discusses a particular set of topics

to do segmentatoin/model the topic evolutions in the event, we apply the Markov assumption on \theta(s),with some probability, it is as same asthe distribution of topics of previous paragraph s-1, otherwise, a new distribution of topics \theta(s) is sampled from a dirichlet. This pattern of dependency is produced by associating a binary variable c(s). with each paragraph, indicating whether its topic is the same as that of the previous paragraph or different. If the topic remains the same, these paragraphs are merged to form one segment.we assume that a tweet consists ofwords which can belong to two distinct types of topics: general topics, which are high-level and constant across the entire event, and specic topics, which are detailed and relate to the segments of the event. As a result, the distribution of general topics is xed for a tweet. However, the distribution of specic topics keeps varying with respect to the development of the event.

each word in a tweet is associated with a distribution of topics. It can be either sampled from a mixture of specic topics \theta(s), or a mixture of general topics \psi(t) over K topicsdepending on a binary variable c(t) sampled from a binomial distribution. In the rst case, \theta(s)is from a referring segment s of the event, where s is chosen according to a categorical distribution s(t). An important property of the categorical distribution s(t) is to allow choosing any segment in the event. This reects the fact that a person may compose a tweet on topics discussed in a segment that (1) was in the past (2) is currently occurring, or (3) will occur after the tweet is posted (usually when she expects certain topics to be discussed in the event)Inference in ET-LDA is HARDUnfortunately, model inference is intractable during coupling of hyperparameters. We need approximate inference algorithms. Here we use collapsed Gibbs sampling30

Gibbs sampling approximates the posterior distribution by iteratively updating each latent variable given theremaining variablesWe need to infer P(Zs, Zt, Cs , Cs, St | Ws, Wt )How joint distribution looks like:Show graph model here, learning parameters by gibbs 30Inference in ET-LDA: example

Inference in ET-LDA: example

Evaluation of ET-LDAExperimental SetupTweets for President Obamas speech on the Middle East on May 19, 2011 (#MESpeech) and Republican Primary debate in the US on Sept 7, 2011 (#ReaganDebate)Event transcripts from New York TimesModel settings: Gibbs sampling and pick #topics by maximizing log-likelihoodBaselinesLDA Latent Dirichlet Allocation (LDA)LCSeg HMM-based event segmentation tool

TasksEvent segmentationTopic extractionAlignmentResults: Event Segmentation

MESpeechReaganDebateET-LDALCSegET-LDALCSegPk0.2950.3610.310.397Pk = probability that a randomly chosen pair of words from the event will be incorrectly separated by a hypothesized segment boundaryResults: Topic ExtractionReaganDebateS1S2S3ET-LDA0.510.610.69LDA0.480.510.52

Performance based on Likert scaleResults: AlignmentGoal: whether the specific tweets (i.e., tweets that are strongly influenced by the events) are correctly identified for each segment.Procedure:ET-LDA: sampled tweets when P(C(t)) > .5LDA: run LDA on tweets corpus, and event transcripts; calculate distance between topic mixtures through JS-divergenceMESpeechS1S2S3S4S5ET-LDA0.490.510.560.580.63LDA0.480.490.540.510.57ReaganDebateET-LDA0.510.520.570.620.61LDA0.480.490.510.510.58Performance based on Likert scaleEvolution of Specific Tweets

most responses were either tangential or about the high-level themesrapid increase from 33% to 54%Controversial topic mentioned, the responses were pronouncedObservation 1: crowds responses tended to be general and steady before the event; after the event, while during the event, they were more specific and episodic.Distribution of Segments Referred to by Specific Tweets

People can also talk about things which are expected to be discussed laterPeople can talk about things that have been discussed before or being discussed currentlyObservation 2: topical context of the tweets did not always correlate with the timeline of the event an event segment can be referred to by specific tweets at any time irrespective of whether it has already occurred or is occurring currently or will occur later onET-LDA alignmentExamples of Specific/General tweets39Yes, we need to talk about jobs and teachers needing jobs! #ReagandebateSomething the #GOP candidates won't mention about Reagan - Reagan grew the size of the federal government tremendously. #reagandebateBoring #GOPDebate #tcot #ReaganDebateRon Paul. Gogogog :) . #reagandebateSpecificGeneralHere are some exmples of stready eand eisodic tweets for this two event, as we asee, the flaowr tof these two kinds of tweets are quite different.

Where stready tweet do respond to the content of the event very closely and very espisodoc, wheras the steady tweet are very genral and high level.39Summary of ET-LDAMotivated joint event-tweet modeling for event sensemakingET-LDA can concurrently segment an event and classify two types of tweets: general and specificDemonstrated that ET-LDA significantly outperformed the traditional modelsET-LDA enables many insights which were never studied before

ET-LDA is powerful, but there Remain Open QuestionsHow does data incompleteness in events transcript affect the performance of ET-LDA in classifying the types of tweetsHow does the volume of tweets affect the performance of ET-LDA in segmenting the event.

How well does ET-LDA predict future tweeting behavior given the topics covered in the event.How does ET-LDA predict the future development of the event given the tweets seen so far.Proposed WorkRobustness of ET-LDAPredictive power of ET-LDAExtension to ET-LDAPossible solutions:Investigate the performance of different inference algorithms (e.g., the EM algorithm) in estimating ET-LDAs parameters while data is incomplete Investigate a training-testing scheme for the ET-LDA modelOutcome Analyze what is currently happening rather than the after-the-fact analysis Users can interact with the system and evaluate its effectiveness in predicting future development of the event as well as the tweeting behaviorProposed Work

What other tasks can we do based on this alignment?

SpecificSpecificSpecificGeneralSpecificGeneralSpecificGeneralGeneral

What were the sentiments elicited by the segments and topics of the event on Twitter?Applications: Event analysis, Stock market, AdvertisementEvents Sensemaking via Aggregated Twitter Sentiment: the ProblemGiven an events transcript S and its associated tweets T Find the aggregated sentiments (positive or negative) about segment (s S) and topics of the event (k K) elicited on Twitter 4545

45Events Sensemaking via Aggregated Twitter Sentiment: possible solutionMain stepsManually label tweets with their sentiment orientation as training dataApply off-the-shelf sentiment classifiers, e.g., MinCut [Pang et al. 2002]Relate aggregated Twitter sentiment to segments and topics of the event that occur within xed time-windows around the tweets timestamps46Is this sufficient?Unfortunately, NO..

46We propose a exible framework, named SOCSENT, forevent analytics via Twitter sentiment that leverages previous partial solutions.Events Sensemaking via Aggregated Twitter Sentiment: ChallengesC1. Difficult to relate Twitter sentiment to segments and topics of the eventFixed time-window approach is often not valid as presented in ET-LDA C2. Manually annotating sentiments of a vast amount of tweets is error-prone Present a bottleneck in learning high quality modelsC3. Twitter sentiment is conveyed with highly domain-specic contextual cuesCan cause models to potentially lose performance and become stale47

How to overcome these challenges?47there are several challenges: First,manually annotating the sentiment of a vast amountof tweets is time consuming and error-prone, presenting a bottleneck in learninghigh quality models.

Besides, sentiment is alwaysconveyed with highly domain-specic contextual cues, andthe idiosyncratic expressions in tweets may rapidly evolveover time, especially when tweets are posted live in responseto the event. It can cause models to potentially lose performance and become stale.

Last and most importantly, thisapproach is unable to relate aggregated Twitter sentiment tosegments and topics of the event. One may consider enforcing tweets correlation with the segment and topics fromthe event that occur within xed time-windows around thetweets timestamps and classify the sentiment based on that. However, aspointed by our recent work this assumption is often not valid: a segment of the event can actually bereferred to by tweets at any time irrespective of whether thesegment has already occurred or is occurring currently or willoccur later on.Our Contribution: SocSentLeverage prior knowledge to overcome the challengesET-LDA to align tweets to the event C1Sentiment lexicon C3Labels for small sets of tweets C2SocSent incorporates prior knowledge into a matrix factorization framework, that learns factors in latent dimensions segments, topics and sentiments (positive or negative) of the event, as elicited on Twitter

Yuheng Hu, Fei Wang, Subbarao Kambhampati. Listen to the Crowd: Automated Analysis of Events via Aggregated Twitter Sentiment. In International Joint Conference on Artificial Intelligence (IJCAI) 2013 48We propose a exible framework, named SOCSENT, forevent analytics via Twitter sentiment that leverages previous partial solutions.SocSent: FrameworktermstweettweetsegmentsegmenttopictopicsentimentsentimenttermtweetsegmentsentimenttermtweetsentimentTweet-event alignment from ET-LDASentiment lexiconLabels for small tweets factorizationRegulationFrom priorRegulationFrom priorRegulationFrom priorWe require that the factors respect the prior knowledge to the extent possible.TSocSent: Formal Formulation

XGTSFG0F0R0R0 regulates G, T and S togetherT X S represents segment-sentiment matrix G X T X S represents tweets-sentiment matrix in the event part, we assume that an event is formed by discrete sequentially-ordered segments, each of which discusses a particular set of topics

to do segmentatoin/model the topic evolutions in the event, we apply the Markov assumption on \theta(s),with some probability, it is as same as the distribution of topics of previous paragraph s-1, otherwise, a new distribution of topics \theta(s) is sampled from a dirichlet. This pattern of dependency is produced by associating a binary variable c(s). with each paragraph, indicating whether its topic is the same as that of the previous paragraph or different. If the topic remains the same, these paragraphs are merged to form one segment.

we assume that a tweet consists ofwords which can belong to two distinct types of topics: general topics, which are high-level and constant across the entire event, and specic topics, which are detailed and relate to the segments of the event. As a result, the distribution of general topics is xed for a tweet. However, the distribution of specic topics keeps varying with respect to the development of the event.

each word in a tweet is associated with a distribution of topics. It can be either sampled from a mixture of specic topics \theta(s), or a mixture of general topics \psi(t) over K topicsdepending on a binary variable c(t) sampled from a binomial distribution. In the rst case, \theta(s)is from a referring segment s of the event, where s is chosen according to a categorical distribution s(t). An important property of the categorical distribution s(t) is to allow choosing any segment in the event. This reects the fact that a person may compose a tweet on topics discussed in a segment that (1) was in the past (2) is currently occurring, or (3) will occur after the tweet is posted (usually when she expects certain topics to be discussed in the event)50Prior Knowledge in SocSentObtain F0 sentiment lexicon from MPQA corpus. F0(i, 1) = 1 if word i is possible, and F0(i, 2) = 1 for negative sentimentAsk people to label the sentiment for a few tweets (e.g., less than 1000) for the purposes of capturing some domain-specic connotations01100101sentimentterm10100101F0R0tweetssentiment0.530.20.010.050.50.30.40.230.210.060.20.12tweetsegmentG0Obtain G0 sentiment lexicon from ET-LDA inference. Each row represent nt tweets and its columns represent ns segments of the event. the content is the posterior probability of a tweet referring to the segments. Apply a lexicon normalization technique (Han et al. EMNLP 2011) to overcome irregular English usage and out-of-vocabulary words in Twitter.

51we have 5,267 positive and 8,701 negative unique termsin the lexicon.SocSent: Model InferenceThe coupling between G, T, S, F makes it difcult to nd optimal solutions for all factors simultaneously.

We adopt an alternating optimization scheme [Ding et al., 2006]

Multiplicative update rules

Show graph model here, learning parameters by gibbs 52Inference in SocSent

is the Lagrangian multipliers which enforce non-negativity constraints on F, C represents terms irrelevant to FEvaluation of SocSentClassification performance of sentiment of event segmentClassification performance of sentiment of event TopicsEffectiveness of Prior KnowledgeTweets for President Obamas speech on the Middle East (#MESpeech) & 2012 Presidential Debate in the US (#DenverDebate")Event transcripts from New York TimesGround truth:Graduate students manually label the sentiment. Later applied ET-LDA to establish the alignment between the labeled tweets and the event segmentsLabel sentiment according to the majority aggregated Twitter sentiment that correlated to itEvaluation Plan for SocSentExperimental Setup 54by twitter streaming api, hashtags are known before due to the white house and nbc newsSentiment Classification ofEvent Segment SocSent utilizes the partially available knowledge on tweet-event alignment from ET-LDA to improve the quality of sentiment classification in both events.

Baselines: LexRatio: Counts the ratio of sentiment words from subjectivity lexicon in a tweet to determine its sentiment orientation [Wilson et al., 2009]MinCuts: Utilizes contextual information via the minimum-cut framework to improve polarity-classification accuracy [Pang and Lee, 2004]MFLK: Supervised matrix factorization method [Li et al. 2009]SocSent improves other approaches by 7.3% to 18.8% 55Sentiment Classification of the Event Topics

SocSent improves the three baselines with a range of 6.5% to 17.3% for both datasetsEffectiveness of Prior KnowledgeSingle type of prior knowledge is less effective than combining them. (combining all three leads to the most significant improvement) Domain-specific knowledge (tweet labels) is more effective than domain-independent knowledge (lexicon)Domain-specific knowledge is particularly helpful in its corresponding task. (F0 + G0 or R0 + G0 have better performance than F0 + R0 on sentiment segments), as G0 conveys prior knowledge on tweet-segment alignment

F0 for sentiment Lexicon, R0 for tweets labels, G0 for prior tweet/event alignment knowledge from ET-LDA. 57Summary of SocSentMotivated low-rank representation of event-tweet sentiment analysis with prior knowledge acting as a regularizerDeveloped SocSent FrameworkProvided evaluations on two tweet datasets Demonstrated that SocSent significantly outperformed the traditional models 58

Put question, contact here58Are ET-LDA and SocSent Enough?ET-LDA and SocSent work on events with textual projections (e.g., transcripts). What if event transcripts are not available?

ChallengesHow to establish the alignment between the event and its associated tweets, while the event transcribe is absent.

Proposed WorkEvent Segmentation via TweetsPossible solutionsAutomated split the entire volume of tweets posted around a given event into a sequentially-ordered segments, where each segment of tweets in fact represents a specific stage of the event (in which a set of topics can be discussed by the event attendees).Build prior event ontology to characterize the event development Build statistical approaches to model both the topical transitions and event stage transitions in the Twitter messages, with the help of the background knowledge from the event ontology.OutcomesWhat an event and its associated tweets are aboutWhich part of the event is referred by which tweets, can be a prior knowledge for SocSentThe corresponding tweeting behavior, only based the Twitter messages themselves. Proposed Work

Fire happened at 5 St and Pike, heard sirens, lots smokeHow to detect events from social media responsesEvent Recognition: the ProblemGiven a set of tweets Find an event where it consists of a set of topically-related trending features extracted from tweets at a given time, where trending is a time interval over which the rate of change of momentum is positiveChallengesBe versatile Locate the time periods when bursts happenDifferentiate whether the detected new event is trivial or not.

Our Contribution: DeMA

DeMA is an unsupervised feature-pivot online event detector, which recognizes trending events their associated Twitter responses from a stream of noisy Twitter message, with 3 steps:Trending feature identificationTrending feature rankingTrending feature groupingYuheng Hu, Shelly Farnham, Andrs Monroy-Hernndez. Whoo.ly: Facilitating Information Seeking For Hyperlocal Communities Using Social Media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI) 2013 Detected EventsDescription of EventsTermsTime of trendingWestminster Dog ShowWestminster, dog, show, clubOct 2nd 2012 10:10amGas leakedGas, Leak, Pike, 10th, St, Pine, Blocked, SirenMay 23rd 2012 9:55am

Opening of Gluten-free kitchen Gluten, Free, Dedicated, Kitchen, Bar, Cap, HillJune 2nd 2012 8:08am

Sea Toy Fair#Toyfairsea, Starwars, Hasbro, LegoAug 18th 2012 11:23am

Hey Mike: we found this event may be of interest to you based on our prediction on your potential engagement! Our predictions were made based on your Twitter engagement history.

Regards,AliceHow to predict crowds' engagement in future eventsOur Contribution: AliceHow to find factors that affect these behavior?Inspired by work in marketingFive categories of engagement modelWhy to use the engagement model?Understand which factors affect peoples engagementBased on the insights, build predictive model based on those important factors

Alice is a statistical framework which can be used to understand peoples engagement with events that are trending in a local community, and to predict the engagement on unseen eventsInvolvementIntimacyInteractionInterestinfluenceYuheng Hu, Shelly Farnham. Understanding and Predicting Peoples Community Engagement in Social Media Under submissionPredicting users engagementBinary classification: predicts whether or not a user will post tweets about a future eventMulti-class classification problem: predicts the volume of tweets a user will generate in the future eventSVM classificationData: 1148 participants, over the course of 140 events Results

Accuracy of results on binary classificationAccuracy of results on multiclass classificationMethodsPrecisionRecallF-1SVM0.880.750.81Class#UsersAccuracy0161180.41%1201384.45%2125572.11%34374.01%Event Recognition and Prediction for Geo-specific EventsMotivation Event happen locallyMobile devices are ubiquitousFoster civic engagementChallengesHow to detect events at a specific geolocationHow to predict the growth and interestingness of the detected eventsHow to recommend events to potential usersProposed WorkProposed solutionsExtend DeMA by incorporating additional geo-specific features of densityAble to infer at time T , there are some important things E going on at location XDevelop tools that leverages content feature, social networks feature, as well as volume features, to predict the growth and popularity of these detected Twitter events.Able to infer at time point T , will the detected event E be trending and popularExtend Alice by incorporating additional geo-specific featuresOutcomesBuild global event explorer on Twitter to track happening events in different regions around the worldRank detected events according to their predicted growth and popular scoreAlert potential users intelligently about the detected events

Proposed WorkEvent Recognition and Prediction for Geo-specific EventsET-LDA is powerful, but there Remain Open QuestionsHow does data incompleteness in events transcript affect the performance of ET-LDA in classifying the types of tweetsHow does the volume of tweets affect the performance of ET-LDA in segmenting the event.

How well does ET-LDA predict future tweeting behavior given the topics covered in the event.How does ET-LDA predict the future development of the event given the tweets seen so far.Proposed WorkRobustness of ET-LDAPredictive power of ET-LDAExtension to ET-LDAPossible solutions:Investigate the performance of different inference algorithms (e.g., the EM algorithm) in estimating ET-LDAs parameters while data is incomplete Investigate a training-testing scheme for the ET-LDA modelOutcome Analyze what is currently happening rather than the after-the-fact analysis Users can interact with the system and evaluate its effectiveness in predicting future development of the event as well as the tweeting behaviorProposed WorkAre ET-LDA and SocSent Enough?ET-LDA and SocSent work on events with textual projections (e.g., transcripts). What if event transcripts are not available?

ChallengesHow to establish the alignment between the event and its associated tweets, while the event transcribe is absent.

Proposed WorkEvent Segmentation via TweetsPossible solutionsAutomated split the entire volume of tweets posted around a given event into a sequentially-ordered segments, where each segment of tweets in fact represents a specific stage of the event (in which a set of topics can be discussed by the event attendees).Build prior event ontology to characterize the event development Build statistical approaches to model both the topical transitions and event stage transitions in the Twitter messages, with the help of the background knowledge from the event ontology.OutcomesWhat an event and its associated tweets are aboutWhich part of the event is referred by which tweets, can be a prior knowledge for SocSentThe corresponding tweeting behavior, only based the Twitter messages themselves. Proposed WorkSummary of ContributionsEvent analytics on social media is an important problemTechnical contribution: proposed Eventics, a powerful toolbox

TasksTools in EventicsOutcomeEvent SensemakingET-LDA(AAAI12, ICWSM12, MMW12)Alignment of events & Twitter responsesEvent segmentationTopics of events & Twitter responsesCharacterization of crowds tweeting behaviorSocSent(IJCAI13)Aggregated Twitter for topics and segments of the eventsEvent RecognitionDeMa(CHI13)Trending events and associated Twitter responses from noisy twitter streamEvent PredictionAlice(in submission)Understanding of users engagement on social mediaPredict users engagement in forthcoming unseen eventsPublicationsYuheng Hu, Shelly Farnham. Understanding and Predicting Peoples Community Engagement in Social Media Under submissionYuheng Hu, Fei Wang, Subbarao Kambhampati. Listen to the Crowd: Automated Analysis of Events via Aggregated Twitter Sentiment. In International Joint Conference on Artificial Intelligence (IJCAI) 2013 Yuheng Hu, Kartik Talamadupula, Subbarao Kambhampati. Dude, srsly?: The Surprisingly Formal Nature of Twitter's Language. In In Proc. of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM'13) Yuheng Hu, Shelly Farnham, Andrs Monroy-Hernndez. Whoo.ly: Facilitating Information Seeking For Hyperlocal Communities Using Social Media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI) 2013 (Best Paper Honorable Mention )Yuheng Hu, Ajita John, Fei Wang, Subbarao Kambhampati. ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback. In AAAI Conference on Artificial Intelligence (AAAI) 2012 Yuheng Hu, Ajita John, Doree Duncan Seligmann, Fei Wang. What were the Tweets about? Topical Associations between Public Events and Twitter Feeds. In Proc. of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM'12)Yuheng Hu, Ajita John, Doree Duncan Seligmann. Event Analytics via Social Media. In Proc. ACM Multimedia 2011 Workshop on Social and Behavioral Networked Media Access (SBNMA) , 2011TimelineDecember 2013 Thesis proposalJanuary - February 2014 Extending ET-LDA by investigating different properties of ET-LDA,including how it handles missing data, etc. Write up results for TKDD journal.March - May 2014 Examine the solutions for segmentation the event using Twitter messages.May - June 2014 Examine the solutions for classifying event types, and models to leverage theresults of the classification.July - August 2014 Write up results for WWW 2015 conference.August - October 2014 Thesis writing.November 2014 Thesis defense.C1. Highly domain-specific C2. Unable to locate the time periods when bursts happenC3. Unable to differentiate whether the detected new event is trivial or not. Event Recognition: the ChallengesState-of-the-art First story detection [Petrovic et al.]Clustering-based event detector [Becker et al.]Wavelet-based detector [Weng et al]Disasters event detection [Sakaki et al]

How to overcome these challenges?Event Prediction: the ChallengesState-of-the-artEvent prediction in news corpus [Radinsky et al]Twitter Engagement in Occupy Wall Street [Chen at al]Tie formation [Gilbert et al]C1. Highly domain-specific C2. Unable to model the engagement in local communityC3. Lack of validation on large scale dataHow to overcome these challenges?C1. Highly domain-specific C2. Unable to locate the time periods when bursts happenC3. Unable to differentiate whether the detected new event is trivial or not. Event Recognition: the ChallengesState-of-the-art First story detection [Petrovic et al.]Clustering-based event detector [Becker et al.]Wavelet-based detector [Weng et al]Disasters event detection [Sakaki et al]

How to overcome these challenges?Understanding Users EngagementWhether or not to get engaged in an event depending on: Event is localNews friends already posted inEvent with more repliesMore personal (1st and 2nd pro-nouns)Smaller overlapping networks with other participants in the events

Understanding Users Engagement contdLevel of participation of a user depending on:Friends during eventHow many news friends postedHow many friends post alreadyHub friends postedTopics are similarMore personal

Identify Trending FeatureTrending event identificationFind an event where it consists of a set of topically-related trending features, at a given time, where features are terms in each tweets Definition of TrendingTrending a time interval over which the rate of change of momentum is positiveMomentum = mass * velocityMass = current importance of an featureVelocity = features average frequency in tweets, during a time periodInspired by work in financeEMA (Exponential Moving Average)MACD (Moving Average Convergence Divergence)MACD histogram

Unfortunately, measuring mass, velocity, momentum is HARDChallenges for Event RecognitionGoals: detecting unplanned trending events and their associated Twitter responses, from a stream of noisy Twitter messages.Connections to Event Sensemaking task: A pre-stepState-of-the-art First story detection [Petrovic et al.]Clustering-based event detector [Becker et al.]Wavelet-based detector [Weng et al]Disasters event detection [Sakaki et al]

C1. Highly domain-specific C2. Unable to locate the time periods when bursts happenC3. Unable to differentiate whether the detected new event is trivial or not.

Our Contribution: DeMADocument-pivot VS. feature-pivot methodsDocument-pivot: detect events by clustering docs based on distance between documentsFeature-pivot: detect events by learning and combining features of words Pros and Cons

DeMA is an unsupervised feature-pivot online event detector, which recognizes trending events their associated Twitter responses from a stream of noisy Twitter message, with 3 steps:Trending feature identificationTrending feature rankingTrending feature groupingYuheng Hu, Shelly Farnham, Andrs Monroy-Hernndez. Whoo.ly: Facilitating Information Seeking For Hyperlocal Communities Using Social Media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI) 2013 Identify Trending FeatureTrending event identificationFind an event where it consists of a set of topically-related trending features, at a given time, where features are terms in each tweets Definition of TrendingTrending a time interval over which the rate of change of momentum is positiveMomentum = mass * velocityMass = current importance of an featureVelocity = features average frequency in tweets, during a time periodInspired by work in financeEMA (Exponential Moving Average)MACD (Moving Average Convergence Divergence)MACD histogram

Unfortunately, measuring mass, velocity, momentum is HARDIdentify Trending Feature: An ExampleGiven a feature F and its time series S = S(F) = {f1, f2, fm}#debatevolumeTime11/08 8am11/08 12PMm11/13 12PMmIdentify Trending Feature: An ExampleGiven a feature F and its time series S = S(F) = {f1, f2, fm}#debatevolumeTime11/08 8am11/08 12PMm11/13 12PMm

n1 EMAn2 EMAIdentify Trending Feature: An ExampleGiven a feature F and its time series S = S(F) = {f1, f2, fm}#debate

volumeTime11/08 8am11/08 12PMm11/13 12PMmn1 EMAn2 EMAIdentify Trending Feature contdMACDSignal lineMACD histogramMACD Difference between the n1- and n2- hour EMA for S(F)

MACD histogram (difference between Fs MACD and its signal line)

#debateUp-trends startDown-trends startRank Trending FeatureEMA, MACD, MACD-histogram can be computed in linear timeBut features may be trending up repeatedlymorning can be trending from 8am 11am everydayRank trending features by their noveltyR(h, d, w, F) = MACD histogram results during hour h, day d, and week wMean(h, d, F) = average trend scoreSD(h, d, F) = standard deviation

Group Trending FeatureMultiple events can be trending within the same time periodWe need topically-related features to form an eventNeed a way to separate all trending features and group them into event clustersShared k-nearest neighborhood clustering with distance defined by JS-divergence with LDA

ABCEvaluation of DeMASetupRandomly sampled 2600 tweets posted during August 2012Tweets from the users who identified their location as SeattleTasksAccuracy in identifying eventsAccuracy in identifying importance of eventsBaselinesFastestRandomGround truthAccuracy in Event RecognitionToolsAccuracyEvents identifiedFalse positivesDeMA0.7868417%Fastest0.5245635%Random0.2421185%Description of EventsTermsWestminster Dog Showwestminster, dog, show, clubGas leakedGas, Leak, Pike, 10th, St, Pine, Blocked, SirenOpenning of Gluten-free kitchen Glueten, Free, Dedicated, Kitchen, Bar, Cap, HillSea Toy Fairtoyfairsea, starwars, hasbro, legoDetected eventsAccuracy in Identification of Importance of EventsHow accuracy is DeMA in identifying the importance of the events?Importance = novelty score

Eventscore1525344251Eventscore1525334251Statistical analysis shows the results of DeMA is significantly correlated with the participants rating of the importance of an event(r=0.31, p