dynamic topic modeling for monitoring market competition...
TRANSCRIPT
Collecting Data
Dynamic Topic Modeling for Monitoring Market Competition from Online Text and Image Data
Hao Zhang1, Gunhee Kim2, Eric P. Xing1
1: Carnegie Mellon University, 2: Seoul National University
Competitive Dynamic Multi-view STC (cdSTC)
Evaluation: Topic Quality
Evaluation: Prediction
Crawling raw tweets and associated Images using the REST API
• 6.6M tweets and 7.5M images from and external links
• Time range: 10/20/2014 to 02/01/2015
Brand Competition Monitoring
• 2 groups of brands: Luxury (13 brands) Beer (12 brands)
Tweets w/ external links
Tweets w/o external links
Tweets directly w/ images
Tweets directly w/o images
72%70%
30% 28%
Come down to SVT and enjoy our Super Bowl Sundayspecial. Heineken 5/2000, Corona 5/3000,Guinessand Mackeson http://fb.me/2WdRGBkem
Tweet (Beer)
Processing
Associations = (Heineken Corona Guiness)
𝒖𝑑𝒗𝑑𝒈𝑑
𝒅
𝜃𝑑
𝜑𝑘𝑡+1𝜑𝑘
𝑡
𝛽𝑘𝑡+1𝛽𝑘
𝑡
𝛾𝑘𝑡+1𝛾𝑘
𝑡
𝑡 = 1: 𝑇
𝑘 = 1: 𝐾
𝑟𝑑𝑏
𝑔𝑑𝑏
𝑧𝑑𝑛
𝑢𝑑𝑛
𝑦𝑑𝑚
𝑣𝑑𝑚
𝑑 = 1: 𝐷
1. Multi-view
2. Competition
3. Dynamic
The model aims to address 3 major challenges
• Modeling of multi-view representations of text and images
• Modeling of latent topics that are competitively shared by
multiple brands
• Tracking temporal evolution of the topics and competitions
𝜃𝑑
𝑟𝑑𝑏
𝑔𝑑𝑏
𝑧𝑑𝑛
𝑢𝑑𝑛
𝑦𝑑𝑚
𝑣𝑑𝑚
𝑑 = 1: 𝐷
𝜙𝑡 Brand-topic occupation matrix at time 𝑡 (∈ 𝑅𝐾×𝐿)
𝛽𝑡/𝛾𝑡 Topic distributions over text/visual words at
time 𝑡 (∈ 𝑅𝐾×𝐺/𝑅𝐾×𝐻)
𝜃𝑑 Document code of document 𝑑 (∈ 𝑅𝐾)
𝑧𝑑𝑛/𝑦𝑑𝑚 Word code of text/visual word 𝑛/𝑚 (∈ 𝑅𝐾)
𝑢𝑑𝑛/𝑣𝑑𝑚 Occurrences of text/visual word 𝑛/𝑚 in document 𝑑
𝑟𝑑𝑏 Brand code of brand 𝑏 in document 𝑑 (∈ 𝑅𝐾)
𝑔𝑑𝑏 Indicator for each brand label 𝑏 for document 𝑑
𝒑 𝜽, 𝒛, 𝒖, 𝒚, 𝒗, 𝒓, 𝒈 𝜷, 𝜸,𝝓
= 𝒑 𝜽
𝒏∈𝑵
𝒑 𝒛𝒏 𝜽 𝒑(𝒖𝒏|𝒛𝒏, 𝜷)
𝒎∈𝑴
𝒑 𝒚𝒎 𝜸 𝒑(𝒗𝒎|𝒚𝒎, 𝜸)
𝒃∈𝑩
𝒑 𝒓𝒃 𝝓 𝒑(𝒈𝒃|𝒓𝒃, 𝝓)
• Joint Probability
m𝑖𝑛Θ𝑡,𝜷𝑡,𝜸𝑡,𝜙𝑡 𝑡=1
𝑇
𝑡=1
𝑇
𝑑=1
𝐷
𝜆||𝜽𝑑𝑡 ||1
+
𝑡=1
𝑇
(𝜋1||𝜷𝑡 − 𝜷𝑡−1||2
2 + 𝜋2||𝜸𝑡 − 𝜸𝑡−1||2
2 + 𝜋3||𝝓𝑡 −𝝓𝑡−1||2
2)
+
𝑡=1
𝑇
𝑑=1
𝐷𝑡
𝑛∈𝑁𝑑𝑡
(𝜈1||𝒛𝑑𝑛𝑡 − 𝜽𝑑
𝑡 ||22 + 𝜌1||𝒛𝑑𝑛
𝑡 ||1 + 𝐿(𝒛𝑑𝑛𝑡 , 𝜷𝑡))
+
𝑡=1
𝑇
𝑑=1
𝐷𝑡
𝑚∈𝑁𝑑𝑡
(𝜈2||𝒚𝑑𝑚𝑡 − 𝜽𝑑
𝑡 ||22 + 𝜌2||𝒚𝑑𝑚
𝑡 ||1 + 𝐿(𝒚𝑑𝑚𝑡 , 𝜸𝑡))
+
𝑡=1
𝑇
𝑑=1
𝐷𝑡
𝑏∈𝐵𝑑𝑡
(𝜈3||𝒓𝑑𝑏𝑡 − 𝜽𝑑
𝑡 ||22 + 𝜌3||𝒓𝑑𝑏
𝑡 ||1 + 𝐿(𝒓𝑑𝑏𝑡 , 𝝓𝑡))
𝑠. 𝑡. 𝜽𝑑𝑡 > 0, ∀𝑑, 𝑡. 𝒛𝑑𝑛
𝑡 , 𝒚𝑑𝑚𝑡 , 𝒓𝑑𝑏𝑡 > 0, ∀𝑑, 𝑛,𝑚, 𝑏, 𝑡
𝛽𝑘𝑡 ∈ 𝑃𝑈 , 𝛾𝑘
𝑡 ∈ 𝑃𝑉 , 𝜙𝑘𝑡 ∈ 𝑃𝐵 , ∀𝑘, 𝑡
⇒• MAP
sparse term for
document code
evolving chain
text
image
brand
simplex
𝐶𝑀 =# 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑤𝑜𝑟𝑑𝑠
# 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑣𝑎𝑙𝑖𝑑 𝑡𝑜𝑝𝑖𝑐𝑠
Argument 1: Lower perplexity ≠ higher quality [J. Chang 2009]
Argument 2: Perplexity is not a fair metric for models with different distributions
–Define the Coherence Measure (CM) and the Validity Measure (VM):
V𝑀 =# 𝑜𝑓 𝑣𝑎𝑙𝑖𝑑 𝑡𝑜𝑝𝑖𝑐𝑠
# 𝑜𝑓 𝑡𝑜𝑝𝑖𝑐𝑠
VM (Beer / Luxury) CM (Beer / Luxury)
dLDA 0.53 / 0.68 0.55 / 0.52
STC + dyn 0.44 / 0.66 0.57 / 0.57
cdSTC + multi 0.51 / 0.70 0.63 / 0.59
cdSTC + text 0.605 / 0.71 0.61 / 0.59
•Average VM/CM on text topics
VM (Beer / Luxury) CM (Beer / Luxury)
Kmeans 0.39 / 0.56 0.59 / 0.64
LDA + multi 0.57 / 0.63 0.51 / 0.69
cdSTC + multi 0.57 / 0.65 0.66 / 0.71
•Average VM/CM on visual topic
• Task I: Given a novel tweet, can we predict its most associated brand?What is the most beautifully-designed
perfume bottle? Tell us on the blog here:
http://smarturl.it/ie2fka and win Gucci
GucciModelinfer
novel tweets
maxΘ𝑡,𝓜𝑡,𝞰𝑡 𝑡=1
𝑇
𝑡=1
𝑇
𝑓 Θ𝑡 ,𝓜𝑡 , 𝐷𝑡 + 𝐶𝑅 Θ𝑡 , 𝞰𝑡 +1
2𝞰𝑡 22
𝑠. 𝑡. 𝜽𝑑𝑡 > 0, ∀𝑑, 𝑡. 𝒛𝑑𝑛
𝑡 , 𝒚𝑑𝑚𝑡 > 0, ∀𝑑, 𝑛,𝑚, 𝑡
𝛽𝑘𝑡 ∈ 𝑃𝑈 , 𝛾𝑘
𝑡 ∈ 𝑃𝑉 , ∀𝑘, 𝑡
⇔
Task I-I: Randomly split data in every time slice
into 90% for training and 10% for testing
(a) Beer (b) Luxury
Task I-II: Use the data in [1, 𝑡 − 1] for training,
[𝑡 − 1, 𝑡] for testing
(a) Beer (b) Luxury
• Task II: Given an unseen past document, can we predict its timestamp?
locate
t
Sent at this
time point
time
What is the most beautifully-dsigned
perfume bottle? Tell us on the blog here:
http://smarturl.it/ie2fka and win Gucci
max𝑡𝑝(𝑑|𝓜𝑡) , 𝑤ℎ𝑒𝑟𝑒
𝑝(𝑑|𝓜𝑡) = 𝑛∈𝑁𝑑 𝑝(𝑢𝑛|𝜷𝑡) 𝑚∈𝑀𝑑 𝑝(𝑣𝑚|𝜸
𝑡) 𝑏∈𝐵𝑑 𝑝(𝑔𝑏|𝝓𝑡)
(a) Beer (b) Luxury
past tweets
• Task III: Can we predict future competition trends using past data?
[1, t-1]
1 0 00 1 00 0 1
time𝜙𝑡
1 0 00 1 00 0 1···
evolve
𝜙𝑡+1
learn
t1 0 00 1 00 0 1
counting
data “gt”
Groundtruth
Prediction
Bags PerfumeWatch
0.4019 0.2615 0.0739
Evolve the competition matrix
Construct the “groundtruth”
data
• How brands occupy the market in every time slice?
• How each textual/visual topic evolves over time?
• How each brand’s occupation changes over time?
• How’s the competition trends between multi-brands like over time?
Objective easy
difficult
#Style #Prada Black Leather & Nylon Tessuto
Saffiano Shoulder #Bag
http://dlvr.it/8WZKM2 #Forsale #Auction
Coat from @ASOS , top from @FreePeople,
jeans from Rag & Bone, boots from
#ChristianLouboutin & bag from @Prada .
What is the most beautifully-designed
perfume bottle? Tell us on the blog here:
http://smarturl.it/ie2fka and win Gucci
The latest crop of #Chanel Pre-Spring bags
have arrived! See the full collection now:
http://bit.ly/1z3PnKG
Pretty In Pink: From @Chanel to @nailsinc, the
best petal-hued make-up launches this spring
http://vogue.uk/8p6UOi
Designer Kate Spade, Invicta, Gucci & More
Watches from $22 & Extra 20% Off
http://www.dealsplus.com/t/1zr85Y
watch+diamond
rolex, watch, gold, dial,
mens, datejust, ladies,
steel, diamond, oyster,
stainless,18k
glasses
chanel, giorgio,
sunglasses, classic,
glasses, reading, women's,
#burberrygifts
bags
bag, leather, gucci,
handbag, tote, clothing,
shoulder, canvas, reading,
women's,
watch+diamond
watch, gold, white date,
ladies, dial gift, rolex
#deals_us, blue, vintage,
bracelet, omega,
glasses
chanel, sunglasses, listen,
green, funny, dark, xmas,
womens, Armani,
excellent, Havana. lacoste
bags
authentic, leather, bag,
shoes, gucci, handbag,
prada, tote, deals, brown,
wallet
t t + 1 Timeline
Chanel
Gucci
Prada
(a) Input: Tweets and associated images of competing brands (b) Output: Temporal evolution of topics and brands’ proportion over the topics
Topics (text / visual words) Brands over topics
The increasing pervasiveness of Internet has lead to a wealth of consumer-
created data over a multitude of online platforms
What can we learn?
Problem Statement
General public’s opinion towards different
companies’ products and service
Performance evaluations in different market
conditions (time, location etc.)
What does marketers want to see?• Detection: Listen in consumers’ opinions towards their
products and their competitors
• Summarization: Summarize/visualize how a shared market is
occupied by different brands
• Dynamics: Monitoring the changes of market competition
over time
SuperBowl + beer
Watch + luxury
corona
budlight
guiness
rolex
omega
burberry
compete
compete
Brand CompetitionsOur Approach: Joint Analysis of Text
and ImagesTake advantage of the pervasiveness of images on the social media
• A large portion of tweets simply show images&links without any
meaningful text in them. Images play an important role for
representing topics in this type of documents
• Many users prefer to use images to deliver their idea more clearly
and broadly,
• The joint use of images with text also helps marketers interpret the
discovered topics
• Images may be essential for users to make conversation about
customers’ descriptions, experiences, and opinions toward the
brands.