a decision support system for inbound marketers: an empirical use of latent dirichlet allocation...

22
A Decision Support System For Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Info-Graphic Designers Meisam Hejazi Nia University of Texas at Dallas (UTD) July 9, 2015 Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 1 / 22

Upload: meisam-hejazi-nia

Post on 08-Jan-2017

22 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

A Decision Support System For Inbound Marketers: AnEmpirical Use of Latent Dirichlet Allocation Topic Model

to Guide Info-Graphic Designers

Meisam Hejazi Nia

University of Texas at Dallas (UTD)

July 9, 2015

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 1 / 22

Page 2: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Inbound Marketing

Inbound Marketing (Hub Spot):

Promote the company by blogs, podcasts, video,eBooks, enewsletter white papers, SEO, Social MediaMarketingViral Content marketing which serve to attractcustomers (Bring visitors in, by making the companyeasy to be found,i.e. Pull)”earn their way in” (via publishing helpfulinformation): Like JournalistsEspecially effective for small businesses that deal withhigh dollar values, long research cycles andknowledge-based products

Outbound Marketing:

Buying attention, cold-calling, direct paper mail, radio,TV advertisements, sales flyers, spam, telemarketingand traditional advertisingGo out to get prospects’ attention (Push)”buy, beg, or bug their way in” (via paidadvertisements, issuing press releases, or payingcommissioned sales people, respectively): Liaison

Info-Graphic is an important tool that inboundmarketers use (Pictorial, Clear)Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 2 / 22

Page 3: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Info-Graphics

The brain processes visual information 60,000 fasterthan text. – 3M Corporation, 2001(visualteachingalliance.com)

Graphic rep of information, data or knowledgeintended to present info quickly and clearly

Improve cognition by utilizing graphics to enhancethe human visual system’s ability to see patterns andtrends

Businesses that publish infographics grow their traffican average of 12% more than those that don’t(Hubspot.com)

Visual Info design suggestions based on VisualPsychological Perception Theory: (piktochart.com)

Layout and Design: Relevant text, Meaningfulheadline, ExpectationColors: Contrast, Reduce Color, Harmony, SmartdisharmonyTypography: Easy-to-read font; SpacingVisuals: Clutter; Icon; Highlight; Sequence; StandardColors

You can take something that’s already gone viral,and piggyback on its success by creating your ownawesome spin on it (Hubspot.com)

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 3 / 22

Page 4: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Research Questions

Can the low level features of an infographic guide an infographic designerto design a viral info-graphic?

Can I design a decision support system to allow an info-graphic designer tomeasure the effect of her design decision on the probability of theinfo-graphic becoming viral?

What are the current viral topics for which the practitioners createinfo-graphics?

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 4 / 22

Page 5: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Overview of this study

Data

355 info-graphics from Pinterest, Hubspot andInformationIsBeautiful.com

209 Pinterest (Pins and Likes), 64 InformationIsBeautiful (Facebook,Tweets, and Google+), 82 Hubspot (Pins, LinkedIn, Facebook,Tweet, Google+)

Methodology: Unsupervised Machine Learning

To extract verbal information by Optimal Character Recognition(OCR) with dictionary filter and wordNet and Google’s word2vec (bagof verbal words)

k-mean to extract histogram of five clusters of RGB and HSV ofimages (bag of visual words)

Soft clustering generative Latent Dirichlet Allocation (Topic Model),estimated by Gibbs Sampling

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 5 / 22

Page 6: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Overview of Results

Results

Info-graphics about world’s top issues and the worlds’ demographicshas significantly higher social media hit than social media and mobileinfographics

Method to allow info-graphic designer to benchmark her designagainst the previous viral info-graphics to measure whether a givendesign decision can help or hurt the probability of the designbecoming viral

Visual information is more relevant than the the verbal information ofinfographics

Identified twelve clusters of infographics named by the word cloud oftheir titles

A Machine learning pipeline to summarize big data (i.e. image ofmillions of pixels) into the predictive probability of an info-graphicbecoming viral

The first quantitative study to help the design choices of theinfo-graphic designers

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 6 / 22

Page 7: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Data: Pinterest, Information is Beautiful, and HubSpot

Page 8: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Position of This Research in Literature

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 8 / 22

Page 9: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

10 Steps To Design An Infographic

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 9 / 22

Page 10: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Samples of Infographics - 1

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 10 / 22

Page 11: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Samples of Infographics - 2

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 11 / 22

Page 12: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Samples of Infographics - 3

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 12 / 22

Page 13: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Machine Learning Pipeline

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 13 / 22

Page 14: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Probability Graphical Model of Latent Dirichlet Allocation

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 14 / 22

Page 15: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Social Media Hit Statistics of the Latent Infographic Topics

Cluster index Cluster Name Size Average SocialMedia Hit

Variance of So-cial Media Hit

1 Cool info graphics about world’s demographic info-graphics 28 2,303 8,744,0032 Mobile and Buzz Design Info-graphics 30 924 1,904,9413 Marketing design and Dashboard Info-graphics 53 1,255 3,987,4514 Face and Media Info-graphics 9 447 350,8125 Traditional Marketing Info-graphics 31 2,693 10,011,5016 Social Media and Decision Making Info-graphics 26 960 869,8427 General life Info-graphics 39 1,775 5,735,7478 Online professional design Info-graphics 33 1,414 5,010,1899 Responsive logos and brands Info-graphics 15 1,195 3,101,27510 International and online design Info-graphics 35 1,354 6,700,74011 Interactive Marketing Info-graphics 28 1,031 5,468,61112 Traditional vs. Online Media Info-graphics 28 1,718 7,377,299

Social Media hit is the sum of the hits of the social media of eachinfographic

The name of the clusters are selected based on the word cloud (wordfrequency visualization) of the infographic titles

Number of clusters based on the Likelihood model selection measuresuggests twelve distinct clusters (Also CTM vs. LDA)

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 15 / 22

Page 16: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Social Media Hit Statistics of the Latent Infographic Topics

cluster 2 cluster 3 cluster 4 cluster 5 cluster 6 cluster 7 cluster 8 cluster 9 cluster 10 cluster 11 cluster 12

cluster 1 (2.3,2)* (1.89,1.99)* (1.85,2.03)* (-0.49,2) (2.21,2)* (0.81,2) (1.33,2) (1.33,2.02) (1.36,2) (1.79,2) (0.77,2)cluster 2 (-0.8,1.99) (1,2.02) (-2.81,2)* (-0.11,2) (-1.74,1.99) (-1.04,2) (-0.57,2.01) (-0.82,2) (-0.21,2) (-1.42,2)cluster 3 (1.2,2) (-2.56,1.99)* (0.71,1.99) (-1.13,1.99) (-0.34,1.99) (0.11,2) (-0.2,1.99) (0.45,1.99) (-0.87,1.99)cluster 4 (-2.1,2.02)* (-1.54,2.03) (-1.64,2.01) (-1.27,2.02) (-1.22,2.06) (-1.04,2.02) (-0.73,2.03) (-1.38,2.03)cluster 5 (2.69,2)* (1.38,1.99) (1.88,2) (1.7,2.01) (1.89,2) (2.27,2) (1.26,2)cluster 6 (-1.65,2) (-0.97,2) (-0.56,2.02) (-0.74,2) (-0.14,2) (-1.35,2)cluster 7 (0.66,1.99) (0.85,2) (0.73,1.99) (1.27,2) (0.09,2)cluster 8 (0.34,2.01) (0.1,2) (0.65,2) (-0.48,2)cluster 9 (-0.22,2.01) (0.24,2.02) (-0.67,2.02)cluster 10 (0.51,2) (-0.54,2)cluster 11 (-1.01,2)

The first element the t-stat, and the second value is the critical value

Cool info graphics about world’s demographic info-graphics aresignificantly more viral than mobile and marketing infographics

Traditional marketing infographics are significantly less viral than themobile and modern marketing infographics

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 16 / 22

Page 17: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Findings’ Summary and Conclusion

Info-graphics about world’s top issues and the worlds’ demographics hassignificantly higher social media hit than social media and mobileinfographics

Method to allow info-graphic designer to benchmark her design againstthe previous viral info-graphics to measure whether a given design decisioncan help or hurt the probability of the design becoming viral

Visual information is more relevant than the the verbal information ofinfographics

Identified twelve clusters of infographics named by the word cloud of theirtitles

A Machine learning pipeline to summarize big data (i.e. image of millionsof pixels) into the predictive probability of an info-graphic becoming viral

The first quantitative study to help the design choices of the info-graphicdesigners

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 17 / 22

Page 18: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Managerial Take Away and Future Research

Infographic designer can create a design and predict its potential viralityby the proposed approach

Visual elements of an infographic is more relevant than its verbalinformation, so especial care should be taken to the attractiveness of thedesign elements

The proposed approach to summarize the visual information in an imagecan be used for summarizing visual information of the viral videos

Future studies might investigate other approaches of data summarizationsuch as Fourier Transformation and Scale Invariant FeatureTransformation methods

Out of sample prediction of the virality and the dynamics of design patternadoptions might be relevant for future studies

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 18 / 22

Page 19: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Thank You

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 19 / 22

Page 20: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

LDA Generative Model

LDA Generative ModelChoose N ∼ Poisson(ξ), where N is the number of featuresChoosing θ ∼ Dirichlet(α),where θ is the k-dimensional random probability that a given document has primitive topic (k − 1 simplex)For each of the N features in:(1) Choose a topic zn ∼ Multinomial(Θ)(2) Choose a feature in from p(in|zn, β), a multinomial probability conditioned on the topic

θi ≥ 0,∑k

i=1 θi = 1

P(θ|α) =Γ(∑k

i=1 αi∏ki=1 Γ(αi

θα1−11 ...θαk−1

k

Likelihood:

p(D|α, β) =∏M

d=1

∫p(θd |α)(

∏Ndn=1

∑zdn

p(zdn |θd)p(wdn |zdn , β)dθd

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 20 / 22

Page 21: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Model selection based on log likelihood

Number of clusters (topics) LDA-Gibbs (image) LDA (full) CTM (full)

k = 3 −198757 −1002505 −1002538k = 4 −176261 −1002539 −1002589k = 5 −161241 −1002561 −1002642k = 6 −145156 −1002586 −1002631k = 7 −133009 −1002610 −1002672k = 8 −115634 −1002638 −1002712k = 9 −99629 −1002660 −1002754k = 10 −93164 −1002505 −1002538k = 11 −88779 −1002539 −1002589k = 12 −95304 −1002561 −1002642k = 13 −97033 −1002586 −1002631k = 14 −198757 −1002610 −1002672

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 21 / 22

Page 22: A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Estimating LDA model by Gibbs Sampling

Estimation LDA model by Gibbs Sampling: Hornik and Grun(2011)

p(zi = K |w , z−i ) ∝n

(i)−i,K+δ

n(.)−i,K+V δ

n(di )

−i,K+α

n(di )

−i,.+kα

β̂(j)K =

n(i)−i,K+δ

n(.)−i,K+V δ

θ̂(d)K =

n(di )

−i,K+α

n(di )

−i,.+kα

Log Likelihood for Gibbs sampling:

log(p(w |z)) = klog( Γ(V δ)Γ(δ)V

+∑V

K=1{[∑V

j=1 log(Γ(n(j)k + δ))]− log(Γ(n

(.)k + V δ))}

Meisam Hejazi Nia (UTD) A Decision Support System For Inbound Marketers July 9, 2015 22 / 22