a tale of two tools: reliability and feasibility of examining twitter mentions about e-cigarettes...
TRANSCRIPT
A Tale of Two Tools: Reliability and Feasibility of Examining Twitter MentionsPresentation at Society for Behavioral Medicine 2016
Amelia Burke-Garcia, MACassandra Stanton, PhDNicole Soufi
Westat Center for Digital Strategy & Research
April 1, 2016
“47.6% of current cigarette smokers & 55.4% of recent former cigarette smokers have tried an e-cigarette.”
~CDC, 2015
“Various terms are used to refer to e-cigarettes, e.g. “e-
hookahs” and “vaporizers”.
~New York Times, 2014
“The failure to equate vaping products generally
with e-cigarettes underscores how successful
the tobacco industry has been in reinventing a
popular “smoking” trend.” ~Gostin & Glasner, 2014
“Understanding…human interaction on the web is a valuable source of sensing health trends.”― Achrekar, Gandhe, Lazarus, Yu and Liu, 2011
7
“89% of all Americans are online.”― International Telecommunication Union (ITU), United Nations Population Division, Internet & Mobile Association of India (IAMAI), World Bank, 2016
“With hundreds of millions of people spending countless hours
on social media to share, communicate, connect, interact,
and create user-generated data at an unprecedented rate, social media has become one unique
source of big data.” ~Zafarani, Abbasi, & Liu, 2014
Options for Mining Social Data
“Social media data is noisy, free-format, of
varying length, and multimedia.”
~Zafarani, Abbasi & Liu, 2014
More Issues• There is a lack of documentation about how the
data is identified and sampled (Morstatter et al., 2013; Valkanas et al., 2014).
• Twitter’s free sample provides less representative data (Morstatter et al., 2013; Valkanas et al., 2014).– This may hold true for samples drawn from other data
mining tools. • Data come with accessing, storing & analyzing
costs (Morstatter et al., 2013; Valkanas et al., 2014).
Research Question
How does Twitter coverage of e-cigarette-related conversations differ
by data source (e.g. Radian6 vs. GNIP)?
Methods• Compared tweets from two tools:– Twitter’s GNIP “Firehose” service – Saleforce’s Radian6 tool
• Key words included: – “e-cigarettes OR vaping” OR “e-cigarettes
health” OR “vaping health”• A total of 1000 mentions were collected– 500 mentions were collected from each tool
over a 30 second period of time (12:57pm EST on August 7, 2015)
Methods• Six measures were proposed to be
used in this analysis:– Tools • Cost, Feasibility & Ease of Use
– Themes• Poster (individual/organization)• Context (12 themes, combined to 9)• Valence (positive/negative)
– Interrater reliability was 94%
FINDINGS
Tool ComparisonRadian6 GNIP
Cost Tiered pricingCost based on number of
mentions
Tiered pricing based on sources and amount of
content
Ease of UseOffers a visual dashboard Easy to pull content and
analyze it
Requires storage capacity to store data
Requires programming knowledge to access the
dataRequires computing power
to analyze the data
Feasibility ?? ???? ??
Poster Type
Radian6 GNIP
Individual 55% 50%
Marketing/Promotion 44% 50%
Non-profit/Gov’t 1% 0%
ThemesRadian6 GNIP
Health/Consequence 6% 3%Cessation 1% 0%
Prod Characteristics 12% 21%Marketing/Sales 18% 23%
Consumer Purchases 1% 1%Utilization Patterns 12% 4%
Policy 4% 4%Endorsement 29% 24%
Other 17% 16%
Valence
Radian6 GNIP
Positive 6% 6%
Negative 4% 5%
Neutral 90% 88%
Word CloudsRadian6 GNIP
Feasibility• Across most measures, these tools delivered
similar results.– Specifically, both demonstrated the overwhelming
presence marketing content and individual conversations about e-cigarettes.
• A key difference was in the level of sales and marketing content that GNIP pulled.
• Based on this analysis, either tool may be a viable option for researchers seeking to analyze Twitter data.– Radian6 may be a better option from a cost and ease-
of-use standpoint.
Conclusions• Researchers seeking to understand social media
conversations have a number of options for data mining.
• Given similarity in content collected across both tools, cost and ease-of-use should be primary considerations when selecting a data mining tool. – GNIP offers quality data (and is well-referenced in
literature) but requires resources to work with its data.– Radian6 provides an alternative when resources and
computing power are limited.
Conclusions• In terms of content,
results demonstrated a gap in conversations around health consequences of vaping.
• Moreover, this study revealed that industry and marketing are using this medium exceedingly more than the public health community.
~500 e-cigarette marketing
tweets in 30 seconds.
Future Directions• Analyze these data in greater detail,
e.g. which flavors and which brands.• Compare data collected using other
tools.• Examine other forms of tobacco use (e.g., hookah, cigars, snus).• Further examine characteristics of
the posters.