eavesdropping on the twitter microblogging site

52
EAVESDROPPING ON THE TWITTER MICROBLOGGING SITE Summer Institute on Distance Learning and Instructional Technology (SIDLIT 2016) August 4-5, 2016

Upload: shalin-hai-jew

Post on 12-Apr-2017

194 views

Category:

Social Media


0 download

TRANSCRIPT

Page 1: Eavesdropping on the Twitter Microblogging Site

EAVESDROPPING ON THE TWITTER MICROBLOGGING SITE

Summer Institute on Distance Learning and Instructional Technology (SIDLIT 2016)

August 4-5, 2016

Page 2: Eavesdropping on the Twitter Microblogging Site

OVERVIEW

Research analysts go to Twitter to capture the general trends of public conversations, identify and profile influential accounts, and extract subgroups within larger collectives and larger discourses; they also go to eavesdrop on individual self-talk and individual-to-individual conversations. So what is technically in your tweets, asked Dave Rosenberg famously in a CNET article (2010). The answer: a whole lot more than 140 characters. How are the most influential social media accounts identified through #hashtag graphs? How are themes extracted? How are sentiments understood? How can users be profiled through their Tweetstreams? How can locations be mapped in terms of the Twitter conversations occurring in particular physical areas? How can live and trending issues be identified and categorized in terms of sentiment (positive, negative, and neutral)? This presentation will summarize some of the free and open-source tools as well as commercial and proprietary ones that enable increased knowability.

2

Page 3: Eavesdropping on the Twitter Microblogging Site

ABOUT TWITTER

3

Page 4: Eavesdropping on the Twitter Microblogging Site

TWITTER DEMOGRAPHICS

320 million monthly active users

A billion unique visits monthly to sites with embedded Tweets

80% active users on mobile

Support for 35 languages (“About Twitter,” 2015)

4

Page 6: Eavesdropping on the Twitter Microblogging Site

BUSINESS MODEL

Runs on advertising based on delivery of human attention and encouragement of certain types of consumption

Intersperses funded commercial messaging into regular messaging

Enables fine-tuned targeting of desirable audiences along with various metrics(“Advertising on Twitter”)

6

Page 7: Eavesdropping on the Twitter Microblogging Site

140 CHARACTERS

Geographical Coverage

Foremost microblogging site in major parts of the world and a main part of the social ecosystem

Blocked in some countries: Turkey, Iran, China, and North Korea (Liebelson, Mar. 28, 2014) and with some temporary phases of inaccessibility in a number of other countries (“Censorship of Twitter,” Apr. 20, 2016) May be based in part on market protectionism for native microblogging services May be political May be due to a mix of factors

English is the predominant language used

7

Page 8: Eavesdropping on the Twitter Microblogging Site

140 CHARACTERS (CONT.)

User Accounts

Verified and unverified user accounts (with regular efforts to clean off spam accounts)

Human, robot, sensor, and cyborg accounts User-created profile data Image data, video data Start-date of account Tweets, following, followers, likes, and lists (account status)

Notifications for raising sociality: Who is following you, who retweeted you, who replied, and others

8

Page 9: Eavesdropping on the Twitter Microblogging Site

140 CHARACTERS (CONT.)

Data Types

Content data: Text messages, URL links (including shortened form links), images, Vine video snippets

Trace data: Online social network relationships based on replies, liking, mentions, following / unfollowing, addressing @ accounts, #hashtagging around a shared topic of interest, and other types of interactions

Metadata: Geolocational information, exchangeable image format (EXIF) data from imagery, and others

Long memory of contents (constant recording), even of deleted messages; recoverability of data

May have private networks (with publicly-inaccessible, unscrape-able, and otherwise hidden data)

9

Page 10: Eavesdropping on the Twitter Microblogging Site

140 CHARACTERS (CONT.)

Publicly Accessible Data from the Twitter API and Limits

Twitter API (application programming interface) enables very partial capture of available data on a topic Is a rate-limited feature Enables access to a few percent of the available messaging Requires authenticated sign-in for “whitelisting”

Are challenges with assertability because of the non-random capture of data and inherent limits

For full data, need to go with Gnip as a provider of big and social data from a variety of social media platforms

10

Page 11: Eavesdropping on the Twitter Microblogging Site

WAYS THE MICROBLOGGING PLATFORM IS USED BY THE CROWD

#hashtag campaigns and social movements

Streaming live events via Web and mobile

Hosting public conversations with others

Expressing social and political power (and solidarity); commenting on social issues and sparking people to action

Ego expression and social performance

Engaging real-world role-playing games

Calling out other individuals and groups for certain behaviors through mockery and challenges

Engaging in online socio-cultural traditions like sharing food images, #TBTs, selfie-sharing, and others

11

Page 12: Eavesdropping on the Twitter Microblogging Site

WAYS THE MICROBLOGGING PLATFORM IS USED BY THE CROWD (CONT.)

Connecting social media endeavors across various platforms (image-sharing, video-sharing, and others)

Driving traffic

Deploying automated agents / robots (‘bots) to elicit information, communicate information, create a sense of artificial virality

Advertising products and services

Exploring particular social and other phenomena through data elicitation and research

Disseminating information about threats to citizens Weather, crime, wildfire, and other

data

Sharing weather sensor information; sharing air quality information

12

Page 13: Eavesdropping on the Twitter Microblogging Site

WAYS THE MICROBLOGGING PLATFORM IS USED BY THE CROWD (CONT.)

Enhancing e-governance (the work of democratic governments through electronic means) Eliciting citizen feedback for various

proposed laws and endeavors

Making social and professional relationships

Acquiring digital coupons and resources

Understanding certain locations and the interests of certain locations (such as around events)

and others…

13

Page 14: Eavesdropping on the Twitter Microblogging Site

WHY A GOOD SOURCE FOR RESEARCH?

Cyber interlinked with the physical world (cyber-physical confluence) May target a particular area to capture microblogging messages being shared in near-

real time

Social media platform where people congregate and interact (particularly through mobile devices) and a culture of hyper-sharing Is regularly integrated with mainstream media Includes highly dynamic data

Ability to share via any language expressible via UTF-8 Unicode character set and with multimedia and with links

14

Page 15: Eavesdropping on the Twitter Microblogging Site

WHY A GOOD SOURCE FOR RESEARCH? (CONT.)

Data leakage (unintended sharing of information) Human impulsivity, with personal guard up; near-constant “status updates” to an

imagined audience Lack of full control in terms of strategic information sharing Inadvertent digital recording of imagery / sound / audio Metadata capture (such as EXIF data) Narrow-casting to an intended audience but broadcasting to all Assumption of ephemeral interactions and erase-ability of messages Accidental “send” Self-talking / talking to self in online public spaces Accidental change to privacy settings Letting an untrustworthy member into a private network

15

Page 16: Eavesdropping on the Twitter Microblogging Site

WHY A GOOD SOURCE FOR RESEARCH? (CONT.)

Human analytical capabilities applied to the data May engage in close readings for understandings May engage the imagery May engage the language May engage the URLs May engage the public reputations and interrelationships

Ability to collate data across regions, user accounts, topics, events, and other elements using various applications (that tap into the Twitter API)

Ability to access full sets of “N” through commercial means for research purposes

16

Page 17: Eavesdropping on the Twitter Microblogging Site

WHY A GOOD SOURCE FOR RESEARCH? (CONT.)

Ability to apply state-of-the-art computational analytics Social network analysis Text analyses Linguistic analysis Word clusters Word co-occurrence / matrix analyses, and others Sentiment analysis Emotion analysis Theme and sub-theme extraction / topic modeling

Geographical mapping

17

Page 18: Eavesdropping on the Twitter Microblogging Site

SOME COMMON METHODS FOR EAVESDROPPING

18

Page 19: Eavesdropping on the Twitter Microblogging Site

SOME ASPECTS OF DATA QUALITY IN TWITTER

Raw or processed data (and summary data)

In context of interactivity or not in social context

Verified or not

Dynamic real-time or time-delayed data or historical data

Complete set (N=all) or partial set (albeit not a random selection)

Customized data or not

Filtered data or not

Data analytics enhanced or non-data-analytics enhanced

Limited access or accessible-to-all Private or public

19

Page 20: Eavesdropping on the Twitter Microblogging Site

VARYING QUALITY STANDARDS OF TWITTER INFORMATION (BASED ON ACCESS LEVELS AND ANALYTICAL CAPABILITIES)

• N = all (gold standard) • Access through commercial means

• Sophisticated research and analytics techniques

• Limited access from Twitter API

• Partial data extraction through scraping

• Partial data extraction through third-party data exporters

• Individual usage (based on EULA and affordances)

• Engaged and interactive

• With community

• With multimedia

• Broad common usage (for informational purposes)

20

Page 21: Eavesdropping on the Twitter Microblogging Site

SOME COMMON METHODS USED FOR EAVESDROPPING ON TWITTER

Following and interacting with people and groups on their Tweetstreams Only requires connectivity and an account on Twitter

Mapping social network graphs Requires access to data and software to map network graphs

Drawing content networks (such as word relationships from a Tweet set) Requires access to data, software to analyze the text

Mapping eventgraphs through “human sensor networks” Requires access to data over time, over space, over topic, and over social media user

account an Requires ability to translate data from other languages back to English (or the base

language) Requires ability to computationally draw data as data visualizations

21

Page 22: Eavesdropping on the Twitter Microblogging Site

SOME COMMON METHODS USED FOR EAVESDROPPING ON TWITTER (CONT.)

Capturing #hashtag conversations Requires access to the data (through software tools, through high-level computer

language for data scraping) Requires ability to map hashtag conversations based on users, communications, and

other dimensions Requires ability to interact with the extracted textual and multimedia data

Capturing keyword searches Requires access to the data and ability to map networks and interact with the textual

and multimedia data (see above)

22

Page 23: Eavesdropping on the Twitter Microblogging Site

SOME COMMON METHODS USED FOR EAVESDROPPING ON TWITTER (CONT.)

Drawing geographical maps to spatially map social networks Requires access to the data Requires ability to map geolocational coordinates to spatial locations on a map

(including dense clusters)

Profiling individuals and groups remotely (zero-interaction profiling) Requires access to the data Requires access to profile information Requires access to the expressions of the site holder (text, imagery, audio, and video)

23

Page 24: Eavesdropping on the Twitter Microblogging Site

SOME COMMON TWITTER DATA CAPTURE TOOLS

NodeXL (Network Overview, Discovery and Exploration for Excel, free add-on to Excel in the Basic version)

NCapture web browser add-on linked to NVivo 11 Plus (proprietary software tool)

R (free, high-level programming language )

Python (free, high-level programming language)

Also online cloud-based data download tools: Twitter Advanced Search Digital Methods Initiative (DMI) Tools Netlytic

24

Page 25: Eavesdropping on the Twitter Microblogging Site

TYPES OF DATA FROM TWITTER

25

Page 26: Eavesdropping on the Twitter Microblogging Site

DATA SETS OF TWITTER DATA

Row ID Tweet ID Username Tweet TimeTweet Type

Retweeted By

Number of Retweets Hashtags Mentions Name Location Web Bio

Number of Tweets

Number of Followers

Number Following

Location Coordinates

26

Page 27: Eavesdropping on the Twitter Microblogging Site

TYPES OF DATA FROM TWITTER

Machine coding and analysis

Text Symbolic processing required

Metadata Relational (trace) data Profile information Time zone information Locational information Locational coordinate data, and others

Manual coding and analysis

Imagery High dimensionality data

Video High dimensionality data Multi-sensory data

Links High dimensionality data Multi-sensory data

27

Page 28: Eavesdropping on the Twitter Microblogging Site

FEATURES OF THE DATA SET

Structured data

Quantitative

Unstructured / semi-structured data

Qualitative Textual data Multimedia Digital imagery (including automated

gifs) Audio Video Live video streams Interactive contents

28

Page 29: Eavesdropping on the Twitter Microblogging Site

29

Page 30: Eavesdropping on the Twitter Microblogging Site

30

Page 31: Eavesdropping on the Twitter Microblogging Site

31

Page 32: Eavesdropping on the Twitter Microblogging Site

32

Page 33: Eavesdropping on the Twitter Microblogging Site

33

Page 34: Eavesdropping on the Twitter Microblogging Site

34

Page 35: Eavesdropping on the Twitter Microblogging Site

35

Page 36: Eavesdropping on the Twitter Microblogging Site

36

Page 37: Eavesdropping on the Twitter Microblogging Site

EXPLORING MICROBLOGGING DATA

37

Page 38: Eavesdropping on the Twitter Microblogging Site

UNDERSTANDING DATA LIMITATIONS

Dynamic Tweetsets are rate- and size-limited on the Twitter API Captured sets are from the most recent and work backwards Twitter sets tend to be highly volatile, with quite a few changes over time as compared

to other data from social media like related tags networks on Flickr or Wikipedia article networks Extracted data are accurate for a certain short amount of time and then must be

updated for accuracy

Tweetstream extractions from a target account may range from 1% to 100% of the available Tweets, depending on the account activity and length of existence (and whether retweets / RTs are included)

It is possible to sample various datasets form Twitter over time for time-based insights

38

Page 39: Eavesdropping on the Twitter Microblogging Site

UNDERSTANDING DATA LIMITATIONS (CONT.)

When reporting research, it’s important to explain data limitations, and use appropriate qualifiers.

Data visualizations are always summary data and so will need textual augmentation / explanation for sufficient clarity. It may be helpful to share examples of the underlying data as well.

39

Page 40: Eavesdropping on the Twitter Microblogging Site

RICH DATA HANDLING

Twitter data contains messaging, trace, and metadata Messaging consists of text, imagery, audio, video, and other rich types of data types Rich data types require plenty of manual analysis for deeper insights Text data may be analyzed using a combination of human “close reading” and manual coding

and “distant reading” and machine coding

Twitter trace (relational) data needs to be mapped to networks for analytics Twitter metadata needs to be analyzed using both human analysis and machine analysis Some locational / location coordinates metadata may be mapped to a geographical map Time data may be mapped to line charts and bar charts Profile metadata may be analyzed using text analytics tools and close human reading

40

Page 41: Eavesdropping on the Twitter Microblogging Site

EXPLORING MICROBLOGGING DATA

Manual coding

Theory-informed coding for dominant themes

Emergent coding (from the data set), with qual and quant insights Identification of data types Identification of points of interest

Identification of personalities

Auto coding

Theme and subtheme extraction

Sentiment mining and analysis

41

Page 42: Eavesdropping on the Twitter Microblogging Site

EXPLORING MICROBLOGGING DATA (CONT.)

Data queries

Text frequency count

Text search

Matrix query

Cluster analysis

Data visualizations

Word clouds

Word trees

Dendrograms (vertical and horizontal)

Cluster diagrams (2D and 3D)

Hierarchy charts: treemap diagrams, sunburst graphs

Ring lattice graphs / circle graphs

Intensity matrices

Bar charts

Geographical maps, and others

42

Page 43: Eavesdropping on the Twitter Microblogging Site

SOME WIDELY AVAILABLE SOFTWARE TOOLS FOR DATA EXTRACTION

Using the right tool for the right questions NodeXL NCapture of NVivo R Python

Must read the evolving Twitter API page to understand the limits of the data extraction

43

Page 44: Eavesdropping on the Twitter Microblogging Site

SOME PRACTICAL RESEARCH APPLICATIONS

44

Page 45: Eavesdropping on the Twitter Microblogging Site

SOME PRACTICAL RESEARCH APPLICATIONS

Capturing a range of data to try to understand an issue

Using social media data for decision-making

Exploring relationships on social media sites to understand leaders and their messaging, to understand followers, and the state of the social network

Using social media messaging to “remote profile” others (even with zero interactions)

Extracting major themes from online groups / #hashtag networks / keyword communities, and profile-specific Tweetstreams

Analyzing sentiment from online communities / networks and profile-specific Tweetstreams

45

Page 46: Eavesdropping on the Twitter Microblogging Site

SOME PRACTICAL AWARENESS AND DECISION-MAKING

APPLICATIONS

46

Page 47: Eavesdropping on the Twitter Microblogging Site

SOME PRACTICAL AWARENESS AND DECISION-MAKING APPLICATIONS

Designing and deploying messaging campaigns

Identifying influential individuals in order to pass along a message (whether microcasting or broadcasting or mixed casting)

Learning about online communities

Understanding other online personalities (egos, entities)

47

Page 48: Eavesdropping on the Twitter Microblogging Site

RECENT PUBLISHED RUMORS OF POSSIBLE CHANGES TO THE SERVICE

Possible raising of the 140-character limit?

Pressures to make the platform more engaging and relevant?

Clearer labeling of commercial accounts?

Improved usage of the platform for learning?

48

Page 49: Eavesdropping on the Twitter Microblogging Site

DEMOS

49

Page 50: Eavesdropping on the Twitter Microblogging Site

50

Page 51: Eavesdropping on the Twitter Microblogging Site

QUESTIONS? COMMENTS?

Ideas for other possible uses?

51

Page 52: Eavesdropping on the Twitter Microblogging Site

CONCLUSION AND CONTACT

Dr. Shalin Hai-Jew iTAC, Kansas State University 212 Hale / Farrell Library [email protected] 785-532-5262

The presenter has no formal tie to Twitter, Inc.

The Twitter logo on the cover is from the company and aligns with the company’s terms of usage to represent Twitter. The map on Slide 5 is by FOBOS92 and was released in 2012 for usage through the Wikimedia Commons and the CC Attribution-Share Alike 3.0 license. The other images were created by the author using a range of software tools, including NodeXL, NCapture, NVivo 11 Plus, and Microsoft Visio.

52