eavesdropping on the twitter microblogging site

Download Eavesdropping on the Twitter Microblogging Site

Post on 12-Apr-2017

191 views

Category:

Social Media

0 download

Embed Size (px)

TRANSCRIPT

  • EAVESDROPPING ON THE TWITTER MICROBLOGGING SITE

    Summer Institute on Distance Learning and Instructional Technology (SIDLIT 2016)

    August 4-5, 2016

  • OVERVIEW

    Research analysts go to Twitter to capture the general trends of public conversations, identify and profile influential accounts, and extract subgroups within larger collectives and larger discourses; they also go to eavesdrop on individual self-talk and individual-to-individual conversations. So what is technically in your tweets, asked Dave Rosenberg famously in a CNET article (2010). The answer: a whole lot more than 140 characters. How are the most influential social media accounts identified through #hashtag graphs? How are themes extracted? How are sentiments understood? How can users be profiled through their Tweetstreams? How can locations be mapped in terms of the Twitter conversations occurring in particular physical areas? How can live and trending issues be identified and categorized in terms of sentiment (positive, negative, and neutral)? This presentation will summarize some of the free and open-source tools as well as commercial and proprietary ones that enable increased knowability.

    2

    http://www.cnet.com/news/whats-technically-in-your-tweets/http://i.i.cbsi.com/cnwk.1d/i/tim/2010/04/20/30146338-map-of-a-tweet.png

  • ABOUT TWITTER

    3

  • TWITTER DEMOGRAPHICS

    320 million monthly active users

    A billion unique visits monthly to sites with embedded Tweets

    80% active users on mobile

    Support for 35 languages (About Twitter, 2015)

    4

    https://about.twitter.com/company

  • COUNTRIES AND CITIES WITH LOCAL TRENDING TOPICS IN TWITTER (BY FOBOS92)

    5

    https://upload.wikimedia.org/wikipedia/commons/e/e4/Twitter_TT.PNGhttps://en.wikipedia.org/wiki/Twitter#/media/File:Twitter_TT.PNG

  • BUSINESS MODEL

    Runs on advertising based on delivery of human attention and encouragement of certain types of consumption

    Intersperses funded commercial messaging into regular messaging

    Enables fine-tuned targeting of desirable audiences along with various metrics(Advertising on Twitter)

    6

    https://business.twitter.com/en/advertising.html

  • 140 CHARACTERS

    Geographical Coverage

    Foremost microblogging site in major parts of the world and a main part of the social ecosystem

    Blocked in some countries: Turkey, Iran, China, and North Korea (Liebelson, Mar. 28, 2014) and with some temporary phases of inaccessibility in a number of other countries (Censorship of Twitter, Apr. 20, 2016) May be based in part on market protectionism for native microblogging services May be political May be due to a mix of factors

    English is the predominant language used

    7

    http://www.motherjones.com/politics/2014/03/turkey-facebook-youtube-twitter-blockedhttps://en.wikipedia.org/wiki/Censorship_of_Twitter

  • 140 CHARACTERS (CONT.)

    User Accounts

    Verified and unverified user accounts (with regular efforts to clean off spam accounts)

    Human, robot, sensor, and cyborg accounts User-created profile data Image data, video data Start-date of account Tweets, following, followers, likes, and lists (account status)

    Notifications for raising sociality: Who is following you, who retweeted you, who replied, and others

    8

  • 140 CHARACTERS (CONT.)

    Data Types

    Content data: Text messages, URL links (including shortened form links), images, Vine video snippets

    Trace data: Online social network relationships based on replies, liking, mentions, following / unfollowing, addressing @ accounts, #hashtagging around a shared topic of interest, and other types of interactions

    Metadata: Geolocational information, exchangeable image format (EXIF) data from imagery, and others

    Long memory of contents (constant recording), even of deleted messages; recoverability of data

    May have private networks (with publicly-inaccessible, unscrape-able, and otherwise hidden data)

    9

  • 140 CHARACTERS (CONT.)

    Publicly Accessible Data from the Twitter API and Limits

    Twitter API (application programming interface) enables very partial capture of available data on a topic Is a rate-limited feature Enables access to a few percent of the available messaging Requires authenticated sign-in for whitelisting

    Are challenges with assertability because of the non-random capture of data and inherent limits

    For full data, need to go with Gnip as a provider of big and social data from a variety of social media platforms

    10

  • WAYS THE MICROBLOGGING PLATFORM IS USED BY THE CROWD

    #hashtag campaigns and social movements

    Streaming live events via Web and mobile

    Hosting public conversations with others

    Expressing social and political power (and solidarity); commenting on social issues and sparking people to action

    Ego expression and social performance

    Engaging real-world role-playing games

    Calling out other individuals and groups for certain behaviors through mockery and challenges

    Engaging in online socio-cultural traditions like sharing food images, #TBTs, selfie-sharing, and others

    11

  • WAYS THE MICROBLOGGING PLATFORM IS USED BY THE CROWD (CONT.)

    Connecting social media endeavors across various platforms (image-sharing, video-sharing, and others)

    Driving traffic

    Deploying automated agents / robots (bots) to elicit information, communicate information, create a sense of artificial virality

    Advertising products and services

    Exploring particular social and other phenomena through data elicitation and research

    Disseminating information about threats to citizens Weather, crime, wildfire, and other

    data

    Sharing weather sensor information; sharing air quality information

    12

  • WAYS THE MICROBLOGGING PLATFORM IS USED BY THE CROWD (CONT.)

    Enhancing e-governance (the work of democratic governments through electronic means) Eliciting citizen feedback for various

    proposed laws and endeavors

    Making social and professional relationships

    Acquiring digital coupons and resources

    Understanding certain locations and the interests of certain locations (such as around events)

    and others

    13

  • WHY A GOOD SOURCE FOR RESEARCH?

    Cyber interlinked with the physical world (cyber-physical confluence) May target a particular area to capture microblogging messages being shared in near-

    real time

    Social media platform where people congregate and interact (particularly through mobile devices) and a culture of hyper-sharing Is regularly integrated with mainstream media Includes highly dynamic data

    Ability to share via any language expressible via UTF-8 Unicode character set and with multimedia and with links

    14

  • WHY A GOOD SOURCE FOR RESEARCH? (CONT.)

    Data leakage (unintended sharing of information) Human impulsivity, with personal guard up; near-constant status updates to an

    imagined audience Lack of full control in terms of strategic information sharing Inadvertent digital recording of imagery / sound / audio Metadata capture (such as EXIF data) Narrow-casting to an intended audience but broadcasting to all Assumption of ephemeral interactions and erase-ability of messages Accidental send Self-talking / talking to self in online public spaces Accidental change to privacy settings Letting an untrustworthy member into a private network

    15

  • WHY A GOOD SOURCE FOR RESEARCH? (CONT.)

    Human analytical capabilities applied to the data May engage in close readings for understandings May engage the imagery May engage the language May engage the URLs May engage the public reputations and interrelationships

    Ability to collate data across regions, user accounts, topics, events, and other elements using various applications (that tap into the Twitter API)

    Ability to access full sets of N through commercial means for research purposes

    16

  • WHY A GOOD SOURCE FOR RESEARCH? (CONT.)

    Ability to apply state-of-the-art computational analytics Social network analysis Text analyses

    Linguistic analysis Word clusters Word co-occurrence / matrix analyses, and others Sentiment analysis Emotion analysis Theme and sub-theme extraction / topic modeling

    Geographical mapping

    17

  • SOME COMMON METHODS FOR EAVESDROPPING

    18

  • SOME ASPECTS OF DATA QUALITY IN TWITTER

    Raw or processed data (and summary data)

    In context of interactivity or not in social context

    Verified or not

    Dynamic real-time or time-delayed data or historical data

    Complete set (N=all) or partial set (albeit not a random selection)

    Customized data or not

    Filtered data or not

    Data analytics enhanced or non-data-analytics enhanced

    Limited access or accessible-to-all Private or public

    19

  • VARYING QUALITY STANDARDS OF TWITTER INFORMATION (BASED ON ACCESS LEVELS AND ANALYTICAL CAPABILITIES)

    N = all (gold standard) Access through commercial means Sophisticated research and analytics

    techniques

    Limited access from Twitter API

    Partial data extraction through scraping

    Partial data extraction through third-party data exporters

    Individual usage (based on EULA and affordances)

    Engaged and interactive With community With multimedia

    Broad common usage (for informational purposes)

    20

  • SOME COMMON METHODS USED FOR EAVESDROPPING ON TWIT