grounded theory meets big data: one way to marry ethnography and digital methods

18
Grounded theory meets Big Data: One way to marry ethnography and digital methods May 2016 Dhiraj Murthy | @dhirajmurthy | [email protected] CAST: Social Media Research Cluster

Upload: citizens-in-the-making

Post on 10-Apr-2017

133 views

Category:

Science


1 download

TRANSCRIPT

@dhirajmurthy 1

Grounded theory meets Big Data: One way to marry ethnography and digital methods

May 2016

Dhiraj Murthy | @dhirajmurthy | [email protected]

CAST: Social Media Research Cluster

@dhirajmurthy 2

Objectives

• There are unique challenges associated with data collection and analysis on social media platforms

• How do we integrate and weigh Big Data questions and more in-depth contextualized analysis of social media content?

• How do we categorize textual and visual content, addressing issues of ontology?

• How can grounded theory be applied to coding schemes?

@dhirajmurthy 3

Starting points

•  Big data methods successfully applied to Twitter data (indeed 16% of research on Twitter employed sentiment analysis (Zimmer and Proferes 2014)

•  We may think that anything about human behavior can be deciphered from Twitter data, but that simply is not true.

•  There are also challenges associated with data collection and analysis on Twitter (boyd & Crawford, 2012).

•  Closed coding systems are thought to be the best for studying Twitter data

•  However, social media data involves very ‘messy’ elements and mixed approaches can have high utility

@dhirajmurthy 4

New ontologies

So perhaps we need to … challenge traditional ontological assumptions!

Hardt and Negri (2005, p. 312) argue that this type of a critical ‘new ontology’ is part of their desire not to engage in “repeating old rituals”, but, rather, “launching a new investigation in order to formulate a new science of society and politics [… that] is not about piling up statistics or mere sociological facts [… but] immersing ourselves in the movements of history and the anthropological transformations of subjectivity.”

@dhirajmurthy 5

First: So what does Twitter API data ‘look like’ "user": { "name": "dhirajmurthy", "friendsCount": 771, "followersCount": 1534, "listedCount": 100, "statusesCount": 2609, } This is an excerpt of API-delivered JavaScript Object

Notation (JSON) data for my Twitter ID

@dhirajmurthy 6

What is often missing in Twitter-based research •  Be open in the inquiry, allowing coding to be emergent.

•  Ask what is happening in the tweet (not just body text). Think about JSON data holistically.

•  What are these tweet data helping us study, speaking broadly?

•  Are we being reflexive on the point of view/standpoint we are interpreting?

•  Are we being flexible or following prescribed rules?

@dhirajmurthy 7

Beyond induction and deduction… •  ‘Big data is [..] most effective when researchers take

account of the complex methodological processes that underlie the analysis of that data’. boyd & Crawford (2012, p. 668)

•  And inductive and deductive methods have their own limitations

@dhirajmurthy 8

Beyond induction and deduction… •  Abductive methods: a form of reasoning ‘for finding the

best explanations among a set of possible ones’ (Paul, 1993) are alternative approach

•  Retroduction: a type of abductive method that

emphasizes “asking why” (Olsen, 2012: 215), researchers are able to probe the data regularly and to “avoid overgeneralisation but searching for reasons and causes” (p. 216) instead.

Or put another way, “the retroductive researcher, unlike the inductive researcher, has something to look for” (Blaikie, 2004).

@dhirajmurthy 9

Methods

Emergent coding methods can be implemented operationally in a systematic fashion to build critical, reflective, conceptual knowledge of Twitter-derived data.

Theory building, Adapted from Goulding, C. (2002), Grounded Theory: Sage, p. 115

@dhirajmurthy 10

In Practice •  Be open in the inquiry, allowing coding to be emergent.

•  Tweets are not merely bits of text. Ask what is happening in the tweet (not just body text). Think about JSON object data holistically (c.f. Manovich’s (2001) ‘digital objects’).

•  What are these tweet data helping us study, speaking broadly?

•  Are we being reflexive on the point of view / standpoint we are interpreting?

•  Are we being flexible or following prescribed rules?

@dhirajmurthy 11

Case study: Accidental Racist

@dhirajmurthy 12

Data collection and relationship model; Figure adapted from Corbin, J. and Strauss, A (2015), Basics of qualitative research: techniques and procedures for developing grounded theory, Thousand Oaks: Sage, pg. 8

Continuous open coding Twitter data model applied to #accidentalracist, a hashtag associated with a 2013 duet by Brad Paisley and LL Cool J

@dhirajmurthy 13

•  Operationalizing this ontology requires several stages of coding

•  Memo making during collection and analysis is integral to both coding development and theory building

•  Comparisons across diverse data at each stage provide reflexivity and triangulation

@dhirajmurthy 14

Computational method first •  One can effectively use

machine learning approaches such as Latent Dirichlet allocation (LDA) to derive topic clusters around a Twitter corpus

•  This can be used to inform what coding categories are deployed for not only tweet content, but profiles and other metadata

•  Example: Topic clusters derived from 90,986 cancer-related tweets (with keywords: cancer, mammogram, lymphoma, melanoma, and cancer survivor)

@dhirajmurthy 15

Conclusions •  Social media are complex sociotechnical spaces

•  Presentation of the self is often highly nuanced – a case particularly complicated with uses of humor, a frequent theme on Twitter

•  Coded content can present different perspectives on social interactions and these data are complementary to computational methods

•  Combining emergent grounded theory with machine learning or vice versa can advance both qualitative and computational methods

@dhirajmurthy 16

Dhiraj Murthy Reader of Sociology at Goldsmiths,

University of London

@dhirajmurthy

[email protected]

@dhirajmurthy 17

References Blaikie, N. (2004). Retroduction. In M. S. Lewis-Beck, A. Bryman & T. F. Liao (Eds.), The

SAGE Encyclopedia of Social Science Research Methods (pp. 973). Thousand Oaks: Sage. boyd, d., & Crawford, K. (2012). Critical questions for Big Data: Provocations for a cultural,

technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662-679.

Corbin, J., & Strauss, A. (2015). Basics of qualitative research : techniques and procedures for developing grounded theory. Los Angeles: Sage.

Hardt, M., & Negri, A. (2005). Multitude war and democracy in the age of Empire, New York: Penguin.

Murthy, D. (2011). Emergent digital ethnographic methods for social research. Handbook of Emergent Technologies in Social Research, Oxford University Press, Oxford, 158-179.

Olsen, W. K. (2012). Data collection : key debates and methods in social research. London; Thousand Oaks, Calif.: SAGE.

Paul, G. (1993). Approaches to abductive reasoning: an overview. Artificial Intelligence Review, 7(2), 109-152.

Zimmer, M., & Proferes, N. J. (2014). A topology of Twitter research: disciplines, methods, and ethics. Aslib Journal of Information Management, 66(3), 250-261. doi: doi:10.1108/AJIM-09-2013-0083.

@dhirajmurthy 18

Selected Work Most can be downloaded from http://www.dhirajmurthy.com/about/ Twitter: Social Communication in the Twitter Age. 2013, with Polity Press ‘Big Data Solutions On a Small Scale: Evaluating Accessible High Performance Computing for Social

Research’, Big Data and Society (with Bowman, S.), 2014 Modeling virtual organizations with Latent Dirichlet Allocation: A case for natural language processing‘,

Neural Networks (with Gross, A.), Volume 58, pp. 38-49, 2014. ‘Social Media, Collaboration, and Scientific Organizations.’ American Behavioral Scientist., (with Lewis,

J.P.), 2014. ‘Comparing Print Coverage and Tweets in Elections: a Case Study of the 2011-2012 US Republican

Primaries‘, Social Science Computer Review (with Petto, L.), 2014 ‘Twitter and Disasters: the uses of Twitter during the 2010 Pakistan floods‘, Information

Communication and Society, Volume 16, Issue 6, 2013, pp. 837-855. ‘Emergent Data Mining Tools for Social Network Analysis‘ in Data Mining in Dynamic Social Networks

and Fuzzy Systems (Bhatnagar, V. ed.), pp 40-57 , (with Gross, A. and Takata, A.), 2013. ‘Evaluation and Development of Data Mining Tools for Online Social Networks’ in Mining Social

Networks and Security Informatics ( Özyer, T. et al. eds.) , pp 183-202 (with Gross, A., Takata, A., Bond, S.), 2013. Evaluation and Development of Data Mining Tools for Online Social Networks.

Murthy, D., Gross, A., Oliveira, D. ‘Understanding Cancer-based Networks in Twitter using Social Network Analysis’ in IEEE International Conference on Semantic Computing Proceedings. Palo Alto, California, 2011