internet archives and social science research - yeungnam university

31
BIG DATA AND SOCIAL SCIENCE THEORY Leveraging Large Scale Data to Discover New Pa4erns in Society Monday, April 7, 2014 CybermoCons @ Korea Yeungnam University Ma4hew Weber Rutgers University School of CommunicaCon & InformaCon

Upload: mwe400

Post on 22-Jan-2015

101 views

Category:

Data & Analytics


0 download

DESCRIPTION

Talk given at Yeungnam University on April 6, 2014

TRANSCRIPT

Page 1: Internet Archives and Social Science Research - Yeungnam University

BIG  DATA  AND  SOCIAL  SCIENCE  THEORY  Leveraging  Large  Scale  Data  to  Discover  New  Pa4erns  in  Society  

Monday,  April  7,  2014  CybermoCons  @  Korea  Yeungnam  University    

Ma4hew  Weber  Rutgers  University  School  of  CommunicaCon  &  InformaCon  

Page 2: Internet Archives and Social Science Research - Yeungnam University

2

Opportunity:  The  Internet  Archive  contains  the  largest  single  record  of  the  history  of  the  World  Wide  Web  from  1995  to  the  present—a  wealth  of  untapped  research  data.    

Challenge:  There  is  a  significant  lack  of  research-­‐ready  databases  and  tools  available  to  the  scholarly  community  

Page 3: Internet Archives and Social Science Research - Yeungnam University

© Internet Archive 2013

Page 4: Internet Archives and Social Science Research - Yeungnam University

©  Internet  Archive  2013  

Page 5: Internet Archives and Social Science Research - Yeungnam University

5

Page 6: Internet Archives and Social Science Research - Yeungnam University

6

Page 7: Internet Archives and Social Science Research - Yeungnam University

7

Page 8: Internet Archives and Social Science Research - Yeungnam University

8

Page 9: Internet Archives and Social Science Research - Yeungnam University

9

Page 10: Internet Archives and Social Science Research - Yeungnam University

10

Opportunity:  The  ArchiveHub  project  aims  to  support  the  creaCon  and  disseminaCon  of  general  guidelines  &  tools  for  conducCng  theoreCcally  and  methodologically  rigorous  

longitudinal  research  using  archival  Web  data    

Page 11: Internet Archives and Social Science Research - Yeungnam University

11

Page 12: Internet Archives and Social Science Research - Yeungnam University

12

Page 13: Internet Archives and Social Science Research - Yeungnam University

13

Page 14: Internet Archives and Social Science Research - Yeungnam University

14

Dataset   Research  PotenAal   Dates   Captures   Unique  URLs  

Hurricane  Katrina   Online  networks  and  organizaConal  resilience  (Chewning,  Lai  and  Doerfel,  2012;  Perry,  Taylor  and  Doerfel,  2003)  in  the  wake  of  disasters;  informaCon  disseminaCon    

2003  –  2012   1,694,236   663,740    

Superstorm  Sandy   2003  –  2012   41,703,112   20,013,455  

US  Senate   Study  the  growth  of  poliCcal  acCvity  in  online  environments  (Adamic  &  Glance,  2005;  Bruns,  2007;  Chang  &  Park,  2012);  polarizaCon  &  media  discourse  

109th  –  112th  Congresses  

26,965,770    8,674,397    

US  House   51,840,777   12,410,014  

Occupy  Wall  Street  

Previous  research  on  NGOs  in  the  online  environment  (Bach  &  Stark,  2004;  Shumate,  2003,  2012;  Shumate,  Fulk,  &  Monge,  2005);  use  of  hyperlink  data  to  study  the  formaCon  and  role  of  alliances  between  SMOs  

2010  –  2012   247,928,272   11,3259,655  

US  Media  

Previous  studies  of  news  media  organizaCons  (Greer  &  Mensing,  2006;  Weber,  2012;  Weber  &  Monge,  In  Press);  focus  on  evoluConary  pa4erns  

2008  –  2012   1,315,132,555   539,184,823  

Page 15: Internet Archives and Social Science Research - Yeungnam University

15

http://archivehub.rutgers.edu

Page 16: Internet Archives and Social Science Research - Yeungnam University

16

Page 17: Internet Archives and Social Science Research - Yeungnam University

Tracing  the  Emergence  of  OrganizaConal  Forms  

17

Environment:    OrganizaCons  compete  for  scare  resources;  during  rapid  periods  of  

disrupCon,  new  entrants  seek  “protected”  niches  (Weber  &  Monge  2014)

PopulaAon:    In  digital  spaces,  online  connecCons  provide  communicaCve  representaCons  of  

informaCon  flows  (Weber  &  Monge,  2012)    

FormaCon  of  Ces  (e.g.  hyperlinks)  can  posiCvely  impact  long-­‐term  likelihood  of  organizaCon  survival  (Weber,  2012)  

OrganizaAon:    OrganizaCons  adapt  internally,  reconfiguring  team  structures  and  

developing  new  rouCnes  for  knowledge  sharing    (Ellison,  Gibbs  &  Weber,  In  Press;  Weber  &  Kim,  Under  Review)

Page 18: Internet Archives and Social Science Research - Yeungnam University

18

Page 19: Internet Archives and Social Science Research - Yeungnam University

Big Data… Big Theory?  

•  Networks  are  central  to  social  movements  in  that  links  between  nodes  can  be  influenCal  in  collecCve  acCon  

•  Examples  of  nodes  includes  parCcipants,  organizaCons,  media  and  communicaCons  technologies    •  Social  networks  and  social  movements  (Diani,  2003)  

 

•  The  interacCon  between  actors,  and  between  actors  and  hashtags,  collecCvely  represent  a  networked  form  of  organizaCon    •  Network  form  of  organizaCon  (Powell,  1990)  

Page 20: Internet Archives and Social Science Research - Yeungnam University

Over time, dyadic communication will become prevalent in an emerging networked organization. H1:  

As a social movement develops as an emerging network form of organization, the organizational structure will be increasingly clustered.

H2:  

Page 21: Internet Archives and Social Science Research - Yeungnam University

Data  

•  TriangulaCon  of  data  insulates  against  false  readings  from  large-­‐scale  data  (see  Lazer,  Kennedy,  King  and  Vespignani,  2014)  

•  Internet  Archive:  –  14  websites;  4,504  hyperlink  dyads  over  a  2-­‐month  period.  

•  Lexis  Nexis:  –  Search  conducted  to  assess  U.S.  newspaper  coverage  of  OWS  from  the  early  stages  of  the  

movement  in  September  2011  through  Sept.  2012  –  Search  OWS  keywords,  e.g.  “Occupy  Wall  Street,”  “Occupy  Oakland”  

•  Twi4er  –  Gnip  PowerTrack    

•  Search  by  keywords;  captures  a  larger  volume  of  Twi4er  data  than  other  opCons    –  Sample  includes  October  17,  2011,  through  January  5,  2012.  IniCal  study  focused  on  the  

criCcal  two-­‐month  period  from  November  1  through  December  31,  2011,    –  750,816  tweets  across  the  two-­‐month  period.    

21

Page 22: Internet Archives and Social Science Research - Yeungnam University
Page 23: Internet Archives and Social Science Research - Yeungnam University

OWS News Coverage  

Page 24: Internet Archives and Social Science Research - Yeungnam University

OWS  on  the  Web  

•  335  seed  organizaCons  based  on  records  from  #OccupyResearch  •  Data  extracted  for  2011  &  2012,  based  on  “both  matching”  

24

0  

2  

4  

6  

8  

10  

12  

14  

16  

18  

Millions  

Captures  per  Month  

Page 25: Internet Archives and Social Science Research - Yeungnam University

Maximal  Cores  (k  Coreness)  

25

Aug.  2011   Jan.  2012  

Page 26: Internet Archives and Social Science Research - Yeungnam University

26

 -­‐        

 10,000.00    

 20,000.00    

 30,000.00    

 40,000.00    

 50,000.00    

 60,000.00    

 70,000.00    

 80,000.00    

Edges  

60  

80  

100  

120  

140  

160  

180  

VerAces  

Page 27: Internet Archives and Social Science Research - Yeungnam University

27

0  

0.01  

0.02  

0.03  

0.04  

0.05  

0.06  

0.07  

0.08  

Density  

Page 28: Internet Archives and Social Science Research - Yeungnam University

28

0  

10  

20  

30  

40  

50  

60  

70  

80  

90  

100  

Clusters  

Page 29: Internet Archives and Social Science Research - Yeungnam University

29

Page 30: Internet Archives and Social Science Research - Yeungnam University

ImplicaCons  

•  Big  Data:  –  Guiding  data  collecCon  with  theoreCcally  grounded  quesCons  avoids  the  

“needle-­‐in-­‐the-­‐haystack”  problem  –  Leverage  advances  in  compuCng  with  exisCng  theories  to  develop  robust  

studies  of  social  science  phenomenon    

•  Big  Theory:  –  Expanding  prior  theories  on  networked  organizaConal  forms  and  form  

emergence  (evoluConary)  –  Building  toward  a  macro  theory  of  organizaConal  form  emergence  based  on  

resource  availability  and  networks  

30

Page 31: Internet Archives and Social Science Research - Yeungnam University

•  Want  data?  –  Email  me!  [email protected]  –  ArchiveHub:  h4p://archivehub.rutgers.edu  

 

•  Collaborators  –  Kris  Carpenter  &  Vinay  Goel,  Internet  Archive    –  David  Lazer,  Northeastern  University      

31 Research  supported  by  NSF  Award  #1244727  and  the  NetSCI  Lab  @  Rutgers