internet archives and social science research - yeungnam university

Post on 22-Jan-2015

101 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk given at Yeungnam University on April 6, 2014

TRANSCRIPT

BIG  DATA  AND  SOCIAL  SCIENCE  THEORY  Leveraging  Large  Scale  Data  to  Discover  New  Pa4erns  in  Society  

Monday,  April  7,  2014  CybermoCons  @  Korea  Yeungnam  University    

Ma4hew  Weber  Rutgers  University  School  of  CommunicaCon  &  InformaCon  

2

Opportunity:  The  Internet  Archive  contains  the  largest  single  record  of  the  history  of  the  World  Wide  Web  from  1995  to  the  present—a  wealth  of  untapped  research  data.    

Challenge:  There  is  a  significant  lack  of  research-­‐ready  databases  and  tools  available  to  the  scholarly  community  

© Internet Archive 2013

©  Internet  Archive  2013  

5

6

7

8

9

10

Opportunity:  The  ArchiveHub  project  aims  to  support  the  creaCon  and  disseminaCon  of  general  guidelines  &  tools  for  conducCng  theoreCcally  and  methodologically  rigorous  

longitudinal  research  using  archival  Web  data    

11

12

13

14

Dataset   Research  PotenAal   Dates   Captures   Unique  URLs  

Hurricane  Katrina   Online  networks  and  organizaConal  resilience  (Chewning,  Lai  and  Doerfel,  2012;  Perry,  Taylor  and  Doerfel,  2003)  in  the  wake  of  disasters;  informaCon  disseminaCon    

2003  –  2012   1,694,236   663,740    

Superstorm  Sandy   2003  –  2012   41,703,112   20,013,455  

US  Senate   Study  the  growth  of  poliCcal  acCvity  in  online  environments  (Adamic  &  Glance,  2005;  Bruns,  2007;  Chang  &  Park,  2012);  polarizaCon  &  media  discourse  

109th  –  112th  Congresses  

26,965,770    8,674,397    

US  House   51,840,777   12,410,014  

Occupy  Wall  Street  

Previous  research  on  NGOs  in  the  online  environment  (Bach  &  Stark,  2004;  Shumate,  2003,  2012;  Shumate,  Fulk,  &  Monge,  2005);  use  of  hyperlink  data  to  study  the  formaCon  and  role  of  alliances  between  SMOs  

2010  –  2012   247,928,272   11,3259,655  

US  Media  

Previous  studies  of  news  media  organizaCons  (Greer  &  Mensing,  2006;  Weber,  2012;  Weber  &  Monge,  In  Press);  focus  on  evoluConary  pa4erns  

2008  –  2012   1,315,132,555   539,184,823  

15

http://archivehub.rutgers.edu

16

Tracing  the  Emergence  of  OrganizaConal  Forms  

17

Environment:    OrganizaCons  compete  for  scare  resources;  during  rapid  periods  of  

disrupCon,  new  entrants  seek  “protected”  niches  (Weber  &  Monge  2014)

PopulaAon:    In  digital  spaces,  online  connecCons  provide  communicaCve  representaCons  of  

informaCon  flows  (Weber  &  Monge,  2012)    

FormaCon  of  Ces  (e.g.  hyperlinks)  can  posiCvely  impact  long-­‐term  likelihood  of  organizaCon  survival  (Weber,  2012)  

OrganizaAon:    OrganizaCons  adapt  internally,  reconfiguring  team  structures  and  

developing  new  rouCnes  for  knowledge  sharing    (Ellison,  Gibbs  &  Weber,  In  Press;  Weber  &  Kim,  Under  Review)

18

Big Data… Big Theory?  

•  Networks  are  central  to  social  movements  in  that  links  between  nodes  can  be  influenCal  in  collecCve  acCon  

•  Examples  of  nodes  includes  parCcipants,  organizaCons,  media  and  communicaCons  technologies    •  Social  networks  and  social  movements  (Diani,  2003)  

 

•  The  interacCon  between  actors,  and  between  actors  and  hashtags,  collecCvely  represent  a  networked  form  of  organizaCon    •  Network  form  of  organizaCon  (Powell,  1990)  

Over time, dyadic communication will become prevalent in an emerging networked organization. H1:  

As a social movement develops as an emerging network form of organization, the organizational structure will be increasingly clustered.

H2:  

Data  

•  TriangulaCon  of  data  insulates  against  false  readings  from  large-­‐scale  data  (see  Lazer,  Kennedy,  King  and  Vespignani,  2014)  

•  Internet  Archive:  –  14  websites;  4,504  hyperlink  dyads  over  a  2-­‐month  period.  

•  Lexis  Nexis:  –  Search  conducted  to  assess  U.S.  newspaper  coverage  of  OWS  from  the  early  stages  of  the  

movement  in  September  2011  through  Sept.  2012  –  Search  OWS  keywords,  e.g.  “Occupy  Wall  Street,”  “Occupy  Oakland”  

•  Twi4er  –  Gnip  PowerTrack    

•  Search  by  keywords;  captures  a  larger  volume  of  Twi4er  data  than  other  opCons    –  Sample  includes  October  17,  2011,  through  January  5,  2012.  IniCal  study  focused  on  the  

criCcal  two-­‐month  period  from  November  1  through  December  31,  2011,    –  750,816  tweets  across  the  two-­‐month  period.    

21

OWS News Coverage  

OWS  on  the  Web  

•  335  seed  organizaCons  based  on  records  from  #OccupyResearch  •  Data  extracted  for  2011  &  2012,  based  on  “both  matching”  

24

0  

2  

4  

6  

8  

10  

12  

14  

16  

18  

Millions  

Captures  per  Month  

Maximal  Cores  (k  Coreness)  

25

Aug.  2011   Jan.  2012  

26

 -­‐        

 10,000.00    

 20,000.00    

 30,000.00    

 40,000.00    

 50,000.00    

 60,000.00    

 70,000.00    

 80,000.00    

Edges  

60  

80  

100  

120  

140  

160  

180  

VerAces  

27

0  

0.01  

0.02  

0.03  

0.04  

0.05  

0.06  

0.07  

0.08  

Density  

28

0  

10  

20  

30  

40  

50  

60  

70  

80  

90  

100  

Clusters  

29

ImplicaCons  

•  Big  Data:  –  Guiding  data  collecCon  with  theoreCcally  grounded  quesCons  avoids  the  

“needle-­‐in-­‐the-­‐haystack”  problem  –  Leverage  advances  in  compuCng  with  exisCng  theories  to  develop  robust  

studies  of  social  science  phenomenon    

•  Big  Theory:  –  Expanding  prior  theories  on  networked  organizaConal  forms  and  form  

emergence  (evoluConary)  –  Building  toward  a  macro  theory  of  organizaConal  form  emergence  based  on  

resource  availability  and  networks  

30

•  Want  data?  –  Email  me!  ma4hew.weber@rutgers.edu  –  ArchiveHub:  h4p://archivehub.rutgers.edu  

 

•  Collaborators  –  Kris  Carpenter  &  Vinay  Goel,  Internet  Archive    –  David  Lazer,  Northeastern  University      

31 Research  supported  by  NSF  Award  #1244727  and  the  NetSCI  Lab  @  Rutgers  

top related