historical research breakout session notes, wire 2014

18
Breakout #3: Historical Research Ian Milligan (in the hot seat today), Stephen Robertson, Thomas Risse, Kris Carpenter Negulescu, Pamela Graham, Niels Brügger, Ivana Marenzi, Katie Kang (thanks for the notes!), Ed Fox, Meghan Dougherty, Matt Connolley

Upload: ian-milligan

Post on 02-Jun-2015

755 views

Category:

Internet


0 download

DESCRIPTION

Rough notes presented at WIRE 2014 on the historical research breakout session.

TRANSCRIPT

Page 1: Historical Research Breakout Session Notes, WIRE 2014

Breakout #3: Historical Research

Ian Milligan (in the hot seat today), Stephen Robertson, Thomas Risse, Kris Carpenter Negulescu, Pamela

Graham, Niels Brügger, Ivana Marenzi, Katie Kang (thanks for the notes!), Ed Fox, Meghan Dougherty, Matt Connolley

Page 2: Historical Research Breakout Session Notes, WIRE 2014

What we did? (I think)

• Tackled some big problems/questions

• Explored what historians and others might want to do with this material

• Thought about community building..

• And came up with a tangible path forward!

Page 3: Historical Research Breakout Session Notes, WIRE 2014

First big problem we discussed about

historians..

Page 4: Historical Research Breakout Session Notes, WIRE 2014
Page 5: Historical Research Breakout Session Notes, WIRE 2014

Building Interest?• Eric Meyer: rooms were “thin on the ground”

• Internet Archive needs to be viewed from many different perspectives

• CS doing their way, historians their own, Internet researchers their own, etc.

• Historians are interested, but get FRUSTRATED quickly when they realize how hard their tasks are

Page 6: Historical Research Breakout Session Notes, WIRE 2014

Building Interest?

• Problem is that historians trained in a particular way, but this offers a new way (the shift from scarcity to abundance)

• And really important to emphasize the QUESTIONS that we have. Can’t just study the web, have to have reasons to do so.

Page 7: Historical Research Breakout Session Notes, WIRE 2014

First big question

• Should we create new research corpuses, or use existing ones?!

• Should we make research corpuses ourselves?

• What moment are we at now?

Page 8: Historical Research Breakout Session Notes, WIRE 2014

Second problem• Technical ones

• Historians are interested, but too hard to answer questions because we don’t have enough technical ability, resources, etc., to do it this way.

• How to effect collaboration? (apparently being in a room together helps)!

• How can we contextualize documents? How do we avoid viewing them as decontextualized?

Page 9: Historical Research Breakout Session Notes, WIRE 2014

Distant/Close• Ideal is a way to move between distant and close,

but not always there (though some awesome examples today!)

• Discussion about whether historians will even care about domain-level studies

• So much emphasis on network studies, etc., which may be of limited utility to humanists like historians (or at the very least, need to be complemented by more qualitative forms of research)

Page 10: Historical Research Breakout Session Notes, WIRE 2014

Too much information, so what can we do?

Page 11: Historical Research Breakout Session Notes, WIRE 2014

Lots of other problems• How do we deal with the big bang of electronic

records (starting in the 1970s)?

• How much data is there (we could compare Google index/Altavista etc) to come here?

• What about emulation when it comes to digital preservation - using old browsers, etc.? Should we use virtual machines?

Page 12: Historical Research Breakout Session Notes, WIRE 2014

Community Building• IIPC has no overlap with AoIR

• Live web scholars don’t deal with archived web scholars, no community!

• How could we build a better community?

• Are people seeing themselves reflected in these communities?

Page 13: Historical Research Breakout Session Notes, WIRE 2014

How quickly can this happen?

• Will it take generations?

• Or will it come quicker?

Page 14: Historical Research Breakout Session Notes, WIRE 2014

Final Outcome• Could we have a history of the 1990s without using

web archives?!

• How could we harness web archives to tell the story of 1995 through 2005? First decade of material?

• Or just 1995-2000 due to manageability of this dataset.!

• Sort of a sweet spot - manageable data - before the rise of closed off social media, etc. A good way to sell the utility of these archives!

Page 15: Historical Research Breakout Session Notes, WIRE 2014

Idea• Reach out to other datasets and archives

• Pick data from 1995-2000 from Internet Archive (~20TB) and build tools, ask questions, etc.!

• Look at different strands (.gov data), GeoCities, news data, e-mails, digital records, etc.

• Cool factoid: Herman Miller invented the idea of ‘cube farm,’ if you do term search you can see it rise with the early web.

Page 16: Historical Research Breakout Session Notes, WIRE 2014

Idea• Some name ideas:

• History 2.0

• History in the Digital Age

• Polarization

• End of Millennium (1995-2000) Historical Studies

• End of Millennium (1995-2000) Studies

Page 17: Historical Research Breakout Session Notes, WIRE 2014

NIST (National Institute of Standards and Technology) TREC (Text Retrieval

Conference) TRAK Idea

• Sources for the following:

• IA collections

• BYU, Corpus of Contemporary American English, wide range of digitized resources

• Related collections from UK, Germany

Page 18: Historical Research Breakout Session Notes, WIRE 2014

NIST TREC Track for End of Millennium (1995-2000) Studies• To be proposed and led by Jimmy Lin

• IA and others provide <30TB of content

• Historians and provide an initial set of questions and tasks (e.g., spread of polarization), for which they have ground truth, from other sources

• Researchers materialize systems that can handle those

• Final competition: historians pose new questions and tasks and research groups submit results for analysis