historical research breakout session notes, wire 2014

Post on 02-Jun-2015

755 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Rough notes presented at WIRE 2014 on the historical research breakout session.

TRANSCRIPT

Breakout #3: Historical Research

Ian Milligan (in the hot seat today), Stephen Robertson, Thomas Risse, Kris Carpenter Negulescu, Pamela

Graham, Niels Brügger, Ivana Marenzi, Katie Kang (thanks for the notes!), Ed Fox, Meghan Dougherty, Matt Connolley

What we did? (I think)

• Tackled some big problems/questions

• Explored what historians and others might want to do with this material

• Thought about community building..

• And came up with a tangible path forward!

First big problem we discussed about

historians..

Building Interest?• Eric Meyer: rooms were “thin on the ground”

• Internet Archive needs to be viewed from many different perspectives

• CS doing their way, historians their own, Internet researchers their own, etc.

• Historians are interested, but get FRUSTRATED quickly when they realize how hard their tasks are

Building Interest?

• Problem is that historians trained in a particular way, but this offers a new way (the shift from scarcity to abundance)

• And really important to emphasize the QUESTIONS that we have. Can’t just study the web, have to have reasons to do so.

First big question

• Should we create new research corpuses, or use existing ones?!

• Should we make research corpuses ourselves?

• What moment are we at now?

Second problem• Technical ones

• Historians are interested, but too hard to answer questions because we don’t have enough technical ability, resources, etc., to do it this way.

• How to effect collaboration? (apparently being in a room together helps)!

• How can we contextualize documents? How do we avoid viewing them as decontextualized?

Distant/Close• Ideal is a way to move between distant and close,

but not always there (though some awesome examples today!)

• Discussion about whether historians will even care about domain-level studies

• So much emphasis on network studies, etc., which may be of limited utility to humanists like historians (or at the very least, need to be complemented by more qualitative forms of research)

Too much information, so what can we do?

Lots of other problems• How do we deal with the big bang of electronic

records (starting in the 1970s)?

• How much data is there (we could compare Google index/Altavista etc) to come here?

• What about emulation when it comes to digital preservation - using old browsers, etc.? Should we use virtual machines?

Community Building• IIPC has no overlap with AoIR

• Live web scholars don’t deal with archived web scholars, no community!

• How could we build a better community?

• Are people seeing themselves reflected in these communities?

How quickly can this happen?

• Will it take generations?

• Or will it come quicker?

Final Outcome• Could we have a history of the 1990s without using

web archives?!

• How could we harness web archives to tell the story of 1995 through 2005? First decade of material?

• Or just 1995-2000 due to manageability of this dataset.!

• Sort of a sweet spot - manageable data - before the rise of closed off social media, etc. A good way to sell the utility of these archives!

Idea• Reach out to other datasets and archives

• Pick data from 1995-2000 from Internet Archive (~20TB) and build tools, ask questions, etc.!

• Look at different strands (.gov data), GeoCities, news data, e-mails, digital records, etc.

• Cool factoid: Herman Miller invented the idea of ‘cube farm,’ if you do term search you can see it rise with the early web.

Idea• Some name ideas:

• History 2.0

• History in the Digital Age

• Polarization

• End of Millennium (1995-2000) Historical Studies

• End of Millennium (1995-2000) Studies

NIST (National Institute of Standards and Technology) TREC (Text Retrieval

Conference) TRAK Idea

• Sources for the following:

• IA collections

• BYU, Corpus of Contemporary American English, wide range of digitized resources

• Related collections from UK, Germany

NIST TREC Track for End of Millennium (1995-2000) Studies• To be proposed and led by Jimmy Lin

• IA and others provide <30TB of content

• Historians and provide an initial set of questions and tasks (e.g., spread of polarization), for which they have ground truth, from other sources

• Researchers materialize systems that can handle those

• Final competition: historians pose new questions and tasks and research groups submit results for analysis

top related