using wayback machine for research - library of congress blogs

58
Nicholas Taylor Repository Development Group Using Wayback Machine for Research

Upload: others

Post on 12-Sep-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Wayback Machine for Research - Library of Congress Blogs

Nicholas TaylorRepository Development Group

Using Wayback Machine for Research

Page 2: Using Wayback Machine for Research - Library of Congress Blogs

WAYBACK MACHINE?What Is the

Page 3: Using Wayback Machine for Research - Library of Congress Blogs

WABAC Machine?

Page 4: Using Wayback Machine for Research - Library of Congress Blogs

Internet Archive’s Wayback Machine

Page 5: Using Wayback Machine for Research - Library of Congress Blogs

not one, but many Wayback Machines

open source software to “replay” web archives rewrites links to point to archived resources allows for temporal navigation within archive

used by many web archiving institutions 33 out of 62 initiatives listed on Wikipedia

Page 6: Using Wayback Machine for Research - Library of Congress Blogs

Government of Canada Web Archive

Page 7: Using Wayback Machine for Research - Library of Congress Blogs

Government of Canada Web Archive

Page 8: Using Wayback Machine for Research - Library of Congress Blogs

Portuguese Web Archive

Page 9: Using Wayback Machine for Research - Library of Congress Blogs

Web Archive Singapore

Page 10: Using Wayback Machine for Research - Library of Congress Blogs

Web Archive Singapore

Page 11: Using Wayback Machine for Research - Library of Congress Blogs

Catalonian Web Archive

Page 12: Using Wayback Machine for Research - Library of Congress Blogs

Catalonian Web Archive

Page 13: Using Wayback Machine for Research - Library of Congress Blogs

California Digital Library Web Archiving Service

Page 14: Using Wayback Machine for Research - Library of Congress Blogs

Harvard University Web Archive Collection Service

Page 15: Using Wayback Machine for Research - Library of Congress Blogs

LIMITATIONS AND WORKAROUNDS

Common

Page 16: Using Wayback Machine for Research - Library of Congress Blogs

limitation: banner displaces page elements

Page 17: Using Wayback Machine for Research - Library of Congress Blogs

workaround: hide the banner

Page 18: Using Wayback Machine for Research - Library of Congress Blogs

limitation: AJAX-enabled sites

Page 19: Using Wayback Machine for Research - Library of Congress Blogs

limitation: AJAX-enabled sites

Page 20: Using Wayback Machine for Research - Library of Congress Blogs

workaround: disable JavaScript

Page 21: Using Wayback Machine for Research - Library of Congress Blogs

limitation: nav menu link errors

Page 22: Using Wayback Machine for Research - Library of Congress Blogs

workaround: insert live site URL in archive

Page 23: Using Wayback Machine for Research - Library of Congress Blogs

workaround: insert live site URL in archive

Page 24: Using Wayback Machine for Research - Library of Congress Blogs

workaround: insert live site URL in archive

Page 25: Using Wayback Machine for Research - Library of Congress Blogs

limitation: no full-text search

Page 26: Using Wayback Machine for Research - Library of Congress Blogs

workaround: none yet, but R&D ongoing

Page 27: Using Wayback Machine for Research - Library of Congress Blogs

MECHANICSBasic

Page 28: Using Wayback Machine for Research - Library of Congress Blogs

structure of a Wayback Machine URL

http://webarchiveqr.loc.gov/loc_sites/20120131201510/http://www.loc.gov/index.html

Wayback Machine URL collection date/timestamp(YYYYMMDDHHMMSS)

URL of archivedresource

Page 29: Using Wayback Machine for Research - Library of Congress Blogs

URL-based access

Page 30: Using Wayback Machine for Research - Library of Congress Blogs

URL-based access

Page 31: Using Wayback Machine for Research - Library of Congress Blogs

date wildcarding

Page 32: Using Wayback Machine for Research - Library of Congress Blogs

date wildcarding

Page 33: Using Wayback Machine for Research - Library of Congress Blogs

document wildcarding

Page 34: Using Wayback Machine for Research - Library of Congress Blogs

document wildcarding

Page 35: Using Wayback Machine for Research - Library of Congress Blogs

document wildcarding

Page 36: Using Wayback Machine for Research - Library of Congress Blogs

FINDING MISSING RESOURCES

Strategies for

Page 37: Using Wayback Machine for Research - Library of Congress Blogs

removed or moved?

don’t start with the archive missing resources have often just moved (Klein

& Nelson, 2010) Synchronicity for Firefox helps find new location scrapes archived version for “fingerprint”

keywords; uses them to query search engines

Page 38: Using Wayback Machine for Research - Library of Congress Blogs

MementoFox

Page 39: Using Wayback Machine for Research - Library of Congress Blogs

MementoFox

Page 40: Using Wayback Machine for Research - Library of Congress Blogs

find archived content now at a new URL

congressional committee hearings archive live site URL doesn’t work in archive find a site in the archive that would link to the

desired site, then navigate to contemporaneous snapshot

Page 41: Using Wayback Machine for Research - Library of Congress Blogs

hearings archive only spans 2001-2006

Page 42: Using Wayback Machine for Research - Library of Congress Blogs

hearings archive URL changed in 2011

Page 43: Using Wayback Machine for Research - Library of Congress Blogs

truncate archival access URL

Page 44: Using Wayback Machine for Research - Library of Congress Blogs

snapshot from prior to site change

Page 45: Using Wayback Machine for Research - Library of Congress Blogs

navigate to appropriate section

Page 46: Using Wayback Machine for Research - Library of Congress Blogs

navigate to appropriate section

Page 47: Using Wayback Machine for Research - Library of Congress Blogs

find archived content now at a new URL

records currently stored in password-protected part of site may have previously been publicly-accessible

conceptual site organization lasts longer than exact link construction

figure out where desired resource would be on the live site, then navigate to analogous section on archived site

Page 48: Using Wayback Machine for Research - Library of Congress Blogs

location of resources on live site

Page 49: Using Wayback Machine for Research - Library of Congress Blogs

location of resources on live site

Page 50: Using Wayback Machine for Research - Library of Congress Blogs

authentication required

Page 51: Using Wayback Machine for Research - Library of Congress Blogs

check the site in the archive

Page 52: Using Wayback Machine for Research - Library of Congress Blogs

navigate to an individual capture

Page 53: Using Wayback Machine for Research - Library of Congress Blogs

navigate to appropriate section

Page 54: Using Wayback Machine for Research - Library of Congress Blogs

navigate to appropriate section

Page 55: Using Wayback Machine for Research - Library of Congress Blogs

GET INVOLVEDHow You Can

Page 56: Using Wayback Machine for Research - Library of Congress Blogs

what websites from today would you want to be able to consult in five, ten, twenty years’ time?

have you told us what is important to capture?

help us to help you

Page 57: Using Wayback Machine for Research - Library of Congress Blogs

for more information

Library of Congress Web Archiving Program: http://www.loc.gov/webarchiving/

Library of Congress Web Archives: http://loc.gov/lcwa/

International Internet Preservation Consortium: http://netpreserve.org/

National Digital Information Infrastructure and Preservation Program: http://www.digitalpreservation.gov/

Page 58: Using Wayback Machine for Research - Library of Congress Blogs

questions?

[email protected]