one site among many: stanford and collaborative technical … · 2020. 6. 27. · opportunities for...
TRANSCRIPT
One Site Among Many: Stanford
and Collaborative Technical
Development for Web Archiving
Nicholas Taylor
Web Archiving Service Manager
Stanford University Libraries
PASIG 2016
March 11, 2016
overview
• web archiving
opportunity gaps
• situation of SUL web
archiving
• APIs + community
(technical)
development
“LAX on take off” by Doug under CC BY-NC-ND 2.0
OPPORTUNITY GAPS
“Mind The Gap” by R~P~M under CC BY-NC-ND 2.0
web content >
“The Seeker” by C MB 166 under CC BY-ND 2.0
preserved web content
link rot + content drift
Andrew Jackson: “Ten years of the UK Web Archive”
a centralized enterprise
60%
25%
14%
63%
20%16%
0%
10%
20%
30%
40%
50%
60%
70%
External Local Both
2011 2013
NDSA: “Web Archiving in the U.S.: A 2013 Survey”
a centralized enterprise
0 01
0
2
01
01
0
3 3
12
4
2
6
4
10
2
0
0
1
1
0
1 3
5
3
4 2
25
6
15
0
2
4
6
8
10
12
14
16
18
20
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Number of organizations Archive-It Partner as of 2013
NDSA: “Web Archiving in the U.S.: A 2013 Survey”
minimal local preservation
19%
81%
20%
80%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Transferred Haven't transferred
2011 2013
NDSA: “Web Archiving in the U.S.: A 2013 Survey”
evolving web
“Light Writing - Spider Web” by oz dean under CC BY-ND 2.0
opportunities for research
“Exploring the Canadian Political Interest Group and Political Parties Web Sphere” by Ian Milligan under Standard YouTube License
WHAT ARE WE DOING?
“stanford13” by Paradoxotaur under CC BY-SA 2.0
Stanford Web Archive Portal
Stanford University Libraries: “Stanford Web Archive Portal”
SearchWorks (online catalog)
Stanford University Libraries: “SearchWorks”
web archaeology (SLAC)
oldweb.today: “WorldWideWeb SLAC Home Page”
building + integrating infrastructure
discovery
preservation
access
capture
SDR
APIS + COMMUNITY DEVELOPMENT
“P1050827” by Rebecca Siegel under CC BY 2.0
web archiving lifecycle
Internet Archive: “The Web Archiving Life Cycle Model”
functional overlap
Appraisal
and
Selection
ScopingData
Capture
Storage and
Organization
QA and
Analysis
Metadata /
Description
Access
/ Use /
Reuse
PreservationRisk
Management
ACT
Archive-It
AtN
BCWeb
CDL WAS
DigiBoard
Islandora
WARC
Solution Pack
Netarchive
Suite
PageFreezer
UNT
Nomination
Tool
WCT
smaller, modular components
“Giant Rubik's Cube” by Francois Lamotte under CC BY 2.0
community seed
API candidates
• capture tool/proxy
interconnect
• capture tool
management
• data import/export
• query + extraction
• integrity audit + repair
• descriptive metadata
• logs + analytics
• renderings/derivative
formats
• federated data
delivery
• federated replay
• federated full-text
search
let’s combine forces
“Stages of flow” by Peter Thoeny under CC BY-NC-SA 2.0