agenda, day 2 · stanford university libraries may 2011. agenda • stanford university ... •some...
TRANSCRIPT
![Page 1: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/1.jpg)
Agenda, Day 208:30 – 08.35 Review of objectives and agenda
08:35 – 09:30 Infrastructure and tools
09:30 – 10:30 Case study: preservation activities at CDL
10:30 – 11:00 Morning break
11:00 – 12:00 Case study: preservation activities at Portico
12:00 – 12:30 Preservation initiatives and organizations: DataNet, DCC, DPC, IIPC, NDSA, OPF
12:30 – 14:00 Lunch
14:00 – Case study: preservation activities at Stanford
– 15:00 Other preservation resources
15:00 – 15:30 Afternoon break
15:30 – 16:00 Format characterization
16:00 – 16:30 Characterization in preservation workflows
16:30 – 17:00 Questions and discussion
![Page 2: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/2.jpg)
Digital Preservation at Stanford University
Tom CramerChief Technology StrategistStanford University LibrariesMay 2011
![Page 3: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/3.jpg)
Agenda• Stanford University
• First & Second Generation Digital Library
• Digitization Efforts
• The Stanford Digital Repository
– Preservation Core
– Management
– Access
![Page 4: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/4.jpg)
Stanford University
“The Universityof Stanford” ?
Leland Stanford Junior Universityx
![Page 5: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/5.jpg)
Stanford University• 15,000 students
• 8,000 graduate• 7,000
undergraduate• 2,000 faculty• 35,000 total
university community
• $3.4 billion annual operating budget• $17.2 billion endowment• Roots of Silicon Valley• One of the world’s leading research universities
![Page 6: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/6.jpg)
Stanford’s Digital Library c. 2007
Typical of all first generation digital libraries?
![Page 7: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/7.jpg)
1st Generation Digital Libraries
• Small scale digitization, largely focused on text & images
• Purpose built systems for specific content types – application focus
• Highly theoretical approach to digital preservation
• Anemic UI’s
![Page 8: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/8.jpg)
2nd Generation Digital Libraries• Large scale digitization
• With more content types
• Multi-pathway workflows• Content use & reuse in an integrated
environment• Pragmatic approach to digital
preservation & full lifecycle of objects
• Infrastructure & service focus
![Page 9: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/9.jpg)
Digitization Trends -- Drivers
• Boutique Large scale • Text & image text, image, audio,
video, software and more• Refresh of 1st generation delivery
systems with contemporary UI’s
![Page 10: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/10.jpg)
Digitization Trends -- Responses
Replacing individual, handwroughtschemes with workflow-based systems, largely automated, with QA, exception handling and reporting that work for multiple content streams.
Management of full lifeycle of object, from physical object management through capture, preservation & access
![Page 11: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/11.jpg)
Digitization at SULAIR
1. Robotic Book Scanning Lab2. Rare Book Scanning Lab3. Map Scanning Lab4. High End Imaging Lab5. Multipurpose (Sheet Feed, et al) Lab6. Media Preservation Lab7. Digital Forensics Lab
![Page 12: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/12.jpg)
Stanford’s Legacy Media Counts
More than 20,000 handheld media objects in Special Collections alone
![Page 13: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/13.jpg)
Legacy Media & Digital Forensics
• Files, operating systems & software• mss, correspondence, images,
records, data, etc.• Steps:
• Extraction• Forensic analysis• Archival processing & description• Access & emulation
• Paradigm shift for archivists, donors
![Page 14: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/14.jpg)
Lifecycle Management = Integration
![Page 15: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/15.jpg)
Lifecycle Management = Integration
Digitization & file processing are the easiest parts of any digitization initiative. Description, file management, collection management, access, and a holistic workflow uniting all pieces, is the real challenge.
![Page 16: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/16.jpg)
Preservation at Stanford
• SDR is in production since Dec 2006•Now a second generation preservation
system• one component in a larger ecosystem of
digital library infrastructure
1997
needidentified
“Dark Cave”concept
‘02 ‘03 ‘04 ‘05 ‘06 ‘07
NDIIPPprototype redesign
1.0 inprod
‘08 ‘09
2.0 conceived
‘10
2.0 in prod
![Page 17: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/17.jpg)
Three Major Areas of Preservation Needs• Digital Library
– Legacy collections– Digitized collections– Licensed, locally loaded content– Born digital collections
• Institutional Repository– Research data, – Publications, dissertations, – Learning objects, university assets
• External Depositors– Publishers– Discipline-specific repositories– Reciprocal deposits with peer institutions
Google Books (’00s of TB)Manuscripts (75 TB)Media (50 TB)Geospatial Data (10 TB)~30 other digi projects (15 TB)Purchased collections (25 TB)
![Page 18: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/18.jpg)
Download, process and preserve 8 million volumes in SDR for...•local indexing,•text mining,•selective delivery, and •long-term access.
E.g., Google-Scanned Books
![Page 19: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/19.jpg)
E.g., Monterey Jazz Festival
•Festival founded in 1958: longest running jazz festival in the world.
•Rich collection of recordings from inception, spanning over 50 years, in varying states of condition & decay.
•Archives held at Stanford’s Archive of Recorded Sound
•~800 audio recordings, 1.6 TB audio files in SDR
•~250 video recordings, 22 TB video files in SDR
Access: - complete database of digital
recordings online at collections.stanford.edu/mjf
- Access via in-site visit to ARS- New commercial releases on
MJF Records
![Page 20: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/20.jpg)
E.g., National Geospatial Digital Archive
• Some 27,000 “at risk” geospatial objects
• TIFFs, GeoTIFFs, Shapefiles, Digital Elevation Models, Digital OrthophotoQuadrangle files
![Page 21: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/21.jpg)
E.g., Preserving Virtual Worlds
Stanford University LibrariesSecond Life Open House,31 July 2009
![Page 22: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/22.jpg)
E.g., Forensically Extracted Born Digital Files
•Digital Forensics lab extracting original computer files from legacy media
•Actively building pipeline from extraction to preservation store
•Support for both immediate and deferred archival processing & description
![Page 23: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/23.jpg)
E.g., Electronic Theses and Dissertations
![Page 24: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/24.jpg)
NSF Policy Position on Data Archiving 1
“NSF's policy position on data is straightforward:
”
1 National Science Foundation, Cyberinfrastructure Council. Cyberinfrastructure Vision for 21st
Century Discovery. March, 2007.
![Page 25: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/25.jpg)
NSF and NIH Grants to Stanford
![Page 26: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/26.jpg)
SDR 2.0 today
• 100+ TB of unique content • 300+ TB of managed data• 200,000+ objects• 62,000,000 files• 7 content types: books, images, audio,
video, manuscripts, GIS data, software• Integrated component of larger
environment
![Page 27: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/27.jpg)
2008: SDR 1.0 In Production & Working, BUT…
• Custom code, maintained by evolving & smaller team– No Reuse of code within Stanford, or larger community
• Bottlenecks– Needed to be quicker to add new content types– Needed to be quicker to add new collections– Needed to decompose code into more granular components
• Largely a stand-alone system– Lacked flexible Management services for streamlined,
continuous content deposit workflows– “Dark Archive” – No access services for rich, self-service
patron access
![Page 28: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/28.jpg)
SDR 1.0 Architecture: Strongly Rooted in OAIS
![Page 29: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/29.jpg)
SDR 2.0: New Technical Architecture • Adopt Fedora as a metadata management
system– Clean mapping of new data model to Fedora
content models– Reuse same design pattern, core technology as in
DOR
• Support for parallelized & asynchronous operations– Multiple ingest streams to increase throughput– Decompose one process (e.g, “ingest”) into
discrete, loosely coupled operations (“checksum”, “package”, “transfer”)
• Adopt a RESTful architecture & common workflow service
![Page 30: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/30.jpg)
SDR 2.0: New Technical Architecture
![Page 31: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/31.jpg)
SDR 2.0: Robots & “WorkDo” Service
![Page 32: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/32.jpg)
Complex Systems from Atomic Pieces
![Page 33: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/33.jpg)
SDR 2.0: Revised Data Model SDR 1.x’s METS-based SIP, AIP and DIP, had many issues: – Each Transfer Manifest was content & collection
specific Doesn’t scale– Transfer manifests require too much interpretation and
analysis to change, augment– Too complex: Stanford METS structure breaks apart
related data across the object– Wraps (somewhat dynamic) metadata with (mostly
static) data files in same envelope– Recursive nature of transfer manifest makes
versioning self-referential, complex– No one speaks METS natively: depositors, SDR &
clients all forced to perform translation at handshakes
![Page 34: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/34.jpg)
Content Structures and Flavors of Metadata
• Flexible data model can take any type of data, packaged in “bags”– A “bag” is a directory with
standardized top-level structure and syntax
• Minimizes analysis & processing required on ingest
• Preserves options for future processing & transformations based on future needs
Each object has seven discrete metadata files:– Identity metadata– Descriptive metadata– Content metadata
(aka structural metadata)
– Technical metadata– Rights metadata– Source metadata– Provenance metadata
![Page 35: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/35.jpg)
SDR Deposits: Content Transfer via Bagit
druid/bagit-info.txt
: Stanford-Content-Metadata: data/metadata/contentMetadata Stanford-Identity-Metadata: data/metadata/identityMetadata Stanford-Provenance-Metadata:
data/metadata/provenanceMetadata /data
/metadata /contentMetadata /descMetadata /identityMetadata /provenanceMetadata /rightsMetadata/sourceMetadata /technicalMetadata
/content/file1/file2
:
![Page 36: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/36.jpg)
Lessons Learned Over 5 Years
• Custom code, maintained by evolving & smaller team, was inefficient & unsustainable– Adopted Fedora for metadata management, Hydra for
application framework– Shared technology & design patterns with rest of digital
library ecosystem– API’s for management, ingest, retrieval, reporting
• Bottlenecks– Need to be quicker to add new content types & collections:
simplify the data model, support “Zip & SIP”– Need to increase the throughput to the storage layer led to
parallelization of processes
• Need to refine & hone the SDR service model– Complement Preservation with robust Management & Access
services
![Page 37: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/37.jpg)
Preservation Is One Leg of a Stool
• Preservation without Access is pointless– Further, all signs points indicate that it is not
economically viable
• Access without Preservation is myopic
• Robust Management services are prerequisite for accessioning, archiving and providing access to content– The “pre-ingest” phenomenon
Can one system handle it all? or
![Page 38: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/38.jpg)
Stanford’s Digital Library Ecosystem
![Page 39: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/39.jpg)
Three Spheres: Management, Preservation and Access
Digitization, Deposit & Management
Preservation
Discovery & Delivery
![Page 40: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/40.jpg)
Stanford Digital Repository (SDR): content agnostic, preservation repository
Specialty applications provide context-specific, user-facing deposit, and access services tailored to content types and disciplines
SDR in Stanford’s DL Ecosystem
Library Management Applications
EEMS (acquiring born digital content), digitization workflow, etc.
Institutional Repository
ETDs, open access articles, faculty “papers”, research data, web sites, etc.
SULAIR Digital Stacks
Delivery for text, images, mss, media, data, & curated collections
National Geospatial Digital Archive(NGDA)
Geospatial data
and SDR provides “back-office” preservation services: replication, auditing, migration, and retrieval in a secure, sustainable, scalable stewardship environment
![Page 41: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/41.jpg)
E.g., Parker Manuscripts
•559 Anglo-Saxon manuscripts, 200,000 pages
•For each page:
22 MB JPEG2000 delivery surrogate22 MB JPEG2000 delivery surrogate110 MB submaster TIFF220 MB master TIFF SDR –
Preservation Core
Parker.stanford.edu: Rich web application, tailored for general public, medievalists
![Page 42: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/42.jpg)
Separation of Concerns
• Scoped repository: differentiation between preservation (provided by SDR) and
…content management (provided by DOR)…access (provided by the Digital Stacks apps)
• Implications: – Reduces pressure on SDR to be all things to all
depositors, for all content– Reinforces need to provide managed & secure storage at
scale– Reinforces requirement to focus on fixity and integrity
services– Emphasizes need to integrate SDR to management &
access services through stable API’s
![Page 43: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/43.jpg)
Management: Hydra-based Applications
Under Development…• SDR’s Front End – Institutional Repository for Stanford• Hypatia – Archival Arrangement, Description & Access• SDR Preservation Core Administrative Application
ETD’s –Electronic Theses & Dissertations
SALT –Self-Archiving Legacy Toolkit
EEMs –Everyday Electronic Materials
![Page 44: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/44.jpg)
Hydra
• Joint development project among Stanford, University of Virginia, University of Hull and Fedora Commons
• Based on Fedora, Active Fedora and Ruby on Rails
• Reuse Blacklight & solr for search & browse within a hydra application
![Page 45: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/45.jpg)
Fundamental Assumption #1
No single system can provide the full range of repository-based solutions for a given institution’s needs,
…yet sustainable solutions require a common repository infrastructure.
![Page 46: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/46.jpg)
For instance…
An ETD solution…- Single PDF- With auxiliary data
files- Simple, prescribed
workflow- Integrated with
student administration system
- Streamlined UI for depositors, reviewers & readers
A digitization workflow system…- Potentially hundreds of
files type per object- Complex, branching
workflow- Sophisticated operator
(back office) interfaces
A general purpose institutional repository- Heterogeneous file types- Simple to complex
objects- General purpose user
interfaces
![Page 47: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/47.jpg)
Distinct Application NeedsMore than one dozen distinct repository application needs across three institutions.
• Electronic theses & dissertations• Open access articles• Data curation application(s)• General purpose institutional repository• Manuscript & archival collection delivery• Library materials accessioning tools• Digitization workflow system• And more...
![Page 48: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/48.jpg)
Shared, Primitive Functions• Deposit – uploading simple or multipart
objects, singly or in bulk• Manage – editing an object’s content,
metadata and permissions• Search – full text and fielded search
supporting both user discovery and administration
• Browse – sequential viewing of objects by collection, attribute or ad hoc filtering
• Deliver – viewing, downloading & disseminating objects through user and machine interfaces
![Page 49: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/49.jpg)
Hydra Philosophy -- Technical• Tailored applications and workflows for
different content types, contexts and user interactions
• A common repository infrastructure• Flexible, atomistic data models• Modular, “Lego brick” services• Library of user interaction widgets• Easily skinned UI
One body, many heads
![Page 50: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/50.jpg)
![Page 51: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/51.jpg)
Fundamental Assumption #2
No single institution can resource the development of a full range of solutions on its own,
…yet each needs the flexibility to tailor solutions to local demands and workflows.
![Page 52: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/52.jpg)
Hydra Philosophy -- Community• An open architecture, with many
contributors to a common core• Collaboratively built “solution bundles” that
can be adapted and modified to suit local needs
• A community of developers and adopters extending and enhancing the core
• “If you want to go fast, go alone. If you want to go far, go together.”
One body, many heads
![Page 53: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/53.jpg)
Electronic Theses and Dissertations
![Page 54: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/54.jpg)
• Automatic deposit to library as part of degree conferral• Built in digital collection building• Better access for patrons• Reduced expenses for students,
University, library processing• Increased visibility of and access to
Stanford research via catalog & Google• Built in preservation through Stanford
Digital Repository
Electronic Theses & Dissertation (ETD)
![Page 55: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/55.jpg)
EEMs: Accessioning Born Digital Materials
Browser widget enables selector to capture the PDF, plus URL, title, author, copyright status, payment information, and comments, and route to Acquisitions.
![Page 56: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/56.jpg)
EEMs: Accessioning Born Digital Materials
Dashboard enables item processing, ultimately leading to preservation in SDR and access via the catalog.
![Page 57: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/57.jpg)
SALT: Digital Archives
![Page 58: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/58.jpg)
SALT: Digital Archives
• Archiving unstructured and semi-structured data
• Allow access to semi-processed information,- with strong access & visibility controls- leveraging full text & entity extraction
• Ongoing enrichment of the archive- through self-annotation by the donor- through crowd-sourcing description and
organization
![Page 59: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/59.jpg)
Component Based Architecture• Fedora as a metadata store• Well structured file system as data store• Solr index for rapid data access• Blacklight & Hydra: app logic & presentation• Atomic Services
– “Robots”: simple, autonomous scripts, providing small units of work in reusable packages
– “Services” provide common operations that support workflows across the environment
• “WorkDo”: lightweight workflow to orchestrate cascade of services
![Page 60: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/60.jpg)
DOR & Digital Stacks Architecture
![Page 61: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/61.jpg)
Digital Library Ecosystem
![Page 62: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/62.jpg)
Growth in Disk and Computing at SULAIR
![Page 63: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/63.jpg)
Stanford’s Digital Library, 2011The next generation of Digital libraries will be complex ecosystems made up of simple components.
Separate systems for digitization, management, preservation and access will enable pieces to be mixed and matched, supporting content streams from a variety of sources, and access by a variety of communities, services and tools.
Photo by Alun Salt. Used under CC Attribution-ShareAlike 2.0 Generic license.
![Page 64: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/64.jpg)
LOCKSS• Lots of Copies Keeps Stuff Safe• Originated at Stanford University• Peer-to-peer, decentralized digital
preservation system• Focus is on scholarly articles
– 7100 e-journal titles, 470 publishers– Collects web-based content – Preserves it locally – Provides 100% post-cancellation access– Done with publisher permission
![Page 65: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/65.jpg)
LOCKSS
Capture & Replication
![Page 66: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/66.jpg)
LOCKSS
Audit & Healing
![Page 67: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/67.jpg)
LOCKSS• Commodity Hardware & Open source
software & Appliance = very low cost• Follows traditional model of library-
based distribution and preservation– Lots of Copies– Locally Managed Copies
• Publisher permissions ensure legal coverage
• Extensible to other collections
![Page 68: Agenda, Day 2 · Stanford University Libraries May 2011. Agenda • Stanford University ... •Some 27,000 “at risk” geospatial objects •TIFFs, GeoTIFFs, Shapefiles, Digital](https://reader034.vdocuments.site/reader034/viewer/2022043022/5f3e3d19ea982637c17069da/html5/thumbnails/68.jpg)
LOCKSS• CLOCKSS: Controlled LOCKSS
– Not-for-profit archive for ensuring access to orphaned scholarly content
– One dozen major publishers + libraries• Private LOCKSS Networks
– Alabama Digital Preservation Network– Arizona State Library, Archive & Public Records– Council of Prairie & Pacific University Libraries
Consoritum– Data Preservation Alliance for the Social Sciences– Digital Commons – Berkely Electronic Press– MetaArchive Cooperative Project– Digital Federal Depository Library Program