personal digital archiving 2015 - nyu - workshop
TRANSCRIPT
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
End-to-end digital preservation for diverse collections
Personal Digital Archiving – 04-26-2015Courtney C. Mumma, MAS/MLIS, US and International Community Development
+
lead developers of Archivematica, Access to Memory (AtoM) and Binder
archivists, librarians, technologists
core values
innovation and smart automation
leverage existing technology
transparency
interoperability and collaboration
grounded in archival practice
open source, including other projects
handshakes / integration
bounty model
hybrid public access and content management
manage accessions, taxonomies, multiple repositories, restrictions and rights, authority records
(ISAAR)
access derivatives including streaming video
multi-lingual description & ISAD(G), RAD, DACS, EAD export, MODS
link to preserved archival packages, sync metadata and PREMIS rights
FOSS digital preservation (AGPLv3)
good practices and standards
no barrier to user groups, community or documentation
consistent, system independent Archival Information Packages (AIPs)
Bagit, Dublin Core, METS, PREMIS
system synthesis
active integrations:
– DSpace
– CONTENTdm
– Islandora/Fedora
– Archivists' Toolkit
– LOCKSS
– DuraCloud
– OpenStack
– TRIM
on-going integrations:
– ArchivesSpace● Bentley● RAC
– Dspace
– Hydra
– Arkivum
– BitCurator
– Dataverse
A flexible open-source application
for standards-based description and access
Access to Memory
What is AtoM?AtoM stands for Access
to Memory.
It is a web-based, open source application for standards-based archival description and access in a multilingual, multi-repository environment.
Web-based
Open source
Standards-based
Multilingual
Multi-repository
Web-based: platform independent
Browser-based user interface.
• Anyone with access to a browser (e.g., Chrome, Internet Explorer, Firefox, Safari etc.) has access to all the features and functionality of the AtoM application.
Platform independent application.
• The application runs on a web server that can be installed and run on many platforms.
Standards-based description: User-friendly content standard edit templates
Templates: ISAD(G), DACS, RAD, DC, MODS ISAAR-CPF, ISDIAH, ISDF
←→
Multi-lingual interfaces
Multi-repository support: per-institution theming
Archivematica integration
Overall Workflow
describe and manage all hybrid content in AtoM
preserve digital content using Archivematica & hand off access copies and metadata to AtoM
provide access (digital copies and descriptions) and links to preserved content in AtoM
A flexible open-source application
for standards-based digital preservation
Archivematica makes OAIS (ISO 14721)
Archival Information Packages (AIPs)
– integrity & virus checks, format identification, characterization & metadata extraction, forensic activities, validation, arrangement, transcription, etc
– normalization to sustainable formats on ingest + preservation of the original file
– include or add metadata, including PREMIS rights and restrictions
– storage agnostic
– bagged AIP with logs and metadata (METS.xml)
the AIP:so much bigger on the inside
value add to storage: metadata, logs, formats and structure to protect against software
obsolescence
the METS.xml file
<dmdSec> (descriptive metadata) Dublin Core XML<amdSec> (administrative metadata) <techMD> PREMIS: object <digiProvMD> PREMIS: events PREMIS: agents <rightsMD> PREMIS: rights<fileSec> (a list of the files and their roles and relationships)<structMap> (a representation of the physical structure of the AIP)
question break .......
then we get knee deep in computers
identify your test content
✔ What✔ Where
✔ How much
what types of digital content?
• born-digital
― government and university records, student artwork, e-theses and dissertations
― diverse formats: audiovisual, textual, geospatial, websites, presentations, images, databases
• digitized
― books, newspapers, images, video from vendors
― pre-made access and preservation copies
• submission documentation & metadata
― permission forms, accession records, pictures of digital media, etc.
― descriptive MD from other systems
where is your digital content?
• stored locally
• in other systems
― ie CONTENTdm, Dspace, DuraCloud, Islandora
• on detached media
― floppies, hard drives, cds, dvds, usb sticks, etc.
• packaged
― Bagged using Library of Congress BagIt specification
― Forensic images
― Zipped or tarballed
how much is there?
• Size: gigabytes, terabytes, petabytes
― Sum total of all material
― Size of distinct content sets
― Biggest single digital objects
• Quantity
― Sum total of all files
― Number of files in distinct content sets
• Resource capacity
― Space allocated to processing and storage locations
― Consider ideal transfer, SIP and AIP sizes
asking questions of your content
• descriptive metadata?
― needs preserved? already existent or need to add? complex or simple objects?
• submission documentation?
― donor agreements, pictures of physical media, licenses, etc
• access copies?
― already have them? what system to send/store?
• generate preservation copies?
― already have them?
• service masters?
asking questions of your content
• directory structure important (Original Order)?
• keep the package AND the content, or just one?
• rights information?
• is content Bagged? in DSpace? a forensic image? (Transfer type)
• how large should my archival packages be?
• will my archival packages have a 1:1 relationship with my transferred digital content? will my content be arranged into multiple packages or combined into one? (Arrangement workflow)
processing in Archivematica
• determine readiness by pilot testing content streams using the methods just described
• prepare content for transfer:
– put it in a folder in a transfer source directory
– prepare a metadata CSV for simple or complex objects
– prepare submission documentation
– identify pre-made access, preservation and/or service copies
– select the right workflow: standard, DSpace, forensic image and pre-configured settings (more on this soon)
now let's see it!
archivematica.org & accesstomemory.org
Questions??
Thank you!