fraimow curatecamp 2015
TRANSCRIPT
Mass Migration Building a Bulk Hard Drive-to-LTO
Workflow From Scratch
Rebecca Fraimow National Digital Stewardship Resident at WGBH
@rhfraim
80 hard drives 11,561 audiovisual files 300 TB of data 1 dedicated LTO workstation 1 dedicated archivist
. . .
Required Scripts & Documents (Initial)
AA_PBCorescript.sh: generates checksums and metadata for each file on drive AA_LTO_checksum.sh: generates checksums for each file on LTO WGBH_Batch1_LimitedCSV_final.csv, WGBH-Batch2-140211.csv, WGBH_batch3.csv, WGBH-Batch4-LimitedCSV.csv: GUID mapping documents
Some drives didn’t perform correctly when removed from their cases Some drives had too much content to fit on one LTO tape Some drives had known failed files on them that were not separated out or identified Some of the content turned out to be derivative material Some of the content had been pulled twice Some drives turned out to have failed files that could only be detected by manual QC
Required Scripts & Documents (Revised)
AA_PBCorescript_with_checks.sh: restructures drive, checks for bad files and derivatives, generates checksums and metadata for each file on drive AA_LTO_checksum.sh: generates checksums for each file on LTO WGBH_Batch1_LimitedCSV_final.csv, WGBH-Batch2-140211.csv, WGBH_batch3.csv, WGBH-Batch4-LimitedCSV.csv: GUID mapping documents AA_LTO_checksum_second_tape.sh: creates a second checksum list for overflow files batch_qt_proofsheet.sh: creates QT_proofs for each files proof_check.sh: QC to identify files incompatible with QT_proofs aapb_MD5_total.csv: list of all files transferred, with checksums corrupted_files.csv: list of files that did not pass MD5 checksum validation derivatives.csv: list of derivative files to be removed from inclusion in the repository md5_original_values.csv: list of all documented MD5s from before files went into Artesia DAM
QT_Proofsheets
Probably OK! NOT OK
SHARE DRIVE
HARD DRIVE
LTO