challenges, workflows, and insights in the collaboration to preserve america's public media
DESCRIPTION
WGBH Media Library and Archives Director Karen Cariani and American Archive of Public Broadcasting Project Manager Casey Davis gave this presentation at the New England Archivists 2014 Fall Symposium. Karen and Casey discussed managing and preserving digital video; Project Hydra; metadata for audiovisual materials; and collaboration with other institutions through the lens of WGBH Media Library and Archives projects including the American Archive of Public Broadcasting and the NEH funded HydraDAM project.TRANSCRIPT
Karen Cariani,Director, WGBH Media Library and Archives
Casey E. Davis,AAPB Project Manager
CHALLENGES, WORKFLOWSAND INSIGHTS
IN THE COLLABORATION TO PRESERVE AMERICA'S PUBLIC MEDIA
WHO WE ARE: WGBH MLA
WHO WE ARE: AAPB
...and more than 120 public radio and television stati ons and archives nati onwide
Social media allows anyone to become a video publisher and broadcaster
100 hours of video uploaded to YouTube every minute60:1 – 80:1 shooti ng rati o on documentary fi lmsHow oft en do you create videos?
We’re all digital archivists now.” -Sibyl Schaefer
I would add to that, more specifi cally.... In a few years, we will also all be audiovisual archivists.
WHY ARE WE HERE TODAY?
• Manage and preserve born-digital AV materials
• Explore digital media repository soluti ons• Generate metadata for digital AV materials• Evaluate multi -insti tuti onal collaborati ons
GOALS AND OBJECTIVES
How many of you have A/V materials in your collecti on?How many of you are collecti ng born digital media?How are you storing the fi les?Can you easily access them?What are your biggest concerns?Who is collaborati ng with other insti tuti ons?
A FEW QUESTIONS
MANAGING DIGITAL AV MATERIALS
• Fragi l i ty, vulnerabi l i ty of digital media• No universal ly accepted standards or
proof of concept• Digital obsolescence
• Complexity of digital video and audio • Complex intel lectual property issues• Huge fi le s izes make storage more
expensive• Storage l imitati ons lead to decis ions
to compress• Lack of training among archivists
CHALLENGES OF MANAGING DIGITAL VIDEO
wrapper
Synchronization information
subtitles
Chapter information
Multiple video streams
Multiple audio streams
One or more
codecs
AAPB DIGITIZATION OF 40K HOURS
WGBH’s 7,010 tapes that were sent to Crawford Media Services
RETURNED ON 17 LTO-6 TAPES
Additi on of 5,000 hours of digiti zed and born digital mediaUp to 59,000 fi lesNot to exceed 5.24 terabytes aft er transcoding occurred
THE AAPB BORN DIGITAL DELIVERABLE
Lack of staff resources at stati onsOft en no metadata for digital fi lesFile names not consistent w/ metadataSystem limitati onsBicycling hard drivesAccess quality vs preservati on quality5.24 terabytes became 250+ terabytes
WE HAD SOME CHALLENGES
Create procedures for donors to submit their digital fi lesProvide donors with resources to inventory their collecti on
Get as much metadata as you can from the donor
Provide donors with instructi ons on fi le naming, drive naming, and organizati on
ACQUIRING DIGITAL MATERIALS
Media currently stored on LTO-4 in an HSM systemThe goal: send all video fi les to AAPB10,648 fi les X approx. 100+ GB each = 201.6 TBCopied fi les over network onto
70 3TB hard drives
Success!
WGBH CONTRIBUTED FILES, TOO
...we initi ally had a 57% failure rate.
We learned the hard way that everyday IT operati ons are not good enough.
In the end:
26.4% failure rate
THINK AGAIN
Consider the NDSA levels of preservati on 1. Protect your data 2. Know your data 3. Monitor your data 4. Repair your data
Consider your resourcesDo what you can
NO, WE DON’T HAVE 7 COPIES
1: Protect your Data
2. Know your data
3. Monitor your data
4. Repair your data
Storage & geographic location
File fixity & integrity
Information security
Metadata
File formats Library of Congress. NDSA Levels of Preservation. http://www.digitalpreservation.gov/ndsa/activities/levels.html.
UK Data Service. Prepare and Manage Data.htt p://ukdataservice.ac.uk/manage-data/
Digital Curati on Centre. Checklist for a Data Management Plan.htt p://www.dcc.ac.uk/resources/data-management-plans/checklist
Library of Congress. DPOE Training Modules.htt p://www.digitalpreservati on.gov/educati on/
WITNESS. Acti vists Guide to Archiving Video.htt p://archiveguide.witness.org/
AMIA Educati on Committ ee Blog & forthcoming webinar serieshtt ps://amiaeducomm.wordpress.com/
RESOURCES
EXPLORE DIGITAL MEDIA REPOSITORY SOLUTIONS
Preservati on fi les are large Uncompressed Slow to move around
Complicated formats Not just one fi le type Codecs, wrappers, frame speed, etc.
Need proxy fi les for viewing Smaller size for quick transport over network Need transcoding
20
WHAT MAKES VIDEO DIFFERENT?
Vendor opti ons License fees - expensive Migration to new versions on their timetable Professional services to access proprietary code Sti ll need tech support
Open Source Need developers /tech support We all need the same basic functions Can add features and functionality
© 2010 WGBH 22
DAM SOFTWARE SOLUTIONS
To build a system using an open source soluti ons and components (Hydra tech stack) for digital media preservati on
How hard is it to do?
Is it implementable elsewhere?
Is it feasible for broad use?
© 2010 WGBH 23
NEH PROJECT GOALS
A system to help us manage digital fi les all formats Born digital
Many, many fi le formats and sizes
Analog to digital fi lesA system potenti ally for preservati on and access
Internal and external accessA system that could evolve with our needs as tech changes
Tech changes every 3-5 yearsAdapt to changing workfl owsAff ordable
© 2010 WGBH 24
WHAT DID WE NEED?
Open source We direct how it evolves We make sure it serves our needs
Perhaps cheaper in the long run Not free as in free puppy (or kitten) that
needs lots of support
But part of an enduring, sustainable community
© 2010 WGBH 25
WHY DID WGBH CHOOSE HYDRA?
• A robust repository fronted by feature-rich, tailored applicati ons and workfl ows (“heads”)
• One body, many heads
• Collaborati vely built gems and “soluti on bundles” that can be leveraged or adapted and modifi ed to suit local needs.
• A community of developers and adopters extending and enhancing the core framework
• Technical Training & Support
• Open source soft ware
© 2010 WGBH 26
WHAT IS HYDRA?
Aim to work towards a susta inable , open source reusable f ramework for multi purpose, multi functi on, multi - insti tuti onal repos i tory-enabled so luti ons
Chal lenges Do more with less Do it fast enough Do it well Get back on your feet quick
The Hydra Way - Work ing in Community Shared Purpose Conti nual Engagement & Assessment Tangible Results
27
WHY HYDRA?
“If you want to go fast, go alone, if you want to go far, go together” --African Proverb
Hydra Partners and Known Users
OR = Open Repositories Conference
Repository-Powered ApproachETDs
(Theses)Books, Articles Images
Audio-Visual
Research Data
Maps & GIS
Docu-ments
Digital Repository
Scalable, Robust, Shared Management
and Preservation Services
Maps
& GIS
Interface can be what you want it to be, simple Manage digital objects – core functi onality
Search Retrieve Describe Connect Store Preserve
Build functi ons and features on top of basic functi onality Started with Sufi a from Penn State
© 2010 WGBH 30
FUNCTIONALITY
© 2010 WGBH 31
© 2010 WGBH 32
Moving away from complicated systems
Turning to what we do best
Acknowledging that we can’t and shouldn’t try to be the end al l system
Focus on preservati on eff orts and hook into the workfl ow systems
Dealing with LOTS of fi les, big fi les, many formats, lots of stuff
Focus on how do we best handle this given our resources? © 2010 WGBH 33
WHAT WE’RE DOING
© 2010 WGBH 34
NEW WORKFLOW
Not easy or cheap
Defi nitely a free puppy Not house broken Needs care and attention
But great ‘walking ’ community Offer advice, share solutions Identify commonalities and work
together© 2010 WGBH 35
OPEN SOURCE TEST CASE
One Body, Many Heads…ETDs
(Theses)
Books, Article
s
Images
Audio-Visual
Research Data
Maps & GIS
Docu-ments
hydraScalable, Robust,
Shared Management and Preservation
Services
• Time consuming to give same level of detail that happens with other types of content
• Need rati onal balance
METADATA FOR AV MATERIALS
How many of you have an inventory of your AV assets?For Analog and digital?Do you have full catalog records?What metadata schema are you using to capture
Descriptive Intellectual property Technical & Preservation metadata?
QUESTIONS
A standard way for anyone managing video or audio to speak the same language
Best practi ces for capturing criti cal descripti ve, intellectual property, and technical metadata about video and audio
Under further development by the AAPB and PBCore Advisory Group
PBCORE | PBCORE.ORG
Northeast Histor ic F i lm Pop Up Archive University of I l l inois Center for
Innovati on in Teaching and Learning Smithsonian Channel Internati onal Cr iminal Tr ibunals , The
Hague All iance for Community Media University of South Carol ina, Moving
Image Research Col lecti ons Bay Area Video Coal iti on Columbia Univers ity L ibrar ies Cal i fornia Audiovisual Preservati on
Project
Rock and Rol l Hal l of Fame Community Media Distr ibuti on
Network MyMassTV Network Documentary Educati onal Resources Washington Univers i ty F i lm and
Televis ion Archive American Archive of Publ ic
Broadcasti ng Dance Her i tage Coal iti on Univers i ty of Notre Dame Greene County Publ ic L ibrary WITNESS Glenstone Art Museum
WHO USES IT?
WGBH I l l inois Publ ic Media Wisconsin Publ ic Te levis ion Wisconsin Publ ic Radio WYSO WNYC-FM WNET Louis iana Publ ic Broadcasti ng Pacifi ca Radio Archives KQED SCETV CUNY-TV KUHF
Howard University Television Database companies/orgs
have PBCore profi les including: Drupal Collecti veAccess Omeka Islandora
And many video and audio digiti zati on vendors...
WHO USES IT?
Local databases (Filemaker, Access, etc.)DAM systemsReady-made soluti ons:
Drupal CONTENTdm Collective Access Omeka
Spreadsheets
FIRST THINGS FIRST: HOW TO STORE DATA
BEFORE WE GO ANY FURTHER
Asset / Intellectual Work
Instantiations / Instances
4 content classes Intellectual Content Intellectual Property Technical Extensions
82 total elements30 att ributesSuggested controlled vocabularies
STRUCTURE OF PBCORE
Minimal fi elds you need to capture Identi fier
asset level & instanti ati on level Source of the identi fier Title
Formal or devised Type of ti tle Descripti on Locati on
Room, shelf, box, fi le path, hard drive ID, etc.
FINDING & CREATING THE METADATA
SO YOU’VE GOT THIS TAPE. NOW WHAT?
WHAT ABOUT THIS FILE?
DIGITAL MEDIA IDENTIFIER & FORMAT
Filename = Instanti ati on ID
From extension you can getDigital Format
http://en.wikipedia.org/wiki/Internet_media_type
DIGITAL MEDIA: ADDITIONAL METADATA
and more technical metadata...
AUTOMATE THE PROCESS
Automati on● removes human error● less staff ti me● consistency
Tools:ffprobemediainfoExifTool
BUT I USE OTHER STANDARDS...
PBCORE IS FLEXIBLE & EXTENSIBLE
As an XML schema, PBCore can be implemented along with other standards Within a METS wrapper With PREMIS as a sidecar fi le or as a <pbcoreExtension>
To provide more granular item-level descripti on along with collecti on-level descripti on in EAD
<PBCORETITLE>EXAMPLES</PBCORETITLE>
http://www.pbcore.org/documentation/
• Simple instantiation record• Simple description document• PBCore Collection• PBCore in a METS record• PBCore in a digital preservation setting• Using PBCore for asset management• Using PBCore for archival description
htt p://www.pbcore.orgPBCore Webinar recording:
htt p://www.vimeo.com/aapb/pbcorePBCore Validator:
htt p://infi nite-spire-2035.herokuapp.com/Forthcoming PBForm & updated Filemaker templateTwitt er: @therealpbcore
RESOURCES
MULTI-INSTITUTIONAL COLLABORATIONS
Content projects: Vietnam, Boston Local News, China?
Content inventory projectHydra community – open source projectAAPB – participating organizationsDigital Commonwealth
COLLABORATIONS
Planning ti me Creati ng policy – but be fl exibleDeliverables for collaborati onOrg chart for decision making – who has the fi nal word
Who is deeply involved, who is peripheralExample: Inventory project:
Data gathering
Tools – PBCore validatorForms – minimum fi eldsHand holding – call us
HOW TO GET WHAT YOU NEED
Important to build rapport with partners
Relati onships: I love you, I need you, but I want you to change
Learn about hierarchy at partner insti tuti on so you can understand challenges and potenti al obstacles.
Manage expectati ons
FACE TO FACE
In workflowIn budgetingIn needsIn timeframes
ACCEPT DIFFERENCES
Don’t be afraid to make sure your needs or your insti tuti ons needs are being met
In large collaborati on most likely you are not the only one to have those thoughts
SPEAK UP
Karen Carianikaren_cariani [at] wgbh [dot] org@kcariani
Casey E. Daviscasey_davis [at] wgbh [dot] org@caseyedavis1
www.americanarchive.orgwww.pbcore.orgOpenvault.wgbh.org
THANK YOU!