challenges, workflows, and insights in the collaboration to preserve america's public media

66
Karen Cariani, Director, WGBH Media Library and Archives Casey E. Davis, AAPB Project Manager CHALLENGES, WORKFLOWS AND INSIGHTS IN THE COLLABORATION TO PRESERVE AMERICA'S PUBLIC MEDIA

Upload: wgbh-media-library-and-archives

Post on 14-Jun-2015

555 views

Category:

Technology


0 download

DESCRIPTION

WGBH Media Library and Archives Director Karen Cariani and American Archive of Public Broadcasting Project Manager Casey Davis gave this presentation at the New England Archivists 2014 Fall Symposium. Karen and Casey discussed managing and preserving digital video; Project Hydra; metadata for audiovisual materials; and collaboration with other institutions through the lens of WGBH Media Library and Archives projects including the American Archive of Public Broadcasting and the NEH funded HydraDAM project.

TRANSCRIPT

Page 1: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Karen Cariani,Director, WGBH Media Library and Archives

Casey E. Davis,AAPB Project Manager

CHALLENGES, WORKFLOWSAND INSIGHTS

IN THE COLLABORATION TO PRESERVE AMERICA'S PUBLIC MEDIA

Page 2: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

WHO WE ARE: WGBH MLA

Page 3: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

WHO WE ARE: AAPB

...and more than 120 public radio and television stati ons and archives nati onwide

Page 4: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Social media allows anyone to become a video publisher and broadcaster

100 hours of video uploaded to YouTube every minute60:1 – 80:1 shooti ng rati o on documentary fi lmsHow oft en do you create videos?

We’re all digital archivists now.” -Sibyl Schaefer

I would add to that, more specifi cally.... In a few years, we will also all be audiovisual archivists.

WHY ARE WE HERE TODAY?

Page 5: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

• Manage and preserve born-digital AV materials

• Explore digital media repository soluti ons• Generate metadata for digital AV materials• Evaluate multi -insti tuti onal collaborati ons

GOALS AND OBJECTIVES

Page 6: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

How many of you have A/V materials in your collecti on?How many of you are collecti ng born digital media?How are you storing the fi les?Can you easily access them?What are your biggest concerns?Who is collaborati ng with other insti tuti ons?

A FEW QUESTIONS

Page 7: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

MANAGING DIGITAL AV MATERIALS

Page 8: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

• Fragi l i ty, vulnerabi l i ty of digital media• No universal ly accepted standards or

proof of concept• Digital obsolescence

• Complexity of digital video and audio • Complex intel lectual property issues• Huge fi le s izes make storage more

expensive• Storage l imitati ons lead to decis ions

to compress• Lack of training among archivists

CHALLENGES OF MANAGING DIGITAL VIDEO

wrapper

Synchronization information

subtitles

Chapter information

Multiple video streams

Multiple audio streams

One or more

codecs

Page 9: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

AAPB DIGITIZATION OF 40K HOURS

WGBH’s 7,010 tapes that were sent to Crawford Media Services

Page 10: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

RETURNED ON 17 LTO-6 TAPES

Page 11: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Additi on of 5,000 hours of digiti zed and born digital mediaUp to 59,000 fi lesNot to exceed 5.24 terabytes aft er transcoding occurred

THE AAPB BORN DIGITAL DELIVERABLE

Page 12: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Lack of staff resources at stati onsOft en no metadata for digital fi lesFile names not consistent w/ metadataSystem limitati onsBicycling hard drivesAccess quality vs preservati on quality5.24 terabytes became 250+ terabytes

WE HAD SOME CHALLENGES

Page 13: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Create procedures for donors to submit their digital fi lesProvide donors with resources to inventory their collecti on

Get as much metadata as you can from the donor

Provide donors with instructi ons on fi le naming, drive naming, and organizati on

ACQUIRING DIGITAL MATERIALS

Page 14: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Media currently stored on LTO-4 in an HSM systemThe goal: send all video fi les to AAPB10,648 fi les X approx. 100+ GB each = 201.6 TBCopied fi les over network onto

70 3TB hard drives

Success!

WGBH CONTRIBUTED FILES, TOO

Page 15: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

...we initi ally had a 57% failure rate.

We learned the hard way that everyday IT operati ons are not good enough.

In the end:

26.4% failure rate

THINK AGAIN

Page 16: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Consider the NDSA levels of preservati on 1. Protect your data 2. Know your data 3. Monitor your data 4. Repair your data

Consider your resourcesDo what you can

NO, WE DON’T HAVE 7 COPIES

Page 17: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

1: Protect your Data

2. Know your data

3. Monitor your data

4. Repair your data

Storage & geographic location

File fixity & integrity

Information security

Metadata

File formats Library of Congress. NDSA Levels of Preservation. http://www.digitalpreservation.gov/ndsa/activities/levels.html.

Page 18: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

UK Data Service. Prepare and Manage Data.htt p://ukdataservice.ac.uk/manage-data/

Digital Curati on Centre. Checklist for a Data Management Plan.htt p://www.dcc.ac.uk/resources/data-management-plans/checklist

Library of Congress. DPOE Training Modules.htt p://www.digitalpreservati on.gov/educati on/

WITNESS. Acti vists Guide to Archiving Video.htt p://archiveguide.witness.org/

AMIA Educati on Committ ee Blog & forthcoming webinar serieshtt ps://amiaeducomm.wordpress.com/

RESOURCES

Page 19: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

EXPLORE DIGITAL MEDIA REPOSITORY SOLUTIONS

Page 20: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Preservati on fi les are large Uncompressed Slow to move around

Complicated formats Not just one fi le type Codecs, wrappers, frame speed, etc.

Need proxy fi les for viewing Smaller size for quick transport over network Need transcoding

20

WHAT MAKES VIDEO DIFFERENT?

Page 21: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media
Page 22: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Vendor opti ons License fees - expensive Migration to new versions on their timetable Professional services to access proprietary code Sti ll need tech support

Open Source Need developers /tech support We all need the same basic functions Can add features and functionality

© 2010 WGBH 22

DAM SOFTWARE SOLUTIONS

Page 23: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

To build a system using an open source soluti ons and components (Hydra tech stack) for digital media preservati on

How hard is it to do?

Is it implementable elsewhere?

Is it feasible for broad use?

© 2010 WGBH 23

NEH PROJECT GOALS

Page 24: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

A system to help us manage digital fi les all formats Born digital

Many, many fi le formats and sizes

Analog to digital fi lesA system potenti ally for preservati on and access

Internal and external accessA system that could evolve with our needs as tech changes

Tech changes every 3-5 yearsAdapt to changing workfl owsAff ordable

© 2010 WGBH 24

WHAT DID WE NEED?

Page 25: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Open source We direct how it evolves We make sure it serves our needs

Perhaps cheaper in the long run Not free as in free puppy (or kitten) that

needs lots of support

But part of an enduring, sustainable community

© 2010 WGBH 25

WHY DID WGBH CHOOSE HYDRA?

Page 26: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

• A robust repository fronted by feature-rich, tailored applicati ons and workfl ows (“heads”)

• One body, many heads

• Collaborati vely built gems and “soluti on bundles” that can be leveraged or adapted and modifi ed to suit local needs.

• A community of developers and adopters extending and enhancing the core framework

• Technical Training & Support

• Open source soft ware

© 2010 WGBH 26

WHAT IS HYDRA?

Page 27: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Aim to work towards a susta inable , open source reusable f ramework for multi purpose, multi functi on, multi - insti tuti onal repos i tory-enabled so luti ons

Chal lenges Do more with less Do it fast enough Do it well Get back on your feet quick

The Hydra Way - Work ing in Community Shared Purpose Conti nual Engagement & Assessment Tangible Results

27

WHY HYDRA?

“If you want to go fast, go alone, if you want to go far, go together” --African Proverb

Page 28: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Hydra Partners and Known Users

OR = Open Repositories Conference

Page 29: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Repository-Powered ApproachETDs

(Theses)Books, Articles Images

Audio-Visual

Research Data

Maps & GIS

Docu-ments

Digital Repository

Scalable, Robust, Shared Management

and Preservation Services

Maps

& GIS

Page 30: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Interface can be what you want it to be, simple Manage digital objects – core functi onality

Search Retrieve Describe Connect Store Preserve

Build functi ons and features on top of basic functi onality Started with Sufi a from Penn State

© 2010 WGBH 30

FUNCTIONALITY

Page 31: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

© 2010 WGBH 31

Page 32: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

© 2010 WGBH 32

Page 33: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Moving away from complicated systems

Turning to what we do best

Acknowledging that we can’t and shouldn’t try to be the end al l system

Focus on preservati on eff orts and hook into the workfl ow systems

Dealing with LOTS of fi les, big fi les, many formats, lots of stuff

Focus on how do we best handle this given our resources? © 2010 WGBH 33

WHAT WE’RE DOING

Page 34: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

© 2010 WGBH 34

NEW WORKFLOW

Page 35: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Not easy or cheap

Defi nitely a free puppy Not house broken Needs care and attention

But great ‘walking ’ community Offer advice, share solutions Identify commonalities and work

together© 2010 WGBH 35

OPEN SOURCE TEST CASE

Page 36: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

One Body, Many Heads…ETDs

(Theses)

Books, Article

s

Images

Audio-Visual

Research Data

Maps & GIS

Docu-ments

hydraScalable, Robust,

Shared Management and Preservation

Services

Page 37: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

• Time consuming to give same level of detail that happens with other types of content

• Need rati onal balance

METADATA FOR AV MATERIALS

Page 38: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

How many of you have an inventory of your AV assets?For Analog and digital?Do you have full catalog records?What metadata schema are you using to capture

Descriptive Intellectual property Technical & Preservation metadata?

QUESTIONS

Page 39: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

A standard way for anyone managing video or audio to speak the same language

Best practi ces for capturing criti cal descripti ve, intellectual property, and technical metadata about video and audio

Under further development by the AAPB and PBCore Advisory Group

PBCORE | PBCORE.ORG

Page 40: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Northeast Histor ic F i lm Pop Up Archive University of I l l inois Center for

Innovati on in Teaching and Learning Smithsonian Channel Internati onal Cr iminal Tr ibunals , The

Hague All iance for Community Media University of South Carol ina, Moving

Image Research Col lecti ons Bay Area Video Coal iti on Columbia Univers ity L ibrar ies Cal i fornia Audiovisual Preservati on

Project

Rock and Rol l Hal l of Fame Community Media Distr ibuti on

Network MyMassTV Network Documentary Educati onal Resources Washington Univers i ty F i lm and

Televis ion Archive American Archive of Publ ic

Broadcasti ng Dance Her i tage Coal iti on Univers i ty of Notre Dame Greene County Publ ic L ibrary WITNESS Glenstone Art Museum

WHO USES IT?

Page 41: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

WGBH I l l inois Publ ic Media Wisconsin Publ ic Te levis ion Wisconsin Publ ic Radio WYSO WNYC-FM WNET Louis iana Publ ic Broadcasti ng Pacifi ca Radio Archives KQED SCETV CUNY-TV KUHF

Howard University Television Database companies/orgs

have PBCore profi les including: Drupal Collecti veAccess Omeka Islandora

And many video and audio digiti zati on vendors...

WHO USES IT?

Page 42: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Local databases (Filemaker, Access, etc.)DAM systemsReady-made soluti ons:

Drupal CONTENTdm Collective Access Omeka

Spreadsheets

FIRST THINGS FIRST: HOW TO STORE DATA

Page 43: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

BEFORE WE GO ANY FURTHER

Asset / Intellectual Work

Instantiations / Instances

Page 44: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

4 content classes Intellectual Content Intellectual Property Technical Extensions

82 total elements30 att ributesSuggested controlled vocabularies

STRUCTURE OF PBCORE

Page 45: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Minimal fi elds you need to capture Identi fier

asset level & instanti ati on level Source of the identi fier Title

Formal or devised Type of ti tle Descripti on Locati on

Room, shelf, box, fi le path, hard drive ID, etc.

FINDING & CREATING THE METADATA

Page 46: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media
Page 47: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

SO YOU’VE GOT THIS TAPE. NOW WHAT?

Page 48: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

WHAT ABOUT THIS FILE?

Page 49: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

DIGITAL MEDIA IDENTIFIER & FORMAT

Filename = Instanti ati on ID

From extension you can getDigital Format

http://en.wikipedia.org/wiki/Internet_media_type

Page 50: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

DIGITAL MEDIA: ADDITIONAL METADATA

Page 51: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

and more technical metadata...

Page 52: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

AUTOMATE THE PROCESS

Automati on● removes human error● less staff ti me● consistency

Tools:ffprobemediainfoExifTool

Page 53: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

BUT I USE OTHER STANDARDS...

Page 54: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

PBCORE IS FLEXIBLE & EXTENSIBLE

As an XML schema, PBCore can be implemented along with other standards Within a METS wrapper With PREMIS as a sidecar fi le or as a <pbcoreExtension>

To provide more granular item-level descripti on along with collecti on-level descripti on in EAD

Page 55: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

<PBCORETITLE>EXAMPLES</PBCORETITLE>

http://www.pbcore.org/documentation/

• Simple instantiation record• Simple description document• PBCore Collection• PBCore in a METS record• PBCore in a digital preservation setting• Using PBCore for asset management• Using PBCore for archival description

Page 56: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

htt p://www.pbcore.orgPBCore Webinar recording:

htt p://www.vimeo.com/aapb/pbcorePBCore Validator:

htt p://infi nite-spire-2035.herokuapp.com/Forthcoming PBForm & updated Filemaker templateTwitt er: @therealpbcore

RESOURCES

Page 57: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

MULTI-INSTITUTIONAL COLLABORATIONS

Page 58: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Content projects: Vietnam, Boston Local News, China?

Content inventory projectHydra community – open source projectAAPB – participating organizationsDigital Commonwealth

COLLABORATIONS

Page 59: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media
Page 60: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media
Page 61: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Planning ti me Creati ng policy – but be fl exibleDeliverables for collaborati onOrg chart for decision making – who has the fi nal word

Who is deeply involved, who is peripheralExample: Inventory project:

Data gathering

Tools – PBCore validatorForms – minimum fi eldsHand holding – call us

HOW TO GET WHAT YOU NEED

Page 62: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media
Page 63: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Important to build rapport with partners

Relati onships: I love you, I need you, but I want you to change

Learn about hierarchy at partner insti tuti on so you can understand challenges and potenti al obstacles.

Manage expectati ons

FACE TO FACE

Page 64: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

In workflowIn budgetingIn needsIn timeframes

ACCEPT DIFFERENCES

Page 65: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Don’t be afraid to make sure your needs or your insti tuti ons needs are being met

In large collaborati on most likely you are not the only one to have those thoughts

SPEAK UP

Page 66: Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media

Karen Carianikaren_cariani [at] wgbh [dot] org@kcariani

Casey E. Daviscasey_davis [at] wgbh [dot] org@caseyedavis1

www.americanarchive.orgwww.pbcore.orgOpenvault.wgbh.org

THANK YOU!