putting it all together for digital assets jon morley beck locey

Download Putting it all together for Digital Assets Jon Morley Beck Locey

If you can't read please download the document

Upload: isabella-gaines

Post on 24-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Putting it all together for Digital Assets Jon Morley Beck Locey
  • Slide 2
  • LDS Church History Library Our collections consist of manuscripts, books, Church records, photographs, oral histories, architectural drawings, pamphlets, newspapers, periodicals, maps, microforms, and audiovisual materials. The collection continues to grow annually and is a prime resource for the study of Church history. Our collections contains approximately: 270,000 books, pamphlets, magazines, and newspapers 240,000 collections of original, unpublished records (journals, diaries, correspondence, minutes, etc.) 3.5 million blessings for Church members 13,000 photograph collections 23,000 audiovisual items
  • Slide 3
  • Digitization Objective Patrons Globalization To provide access to library content to 14 million church members and the public throughout worldwide. Remote sites for new content. Internal Operations To create an automated digital pipeline from digitizing content to patron consumption. Extending to crowdsourcing in future.
  • Slide 4
  • Agenda The first half of this presentation will focus on what we want to accomplish with an automated digital pipeline. The second half will focused on how we built the digital pipeline.
  • Slide 5
  • DIGITAL PIPELINE What we want to accomplish
  • Slide 6
  • Digital Content Aleph (Master Record) Primo (Discovery) Rosetta EAD Tool (Encoded Archival Description) Church History Library Physical Assets 20102011
  • Slide 7
  • Rosetta Dual Role Two instances of Rosetta 1. DRPS Digital Records Preservation System (dark archive) 2.DCMS Digital Content Management System (public display)
  • Slide 8
  • Digital Pipeline Rosetta EAD Tool (Encoded Archival Description) Aleph (Master Record) Primo Ingest Collection PID555 tag Harvest & Index 855 tag
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • EAD Tool Interface
  • Slide 13
  • Slide 14
  • Slide 15
  • Click on Thumbnail
  • Slide 16
  • Slide 17
  • Slide 18
  • Patron can a item to be digitized.
  • Slide 19
  • Organizing and displaying Collections
  • Slide 20
  • EAD Tool (Encoded Archival Description) A finding aid that adds a viewable structure to a collection. -Many large, complex collections -Aleph stores only the collection level descriptive metadata -Add / edit / delete component level metadata -Rosetta 3.0 doesnt have enough functionality for our collections.
  • Slide 21
  • EAD Tool - Staff View
  • Slide 22
  • Slide 23
  • EDIT Drag and drop XML or CSV files
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Slide 30
  • Challenges Configuration is fairly complex within and between systems Large Batch Ingest has been difficult Re-ingesting content into Rosetta has been problematic (preservation system) Collections Management Built EAD tool to manage collections Serials? PDF Content Progressive PDF download in Adobe Reader X is not fully supported until Rosetta v3 2 searches to see text within a PDF
  • Slide 31
  • Successes Ingested over 57,000 Family history books ranging from 1 MB to 800 MB for the Family History department. (books.familysearch.org) Ingested 700,000 files. Most are linked to collections for Church History library. Consistent user experiences with large files Good response times 75,000 views / Month Once the pipelines are established, they work well. Able to create customizable solutions for multiple institutions.
  • Slide 32
  • Under Development Restricted content Based on IP addresses authentication Multiple viewing experiences (Responsive Design) Reporting capabilities Monitor usage
  • Slide 33
  • DIGITAL PIPELINE How we glued it all together
  • Slide 34
  • Digital Pipeline Aleph (Master Record) PrimoRosetta EAD Tool (Encoded Archival Description) Church History Library
  • Slide 35
  • Catalog in Aleph Physical assets are cataloged in Aleph (collection level). Staff assigns a call number and Aleph assigns a BIB number. While browsing, a patron requests that the asset be digitized. Aleph Call #: MS 2877 4 BIB #: 000114027
  • Slide 36
  • Digitized into Rosetta Rosetta The asset(s) from the Aleph collection are digitized. A custom tool (SIP tool) queries the Aleph SRU server using BIB # to get collection level metadata. Digitized assets, BIB # (CMS ID) and metadata are ingested into Rosetta. PIDs get assigned. Aleph SRU BIB #: 000114027 Call #: MS 2877 4 Title: Parley P. Pratt 000114027 MS 2877 4 Parley P. Pratt + Query Response
  • Slide 37
  • Rosetta to Aleph / EAD Tool Every night a custom script (cron job) queries Rosetta using the OAI harvester. The PIDs and item titles from Rosetta are inserted into the EAD Tool or prepared for Aleph (856 tags). Every night a custom script (job_list) ingests the metadata into Aleph. Rosetta OAI Aleph Query Response Dublin Core EAD Tool
  • Slide 38
  • From Aleph to Primo Every night Aleph publishes all data sets to Primo (publish- 06 in job_list). Aleph data sets include collection level metadata plus 856 tags (Rosetta PID) or 555 tags (EAD link). During harvest, Primo creates links from 555 or 856 tags which point to the EAD Tool and Rosetta respectively. Publish MARC XML Aleph Primo
  • Slide 39
  • Digital Pipeline Notes Aleph BIB # provides the linkage between the various systems. Call # provides the reference Linux scripting glues it all together. Cron jobs, job_list, wget and custom logs work together to get data, re-format data, move data and start new jobs. NFS shares allow us to move data around easily. Rights are a bit of a hassle. Timing matters. Pull from Rosetta, update EAD Tool, send to Aleph, publish to Primo, and run Primo pipes.
  • Slide 40
  • Backup Slides
  • Slide 41
  • Rosetta OAI Harvester Linux script using wget requests Rosetta data from last 24 hours Query: by publication set with from / until Response: Dublin Core Wget http://..org/oaiprovider/request?verb=ListRecords& metadataPrefix=oai_dc&set= &from=2012-08-20T20:00:00Z& until=2012-08-21T19:59:59Z Rosetta OAI Query Response Linux script
  • Slide 42
  • Aleph SRU Server Aleph SRU server response to Rosetta Query: by BIB # (CMIS ID) Response: Dublin Core (or MARC XML) Query Response RosettaAleph SRU
  • Slide 43
  • Aleph SRU Server URL http://..org:5661/ ?version=1.1&operation=searchRetrieve... &query=dc.callno=MS 318&maximumRecords=1 &query=rec.id=000082419&maximumRecords=1 &query=dc.title=Book of Mormon&maximumRecords=5 &query=dc.subject=Apostles&maximumRecords=10 &query=dc.creator=John Taylor&maximumRecords=10