digitizing nevada newspapers: workflow

23
Nevada Digital Newspaper Project Dana Bullinger (Project Coordinator) and Melissa Stoner (Project Technician)

Upload: nevada-digital-newspaper-project

Post on 22-Jan-2018

214 views

Category:

Technology


1 download

TRANSCRIPT

Nevada Digital Newspaper Project

Dana Bullinger (Project Coordinator) and Melissa Stoner (Project Technician)

PHASE ONE

Title Selection

● Advisory Board selects qualified titles

○ Research Value

○ Geographic Representation

○ Temporal Coverage

○ Diversity

NDNP Title Guidelines

● Complete (or majority of) title run should be available

on microfilm without restrictions

● Technical factors to consider:

○ Quality of original text and microfilm capture

○ Reduction ratio (lower the reduction ratio, the better, below 20x)

○ Camera master negative microfilm duplicated should have a resolution

test patterns readable at 5.0 or higher

○ Variations of no more than 0.2 within images and between exposures

○ Confidence level through OCR testing of sample page images

Deliverables

For Each Title

•Up-to-date MARC record from the

CONSER OCLC database

•Additional title-level metadata (Reel-Level

Metadata spreadsheet example)

•Newspaper History Essay - 500 words per

title

For each issue

•Structural metadata for issues digitized and

organized by date (Page-Level Metadata

spreadsheet example)

DeliverablesFor each newspaper page

- Page image in two formats

- Grayscale, scanned between 300-

400 dpi, uncompressed TIFF 6.0

image file

- Same image, compressed as

JPEG2000 (.JP2)

- OCR text using the ALTO schema

(1 file per page)

- PDF image with Hidden Text

PHASE TWO

Selected Titles

● Research Library of

Congress Control Numbers

CCNs and OCLC numbers

for all titles

● Accurate LCCNs critical for

data management

● Fill in spreadsheet

● Send to LC for approval

Before Duplication Begins...

● Set up purchase order with selected

digitization vendor (iArchives)

● Research and order microfilm reader

● Send work plan to NEH

● Order 10 1-TB Hard Drives for our

deliverables

Microfilm Reader and Software

•14MP Image Sensor

•Light Source

•File Output

•Lens with 7x to 105x

magnification

Sample Batch● Sample batch allows Library of Congress to

identify any potential problems and ensures

technical specifications are being implemented

● Tonopah Daily Bonanza (1901-1903)

● Negative and Positive Reels duplicated by

NSLA and sent to UNLV

● Apply LC-provided barcodes on Negative Reel

boxes

○ Barcode connects digital content to physical

reel deposited at LC

MasterFile●Document everything in the MasterFile and Reel-Level

Spreadsheet

○ Title, Year, LCCN, Barcode/Reel Number, Unique name for iArchives,

metadata received from NSLA

Collation: Reel-Level

UNLV NSLA

Unique Name Title

LCCN Source Repository

Reel-Number Density Readings

Location of Publication Reduction Ratio

Start/End date Average Density

Digital Responsible

Institution

Collation: Page-Level

● Use template

● One page-level spreadsheet = one reel

● Page count

● Anomalies

- Missing issues or pages

- Duplicate issues or pages

- Mutilated pages

- Other abnormalities (e.g. pages out of

order,incorrect dates)

Quality Review: before deliver to vendor● Re-visit collation sheet and reel

metadata line-by-line

● Confirm for accuracy

● Check delivered page count against

● Check all notation for standardization

and clarity

● Metadata property formatted

iArchives

● iArchives Portal

○ Upload Reel and Page-level in a

.CSV file

● Ship Negative reels and blank hard

drive to be digitized

Scanning Specifications● Scan from clean second-

generation duplicate silver

negative microfilm (to be

deposited at the Library of

Congress at the end of the award

period)

● Capture specifications are 8-bit

grayscale, between 300 and 400

dpi

● Target film strip should be

scanned at the start of each

session

● Provide the master page images,

delivered to LC, as uncompressed

images in TIFF 6.0 format

PHASE THREE

Back to UNLV

● Receive hard

drive

● Batch Structure

Quality Review- Quality Review process ensures that NDNP Specifications are met

by checking for image quality, irregularities, and correct

bibliographic software

- Digital Viewer and Validator

(DVV)

- Allows awardees and

vendors to view data and

validate technical aspects of

files

- Verification checks digital

signatures of all files in a batch

Quality Review● Verify Batch

● Double check dates using Calendar View

in DVV, cross reference with Reel-Level

and Page-Level data

● View thumbnails

● Check OCR (10% of pages)

● Verify Batch with DVV for a second time

● Email Tonijala Penn (LC Liaison) and Deb

Thomas (Project Coordinator for NDNP)

Library of Congress

● Ship to LC

○ Hard Drive

○ Shipping Manifest

○ Use fluorescent stickers!

● Receives and processes batch

● 6-8 weeks turnaround time

● If accepted, batch is ingested

into Chronicling America

Totals to date