methods for knowledge management & digital preservation

Post on 25-Feb-2016

31 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Methods for Knowledge Management & Digital Preservation. The Theory and Practice of Digital History. Carl A. Young, M.A. in waiting 1 December 2009. Project Overview. Challenge. - PowerPoint PPT Presentation

TRANSCRIPT

“Think. Learn. Succeed.”Ver 1.2

Methods for Knowledge Management & Digital Preservation

The Theory and Practice of Digital History

Carl A. Young, M.A. in waiting1 December 2009

“Think. Learn. Succeed.”Ver 1.2

Project Overview

Resource and skill-constrained historians and archivists require efficient methods for capturing, analyzing, and sharing original

artifacts.

• Multi-phase project • Develop a low-cost process for

digitally archiving documents• Store them in a standards-based

data storage platform• Set the conditions to scale with future

phases • Creating a collaborative, accessible,

online digital repository fully leveraging the optionality of the digital domain.

Phase I – PrototypingPhase II- Capture

Phase III- Web AccessPhase IV- Initial ExpansionPhase V- Infinite Expansion

Major PhasesMethodology

Challenge

“Think. Learn. Succeed.”Ver 1.2

Completed in November 2009, this phase established a usable, affordable methodology

for project development by prototyping the capture and conversion of an original artifact

for testing and exploration purposes.

3

Phase I: Prototype

“Think. Learn. Succeed.”Ver 1.2 4

Demonstration

Phase I: Prototype (cont.)

Original Digital Camera .JPG file format2 MB

Treatment w/Photoshop.TIFF29 MB

Adobe Conversion.pdf278 KB

Time elapsed:Photo: <1 minTreatment: ~3 minConversion: <1min

“Think. Learn. Succeed.”Ver 1.2 5

“Think. Learn. Succeed.”Ver 1.2 6

Phase I: Prototype (cont.)

Process Flowchart

Legend

“Think. Learn. Succeed.”Ver 1.2

Completed in November 2009, this phase performed and documented a low-budget

document capture, artifact preservation, and conversion to a distributable format where a

historic text is extracted from the original document, archived, and presented to the user

in both the original capture (.jpg or .tiff) and distributable (.pdf and .xml) format with an

evaluation of optical character recognition (OCR) and transcription requirements.

7

Phase II: Capture

“Think. Learn. Succeed.”Ver 1.2

Select Area• Image

– Adjustments– Curves “Digitization”

• Channel - RGB• Output-203• Input-160

8

Phase II: Capture (cont.)

Image Treatment

FilterBlur

Smart BlurRadius-100Threshold-100Quality- HighMode- Normal

Surface BlurRadius-100Threshold-25

Surface Blur (if needed)Radius-100Threshold-25

Lens BlurShape - OctagonRadius - 5Blade Curve - 50Rotation - 300Brightness -10Threshold - 75Noise- 3Distro –Uniform Select

SelectColor Range

Modify ShadowsNo Invert

ModifyExpand 2

CutFile

New *Width-1600Height - 2500Resolution- 300CM - RGB 16bit* Recommend saving as a preset.

PasteFlattenClean up as neededSave As .TIFF

“Think. Learn. Succeed.”Ver 1.2 10

OCR and Transcription Demo

Phase II: Capture (cont.)

OCR TranscriptionTime elapsed:OCR: <1 minTranscription: ~5min

“Think. Learn. Succeed.”Ver 1.2 11

OCRTranscription

“Think. Learn. Succeed.”Ver 1.2 12

TEI Demo

Phase II: Capture (cont.)

Time elapsed:Preliminary Data: ~45 minPage: ~5 minLook at UVA’s TEI How To

“Think. Learn. Succeed.”Ver 1.2 13

Phase II: Capture (cont.)

Methodology Flow Chart

Legend

“Think. Learn. Succeed.”Ver 1.2

Phase II: Capture (cont.)

Militiaman’s Guide155 pages total, type text, fair condition

40 hours (optimal) / 5 GbsPer Page Estimates

• Photography: – ~30 sec– 2.5 Mbs @ 5Mpxl

• .tiff Conversion– ~3 min– 23 Mbs

• .pdf Conversion– ~1 min– 300 Kbs

• OCR - ~45 sec• Error Correction/Transcription: ~5 min• TEI - ~5 min (~45 min overhead)

14

Labor Estimates

Case Estimates• Photography:

– ~1:15– ~ 400 Mbs

• .tiff Conversion– ~7:45– 3.5 Gbs

• .pdf Conversion– ~2:30– 50 Mbs

• OCR - ~2 hours• Error Correction/Transcription: ~13 hrs• TEI - ~14 hrs

“Think. Learn. Succeed.”Ver 1.2

• Consumer-grade HP 5Mpxl digital camera ($125)• Slightly above consumer-grade PC ($1100)

– 4 GB RAM– 1 GB VRAM– 500 GB, SATA HD– Dual Screens

• Consumer Software ($600)– Adobe Creative Suite 3

15

Equipment Baseline

“Think. Learn. Succeed.”Ver 1.2

• Use a Tripod/Mount• Use consistent lighting• Safely flatten pages as much as possible• Use a mounting frame• Highest Resolution available• OCR is NOT reliable• Need an efficient method for TEI

16

Lessons Learned

“Think. Learn. Succeed.”Ver 1.2

This phase is the subject of this grant funding request. A team of professional developers will construct a

suitable multi-media database for storage and access of original artifact captures, distributable .pdf versions, and XML-based data and metadata derived from the

original. The team will also develop a working prototype web site

to access the data. Fundamental to this phase will be data archiving and disaster recovery for the data.

Successful conclusion of this phase will yield a working version 1.0 available for release and continued

development.

17

Phase III: Web-Access

“Think. Learn. Succeed.”Ver 1.2 18

Phase III: Web-Access (cont.)

Flow Chart

“Think. Learn. Succeed.”Ver 1.2 19

Work Breakdown Structure

Phase III: Web-Access (cont.)

Database Development

Prototype Evaluation

Prototype Web Development

AlphaTest & Mod

Beta

Test & Mod

RC1Test & Mod

v1.0

DocumentationDisaster Recovery

TestingEstimated Cost:

$52,000

“Think. Learn. Succeed.”Ver 1.2 20

Project Gantt Chart

Phase III: Web-Access (cont.)

“Think. Learn. Succeed.”Ver 1.2

Beyond the scope of this grant request, this phase seeks to develop partnerships and data shares across multiple institutions with similar projects

in development or production. The level of participation directly influences the

scale of this phase. It is anticipated that the minimal costs will be shared across participating

institutions.

21

Phase IV: Initial Expansion

“Think. Learn. Succeed.”Ver 1.2

Conduct Lifecycle Management Review

DocumentationDisaster Recover

Testing

Publish Methodology

Find Partners

Large Scale Capture

Leverage v1.0

Update Code and Processes

22

Work Breakdown Structure

Phase IV: Initial Expansion (cont.)

Estimated Cost: $8,000

“Think. Learn. Succeed.”Ver 1.2

Optionally, and depending on the success of the earlier phases, this phase will greatly expand collaborative efforts by potentially make this capability available to amateur and resource-

constrained archivists and historians by providing a standards-based methodology and

data capture technique and a collaborative platform to share the data once stored.

This aspect of the final phase will be limited only by technology maintenance and scalability

costs.

23

Phase V: Infinite Expansion

“Think. Learn. Succeed.”Ver 1.2 24

Work Breakdown Structure

Phase V: Infinite Expansion (cont.)

Publish Updated Methodology

Publish Membership Schema

Open Data Models

Leverage Current Version

Conduct Lifecycle Management Review

DocumentationDisaster Recover

TestingEstimated

Cost: $82,000

Release New Version(s)

“Think. Learn. Succeed.”Ver 1.2

Summary

• 5-Phase Approach• “How-To”

– Digitization– TEI– Manage the project

• Sets the stage– Broad/ambitious goals and

plan– Manageable pieces– Flexible optionality

• Phase III support:– $51,733.33– Prototype Validation– Database Development– Web Development– Hosting– Disaster Recovery

• Phase IV and V templates– Future expansion as desired– Flexible Planning

25

Project Summary Grant Request / Funding Summary

“Think. Learn. Succeed.”Ver 1.2

QUESTIONS

26

“Think. Learn. Succeed.”Ver 1.2

CONCLUSION

27

“Think. Learn. Succeed.”Ver 1.2

Man had always assumed that he was more intelligent than dolphins because he had achieved so much... the wheel,

New York, wars, and so on, whilst all the dolphins had ever done was muck about in the water having a good

time. But conversely the dolphins believed themselves to be

more intelligent than man for precisely the same reasons.

- Douglas Adams

28

Dead Guy Quote

“Think. Learn. Succeed.”Ver 1.2

BACKUP

29

“Think. Learn. Succeed.”Ver 1.2 30

Phase I: Prototype (cont.)

Work Breakdown Structure

Image Capture

Image Preservation

Image Manipulation

Database Development

TEI Process Development

Data Development

Static Web-Page

Prototyping

DocumentationDisaster

Recovery TestingEstimated Cost:

$5,000

“Think. Learn. Succeed.”Ver 1.2 31

Gantt Chart

Phase I: Prototype (cont.)

“Think. Learn. Succeed.”Ver 1.2 32

Phase II: Capture (cont.)

Work Breakdown Structure

Image Capture

TEI

Prototype Database Input

DocumentationDisaster

Recovery TestingEstimated

Cost: $2,000

“Think. Learn. Succeed.”Ver 1.2 33

Phase II: Capture (cont.)

Gantt Chart

top related