methods for knowledge management & digital preservation

32
“Think. Learn. Succeed.” Ver 1.2 Methods for Knowledge Management & Digital Preservation The Theory and Practice of Digital History Carl A. Young, M.A. in waiting 1 December 2009

Upload: jenis

Post on 25-Feb-2016

31 views

Category:

Documents


1 download

DESCRIPTION

Methods for Knowledge Management & Digital Preservation. The Theory and Practice of Digital History. Carl A. Young, M.A. in waiting 1 December 2009. Project Overview. Challenge. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Methods for Knowledge Management & Digital Preservation

The Theory and Practice of Digital History

Carl A. Young, M.A. in waiting1 December 2009

Page 2: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Project Overview

Resource and skill-constrained historians and archivists require efficient methods for capturing, analyzing, and sharing original

artifacts.

• Multi-phase project • Develop a low-cost process for

digitally archiving documents• Store them in a standards-based

data storage platform• Set the conditions to scale with future

phases • Creating a collaborative, accessible,

online digital repository fully leveraging the optionality of the digital domain.

Phase I – PrototypingPhase II- Capture

Phase III- Web AccessPhase IV- Initial ExpansionPhase V- Infinite Expansion

Major PhasesMethodology

Challenge

Page 3: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Completed in November 2009, this phase established a usable, affordable methodology

for project development by prototyping the capture and conversion of an original artifact

for testing and exploration purposes.

3

Phase I: Prototype

Page 4: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 4

Demonstration

Phase I: Prototype (cont.)

Original Digital Camera .JPG file format2 MB

Treatment w/Photoshop.TIFF29 MB

Adobe Conversion.pdf278 KB

Time elapsed:Photo: <1 minTreatment: ~3 minConversion: <1min

Page 5: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 5

Page 6: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 6

Phase I: Prototype (cont.)

Process Flowchart

Legend

Page 7: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Completed in November 2009, this phase performed and documented a low-budget

document capture, artifact preservation, and conversion to a distributable format where a

historic text is extracted from the original document, archived, and presented to the user

in both the original capture (.jpg or .tiff) and distributable (.pdf and .xml) format with an

evaluation of optical character recognition (OCR) and transcription requirements.

7

Phase II: Capture

Page 8: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Select Area• Image

– Adjustments– Curves “Digitization”

• Channel - RGB• Output-203• Input-160

8

Phase II: Capture (cont.)

Image Treatment

FilterBlur

Smart BlurRadius-100Threshold-100Quality- HighMode- Normal

Surface BlurRadius-100Threshold-25

Surface Blur (if needed)Radius-100Threshold-25

Lens BlurShape - OctagonRadius - 5Blade Curve - 50Rotation - 300Brightness -10Threshold - 75Noise- 3Distro –Uniform Select

SelectColor Range

Modify ShadowsNo Invert

ModifyExpand 2

CutFile

New *Width-1600Height - 2500Resolution- 300CM - RGB 16bit* Recommend saving as a preset.

PasteFlattenClean up as neededSave As .TIFF

Page 9: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 10

OCR and Transcription Demo

Phase II: Capture (cont.)

OCR TranscriptionTime elapsed:OCR: <1 minTranscription: ~5min

Page 10: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 11

OCRTranscription

Page 11: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 12

TEI Demo

Phase II: Capture (cont.)

Time elapsed:Preliminary Data: ~45 minPage: ~5 minLook at UVA’s TEI How To

Page 12: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 13

Phase II: Capture (cont.)

Methodology Flow Chart

Legend

Page 13: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Phase II: Capture (cont.)

Militiaman’s Guide155 pages total, type text, fair condition

40 hours (optimal) / 5 GbsPer Page Estimates

• Photography: – ~30 sec– 2.5 Mbs @ 5Mpxl

• .tiff Conversion– ~3 min– 23 Mbs

• .pdf Conversion– ~1 min– 300 Kbs

• OCR - ~45 sec• Error Correction/Transcription: ~5 min• TEI - ~5 min (~45 min overhead)

14

Labor Estimates

Case Estimates• Photography:

– ~1:15– ~ 400 Mbs

• .tiff Conversion– ~7:45– 3.5 Gbs

• .pdf Conversion– ~2:30– 50 Mbs

• OCR - ~2 hours• Error Correction/Transcription: ~13 hrs• TEI - ~14 hrs

Page 14: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

• Consumer-grade HP 5Mpxl digital camera ($125)• Slightly above consumer-grade PC ($1100)

– 4 GB RAM– 1 GB VRAM– 500 GB, SATA HD– Dual Screens

• Consumer Software ($600)– Adobe Creative Suite 3

15

Equipment Baseline

Page 15: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

• Use a Tripod/Mount• Use consistent lighting• Safely flatten pages as much as possible• Use a mounting frame• Highest Resolution available• OCR is NOT reliable• Need an efficient method for TEI

16

Lessons Learned

Page 16: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

This phase is the subject of this grant funding request. A team of professional developers will construct a

suitable multi-media database for storage and access of original artifact captures, distributable .pdf versions, and XML-based data and metadata derived from the

original. The team will also develop a working prototype web site

to access the data. Fundamental to this phase will be data archiving and disaster recovery for the data.

Successful conclusion of this phase will yield a working version 1.0 available for release and continued

development.

17

Phase III: Web-Access

Page 17: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 18

Phase III: Web-Access (cont.)

Flow Chart

Page 18: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 19

Work Breakdown Structure

Phase III: Web-Access (cont.)

Database Development

Prototype Evaluation

Prototype Web Development

AlphaTest & Mod

Beta

Test & Mod

RC1Test & Mod

v1.0

DocumentationDisaster Recovery

TestingEstimated Cost:

$52,000

Page 19: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 20

Project Gantt Chart

Phase III: Web-Access (cont.)

Page 20: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Beyond the scope of this grant request, this phase seeks to develop partnerships and data shares across multiple institutions with similar projects

in development or production. The level of participation directly influences the

scale of this phase. It is anticipated that the minimal costs will be shared across participating

institutions.

21

Phase IV: Initial Expansion

Page 21: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Conduct Lifecycle Management Review

DocumentationDisaster Recover

Testing

Publish Methodology

Find Partners

Large Scale Capture

Leverage v1.0

Update Code and Processes

22

Work Breakdown Structure

Phase IV: Initial Expansion (cont.)

Estimated Cost: $8,000

Page 22: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Optionally, and depending on the success of the earlier phases, this phase will greatly expand collaborative efforts by potentially make this capability available to amateur and resource-

constrained archivists and historians by providing a standards-based methodology and

data capture technique and a collaborative platform to share the data once stored.

This aspect of the final phase will be limited only by technology maintenance and scalability

costs.

23

Phase V: Infinite Expansion

Page 23: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 24

Work Breakdown Structure

Phase V: Infinite Expansion (cont.)

Publish Updated Methodology

Publish Membership Schema

Open Data Models

Leverage Current Version

Conduct Lifecycle Management Review

DocumentationDisaster Recover

TestingEstimated

Cost: $82,000

Release New Version(s)

Page 24: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Summary

• 5-Phase Approach• “How-To”

– Digitization– TEI– Manage the project

• Sets the stage– Broad/ambitious goals and

plan– Manageable pieces– Flexible optionality

• Phase III support:– $51,733.33– Prototype Validation– Database Development– Web Development– Hosting– Disaster Recovery

• Phase IV and V templates– Future expansion as desired– Flexible Planning

25

Project Summary Grant Request / Funding Summary

Page 25: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

QUESTIONS

26

Page 26: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

CONCLUSION

27

Page 27: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

Man had always assumed that he was more intelligent than dolphins because he had achieved so much... the wheel,

New York, wars, and so on, whilst all the dolphins had ever done was muck about in the water having a good

time. But conversely the dolphins believed themselves to be

more intelligent than man for precisely the same reasons.

- Douglas Adams

28

Dead Guy Quote

Page 28: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2

BACKUP

29

Page 29: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 30

Phase I: Prototype (cont.)

Work Breakdown Structure

Image Capture

Image Preservation

Image Manipulation

Database Development

TEI Process Development

Data Development

Static Web-Page

Prototyping

DocumentationDisaster

Recovery TestingEstimated Cost:

$5,000

Page 30: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 31

Gantt Chart

Phase I: Prototype (cont.)

Page 31: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 32

Phase II: Capture (cont.)

Work Breakdown Structure

Image Capture

TEI

Prototype Database Input

DocumentationDisaster

Recovery TestingEstimated

Cost: $2,000

Page 32: Methods for Knowledge Management & Digital Preservation

“Think. Learn. Succeed.”Ver 1.2 33

Phase II: Capture (cont.)

Gantt Chart