digital cva: status report · digital cva: getting up to date • if electronic files were...

43
Digital CVA: Status Report Digital CVA: Status Report Overview of www cvaonline org Overview of www .cvaonline.org Processing CVAs C Current status Getting up to date

Upload: others

Post on 28-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Digital CVA: Status ReportDigital CVA: Status Report

• Overview of www cvaonline orgOverview of www.cvaonline.org

• Processing CVAs

C• Current status

• Getting up to date

Page 2: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Digital CVA: Status ReportDigital CVA: Status Report

• Overview of www cvaonline orgOverview of www.cvaonline.org

• Processing CVAs• Current status

• Getting up to dateGetting up to date

Page 3: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

CVA Processing ‐ Images

CVA Textual PagesC & Si l I

Backup

Storage

HFSBackup

gand Plates Scan

Crop & Single Imagep

Storage

Convert to JTIP

Convert to SPIFF

Add Watermark

Page 4: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

CVA Processing – Textual Data

Entering of data Linking with imagesCVA Textual Pages

Checking and Correcting

Release to public site

Correcting

Page 5: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Digital CVA: Current StatusDigital CVA: Current Status

• Number of fascicules processed and availableNumber of fascicules processed and available on the website:  322

• Number of fascicules processed and held back• Number of fascicules processed and held back for release: 14

N b f CVA d i d b 44 200• Number of CVA records in database: 44,200.

• Number of images: 130,000 (4,000 GB).

Page 6: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Digital CVA: Getting Up To DateDigital CVA: Getting Up To Date

• Project ended 2004Project ended 2004.

• Number of fascicules published since the end of the project: 36of the project: 36

• Number of fascicules in press: 2

Page 7: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Digital CVA: Getting Up To DateDigital CVA: Getting Up To Date

• Estimated processing time per CVA:Estimated processing time per CVA:– Scanning: 2 hours

– Image cropping & single imaging: 6 hours

– Data entry (including searching for existing records): 3 days

– Processing images and linking: 1 day (1 hour interactive)

– Checking & correcting data: 1 day

• Total: 1 week per CVA

• Estimated 1 year for 40 CVAs.

• Including bringing website up to date.

Page 8: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Digital CVA: Getting Up To DateDigital CVA: Getting Up To Date

• If electronic files were available from the publisher.If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA. The other stages would still be required, as the language and naming convention used in each CVA would be different so still requiring manual data tentry.

• However, this would be preferred as the quality of images should be improved and the ability to searchimages should be improved and the ability to search any of the textual content would be possible (not just the keywords).the keywords).

Page 9: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 10: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Digital CVA: UpdatingDigital CVA: Updating

• Participants can be given accounts to update recordsParticipants can be given accounts to update records relating to their own collections.

• The core data is part of the main Beazley Archive p ypottery database, so only certain fields will be editable by remote users.

• Participants can also add any number of records.

• Participants can add images to existing or new records.

• Records will then be available to the public for viewing.

Page 11: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 12: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 13: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 14: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 15: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 16: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 17: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 18: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 19: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 20: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

CLAROSCLAROS

• First show how we linked to Henry Immerwahr’sFirst show how we linked to Henry Immerwahr s Corpus of Attic  Vase Inscriptions.

• Immerwhar documented around 8,500 Attic Vases ,with Inscriptions.

Page 21: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 22: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 23: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 24: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 25: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Data• University of Oxford – Beazley Archive

– Electronic documentation started 1979

– 150 000 Pottery records and 130 000 images150,000 Pottery records and 130,000 images

– 50,000 Engraved gem and cameo records and 30,000 images

– 900 Plaster casts records (classical sculpture) and 1000 images

– 900 Antiquarian photographs900 Antiquarian photographs

• University of Oxford ‐ Lexicon of Greek Personal Names

– Electronic documentation started 1975Electronic documentation started 1975.

– 400,000 recorded individuals. Over 35,000 unique  personal names.

• University of Cologne – Research Sculpture Archive• University of Cologne – Research Sculpture Archive

– Electronic documentation started 1972

– 250,000 Sculpture records, 490,000 images.

Page 26: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Data• German Archaeological Institute

– 1,500,000 photographs

• University of Paris X ‐ Lexicon Iconographicum Mythologiae Classicae

– Created 1972.

– 100,000 records, 180,000 images of mythological and religious iconography from 2,000 museums and collections.

• Total

– 2 million records and images

Page 27: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Databases• Beazley Archive: ‘XDB’ – XML data, SQL Server Database, ASP 

front end.

• Cologne Research Archive and German Archaeological Institute: ‘Arachne’ ‐MySQL database PHP front endArachne    MySQL database, PHP front end.

• LIMC: MySQL database, PHP front end.

• LGPN: Ingres relational database, also available as an eXist XML database serving TEI‐XML data XQuery front enddatabase serving TEI XML data. XQuery front end.

Page 28: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Databases• The variety of databases demonstrates how other partners can  

join with relatively little effort.

• No changes required to existing databases or programs• No changes required to existing databases or programs.

• Interchange of data is achieved by export of underlying data to CIDOC‐CRM.

Page 29: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

CLAROS Explorer• Each partner can integrate CLAROS data from the other partners using his own 

programming platform.

• As an example Beazley Archive set up the CLAROS Explorer that you have just seen demonstrated to show what is possible.

CLAROS

Search Request

ASP Returned Data

Beazley Server CLAROS Server

CLAROSASP Returned Data

FormatUser’s Web Browser

JavaScript

DataUserInteraction

Page 30: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

CLAROS System Components

Page 31: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Combining Disparate Data

Page 32: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Technicalities

Page 33: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

f

Image RecognitionVisually defined query

“It is an amphora

Data

It is an amphora… and here are similar objects in the archive”

What is this?

Page 34: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

f d ilh tt

Shape Representationoriginal image foreground

separationsilhouette

representationvector

X1X2

x2

.

.

.Xn

x

x1n

• No representation of patterns or surface markings

• 100-dimensional “vase shape space”

x

Page 35: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Shape Representation‐ details

silhouette (both sides)

handles

Page 36: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Vase shape space

Page 37: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Compute three nearest neighbours for each vasequery

classify shapeclassify shapeall three are

neck-amphorae

“judge me by the company I keep”

“vase shape space”

Page 38: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Step by step demonstration: Step 2: classify shapeStep 2: classify shape

Page 39: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

Step by step demonstration: St 3 t h i CLAROSStep 3: matches in CLAROS

Page 40: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 41: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA

CLAROS Explorer – Further Development

• Linked Multi‐lingual ThesauriLinked Multi lingual Thesauri

– HERAKLES ‐> HERCULES, HÉRACLÈS, ERACLE, HERACLES

• Hierarchy of place names / geo‐coordinatesHierarchy of place names / geo coordinates

– GREECE ‐> ATTICA ‐> ATHENS ‐> ACROPOLIS

Page 42: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA
Page 43: Digital CVA: Status Report · Digital CVA: Getting Up To Date • If electronic files were available from the publisher. There would only be the saving of 2 hours scanning per CVA