succeed validation and take up of tools - katrien depuydt

26
Succeed WP3 Validation and take-up of tools Katrien Depuydt (INL) Stefan Eickeler, Sebastian Kirch, (IAIS) Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Upload: impact-centre-of-competence

Post on 05-Jul-2015

95 views

Category:

Technology


1 download

DESCRIPTION

Succeed WP3 Validation and Take-up of Tools at the "Succeed in Digitisation. Spreading Excellence" Conference.

TRANSCRIPT

Page 1: Succeed Validation and Take up of Tools - Katrien Depuydt

Succeed WP3 – Validation and take-up of tools

Katrien Depuydt (INL) –Stefan Eickeler, Sebastian Kirch, (IAIS)

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 2: Succeed Validation and Take up of Tools - Katrien Depuydt

Objectives

Many tools and linguistic resources were developed in research and

development programs supporting the digitisation of cultural heritage

Still, too few are used in the productive environments

Succeed’s approach to support the take-up of these tools:

1. Identify existing tools and resources

2. Identify libraries willing to use and evaluate tools

3. Define criteria to validate and evaluate tools

4. Provide training material for tools

5. Provide support to libraries using and evaluating tools

6. Blueprint for validation and take-up of tools

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 3: Succeed Validation and Take up of Tools - Katrien Depuydt

Survey of tools

Training material Evaluation

Page 4: Succeed Validation and Take up of Tools - Katrien Depuydt

1. SURVEY AND SELECTION OF TOOLS

Page 5: Succeed Validation and Take up of Tools - Katrien Depuydt

Survey of tools

Brief description and goals

Produce a survey of existing

tools

ground truth data and

lexicon data for digitisation

Select candidate tools for implementation at cultural heritage institutions

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 6: Succeed Validation and Take up of Tools - Katrien Depuydt

Survey of tools

Methodology used to achieve the

objectives

1. Taxonomy for categorisation based on

a simplified digitisation workflow

2. Definition of attributes e.g. how a tool

can be used in the digitisation process

3. Online Spreadsheet to collect and

organise tools

4. Assessment and further selection into a

shortlist of tools

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 7: Succeed Validation and Take up of Tools - Katrien Depuydt

Selection of tools

First selection: knock-out criteria (three steps)

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Further selection: (expertise partners)

Page 8: Succeed Validation and Take up of Tools - Katrien Depuydt

Task 1 Survey of tools

Summary of outcomes

Categorised list of 213 research and commercial tools

Available in an online database and frequently updated

Shortlist with the most relevant tools based on a quality assessment

An overview of existing ground truth material and lexicon data has

been produced.

http://impact.dlsi.ua.es/digitisation/tools-resources/tools-for-text-digitisation/

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 9: Succeed Validation and Take up of Tools - Katrien Depuydt

2. VALIDATION PARAMETERS

Page 10: Succeed Validation and Take up of Tools - Katrien Depuydt

1st Project Review – WP3

Validation parameters

Brief description and goals Define validation parameters and procedures for the implementation of

tools in productive environments (per task carried out by using a tool)

Validate each tool (or group of tools) based on these criteria

Work out evaluation work plans and test scenarios in cooperation with libraries based on their requirements

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 11: Succeed Validation and Take up of Tools - Katrien Depuydt

Validation parameters

Methodology used to achieve the objectives

1. Definition of evaluation template structure

2. Tool selection by libraries

3. Creation and compilation of evaluation material Separate evaluation forms

per task/tool type & common usability evaluation form

4. Distribution of evaluation material to participating libraries

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 12: Succeed Validation and Take up of Tools - Katrien Depuydt

1st Project Review – WP3

Validation parameters

Summary of outcomes

Described evaluation procedures

and produced 9 evaluation forms

per task

Worked out evaluation and test

scenarios as a “work plan” together

with the participating libraries

Blueprint for take-up and validation

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 13: Succeed Validation and Take up of Tools - Katrien Depuydt

3. TAKE-UP SUPPORT

Page 14: Succeed Validation and Take up of Tools - Katrien Depuydt

Take-up support

Brief description and goals

Support the integration, take-up and validation of digitisation tools and

resources

Tool implementation at four participant libraries and nine external libraries

(16 potential external libraries at the start of the project > 9 retained)

Assistance for the adaptation/application of the tools to specific domains

and/or languages

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 15: Succeed Validation and Take up of Tools - Katrien Depuydt

Take-up support

Methodology used to achieve the objectives

1. Each library installs, on average, two tools and tests their performance and

usability in a productive environment according to the predifined

validation criteria

2. Some consortium libraries will test existing linguistic resources for

enhancement of textual information retrieval

3. The technical partners (IAIS, INL, PSNC, UA) will provide online assistance

for the adaptation of the tools to specific domains and languages

4. The technical partners will report on the results based on the information

provided by the libraries

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 16: Succeed Validation and Take up of Tools - Katrien Depuydt

External Libraries

Library Country Selected Tools

Wielkopolska Biblioteka Cyfrowa Poland - Scan Tailor

- JHOVE2

- Image Magick

General Historical Library of

Salamanca

Spain - Gimp

- Omnipage

Wroclaw University Library Poland - Scan Tailor

- Tesseract OCR

University Library of Bratislava Slovak

Republic

- Scan Tailor

- ImageMagick

National Library of Finland Finland - Newspaper segmentation

- Korrektor

- Document Deskewer

Library of the University of Granada Spain - Scan Tailor

- Alchemy API

University Library of Leuven Belgium - Abbyy FRE

- NERT

University Library of Antwerp Belgium - NE Attestation tool,

- NLTK (NE),

- Stanford (NE)

University Library of Darmstadt Germany - Newspaper segmentation

- Korrektor

- Document Deskewer

Internal Libraries

Library Country Selected Tools

Biblioteca Virtual Miguel de Cervantes Spain - Abbyy FRE

- Geometric correction: Page Curl

- COBaLT

- Lexicon as Webservice

Bibliotèque nationale de France France - DBPedia Spotlight

- Evaluation Tool for OCR

- Lexicon as Webservice

Koninklijke Bibliotheek Netherlands - Lexicon as Webservice

- NLTK

- NERT

The British Library United

Kingdom

- Evaluation Tool for OCR

- Stanford (NE)

- Lexicon as Webservice

Take-up support

Summary of outcomes

Involved 9 external libraries in the

project to perform tool evaluation,

each of them committed to evaluate at

least 2 tools

Collected libraries’ digitisation

requirements

Consulted libraries in defining

interesting use cases for evaluation

Provided remote assistance for the

take-up of tools selected by the

libraries

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 17: Succeed Validation and Take up of Tools - Katrien Depuydt

Take-up support

Remote assistance for technical support: Assistance for the integration and

adaptation of the tools to specific domains, languages and use cases

Implementation studies (final report): Elaboration of blueprint on validation

and take-up process for tools and resources

Case studies from the implementation experiences produced

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 18: Succeed Validation and Take up of Tools - Katrien Depuydt

4: TRAINING

Page 19: Succeed Validation and Take up of Tools - Katrien Depuydt

Training

Brief description and goals

Produce documentation and training material for the tools to be validated. They must help the participating libraries to take-up the tools in their productive environment.

Provide training on specific tools to external stakeholders.

Organise on-site training workshops depending on libraries requirements

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 20: Succeed Validation and Take up of Tools - Katrien Depuydt

Training

Methodology used to achieve the objectives

1. Document structure of training material

2. Tool selection by libraries

3. Distribution of Work: WP 3 partners according to expertise and knowledge

with the selected tools

4. Creation and compilation of training material

5. Distribution of training material to participating libraries

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 21: Succeed Validation and Take up of Tools - Katrien Depuydt

Training

Summary of outcomes

Prepared training materials for 19 tools

(separate document, online SCORM +

DigitWiki)

Organized TPDL tutorial attracting

experts from digital libraries from

around the world

Participation in hackathons

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 22: Succeed Validation and Take up of Tools - Katrien Depuydt

5. CONCLUSIONS

Page 23: Succeed Validation and Take up of Tools - Katrien Depuydt

Conclusions

Evaluation work of each participating library

> Presentations!

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 24: Succeed Validation and Take up of Tools - Katrien Depuydt

Conclusions

Blueprint for evaluation

General recommendations for evaluation by libraries:

a. Translate requirements into detailed use case (including detailed

description of data + data format)

b. Acquire or produce test data

c. Determine tools

d. Produce work plan

e. Verify use case with internal and external experts (Tool providers, CoC)

If no test data can be produced, adapt use case

If plan breaks down in too many steps, adapt use case

If necessary, change tool selection

f. Documentation of the evaluation (evaluation forms)

g. Use experienced technical staff

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 25: Succeed Validation and Take up of Tools - Katrien Depuydt

Conclusions

Blueprint for evaluation

General recommendations for tool providers:

a. Provide a clear description of the purpose of the tool

b. Provide a clear description of the formats the tool can handle

c. Provide a clear description of the type of material the tool can handle with

reasonable results; provide information on performance where possible

d. Provide a clear step by step description of the complete procedure that

should be followed to get the best possible result, including training and

tuning of parameters.

e. Provide compact documentation if possible

f. Minimize interdependency of parts of documentation

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Page 26: Succeed Validation and Take up of Tools - Katrien Depuydt

Thank you!

Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.