sharing repositories for editorial scholarship of digital texts. the pinakes 3.0 open source project...

22
Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project http://pinakes.imss.fi.it Andrea Bozzi - CNR-ILC, Pisa, Italy & Andrea Scotti - IMSS/FRD, Florence, Italy The Marriage of Mercury and Philology: Problems and Outcomes in Digital Philology Edinburgh 25-27 2008

Upload: jack-knight

Post on 20-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

Sharing repositories for editorial scholarship of digital texts.

The Pinakes 3.0 Open Source Project

http://pinakes.imss.fi.it

Andrea Bozzi - CNR-ILC, Pisa, Italy

&

Andrea Scotti - IMSS/FRD, Florence, Italy

The Marriage of Mercury and Philology: Problems and Outcomes in Digital Philology

Edinburgh 25-27 2008

Page 2: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

Summary1. Actors (a), objectives (b), state of the art (c) and

current user group (d).2. PKA: Powered Knowledge Architecture: overall view

and generalistic function of all modules (a-b).3. PK Main: Dynamic modeling of schema/s and data

(on web direct).4. PK Text: Methods and functionalities and subset

modules (a-g).5. Towards a possible co-ordination act for a

European Consortium in Digital Humanistic & Linguistic Scholarship.

Page 3: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

1. Actors, objectives, state of the art and future

development (a).

Individuals:• Author and project leader: Andrea Scotti (Fondazione Rinascimento

Digitale, Florence; Istituto e Museo di Storia della Scienza, Florence)• Co-authors and developers: Fabrizio Butini and Corrado Veser (Istituto e

Museo di Storia della Scienza, Florence)• Pinakes Text - Author: Andrea Bozzi (ILC - Consiglio Nazionale delle

Ricerche, Pisa)• Search engine, Pinakes Text, and PK Advanced Edition - Development:

Paolo Ruffolo (ILC - Consiglio Nazionale delle Ricerche, Pisa) & Engineering Faculty for IT, Valeriano Sandrucci, Luca Romano Dep. for Software Architecture and Validation, University of Florence

• Project Duration: since 2005

Institutions:1. The project is financially supported by the Fondazione Rinascimento

Digitale which has been created as a no-profit institution within the framework activities of the Ente Cassa di Risparmio in Florence.

2. The promoter is the Institute and Museum for the History of Science in Florence in coordination with the Libraries Directorate of the Italian Ministry for Cultural Heritage.

Page 4: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

1. The overall goal is to facilitate both understanding the epistemological relevance of computational methodology in the humanities research/studies and to offer there within a feasible way to deploy it across different disciplines.

2. This implies to make available a public and customable set of tools to describe, manage, structure and publish all kind of information & research results concerning the cultural heritage in general and the humanities studies in particular.

3. To offer a coordination between existing repositories and develop a set of services to make accessible and moveable research results within a new perspective of authorship and intellectual property using a generalized standard model of metadata description.

1. Actors, objectives, state of the art and current user

group (b).

Page 5: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

1. Actors, objectives, state of the art and current user

group (c).

1. Pinakes 3.0 Base Edition that includes:

a) PK Schema and Project administration Alpha version published in 2007 and available at the home page. The Beta version will be published within April 2008.

b) PK dynamic Input interface Alpha Version published in 2007 and available at the home page. The Beta version will be published within April 2008

c) PK Text experimental version. The Alpha version will be available within April 2008.

2. Pinakes 3.0 Base Edition documentation and code is published since 2006 both on the home page http:// pinakes.imss.fi.it and on SourceForge.org. Currently all code is visible and accessible also on the Italian National Observatory for Open Source at the www.cnipa.it (Public Administration main home page).

Page 6: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

1. Actors, objectives, state of the art and current user

group (d).1. All Pinakes 2.0 projects published on the web since 1996 and

visible on the web at the address: www.pinakes.org will be brought into the current version. Among them already:a) Panopticon Lavoisier (Works and Life of )b) Parnassus Scientiarum (The Waller Collection)c) Theatre of Nature: works and life of Ulisse Aldovandi Are already transferred and undergoing a significant test.

2. Candidates that have submitted a cooperation act are:a) National Edition of G. Galilei including Iconography and

scientific instruments - IMSS/MIBACb) Work of Dante Alighieri - SDIc) Liz - Letteratura Italiana Zanichelli from 1350 - 1920

University of Romed) Uffizi Library - Florencee) Gabinetto Viessieux - Palazzo Strozzi, Florencef) University of Siena - All archeological excavation, research

data sets concerning Tuscany.g) The Medieval Philosophical Texts and manuscirpt catalogue -

SISAL, Fodazione Franceschini, Florence

Page 7: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

1. Actors, objectives, state of the art and current user

group (d continued).

Since end 2007 and 2008 a new set of tools, produced in cooperation with other research bodies, will be included in Pinakes 3.0 OSI package. Among them we can list:

• The morpho-syntactical analyzer engine produced within in the cooperation activities of “Padre Brusa Center”, Universita´Cattolica il Sacro cuore, Milan, by Marco Passarotti - a TreeBank application - and the computational linguistic activities of the Karl University of Prague (Dep. of IT & Linguistics). This model will be crossed with that of TigerSearch by the University of Stuttgart and currently used from the unit of Greek Language Analysis at the Universita´ Ca´Foscari, Venezia, by Citti´s group.

• The applications resulting from a EU Project called “Beyond Text” which cross the morpho-syntactical analysis with the pattern recognition method in order to make also non-natural languages searchable (see ahead in this presentation).

Page 8: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

2. PKA: Powered Knowledge Architecture: overall view and generalistic function of all modules (a).

Pinakes Text and Submodules

Schema AdminDefines the KW

domainDefines object relationships

Dynamic Input ApplicationDefines the data

Sets in relation objects

General BrowserPublishes dataFrom PKMain

Fom PK Text & Sub Modules

PKMain

PKA - Architecture orverviewSingle Node of Pinakes web servicies

Manages (n) projectsCan be seen/referred to by other nodes

Page 9: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

2. PKA: Powered Knowledge Architecture: overall view and generalistic function of all modules (b).

TreeBank & BeyondText EngineMorpho-Synt. Analyz.

Non-natural lang. Analyz.

TEI Schema AdminDefines the data granularityDefines Text relationships

Defines Text Indices

Dynamic Input ApplicationUses defined Tag/s

Sets relations on Text UnitsSets relations: Txt->Img

Query interfacePublishes dataFrom PKMain

Fom PK Text & Sub Modules

Pinakes Text

Page 11: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

Pinakes3

Text Browser Search Engine

PinakesText

DB - Tag + position storage

Input application XMl/TEI Loader

CVS Repository of digital objects

4. PK Text: Methods and functionalities and subset modules (a).

Page 12: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

4. PK Text: Methods and functionalities and subset modules (b). Main working interface

Page 13: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

4. PK Text: Methods and functionalities and subset modules (c). Text variants finder

Page 14: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

4. PK Text: Methods and functionalities and subset modules (d). Word and position finder

Page 15: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

4. PK Text: Methods and functionalities and subset modules (e). Image selection, texts selection, tagging

and tag qualification

Page 16: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

4. PK Text: Methods and functionalities and subset modules (f).Tag menu of sources and word values

Page 17: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

4. PK Text: Methods and functionalities and subset modules (g).View of all transcription and variants with

tag included

Page 18: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

PK Text: Methods and functionalities and subset modules (h).Sample of a TreeBank search/analytic result

-syntactic tdependence treeof a sentence of

the Index Thomisticus-annotation tool by : TrEd (UFAL – Praga)-on top: the sentence- to each word of the sentence corrisponds a node in the tree and to each word is accociated a lemma (vd. Producit/produco) and a number of Tags concerning the attibute time, gender, number etc.to each word is associated a “afun” (analytic function) : Obj, Sb, etc.

Page 19: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

PK Text: Methods and functionalities and subset modules (i).Sample of a TreeBank NetGragh

Page 20: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

Towards a possible co-ordination act for a European Consortium in Digital Humanistic & Linguistic Scholarship.

A list key points to be discussed (a)

1. Why a consortium? Current repositories & projects:

a. The EU attempts and the misleading effect;b. For who and which way;c. The modeling methods and data re-factoring;d. The languages and linguistic issues;e. Statistic data: use, distribution, stand alone repositories

2. How to share and why sharing. Target and needs of the research activityand production:

a. Interoperability and intellectual property standardsb. Re-use and new data.c. Short term and long term targets and sustainability.d. Co-operative work and intersecting knowledge snowball effect.

Technological knowledge and practical issues.

Page 21: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

Towards a possible co-ordination act for a European Consortium in Digital Humanistic & Linguistic Scholarship.

A list key points to be discussed (b)

1. Foundational members and institutional references:

a. Institutionsb. Research groupsc. Projectsd. Individuals

2. Standards to be reached, their specialization and functional requirements:

a. DCI, TEI, OWL, RDF and raw data UTF-8b. Relation between MARC21, XMLMARC, DCI/TEI and non-conform

data.c. Standard EU agencies (forthcoming)d. Security, durability and restoring data

3. Possible infrastructure: logical and physical structure/s.

Page 22: Sharing repositories for editorial scholarship of digital texts. The Pinakes 3.0 Open Source Project  Andrea Bozzi - CNR-ILC,

Towards a possible co-ordination act for a European Consortium in Digital Humanistic & Linguistic Scholarship.

A list key points to be discussed (c)

1. Possible services and actors.

a. Actors directory and OAI indexes;b. Services and Tools directory;c. Activity reports and new perspectives.