![Page 1: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/1.jpg)
Special applications for Digital Libraries:computer-aided philological and linguistic
analysis of digital documents
Istituto di Linguistica Computazionale – Pisa
Andrea Bozzi
NEH/CNR MeetingWashington DCOctober 5, 2007
![Page 2: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/2.jpg)
Presentation contents
1. An EU supported system for Greek papyrology
2. A special application for browsing and searching demotic documents on ostraka;
3. A philological workstation for digital medieval manuscripts;
4. CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;
5. How to integrate all these modules in a web-based open source application.
![Page 3: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/3.jpg)
Presentation contents
1. An EU supported system for Greek papyrology
2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;
3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;
4. CHLT-LEMLAT (EC-NSF project)CHLT-LEMLAT (EC-NSF project) to perform to perform lemmatization of Latin texts;lemmatization of Latin texts;
5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.
![Page 4: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/4.jpg)
The philological workstation: image and text transcription
![Page 5: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/5.jpg)
Image segmentation and semi-automatic word linking
![Page 6: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/6.jpg)
Annotations and critical apparatus
![Page 7: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/7.jpg)
Wordforms list and specific indexes
![Page 8: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/8.jpg)
The web philological workstation to manage documentsof the Istituto Papirologico Vitelli in Florence (restricted use)
![Page 9: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/9.jpg)
Presentation contents
Andrea Bozzi
NEH/CNR Meeting, Washington October 5, 2007
1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology
2. A special application for browsing and searching demotic documents on ostraka;
3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;
4. CHLT-LEMLAT (EC-NSF project) to perform CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;lemmatization of Latin texts;
5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.
![Page 10: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/10.jpg)
OMM 1381: E. Bresciani, S. Pernigotti, M.C. Betrò, Ostraka demotici da Narmuti, Pisa, 1983, pp. 16-18;
OMM 300: Gallo P., Ostraca demotici e ieratici dall’archivio bilingue di Narmouthis, Pisa, 1997, pp. 113-114;
OMM 393: R. Pintaudi, P.J. Sijpesteijn, Ostraka greci da Narmuthis, Pisa, 1993, p. 40.
Special system for teaching and retrieving linguistic information from demotic texts on ostraka
![Page 11: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/11.jpg)
L’archivio delle immagini digitali e la tabella dei segni demotici
![Page 12: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/12.jpg)
Research results:see the blue parts(arrow) where the selected symbolhas been found
![Page 13: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/13.jpg)
Presentation contents
Andrea Bozzi
NEH/CNR Meeting, Washington October 5, 2007
1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology
2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;
3. A philological workstation for digital medieval manuscripts;
4. CHLT-LEMLAT (EC-NSF project) to perform CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;lemmatization of Latin texts;
5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.
![Page 14: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/14.jpg)
Textual criticism for medieval manuscripts
Link to the listof collatedsources
![Page 15: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/15.jpg)
Selection ofthe variant eixens
Evaluation of the variant reading in the collated source
![Page 16: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/16.jpg)
Recording of thevariant Eixensin theCritical apparatus
![Page 17: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/17.jpg)
Variants search in different ancient printed editions of the same work
Link to the listof collatedbooks
![Page 18: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/18.jpg)
Image of the corresponding page
![Page 19: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/19.jpg)
Presentation contents
Andrea Bozzi
NEH/CNR Meeting, Washington October 5, 2007
1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology
2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;
3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;
4. CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;
5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.
![Page 20: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/20.jpg)
Lemmatization results(C. Sallustius Crispus, De coniuratione Catilinae, 1-2)
![Page 21: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/21.jpg)
Lemmatization results of selected wordforms
![Page 22: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/22.jpg)
Presentation contents
Andrea Bozzi
NEH/CNR Meeting, Washington DCOctober 5, 2007
1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology
2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;
3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;
4. CHLT-LEMLAT (EC-NSF project) to perform CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;lemmatization of Latin texts;
5. How to integrate all these modules in a web-based open source application.
![Page 23: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/23.jpg)
Pinakes 3.0http://pinakes.imss.fi.it
• Aim: web-based open source application to manage cultural heritage historical data in digital format.
• Partners:– Fondazione Rinascimento Digitale, Florence;– Istituto e Museo della Storia della Scienza,
Florence;– Ministero per i Beni Culturali, Rome– CNR, Istituto di Linguistica Computazionale, Pisa
![Page 24: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/24.jpg)
Technology
– Programming language: JAVA (Jdk1.5)– Servlet Engine: Tomcat 5.5.x + Apache HTTP
Connectors.– Web server: Apache httpd server 2.2.x.– Web Applications Framework: Jakarta Struts– Web Service Framework: Apache Axis 1.4– Database Engine: Postgres 8.1– Programming environment: NetBeans 5.5.1.– Final development: Hibernate 3.2.5.
![Page 25: Istituto di Linguistica Computazionale – Pisa Andrea Bozzi](https://reader035.vdocuments.site/reader035/viewer/2022062309/56815810550346895dc57e52/html5/thumbnails/25.jpg)
Standards
• DCMI (Dublin Core Metadata Initiative)• TEI (Text Encoding Initiative)• OWL (Ontology Web Language)• RDF-XML (Resource Description Framework)• SPARQL (Query Language fo RDF)
• UTF8 (Unicode Transformation Format).