semantic publishing of digital manuscripts · ferdinand de saussure (1857-1913) ... text documents...

20
Knowledge Engineering Models and Tools for the Digital Scholarly Publishing of Manuscripts Semantic Web for the Digital Humanities Sahar Aljalbout, Giuseppe Cosenza, Luka Nerima, Gilles Falquet 1

Upload: vohanh

Post on 16-Feb-2019

222 views

Category:

Documents


0 download

TRANSCRIPT

Knowledge Engineering Models and Tools for the Digital Scholarly Publishing

of Manuscripts Semantic Web for the Digital Humanities

Sahar Aljalbout, Giuseppe Cosenza, Luka Nerima, Gilles Falquet

1

Cultural Heritage Data

Unexploited rich digital corpora

A model to deal with intensive knowledge tasks in the field of Scientific Manuscripts

2

Massive Corpora

Ferdinand de Saussure (1857-1913)

• Linguist: General Linguistics, Comparative Grammar and Social Sciences

• Few Publications : Never published in General

Linguistics • Course in General Linguistics: 1916 based on the

notes of his students

• Philological Studies on CLG

3

50 000 handwritten pages: libraries of Geneva, Paris and Harvard.

4

Notes de linguistique générale Bibliothèque de Genève Archive de Saussure 372/5 f 171

Notes de linguistique générale Bibliothèque de Genève Ms Fr 3951/10 ff 31v-32

Handwritten by Saussure’ student

Handwritten by Saussure

Organization based on the archivist criteria: General thematic and arrival order Thematic Categorization required: Scientific thematization

Thematic: Langage Thematic: Histoire de la linguistique

Chronological order and text order problems

5

4. Text order problem

1. Archivist classification: “notes pour un article sur Whitney”

2. Thematic classification

3. Date and chronological problem

Saussurian’s Needs

• Retrieving and Accessing MSS visualization, thematic classification plan

• Understanding unsettled/evolutionary terminology

• Dating writing date, chronological order of MSS

• Publishing disclose the author’s work 6

How to transform the humanists’ researchers’needs into a digital model?

7

How should the digital model be structured?

Requirements Interconnected model of manuscripts, transcriptions, and domain knowledge related to the

manuscripts content Visualization system for this interconnected model

Semantic Infrastructure for Scientific Manuscripts

1. Text Documents

2. Knowledge Resources

3. Linking Structure

8

Text Documents 1. Manuscripts one or more

handwritten pages…

9

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

2. Transcriptions as exact as possible copies of the MSS

3. Scientific documents books, articles…

Knowledge Resources: Ontologies Formal specification of a shared conceptualization

10

Locations

Events

Persons

Ontology of Events

???

Ontology of Persons

Ontology of Locations

Alignment

Knowledge Resources: Terminologies and Taxonomies

1. Terminologies different representation of technical terms used by Saussure

Evolution of Saussure’s term

11

Researches

Comparative linguistics

… …

General Linguistics

Ablaut … Langage Langue Parole … Zèro

Classical Terminology

2. Taxonomies Thematic Categorization of the MSS

Langage Langue

… …

Langage Langue Parole

… …

Langage La Langue

Les langues Parole

Evolutionary Terminology

Time Line

Linking Structure: Document Links 1. Internal Document Links (thematic similarity, contradictions…)

12

Manuscripts

Transcriptions kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

Scientific Documents

2. External Document Links

1.1. Manuscript-Manuscript 1.2. Transcription-Transcription 1.3. ScientificDocument-ScientificDocument

2.1. Manuscript-Transcriptions (has transcription) 2.2. Transcriptions-ScientificDocument

(Quotes)

Linking Structure: Knowledge Resource Links

13

Onto_Persons

Onto_Locations

Onto_Events

Linking Structure: Documents-to-KnowledgeResources

Ont_events

Ont_Persons

Ont_Places

Evol. Term.

Clas. Term.

Taxonomy

Documents Knowledge Resources

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

kgsghgkjhgghhgdgkjhakghjhg

Provided Services 1. Adding Manuscripts and Transcriptions

2. Importing Knowledge Resources handle different formal languages (OWL, SKOS, RDFS)

3. Semantic Indexing of texts with Multiple Ontologies

4. Creation and computation of semantic links thematic similarity

5. Spatial and Temporal reasoning tag each MSS with the inferred time and space

6. Generation of derived documents selecting elements of the knowledge base to produce a new document

7. Knowledge Base Navigation and Retrieval interactive hypertext links, concept based navigation

15

Examples of possible use

1. Semantic Indexing of a Transcription

2. Terminology Loading and Navigation in the Knowledge Base

16

Semantic Indexing of a Transcription

17

Section1 Jusqu’ici l’homme n’a encore fait usage ni de sa langue, ni de son palais ni de ses dents, et c’est à l’aide de ces instruments qu’il arrivera en dernier lieu au son dental, le plus compliqué des trois. (Essai, 1874) Section2 Donc la langue est: un ensemble de conventions nécessaires adoptées par le corps social pour permettre l’usage de la faculté du langage chez les individus. La faculté du langage est un fait distinct de la langue mais qui ne peut s’exercer sans elle. (II cours de linguistique générale, 1908-1909, Riedlinger notes)

C1

Langue (organe)

[…] quel que soit d’ailleurs le point (alvéolaire ou palatal) vers lequel la langue est dirigée […]

[…]

C2

[…] Nous pouvons dire que le langage se manifeste toujours au moyen d’une langue; il est inexistant sans cela. […]

[…] La langue pour nous ce sera le produit social dont l’existence permet à l’individu l’exercice de la faculté du langage […]

Langue (système de signes)

<fds:3.1-langue>

<fds:3.2-langue>

Terminology Transcriptions

Ref

Ref

Terminology Loading and Navigation in the Knowledge Base

18

Implementation with Semantic Web Techniques

Adopted technologies • 3 databases: image db, triple store for manuscripts and transcriptions and

derived documents db • Data stored as RDF triples in OpenRDF sesame triple store

Corpus Acquisition • 50 000 Handwritten Pages/ 5000 Transcriptions • MSS images from Geneva’s library • Contextual Knowledge extracted with collaboration of Saussurian experts (People , events and places ontologies)

19

Conclusion

• Infrastructure for the storage, semantic enrichment, visualization, and publication of a corpus of scientific manuscripts.

• Implementation with Semantic Web techniques • Infrastructure applied to Ferdinand de Saussure’s work but can be

used on any corpus of manuscripts

20