semantic publishing of digital manuscripts · ferdinand de saussure (1857-1913) ... text documents...
TRANSCRIPT
Knowledge Engineering Models and Tools for the Digital Scholarly Publishing
of Manuscripts Semantic Web for the Digital Humanities
Sahar Aljalbout, Giuseppe Cosenza, Luka Nerima, Gilles Falquet
1
Cultural Heritage Data
Unexploited rich digital corpora
A model to deal with intensive knowledge tasks in the field of Scientific Manuscripts
2
Massive Corpora
Ferdinand de Saussure (1857-1913)
• Linguist: General Linguistics, Comparative Grammar and Social Sciences
• Few Publications : Never published in General
Linguistics • Course in General Linguistics: 1916 based on the
notes of his students
• Philological Studies on CLG
3
50 000 handwritten pages: libraries of Geneva, Paris and Harvard.
4
Notes de linguistique générale Bibliothèque de Genève Archive de Saussure 372/5 f 171
Notes de linguistique générale Bibliothèque de Genève Ms Fr 3951/10 ff 31v-32
Handwritten by Saussure’ student
Handwritten by Saussure
Organization based on the archivist criteria: General thematic and arrival order Thematic Categorization required: Scientific thematization
Thematic: Langage Thematic: Histoire de la linguistique
Chronological order and text order problems
5
4. Text order problem
1. Archivist classification: “notes pour un article sur Whitney”
2. Thematic classification
3. Date and chronological problem
Saussurian’s Needs
• Retrieving and Accessing MSS visualization, thematic classification plan
• Understanding unsettled/evolutionary terminology
• Dating writing date, chronological order of MSS
• Publishing disclose the author’s work 6
How to transform the humanists’ researchers’needs into a digital model?
7
How should the digital model be structured?
Requirements Interconnected model of manuscripts, transcriptions, and domain knowledge related to the
manuscripts content Visualization system for this interconnected model
Semantic Infrastructure for Scientific Manuscripts
1. Text Documents
2. Knowledge Resources
3. Linking Structure
8
Text Documents 1. Manuscripts one or more
handwritten pages…
9
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
2. Transcriptions as exact as possible copies of the MSS
3. Scientific documents books, articles…
Knowledge Resources: Ontologies Formal specification of a shared conceptualization
10
Locations
…
Events
…
Persons
…
Ontology of Events
…
???
Ontology of Persons
Ontology of Locations
…
…
Alignment
Knowledge Resources: Terminologies and Taxonomies
1. Terminologies different representation of technical terms used by Saussure
Evolution of Saussure’s term
11
Researches
…
Comparative linguistics
… …
General Linguistics
…
Ablaut … Langage Langue Parole … Zèro
Classical Terminology
2. Taxonomies Thematic Categorization of the MSS
Langage Langue
… …
Langage Langue Parole
… …
Langage La Langue
Les langues Parole
…
Evolutionary Terminology
Time Line
Linking Structure: Document Links 1. Internal Document Links (thematic similarity, contradictions…)
12
Manuscripts
Transcriptions kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
Scientific Documents
2. External Document Links
1.1. Manuscript-Manuscript 1.2. Transcription-Transcription 1.3. ScientificDocument-ScientificDocument
2.1. Manuscript-Transcriptions (has transcription) 2.2. Transcriptions-ScientificDocument
(Quotes)
Linking Structure: Documents-to-KnowledgeResources
Ont_events
Ont_Persons
Ont_Places
Evol. Term.
Clas. Term.
Taxonomy
Documents Knowledge Resources
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
kgsghgkjhgghhgdgkjhakghjhg
Provided Services 1. Adding Manuscripts and Transcriptions
2. Importing Knowledge Resources handle different formal languages (OWL, SKOS, RDFS)
3. Semantic Indexing of texts with Multiple Ontologies
4. Creation and computation of semantic links thematic similarity
5. Spatial and Temporal reasoning tag each MSS with the inferred time and space
6. Generation of derived documents selecting elements of the knowledge base to produce a new document
7. Knowledge Base Navigation and Retrieval interactive hypertext links, concept based navigation
15
Examples of possible use
1. Semantic Indexing of a Transcription
2. Terminology Loading and Navigation in the Knowledge Base
16
Semantic Indexing of a Transcription
17
Section1 Jusqu’ici l’homme n’a encore fait usage ni de sa langue, ni de son palais ni de ses dents, et c’est à l’aide de ces instruments qu’il arrivera en dernier lieu au son dental, le plus compliqué des trois. (Essai, 1874) Section2 Donc la langue est: un ensemble de conventions nécessaires adoptées par le corps social pour permettre l’usage de la faculté du langage chez les individus. La faculté du langage est un fait distinct de la langue mais qui ne peut s’exercer sans elle. (II cours de linguistique générale, 1908-1909, Riedlinger notes)
C1
Langue (organe)
[…] quel que soit d’ailleurs le point (alvéolaire ou palatal) vers lequel la langue est dirigée […]
[…]
C2
[…] Nous pouvons dire que le langage se manifeste toujours au moyen d’une langue; il est inexistant sans cela. […]
[…] La langue pour nous ce sera le produit social dont l’existence permet à l’individu l’exercice de la faculté du langage […]
Langue (système de signes)
<fds:3.1-langue>
<fds:3.2-langue>
Terminology Transcriptions
Ref
Ref
Implementation with Semantic Web Techniques
Adopted technologies • 3 databases: image db, triple store for manuscripts and transcriptions and
derived documents db • Data stored as RDF triples in OpenRDF sesame triple store
Corpus Acquisition • 50 000 Handwritten Pages/ 5000 Transcriptions • MSS images from Geneva’s library • Contextual Knowledge extracted with collaboration of Saussurian experts (People , events and places ontologies)
19