jan christoph meister university of hamburg
TRANSCRIPT
![Page 1: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/1.jpg)
Jan Christoph MeisterUniversity of Hamburg
www.catma.de
![Page 2: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/2.jpg)
CATMA - an integrated textual markup and analysis tool
29.10.2012 2CLARIN's Turn Towards The Literary Text
![Page 3: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/3.jpg)
Text vs. sentence, or: What‘s so different about processing texts?• structural complexity: min TEXT > 2 (SENTENCE)
• structural activity: TEXT processing actualizes paradigmatic cross-reference across sentences
• structural dynamic: TEXT processing represents & simulates cognitive and empirical processes
29.10.2012 CLARIN's Turn Towards The Literary Text 3
TEXT yields more INTERPRETATIONS than SENTENCE
+CONTINGENCY: The more complex & dynamic structure, when activated during processing, results in a higher degree of contingency in functional „outcome“
![Page 4: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/4.jpg)
The what and why of MarkUp procedural, descriptive & discursive
function
• discursive markup: enables human readers to interpret a text and to explore its hermeneutic potential in collaboration „What might this text mean to us?“
• declarative markup: informs a human reader how to process a text as a communicative device „How is this text put together and how does it function in its communicative universe?“
• procedural markup: instructs a (natural or artificial) text processor how to handle a text as a structured character string „What is the correct operation to perfom on this input?“
29.10.2012 4CLARIN's Turn Towards The Literary Text
performative function
discursive function
![Page 5: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/5.jpg)
Hermeneutic „must haves“ of discursive markup
facilitate collaboration & non-deterministic annotation
allow for multiple markup allow for overlap allow for concurrent tagging
conceptualize markup as dynamic & recursive
allow for extensibility allow for multiple (and even contradictory) markup seamlessly integrate markup and analysis & support the hermeneutic loop
29.10.2012 5CLARIN's Turn Towards The Literary Text
![Page 6: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/6.jpg)
MarkUp types & data models
29.10.2012 CLARIN's Turn Towards The Literary Text 6
There is no such thing as “no-mark up”. (Coombs, Renear, DeRose 1987)
opaqueimplicit
<SentenceStart>There</SentenceStart> is no such thing as “no-mark up.”
linearinline, deterministic
<SentenceStart><Adverb>There</Adverb></SentenceStart> is no such thing as “no-mark up”.
nested inline,deterministic sequential
There is no such thing as ”no-mark up”.
<1,5, word class = “Adverb”><1,5, segment = “SentenceStart”><1,5, POS = “verb phrase element”>
relationalstand off, descriptive
<1,5, word class = “Adverb”><1,38, speech act = “declaration”><1,11, POS = “verb phrase”>
There is no such thing as “no-mark up”.
<1,5, word class = “Preposition”><1,5, segment = “SentenceStart”><1,8, POS = “noun phrase”> network
stand off, discursive
![Page 7: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/7.jpg)
Implementation in CATMA
29.10.2012 7CLARIN's Turn Towards The Literary Text
www.catma.de
![Page 8: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/8.jpg)
The CATMA/CLÉA approach to markup
text range based model a tag references a text range with a start and an
end offset external standoff markup
markup is stored in external files or data bases to facilitate tagging and exchange of markup by multiple users
markup is stored in a standoff manner to allow overlapping
markup tolerates non-deterministic tagging & supports analytical operations that exploit semantic ambiguity
29.10.2012 8CLARIN's Turn Towards The Literary Text
![Page 9: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/9.jpg)
Example for overlapping markup in CATMA
29.10.2012 CLARIN's Turn Towards The Literary Text 9
(NB: In CATMA tag sets can be imported/exported; tags can be created / manipulated ad hoc during mark up)
![Page 10: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/10.jpg)
TEI feature structure tag declaration & overlapping markup
<fs xml:id="CATMA_d7251f99-14e9-4c36-8ff7-24058ae81ce5" n="1_7985fdf0-77a5-4060-9a3d-2d977e0ab954" type="catma_tag">
<f xml:id="CATMA_aa9b3727-187e-4fb8-9990-e7880912a409" name="catma_tagname">
<string>Keynote_speaker&affiliation</string>
</f>
<f xml:id="CATMA_564825ba-28b2-4dab-b136-b87c8a3d9e28" name="catma_displaycolor">
<numeric value="-13421569"/>
</f>
</fs>
29.10.2012 CLARIN's Turn Towards The Literary Text 10
<ptr target="Abstracts.doc#range( /.21736, /.21888)" type="inclusion"/>
<seg ana="#CATMA_0a252cc2-96d2-4ed4-8fb8-52380550ec0b #CATMA_d7251f99-14e9-4c36-8ff7-24058ae81ce5 #CATMA_8513fe2d-2e35-4d0a-a3a2-07528bcfa012">
![Page 11: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/11.jpg)
Question 1: How can we model a collaborative mark up practice?
29.10.2012 CLARIN's Turn Towards The Literary Text 11
![Page 12: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/12.jpg)
Answer 1: CATMA’S “n-meta-data set to-1 object data instance”-model
29.10.2012 12CLARIN's Turn Towards The Literary Text
meta-data•procedural•declarative•hermeneutic
object-data
![Page 13: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/13.jpg)
Question 2: But how, on top of that, can we also model the recursive routines that characterize the humanistic workflow?
29.10.2012 CLARIN's Turn Towards The Literary Text 13
![Page 14: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/14.jpg)
Example for recursion: a simple querie across the object data/meta data divide
29.10.2012 CLARIN's Turn Towards The Literary Text 14
Step 1: object data querie
Step 2: refinement by adding ...
... an additional meta-data constraint
![Page 15: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/15.jpg)
... which is why (reg="\b\S*\Qez\E(?=\W)") where (tag="Keynote_speaker&affiliation") generates this:
29.10.2012 CLARIN's Turn Towards The Literary Text 15
![Page 16: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/16.jpg)
Answer 2: CATMA’S dynamic data model, e.g. (n meta-data set to 1 object instance)>n+1
29.10.2012 16CLARIN's Turn Towards The Literary Text
meta-data•procedural•declarative•hermeneutic
object-data
object-data
![Page 17: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/17.jpg)
Question 3: How can we implement this practice in a system?
29.10.2012 CLARIN's Turn Towards The Literary Text 17
![Page 18: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/18.jpg)
Answer 3: Call the big sister – CLÉA!
29.10.2012 CLARIN's Turn Towards The Literary Text 18
CLÉA Data Base Model
![Page 19: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/19.jpg)
CATMA/CLÉA: User and resource administration
29.10.2012 CLARIN's Turn Towards The Literary Text 19
![Page 20: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/20.jpg)
Manage corpora & source documents, markup collections and tag libraries
29.10.2012 CLARIN's Turn Towards The Literary Text 20
![Page 21: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/21.jpg)
Annotate texts or corpora using pre-defined or ready-made tags
29.10.2012 CLARIN's Turn Towards The Literary Text 21
![Page 22: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/22.jpg)
Build and execute queries on source text & tags, or any combination thereof
29.10.2012 CLARIN's Turn Towards The Literary Text 22
![Page 23: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/23.jpg)
Visualize results
29.10.2012 CLARIN's Turn Towards The Literary Text 23
![Page 24: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/24.jpg)
What’s in it for CLARIN?
• Import any text or corpus into CATMA/CLÉA• Run standard analytical procedures automatically
or inter actively on upload (indexing, POS tagging etc.)
• Annotate and analyse texts or corpora collaboratively
• Share and export markup from the CATMA/CLÉA data base in multiple formats
CLÉA = Collaborative Literature Éxploration and Annotation
29.10.2012 CLARIN's Turn Towards The Literary Text 24
![Page 25: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/25.jpg)
29.10.2012 CLARIN's Turn Towards The Literary Text 25
Mille grazie to my CATMA/CLÉA development team
• Evelyn Gius• Malte Meister• Marco Petris• Lena Schüch
and to our funders
• University of Hamburg (2009)• Google DH Awards (2010-2013)• BMBF (2013-2016)
![Page 26: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/26.jpg)
Tag definition
<fsDecl xml:id="CATMA_TAG_ID_1"
type="test"
baseTypes="catma_tag">
<fsDescr>test - Test Tag</fsDescr>
<fDecl xml:id="CATMA_TAG_DEF_1_PROP_1"
name="catma_displaycolor"
optional="false">
<vRange><numeric value="-13408513"/></vRange>
</fDecl>
<fDecl xml:id="CATMA_TAG_DEF_1_PROP_2" name="user_defined_test_property"
optional="false">
<vRange><string/></vRange>
</fDecl>
</fsDecl>
each Tag can haveadditional user defined properties
each Tag has a type
each Tag has a color
29.10.2012 26CLARIN's Turn Towards The Literary Text
![Page 27: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/27.jpg)
Tag instance
<fs xml:id="CATMA_TAG_INSTANCE_1" type="test">
<f xml:id="CATMA_PROPERTY_1_1" name="catma_displaycolor">
<numeric value="-13408513"/>
</f>
<f xml:id="CATMA_PROPERTY_1_2" name="user_defined_test_property">
<string>instance specific test value</string>
</f></fs>
a Tag instance can have individual values for the user defined properties
each Tag instance is of a type
29.10.2012 27CLARIN's Turn Towards The Literary Text
![Page 28: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/28.jpg)
Tag referencing
<seg ana="#CATMA_TAG_INSTANCE_1">
<ptr target="mytext_utf8.txt#char=36168,36185" type="inclusion"/>
</seg>
The content of a range is referenced by a pointer to an external entity.
The URI is based on the RFC 5147 for pointing to plain text.
29.10.2012 28CLARIN's Turn Towards The Literary Text
![Page 29: Jan Christoph Meister University of Hamburg](https://reader036.vdocuments.site/reader036/viewer/2022070414/5697c0131a28abf838cccd92/html5/thumbnails/29.jpg)
Potential problems and possible solutions
referencing ranges based on character offsets are vulnerable to modifications of the content• possible solution: automated adjustments with
checksums and context information, and• track versioning and revision history in the source
document header
the encoding of the tags is machine readable but not interoperable out of the box possible solution: defining the feature structure
encoding of tags in terms of the open annotation framework
29.10.2012 29CLARIN's Turn Towards The Literary Text