![Page 1: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/1.jpg)
CODA – CATCHPlus Open Document Annotation
Hennie Brugman
OAC II Project Review meeting
Chicago – July 26-27, 2012
![Page 2: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/2.jpg)
Annotation context
• Audiovisual
– ASR, language, gesture, oral history
• Text – Semantic annotation
• Music – lyrics, music notation
• Linguistic Annotation – named entities
• Image annotation
• Programs: CATCH, CATCHPlus, CLARIN
![Page 3: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/3.jpg)
CODA main use cases
• Queen’s Cabinet (Henny van Schie/National Archive,
Lambert Schomaker/Univ Groningen)
– Line strip and word zone annotations
– ML: search in manuscript images
– Add Named Entity annotations
• Sailing Letters (Nicoline van de Sijs/Meertens +
consortium, Lambert Schomaker)
– Support manual annotation
– Line strip detection service
![Page 4: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/4.jpg)
2
![Page 5: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/5.jpg)
![Page 6: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/6.jpg)
Line annotation tools (catchplus)
![Page 7: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/7.jpg)
<txt>godefroit</txt>
<id>navis-SAL7316_0195-line-026
-y1=2094-y2=2317-zone-HUMAN
-x=1145-y=105-w=315-h=116
-unshear=0.0-version=ortho </id>
<user>mceunen</user>
<time>Wed Jan 26 16:37:01 2011</time>
![Page 8: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/8.jpg)
OAC representation ImageAnnotation TextAnnotations
hasBody
hasTarget
hasBody hasTarget
constrains constrains
constrains constrains
hasTarget hasBody
“Dit is een beschrijving van Den Haag. En dit is een tweede zin.”
cnt:chars
imageScan.jpg
ia:1
page:0
zone:2
line:1
Canvas1
ct:1
ct:2 cb:2
cb:1
ib:0
hasBody
linestrip.jpg ia:2
Named Entity
![Page 9: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/9.jpg)
OAC representation – Named Entities
ImageAnnotation TextAnnotations EntityAnnotation
hasBody hasTarget hasBody hasTarget hasTarget
hasTarget
hasBody constrains
constrains
constrains constrains
constrains constrains
hasTarget hasBody “Dit is een beschrijving van Den Haag. En dit is een tweede zin.”
“location” cnt:chars
cnt:chars imageScan.jpg
ia:1 ta:0
ta:2
ta:1
Canvas1
ct:1
ct:2
ct:3
ct:4
cb:2
cb:1
ib:0 ib:1
ea:1
! Annotation of annotations?
! Annotation of segments of inline text?
InlineTextConstraint: <rdf:Description rdf:about="urn:uuid:533624bb-d565-40ba-a14a-2e95c19c20df">
<rdf:type rdf:resource="http://www.openannotation.org/ns/ConstrainedTarget"/>
<constrains xmlns="http://www.openannotation.org/ns/"
rdf:resource="http://oas.dev.seecr.nl:8000/resolve/urn%3Auuid
%3Ad8741024-18bf-40a8-a648-2cd5ebb9acfd"/>
<constrainedBy xmlns="http://www.openannotation.org/ns/"
rdf:resource="urn:uuid:4f6b7d34-2329-4ab6-be89-a0feec9e7208"/>
</rdf:Description>
<rdf:Description rdf:about="urn:uuid:4f6b7d34-2329-4ab6-be89-a0feec9e7208">
<rdf:type rdf:resource="http://www.openannotation.org/ns/Constraint"/>
<rdf:type rdf:resource="http://www.catchplus.nl/annotation/InlineTextConstraint"/>
<rdf:type rdf:resource="http://www.w3.org/2008/content#ContentAsText"/>
<chars xmlns="http://www.w3.org/2008/content#">
"<textsegment offset="279" range="2"/>"</chars>
<characterEncoding xmlns="http://www.w3.org/2008/content#">
UTF-8</characterEncoding>
</rdf:Description>
![Page 10: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/10.jpg)
KdK-2-OAC conversion
• Implicit line and page text
• Word and line order
• Text offsets and ranges
• Spatial information
• Identifiers and ‘annotatability’
• Redundant text for searchability
! Need for explicit representation of Sequence?
! Search on text of ConstrainedTarget/Body?
![Page 11: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/11.jpg)
KdK2OAC conclusions
• Bidirectional mapping is possible
• Compatible with SharedCanvas model
• OAC + Canvas links everything together
• Implicit information made explicit
• Supports alternative text segmentations
• OAC representation is extremely verbose
! For many annotation tasks OA may be overkill
![Page 12: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/12.jpg)
![Page 13: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/13.jpg)
Open Annotation Service (OAS) • Upload annotation RDF using SRU/Update
• Inlines external text and XML Bodies and authors
• Indexes OA and DC properties
• Assigns resolvable http URIs and resolves those
• Implementation: RDF store icw Solr, production quality
software components (Meresco)
• Built-in OAI-PMH data provider and harvester for
‘annotation sets’
• Query: SRU/CQL, SPARQL, OAI-PMH
• Simple management dashboard (authentication and
authorization, collection management, harvesting)
• Easy installation and Open Source
! Model does not support Annotation “sets”
![Page 14: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/14.jpg)
OAS: issues
• Annotation publication
• Searchability: ‘harvest and index’
• Text search on external bodies
• Annotation boundaries
• ‘Bypassing’ oac:constrains
! In RDF, what are the boundaries of an annotation?
![Page 15: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/15.jpg)
![Page 16: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/16.jpg)
Entity Recognition service
service
frog
converter
URL or
text OAS resolve
source_text
FoLiA_document
URL
or ID
entity
annotations
![Page 17: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/17.jpg)
‘frog’ and FoLiA
• ‘Frog’ tool generates FoLiA XML document with
– Segmentation of text in paragraphs, sentences and words
(tokens) – XML hierarchy
– Part of speech, lemma, morphology, chunking, dependency
structure and named entities
• Mix of inline and standoff annotation
– ‘Frog’ does not keep track of character offsets
– Explicit ordering: numbering system in ids
• Trained for Dutch
• Widely used for Dutch corpora
• Made available by: ILK @ Tilburg University
![Page 18: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/18.jpg)
FoLiA-2-OAC conversion
• Reconstruct character offsets after tokenization
• Operates on inline text as published by OAS
• Construct and add entity text from tokens +
sequence (the+hague != hague+the)
• Two approaches
1. Minimal: extract entity annotations and tokens, and
convert to OAC
2. Maximal: full conversion to OAC
![Page 19: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/19.jpg)
Linguistic Annotation
! Mix-in domain semantics as subtypes/subproperties?
! Maximal OA mapping or embed linguistic standards?
! Layers, hierarchies (syntax) and Documents
! Sequence (e.g. entities, morpheme breakup)
![Page 20: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/20.jpg)
![Page 21: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/21.jpg)
Synchronized viewing client demo
• Demo/screenshot
![Page 22: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/22.jpg)
Summary of OA issues ! Annotation of annotations?
! Annotation of segments of inline text?
! Need for explicit representation of Sequence?
! Search on ConstrainedTarget/Body?
! For many annotation tasks OA may be overkill
! Model does not support Annotation sets
! In RDF, what are the boundaries of an annotation?
![Page 23: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/23.jpg)
Future work
• Finalize and integrate software (with web
services)
• Upgrade to new OA spec (incl OAS)
• Line strip detection web service
• Possible applications
– AV annotation in CATCHPlus
– Nederlab
![Page 24: CODA CATCHPlus Open Document Annotation · 2012-08-31 · CODA – CATCHPlus Open Document Annotation Hennie Brugman OAC II Project Review meeting Chicago – July 26-27, 2012 . Annotation](https://reader033.vdocuments.site/reader033/viewer/2022042306/5ed2102d9eb0885e0304a35a/html5/thumbnails/24.jpg)
Questions?