econnect wp1 & semantic issues vu members –guus schreiber, antoine isaac, jacco van...
TRANSCRIPT
EConnect WP1 & semantic issues
• VU members– Guus Schreiber, Antoine Isaac, Jacco van
Ossenbruggen, Jan Wielemaker
EConnect WP1
• Creation of semantic layer• Alignment of vocabularies in that layer• Technical specifications of semantics-based
operations• Integration with operational services
EDL D2.5 Requirements
• Contextualization of works using knowledge organization systems
Requirements
• Contextualization of works using knowledge organization systems
• Reminder: SKOS allows to represent cross-vocabulary links
Important enriching challenges
Fine vision, but this requires:• Identifying simple values with more complex
objects, from outside the original context• Mapping knowledge organization systems
together• Recognizing relation between surrogates • Including when they represent the same work
Important enriching challenges
• Finding good vocabularies for enriching and aligning
• Subject vocabularies or authority files richly structured, with appropriate coverage of a domain
• Those is likely to be specific to a domain– At the level of aggregators?
The EDM, and then?
• That model alone is not enough
• It fits some precise data modeling and access needs
• But it does not commit much to specific domain or application requirements– Remember: it's a feature!
Two steps for flexible and useful knowledge representation
• Fitting domain via specialization
• Cf Martin: Cross-model integration via property specialization
• ens:isAbout > dc:subject > rma:depicts• skos:broader > ex:broaderPartitive
Two steps for flexible and useful knowledge representation
• Fitting domain via specialization– Cf Martin: Cross-model integration via property
specialization
– ens:isAbout > dc:subject > rma:depicts
– skos:broader > ex:broaderPartitive
• Someone has to take care of this:– Europeana? Content providers? Aggregators?
– Cf. the process devised for ESE
Two steps for flexible and useful knowledge representation
• Fitting application requirements
• Art of creating shortcuts in the representations• New application-specific properties as views over
complex paths– surrogateMatch
– integrating all views
This is important, and related to how Europeana will exploit the EDM
• What Jan is telling in the room above
• Several options for considering semantic services for Europeana
• Pre-processing query– Eg autocompletion using semantic networks
• Parallel processing of query
At the end of parallel processing
DisambiguationDisambiguation
RelationsRelations
VocabulariesVocabularies
This is important, and related to how Europeana will exploit the EDM
• What Jan is telling in the room above
• Several options for considering semantic services for Europeana
• Pre-processing query– Eg autocompletion using semantic networks
• Parallel processing of query• Post-processing of query
– E.g., clustering
ClioPatria: “Matisse”“Matisse” in the
title“Matisse” in the
title
Located in“Musee Matisse”
Located in“Musee Matisse”
Created by“Matisse”
Created by“Matisse”
Paintings in the same style as
used by “Matisse”
Paintings in the same style as
used by “Matisse”
What Jan is telling (c'ed)
• Semantic search is oriented towards serendipity• Great, but there are scalability problems• Standing in the path of the operational system?
– Not really recommended…
• Still allows for parallel and maybe post-processing– for scenarios where user can cope with rich information
What Jan is telling (c'ed)
• Other solution?– Like, more basic stuff!
• Well, we have a schema that presents quite detailed distinctions, let's make it work…
Derived properties as a way to "index" derived relations
• Complex paths are expansive to query• Shortcuts are useful
• Example: searching for "Everything inspired from Leonardo's work"
In the original EDM-compliant graph
Derived properties as a way to "index" derived relations
• Having the value "Leonardo" somehow directly attached to the surrogate of MonaLisa2000 would be handy– As well as other languages for Da Vinci
• In fact this can be used for enriching the (XML) records before they get indexed in the Europeana operational service
Compiling for traditional text-search
EDMEDM SemanticEngineSemanticEngine
XMLDumpXMLDump LuceneLucene
Determining pre-compilation strategies
• What should a pre-compiled, enriched record should contain?– Labels? closely-related concepts? labels from other
languages?
– Which short-cuts are relevant? Which are the most useful?
Determining pre-compilation strategies
• What should a pre-compiled, enriched record should contain?– Labels? closely-related concepts? labels from other languages?
– Which short-cuts are relevant? Which are the most useful?
• Coming with appropriate ways to make the schema work
• Maybe several profiles can be used – Cf the way the different elements of the ESE are used for different
Europeana features (timeline, advanced search, basic search)
• This is also semantics!
• But highly dependent from applications
Guidelines and best practices will be handy
• Connecting specialized data models to more generic ones
• Enrichment• Connection of objects (identity conditions)• "Practical" application-specific semantics