econnect wp1 & semantic issues vu members –guus schreiber, antoine isaac, jacco van...

EConnect WP1 & semantic issues

• VU members– Guus Schreiber, Antoine Isaac, Jacco van

Ossenbruggen, Jan Wielemaker

EConnect WP1

• Creation of semantic layer• Alignment of vocabularies in that layer• Technical specifications of semantics-based

operations• Integration with operational services

EDL D2.5 Requirements

• Contextualization of works using knowledge organization systems

Requirements

• Contextualization of works using knowledge organization systems

• Reminder: SKOS allows to represent cross-vocabulary links

Important enriching challenges

Fine vision, but this requires:• Identifying simple values with more complex

objects, from outside the original context• Mapping knowledge organization systems

together• Recognizing relation between surrogates • Including when they represent the same work

Important enriching challenges

• Finding good vocabularies for enriching and aligning

• Subject vocabularies or authority files richly structured, with appropriate coverage of a domain

• Those is likely to be specific to a domain– At the level of aggregators?

The EDM, and then?

• That model alone is not enough

• It fits some precise data modeling and access needs

• But it does not commit much to specific domain or application requirements– Remember: it's a feature!

Two steps for flexible and useful knowledge representation

• Fitting domain via specialization

• Cf Martin: Cross-model integration via property specialization

• ens:isAbout > dc:subject > rma:depicts• skos:broader > ex:broaderPartitive


• Fitting domain via specialization– Cf Martin: Cross-model integration via property

specialization

– ens:isAbout > dc:subject > rma:depicts

– skos:broader > ex:broaderPartitive

• Someone has to take care of this:– Europeana? Content providers? Aggregators?

– Cf. the process devised for ESE


• Fitting application requirements

• Art of creating shortcuts in the representations• New application-specific properties as views over

complex paths– surrogateMatch

– integrating all views

This is important, and related to how Europeana will exploit the EDM

• What Jan is telling in the room above

• Several options for considering semantic services for Europeana

• Pre-processing query– Eg autocompletion using semantic networks

• Parallel processing of query

At the end of parallel processing

DisambiguationDisambiguation

RelationsRelations

VocabulariesVocabularies

This is important, and related to how Europeana will exploit the EDM

• What Jan is telling in the room above

• Several options for considering semantic services for Europeana

• Pre-processing query– Eg autocompletion using semantic networks

• Parallel processing of query• Post-processing of query

– E.g., clustering

ClioPatria: “Matisse”“Matisse” in the

title“Matisse” in the

title

Located in“Musee Matisse”

Located in“Musee Matisse”

Created by“Matisse”

Created by“Matisse”

Paintings in the same style as

used by “Matisse”

Paintings in the same style as

used by “Matisse”

What Jan is telling (c'ed)

• Semantic search is oriented towards serendipity• Great, but there are scalability problems• Standing in the path of the operational system?

– Not really recommended…

• Still allows for parallel and maybe post-processing– for scenarios where user can cope with rich information

What Jan is telling (c'ed)

• Other solution?– Like, more basic stuff!

• Well, we have a schema that presents quite detailed distinctions, let's make it work…

Derived properties as a way to "index" derived relations

• Complex paths are expansive to query• Shortcuts are useful

• Example: searching for "Everything inspired from Leonardo's work"

In the original EDM-compliant graph

Derived properties as a way to "index" derived relations

• Having the value "Leonardo" somehow directly attached to the surrogate of MonaLisa2000 would be handy– As well as other languages for Da Vinci

• In fact this can be used for enriching the (XML) records before they get indexed in the Europeana operational service

Compiling for traditional text-search

EDMEDM SemanticEngineSemanticEngine

XMLDumpXMLDump LuceneLucene

Determining pre-compilation strategies

• What should a pre-compiled, enriched record should contain?– Labels? closely-related concepts? labels from other

languages?

– Which short-cuts are relevant? Which are the most useful?

Determining pre-compilation strategies

• What should a pre-compiled, enriched record should contain?– Labels? closely-related concepts? labels from other languages?

– Which short-cuts are relevant? Which are the most useful?

• Coming with appropriate ways to make the schema work

• Maybe several profiles can be used – Cf the way the different elements of the ESE are used for different

Europeana features (timeline, advanced search, basic search)

• This is also semantics!

• But highly dependent from applications

Guidelines and best practices will be handy

• Connecting specialized data models to more generic ones

• Enrichment• Connection of objects (identity conditions)• "Practical" application-specific semantics

econnect wp1 & semantic issues vu members –guus schreiber, antoine isaac, jacco van...

Documents

semantic services

processing of querye

specific domain

crossmodel integration

specializationcf martin

aligningsubject vocabularies

subject rma

isabout dc