vocabulary management and skos - taxonic · vocabulary management and skos putting business in the...

20
Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014

Upload: others

Post on 14-Mar-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

Vocabulary management and SKOS

Putting Business in the Lead

Jan Voskuil (Taxonic)

September 5th, 2014, Leipzig

SEMANTiCS 2014

Page 2: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

Introduction

Jan Voskuil Taxonic (co-founder)

Consultancy in Semantic Technology

“SKOS is used for findability, but should be used also for vocabulary management in organizations.

Business owns the dictionary, not IT”

What are dictionaries and what for? SKOS: Tooling and benefits Practicalities

Page 3: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

Dienst Justitiële Inrichtingen (DJI)

Custodial Institutions Agency

Ca. 10.000 employees

Ca. 70.000 inmates per year

Ca. 50 facilities

Four groups of detainees

Adult detainees

Juvenile offenders

Patients in forensic care

Foreign nationals

Page 4: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

4

Dictionaries: Benefits

• Knowledge management

• Quality of information

• Manageability– If your systems contain 100K+ of

attribute names, then they

contain unstructured

information (Dave McComb)

• Findability

– Document (DMS)

– Data (DBMS)

• Exchangeability

Page 5: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

5

Frequency of the most frequent word

Frequency of the second most frequent word

How many key words are enough?

• Zipf’s Law• 5000 words are enough to understand

95% of any corpus. For the other 5% you need to know the other 200,000 words

Source:Tiberius and Schoonheim

A Frequency Dictionary of Dutch, 2014

Pocket dictionary: 5K

General dictionary: 100K

Lexicographic dictionary: 1M+

Page 6: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

6

The Real World

Dictionary Owner

Begrippenwoordenboek DJI Dept X

Begrippenlijst Project Y Project Y

Mega Glossary ICT-Dept

Information chain dictionaries

Ketenwoordenboek Strafrecht JustID

Ketenwoordenboek Vreemdelingen

JustID

Justitiethesaurus WODC

Data Dictionaries

Gegevenswoordenboek MITS ICT-Dept

Datadictionary Tulp MIR ICT-Dept

… It just does not work!

What is the correct definition of x?Who decides this?

My project introduces new terms, how can I get these accepted?

Page 7: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

7

OLD SITUATION NEW SITUATION

Various lists Single source of truth

Various versions Single source of truth

Word-documents Intranet (Internet)

Distribution per mail Intranet (Internet)

Endless discussions Clear-cut governance

Responsibility of IT dept or project Ownership by the business

Page 8: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

8

Some How To’s

• Keep the dictionary lean and mean– Create a “pocket dictionary”

– Example: 1200 key words

• Governance: be pragmatic

• Ownership within the business!

• Use clear, explanatory descriptions – Language of the work force

– Avoid legal speak!

• Dictionary maintenance is a continuous proces!– Release cycle

– One major, four minor releases per year

– Major release is approved by senior executives

Page 9: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

9

Why SKOS is so great: just enough semantics

• Semantic relations

– Compare one-dimensional lists

• A LIMITED number of

STANDARDIZED semantic

relations

– Broader, Narrower, Related Term

– Semantics is sufficiently vague

• Intuitive, easy to understand

– Ideal for “pidginization”

– Use is far broader than Class

Diagrams, ERDs and ontologies

• Only most relevant info

• “GENERALIZED CLASSIFICATION”

Justitiabele(“Detainee”)

Adult detainee

Juvenile offender

Foreign national

Patient in forensic care

nar

row

er

Criminal Law

Penal Institution

narrower

Sex

Male

Female

Unknown

Undisclosedn

arro

wer

Page 10: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

10

Why SKOS is so great: tooling

Page 11: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

11

Tooling: PoolParty Thesaurus Manager

Page 12: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

12

End User View

Page 13: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

13

SKOS is an Open Standard: Project Linking

Page 14: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

http://vocabulary.wolterskluwer.de

Page 15: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

15

prefLabel: Unfallverhütung

Alternative labels

Broaders

Narrowers

Related terms

From DBPedia

From lod.gesis.org

From eurovoc.org

From Wolters Kluwer

Other thesauri on

the web

Page 16: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

16

prefLabel: Unfallverhütung

Alternative labels

Broaders

Narrowers

Related terms

From DBPedia

From lod.gesis.org

From eurovoc.org

From Wolters Kluwer

Other thesauri on

the web

DJI and the POLICE have very different meanings for the word ARRESTANT

DO: > RESPECT DIFFERENCES BETWEEN ORGANIZATIONS> MAKE LEXICOGRAPHIC DIFFERENCES EXPLICIT USING LINKED THESAURI

DON’T> TRY MAKING ALL ORGANIZATIONS USE EXACTLY THE SAME LANGUAGE

Page 17: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

17

Conclusion and next step:

Linking Thesauri to Datamodels

• Datamodels: not owned by business

– too detailed

– too complex

– NO ownership at the strategic level

• Thesauri

– Relatively abstract

– Relatively simple

– Ownership by the business

• SKOS bridges the gap

– With datamodels in RDF, the gap can be bridged!

Page 18: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

18

THESAURUS AND DOMAINMODELS: SCENARIO 1

DOMAIN MODEL| Data dictionary

:inmate#9818763

“B.23.a”:cell

:pi_Dordrecht:isRegisteredAt

:penitentiaryInstitution

rdf:type

THESAURUSskos:Concept

voc:4862

“Penitentiary Institution”

skos:prefLabel

rdf:type

“Detention Facility”

skos:broader

eurovoc:C877

Skos:Concept

rdfs:type

skos:exactMatch

skos:prefLabel

“A prison,[3] gaol or jail[4] is a facility in which inmates are forcibly confined and denied a

variety of freedoms under the authority of…

skos:Definition

“място за лишаване от свобода ”@bg

“Penal Institution”@en

skos:prefLabel

owl:sameAs?skos:exactMatch?

Page 19: Vocabulary management and SKOS - Taxonic · Vocabulary management and SKOS Putting Business in the Lead Jan Voskuil (Taxonic) September 5th, 2014, Leipzig SEMANTiCS 2014. Introduction

19

DOMAIN MODEL| Data dictionary

THESAURUS

DOMAIN MODEL| Data dictionary

THESAURUS AND DOMAINMODELS: SCENARIO 2

skos:Concept

“Penitentiary Institution”

rdf:type

“Detention Facility”

eurovoc:C877

Skos:Concept

rdfs:type

skos:exactMatch

skos:prefLabel

“A prison,[3] gaol or jail[4] is a facility in which inmates are forcibly confined and denied a

variety of freedoms under the authority of…

“място за лишаване от свобода ”@bg

“Penal Institution”@en

skos:prefLabel

:inmate#9818763

“B.23.a”:cell

:pi_Dordrecht:isRegisteredAt

:penitentiaryInstitution

rdf:type