october 19, 20051 the semantic web: what is it and why should you care? semantic arts, inc. dave...

91
October 19, 2005 1 The Semantic Web: What is it and why should you care? Semantic Arts, Inc. Dave McComb for Toronto IRMAC/DAMA Oct 19, 2005

Upload: kennedy-brayfield

Post on 11-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

October 19, 2005 1

The Semantic Web:What is it and why should you care?

Semantic Arts, Inc.

Dave McComb

for Toronto IRMAC/DAMA Oct 19, 2005

October 19, 2005 2

Objectives

Semantics > Good Definitions

Exotic Terminology

Pursue this further

October 19, 2005 3

Discipline

Standards

Tools

Con

ten

t

Infr

astr

uctu

re

Semantic Web

Semantic Technology

Semantic Methodology,

Design & Approach

October 19, 2005 4

Discipline

Standards

Tools

Con

ten

t

Infr

astr

uctu

re

Part 1: Intro, Concepts and

Methods

Part 2: Semantic Metadata and

Annotated Data

Part 3: Semantic Web

Part 4: Demos

October 19, 2005 5

Semantic Concepts, Discipline and Methods

Discipline

Standards

Tools

Con

ten

t

Infr

astr

uctu

re

Part 1: Intro, Concepts and

Methods

October 19, 2005 6

Semantics

The study of meaning(sometimes the study of

the meaning of words)

October 19, 2005 7

October 19, 2005 8

October 19, 2005 9

Structure and Metadata

You can now deal with thousands, even millions of transactions, by knowing only a small amount of metadata

October 19, 2005 10

Drowning in Metadata

Thousands -> millions of bits of metadata

Meta metadata?XMI/MOF/CWM Millions ->

Billions of instances in hundreds of databases

Commit to share ontologies to get back to thousands/ tens of thousands of concepts

October 19, 2005 11

Operative SemanticsSome of these fields are “known” to the system and cause overt changes in

behavior

October 19, 2005 12

Others are more subtle

This one shows up on the detailed P&L

reports

This one shows up in the AP list of bills

to pay

This one shows up on the check

October 19, 2005 13

None of this is mentioned in the user manual or on line help text

October 19, 2005 14

Scale issues

October 19, 2005 15

Carver Mead

October 19, 2005 16

Flat Earth Schema

We need to get up out of the weeds

Higher level, business concepts

October 19, 2005 17

Semantic Framework

Prime

CategoryContext

ContextCategory

October 19, 2005 18

Anna wierzbicka

Semantics: Primes and Universals

Anna Wierzbicka

October 19, 2005 19

Semantic PrimesPrime

CategoryContext

ContextCategory

Anna Wierzbicka

October 19, 2005 20

First Prime

Discrete Physical Object– Something to which you could (potentially)

attach a unique bar code

October 19, 2005 21

Physical Items

October 19, 2005 22

Semantic Primes for Business

– People– Animals– Physical Made Items– Buildings– Landmarks– Physical Container– Homogenous

Material– Legal Entities– Historical Events– Conversion– Scheduled Events– Defined Events– Measurement– Estimate

– Monetary Amount– Reference Value– Decision– Request – Rights– Permission– Offer– Order (Directive)– Contract/Order– Messages– Documents– Inventions– Programs

October 19, 2005 23

“G’arn?”“Narn”

Role of context

October 19, 2005 24

Context

How many addresses do you have in your database?One of our clients has 116.

Prime

CategoryContext

ContextCategory

How many types of addresses are there?

October 19, 2005 25

Context

WhereWhenRelationshipsPurpose

Prime

CategoryContext

ContextCategory

What differentiates the 116?

Context, such as

October 19, 2005 26

CategoriesPrime

CategoryContext

ContextCategory

How Categories Inform Us

October 19, 2005 27

Example CategoriesInventory system (categories disguised as attributes):

Attractive

Insurance spare

Fast/Slow Moving

A/B/C

High/Low Value

Degradable

October 19, 2005 28

Example CategoriesInventory system (categories disguised as entities):

Equipment

Kits

Parts

Tools

Serialized Parts

Raw Material

Assemblies

Phantoms

October 19, 2005 29

Example CategoriesInventory system (categories disguised as states):

Obsolete

Reserved

Out of Stock

In Inspection

Discontinued

On Order

October 19, 2005 30

Example CategoriesInventory system (categories disguised as relations):

On consignment

In Use

Stock for this warehouse

Preferred Supplier

Issued to

October 19, 2005 31

What are we doing???

We categorize things all the time.As data modelers we set up other people’s

categories for them.We decide whether their categories will be

expressed as:– Entities– Attributes (codes, enums, flags and labels)– States– Relations– Classes – Types– etc.

October 19, 2005 32

Category Definition

Encarta: “a group or set of things, people, or actions that are classified together because of common characteristics”

Cambridge (English): “a type, or a group of things having some features that are the same”

Cambridge (American): ”a grouping of people or things by type in any systematic arrangement. (The light trucks weigh less than 5,000 pounds and are in a category that includes minivans, pickups, and sport utility vehicles)”

Infoplease: “any general or comprehensive division; a class”

Encyclopedia.com: “philosophical term that literally means predication or assertion”

October 19, 2005 33

Operative Definition of Categories

Semantic Arts: “A description of a set of things that contains:

– A set of testable membership criteria that can either improve or reduce our confidence in the membership

– A set of additional information that can be inferred from the membership

– A set of behaviors that can be applied to members of the category

– A set of questions that can be applied to the instance to gather property or relationship values”

October 19, 2005 34

Hidden Categories

Almost every “IF…THEN…” or “CASE…” statement contains a category

So does the procedures manualYou are aware of some of them

October 19, 2005 35

Categories and Behavior

The reason to create a new category is if the distinction (the new category) will be treated differently, behaviorally– By a program, or– By a human

October 19, 2005 36

Categories and Behavior

The reason to subsume categories (through a taxonomy or just collapse them) is if they can be treated the same, behaviorally

October 19, 2005 37

Wrap on Discipline

October 19, 2005 38

Discipline

Standards

Tools

Con

ten

t

Infr

astr

uctu

re

Part 2: Semantic Metadata and

Annotated Data

October 19, 2005 39

Metadata and Annotated Data

October 19, 2005 40

Content: FOAF

Friend Of A Friend Ontology for contacts

October 19, 2005 41

Content: Dublin Core

October 19, 2005 42

So, how do we do this?

Business Vocabulary

Taxonomy

Ontology

Description Logic

October 19, 2005 43

Business Vocabulary

Not whether, but – when:

• as you come across the terms, or up front?– what source:

• source documents, interviews or existing systems?

– how:• defining terms or concepts?

October 19, 2005 44

Business Vocabulary

Schema Jargon

October 19, 2005 45

Injured workers -- representatives

Information contained in the claim files and records of injured workers, under the provisions of this title, shall be deemed confidential and shall not be open to public inspection (other than to public employees in the performance of their official duties), but representatives of a claimant, be it an individual or an organization, may review a claim file or receive specific information therefore upon the presentation of the signed authorization of the claimant.

October 19, 2005 46

Employers -- Representatives

Employers or their duly authorized representatives may review any files of their own injured workers in connection with any pending claims.

October 19, 2005 47

Claimant

A claimant may review his or her claim file if the director determines, pursuant to criteria adopted by rule, that the review is in the claimant's interest.

October 19, 2005 48

Patient

Except as otherwise provided by law, all treatment records shall remain confidential. Treatment records may be released only to the persons designated in this section, or to other persons designated in an informed written consent of the patient….[much more]

October 19, 2005 49

Child Victims

Information revealing the identity of child victims of sexual assault who are under age eighteen is confidential and not subject to public disclosure. Identifying information means the child victim's name, address, location, photograph, and in cases in which the child victim is a relative or stepchild of the alleged perpetrator, identification of the relationship between the child and the alleged perpetrator.

October 19, 2005 50

Dilbert’s Boss Understands This

October 19, 2005 51

“How to”

Sources– Documents– Existing systems– Controlled Vocabularies– Interviews

Techniques– Distinctionary– Concept -> Term

October 19, 2005 52

Documents

Information contained in the claim files and records of injured workers, under the provisions of this title, shall be deemed confidential and shall not be open to public inspection (other than to public employees in the performance of their official duties), but representatives of a claimant, be it an individual or an organization, may review a claim file or receive specific information therefore upon the presentation of the signed authorization of the claimant.

October 19, 2005 53

Existing systems

October 19, 2005 54

Vocabulary Item:

“A variety of language unique to an individual”

Idiolect

October 19, 2005 55

Every System We Design or Buy…

… is another ideolect

October 19, 2005 56

Interviews

•Enumerate types•Look for counter examples•Look for similarities•Synonyms

October 19, 2005 57

Warning:

Definitions are hard to get consensus onAnd often not worth it

October 19, 2005 58

Example good Definition

Customer:Groups or individuals who have a business relationship with the organization--those who receive and use or are directly affected by the products and services of the organization. Customers include direct recipients of products and services, internal customers who produce services and products for final recipients, and other organizations and entities that interact with an organization to produce products and services.

October 19, 2005 59

Another Problems with Definitions

Homonym problem– Same lexical word means different things

October 19, 2005 60

SUMO and WordNet

October 19, 2005 61

Concept

Avoids the generalized definition trapDrastically speeds up discovery (have you

ever tried to get a group of experts to agree on the meaning of a set of terms)

Finesses the homonymy problem

Term or Terms

October 19, 2005 62

Process

Tease apart the facets of a given definition.People will generally agree with the facets.They won’t necessarily agree on the same

combination of facets mapping to the base word you started with.

Ask: what could we call each bundle of facets that they care about?

e.g., mother

October 19, 2005 63

Key Concept: The Distinctionary

Is: a glossary

Is distinct from other glossaries: structurally, each definition first specifies the more general type of thing the word is, and then provides a way to distinguish this thing from others that are similar.

October 19, 2005 64

Example

Patient:

A patient is a role between a human being and a healthcare delivery institution.

It is different from other roles between a human and a healthcare delivery institution in that the human had been the recipient of the delivery of diagnostic or corrective health care services.

October 19, 2005 65

Taxonomies

Business Vocabulary

Taxonomy

Ontology

Description Logic

October 19, 2005 66

Taxonomy

“A taxonomy is a system for classifying and organizing large amounts of information”

Seth Earley www.earley.com

October 19, 2005 67

DMOZ

Home– Gardening– Personal Finance– Cooking

• Baking• Casseroles• Camping

– Dutch Oven

October 19, 2005 68

Formal Taxonomy

Animalia

ArthopodaChordata

Mammalia

Carnivora

PantheraGenus

Species

Family

Order

Class

Phylum

Kingdom

Felidae

Ursus

(bears)leo

(lion)

tigris

(tiger)

isa?isa?

October 19, 2005 69

Subsumption v. Inheritance

Dynamic v. Static

+PaidToDate() : int+Reserve() : int

-pensionAmt : int

Pension

+ClaimMgr() : object+DaysLost() : int

-TimeLoss : bool-ReturnToWork : Date

Claim

October 19, 2005 70

Ontology --Frame based

Business Vocabulary

Taxonomy

Ontology

Description Logic

October 19, 2005 71

Ontology Definition

“A specification of a conceptualization”

Tom Gruber

Taxonomy: Ontology::Tree: Network

October 19, 2005 72

October 19, 2005 73

Limits of Taxonomy

Disjointedness

October 19, 2005 74

Concept: A Small Ontology

GP (Genealogy Primitives)PersonM/FSpouseParent

October 19, 2005 75

Consider my family Database

MName FName Sex DoB EyeColor

Naomi John M 11/18/52 Grey

Betty William F 12/20/15 Hazel

Walter Crete M 11/15/17 Blue

Heidi Dave F 12/1/88 Blue

Naomi John M 4/3/54 Blue

Name

Dave

Naomi

John

Addie

Tommy

... ... ... ... ......

October 19, 2005 76

What kinds of queries could I do?

Any view qualified by the attributes– (show everyone born before 1/1/1990)

Some join based queries– (show all of Dave’s children)

But it gets much more complex after that

October 19, 2005 77

Committing to an Ontology

MName FName Sex DoB EyeColor

Naomi John M 11/18/52 Grey

Betty William F 12/20/15 Hazel

Walter Crete M 11/15/17 Blue

Heidi Dave F 12/1/88 Blue

Naomi John M 4/3/54 Blue

Name

Dave

Naomi

John

Addie

Tommy

... ... ... ... ......

Person

Person

Gender

PersonSpouse

October 19, 2005 78

Concept: Committing & Sharing

GP (Genealogy Primitives)

GC (Genealogy Concepts)

My Family

Commits toCommits to

PersonM/FSpouseParent

Dave is maleDave is Addie’s parentAddie is femaleNaomi is Dave’s parentNaomi is Tom’s parent

Father…Uncle…Cousin…Second Cousin, etc. …

Key concept: queries/ inference can be executed using ontological definitions I’m not even aware of

October 19, 2005 79

Good Resource

Ontology Development 101: A Guide to creating your first ontology

Natalya Noy and Deborah McGuinnesshttp://www.ksl.stanford.edu/people/dlm/papers/ontology-tutorial-noy-mcguinness.pdf

October 19, 2005 80

Description Logics

Business Vocabulary

Taxonomy

Ontology

Description Logic

October 19, 2005 81

Description Logics

This is where the rigor comes in.

Three things that take some getting used to:– Classes and Instances interchangeable– Allowing the system to do some of the design

work for you– Open world logic

Plus some very strange terminology and symbology

October 19, 2005 82

Description Logics (DL)Points of Departure

As much as possible, minimize the number of concepts that have to be accepted axiomatically.

Emphasize formal definitions for all the rest.

October 19, 2005 83

DL Definitions

October 19, 2005 84

Classes and Instances

Database designers make an early design decision as to what is going to be metadata (classes, columns, etc.) and what is going to be instance data.

For ontologists, this is a continually moving target.

Additionally, properties (which could be equivalent to attributes or relationships) are “free floating” and can be attached to classes, but don’t “belong” to them in the same way as with database models.

October 19, 2005 85

Allowing the System to Do some Design

Declared

Inferred

October 19, 2005 86

Open World

In closed world (i.e., SQL), absence of information is assumed to be negation. If the query doesn’t find it, it doesn’t exist.

In open world (DL), things are assumed to be possible until proven otherwise.

In DL, classes are assumed to overlap unless they are explicitly declared to be disjoint.

Domain and range are used for reasoning, not constraining.

October 19, 2005 87

Motherhood

Sue is John’s biological motherSarah is John’s biological mother

Therefore?

George Washington’s mother

October 19, 2005 88

October 19, 2005 89

Other strange vocabulary

DL Term English Description MeaningPartial Necessary Primitive, or

defined classesIf something is a member of this class then it is necessary to fulfill these conditions

Complete Necessary & Sufficient

Derived or defined classes

If something fulfills these conditions, then it is a member of this class

TBox Terms Metadata Reasoning in the ontology

ABox Assertions instances Reasoning over the data

October 19, 2005 90

Summary

Business Vocabulary

Taxonomy

Ontology

Description Logic

October 19, 2005 91

www.semanticarts.comSemantic Arts, Inc.