october 19, 20051 the semantic web: what is it and why should you care? semantic arts, inc. dave...
TRANSCRIPT
October 19, 2005 1
The Semantic Web:What is it and why should you care?
Semantic Arts, Inc.
Dave McComb
for Toronto IRMAC/DAMA Oct 19, 2005
October 19, 2005 3
Discipline
Standards
Tools
Con
ten
t
Infr
astr
uctu
re
Semantic Web
Semantic Technology
Semantic Methodology,
Design & Approach
October 19, 2005 4
Discipline
Standards
Tools
Con
ten
t
Infr
astr
uctu
re
Part 1: Intro, Concepts and
Methods
Part 2: Semantic Metadata and
Annotated Data
Part 3: Semantic Web
Part 4: Demos
October 19, 2005 5
Semantic Concepts, Discipline and Methods
Discipline
Standards
Tools
Con
ten
t
Infr
astr
uctu
re
Part 1: Intro, Concepts and
Methods
October 19, 2005 9
Structure and Metadata
You can now deal with thousands, even millions of transactions, by knowing only a small amount of metadata
October 19, 2005 10
Drowning in Metadata
Thousands -> millions of bits of metadata
Meta metadata?XMI/MOF/CWM Millions ->
Billions of instances in hundreds of databases
Commit to share ontologies to get back to thousands/ tens of thousands of concepts
October 19, 2005 11
Operative SemanticsSome of these fields are “known” to the system and cause overt changes in
behavior
October 19, 2005 12
Others are more subtle
This one shows up on the detailed P&L
reports
This one shows up in the AP list of bills
to pay
This one shows up on the check
October 19, 2005 16
Flat Earth Schema
We need to get up out of the weeds
Higher level, business concepts
October 19, 2005 20
First Prime
Discrete Physical Object– Something to which you could (potentially)
attach a unique bar code
October 19, 2005 22
Semantic Primes for Business
– People– Animals– Physical Made Items– Buildings– Landmarks– Physical Container– Homogenous
Material– Legal Entities– Historical Events– Conversion– Scheduled Events– Defined Events– Measurement– Estimate
– Monetary Amount– Reference Value– Decision– Request – Rights– Permission– Offer– Order (Directive)– Contract/Order– Messages– Documents– Inventions– Programs
October 19, 2005 24
Context
How many addresses do you have in your database?One of our clients has 116.
Prime
CategoryContext
ContextCategory
How many types of addresses are there?
October 19, 2005 25
Context
WhereWhenRelationshipsPurpose
Prime
CategoryContext
ContextCategory
What differentiates the 116?
Context, such as
October 19, 2005 27
Example CategoriesInventory system (categories disguised as attributes):
Attractive
Insurance spare
Fast/Slow Moving
A/B/C
High/Low Value
Degradable
October 19, 2005 28
Example CategoriesInventory system (categories disguised as entities):
Equipment
Kits
Parts
Tools
Serialized Parts
Raw Material
Assemblies
Phantoms
October 19, 2005 29
Example CategoriesInventory system (categories disguised as states):
Obsolete
Reserved
Out of Stock
In Inspection
Discontinued
On Order
October 19, 2005 30
Example CategoriesInventory system (categories disguised as relations):
On consignment
In Use
Stock for this warehouse
Preferred Supplier
Issued to
October 19, 2005 31
What are we doing???
We categorize things all the time.As data modelers we set up other people’s
categories for them.We decide whether their categories will be
expressed as:– Entities– Attributes (codes, enums, flags and labels)– States– Relations– Classes – Types– etc.
October 19, 2005 32
Category Definition
Encarta: “a group or set of things, people, or actions that are classified together because of common characteristics”
Cambridge (English): “a type, or a group of things having some features that are the same”
Cambridge (American): ”a grouping of people or things by type in any systematic arrangement. (The light trucks weigh less than 5,000 pounds and are in a category that includes minivans, pickups, and sport utility vehicles)”
Infoplease: “any general or comprehensive division; a class”
Encyclopedia.com: “philosophical term that literally means predication or assertion”
October 19, 2005 33
Operative Definition of Categories
Semantic Arts: “A description of a set of things that contains:
– A set of testable membership criteria that can either improve or reduce our confidence in the membership
– A set of additional information that can be inferred from the membership
– A set of behaviors that can be applied to members of the category
– A set of questions that can be applied to the instance to gather property or relationship values”
October 19, 2005 34
Hidden Categories
Almost every “IF…THEN…” or “CASE…” statement contains a category
So does the procedures manualYou are aware of some of them
October 19, 2005 35
Categories and Behavior
The reason to create a new category is if the distinction (the new category) will be treated differently, behaviorally– By a program, or– By a human
October 19, 2005 36
Categories and Behavior
The reason to subsume categories (through a taxonomy or just collapse them) is if they can be treated the same, behaviorally
October 19, 2005 38
Discipline
Standards
Tools
Con
ten
t
Infr
astr
uctu
re
Part 2: Semantic Metadata and
Annotated Data
October 19, 2005 43
Business Vocabulary
Not whether, but – when:
• as you come across the terms, or up front?– what source:
• source documents, interviews or existing systems?
– how:• defining terms or concepts?
October 19, 2005 45
Injured workers -- representatives
Information contained in the claim files and records of injured workers, under the provisions of this title, shall be deemed confidential and shall not be open to public inspection (other than to public employees in the performance of their official duties), but representatives of a claimant, be it an individual or an organization, may review a claim file or receive specific information therefore upon the presentation of the signed authorization of the claimant.
October 19, 2005 46
Employers -- Representatives
Employers or their duly authorized representatives may review any files of their own injured workers in connection with any pending claims.
October 19, 2005 47
Claimant
A claimant may review his or her claim file if the director determines, pursuant to criteria adopted by rule, that the review is in the claimant's interest.
October 19, 2005 48
Patient
Except as otherwise provided by law, all treatment records shall remain confidential. Treatment records may be released only to the persons designated in this section, or to other persons designated in an informed written consent of the patient….[much more]
October 19, 2005 49
Child Victims
Information revealing the identity of child victims of sexual assault who are under age eighteen is confidential and not subject to public disclosure. Identifying information means the child victim's name, address, location, photograph, and in cases in which the child victim is a relative or stepchild of the alleged perpetrator, identification of the relationship between the child and the alleged perpetrator.
October 19, 2005 51
“How to”
Sources– Documents– Existing systems– Controlled Vocabularies– Interviews
Techniques– Distinctionary– Concept -> Term
October 19, 2005 52
Documents
Information contained in the claim files and records of injured workers, under the provisions of this title, shall be deemed confidential and shall not be open to public inspection (other than to public employees in the performance of their official duties), but representatives of a claimant, be it an individual or an organization, may review a claim file or receive specific information therefore upon the presentation of the signed authorization of the claimant.
October 19, 2005 56
Interviews
•Enumerate types•Look for counter examples•Look for similarities•Synonyms
October 19, 2005 58
Example good Definition
Customer:Groups or individuals who have a business relationship with the organization--those who receive and use or are directly affected by the products and services of the organization. Customers include direct recipients of products and services, internal customers who produce services and products for final recipients, and other organizations and entities that interact with an organization to produce products and services.
October 19, 2005 59
Another Problems with Definitions
Homonym problem– Same lexical word means different things
October 19, 2005 61
Concept
Avoids the generalized definition trapDrastically speeds up discovery (have you
ever tried to get a group of experts to agree on the meaning of a set of terms)
Finesses the homonymy problem
Term or Terms
October 19, 2005 62
Process
Tease apart the facets of a given definition.People will generally agree with the facets.They won’t necessarily agree on the same
combination of facets mapping to the base word you started with.
Ask: what could we call each bundle of facets that they care about?
e.g., mother
October 19, 2005 63
Key Concept: The Distinctionary
Is: a glossary
Is distinct from other glossaries: structurally, each definition first specifies the more general type of thing the word is, and then provides a way to distinguish this thing from others that are similar.
October 19, 2005 64
Example
Patient:
A patient is a role between a human being and a healthcare delivery institution.
It is different from other roles between a human and a healthcare delivery institution in that the human had been the recipient of the delivery of diagnostic or corrective health care services.
October 19, 2005 66
Taxonomy
“A taxonomy is a system for classifying and organizing large amounts of information”
Seth Earley www.earley.com
October 19, 2005 67
DMOZ
Home– Gardening– Personal Finance– Cooking
• Baking• Casseroles• Camping
– Dutch Oven
October 19, 2005 68
Formal Taxonomy
Animalia
ArthopodaChordata
Mammalia
Carnivora
PantheraGenus
Species
Family
Order
Class
Phylum
Kingdom
Felidae
Ursus
(bears)leo
(lion)
tigris
(tiger)
isa?isa?
October 19, 2005 69
Subsumption v. Inheritance
Dynamic v. Static
+PaidToDate() : int+Reserve() : int
-pensionAmt : int
Pension
+ClaimMgr() : object+DaysLost() : int
-TimeLoss : bool-ReturnToWork : Date
Claim
October 19, 2005 71
Ontology Definition
“A specification of a conceptualization”
Tom Gruber
Taxonomy: Ontology::Tree: Network
October 19, 2005 75
Consider my family Database
MName FName Sex DoB EyeColor
Naomi John M 11/18/52 Grey
Betty William F 12/20/15 Hazel
Walter Crete M 11/15/17 Blue
Heidi Dave F 12/1/88 Blue
Naomi John M 4/3/54 Blue
Name
Dave
Naomi
John
Addie
Tommy
... ... ... ... ......
October 19, 2005 76
What kinds of queries could I do?
Any view qualified by the attributes– (show everyone born before 1/1/1990)
Some join based queries– (show all of Dave’s children)
But it gets much more complex after that
October 19, 2005 77
Committing to an Ontology
MName FName Sex DoB EyeColor
Naomi John M 11/18/52 Grey
Betty William F 12/20/15 Hazel
Walter Crete M 11/15/17 Blue
Heidi Dave F 12/1/88 Blue
Naomi John M 4/3/54 Blue
Name
Dave
Naomi
John
Addie
Tommy
... ... ... ... ......
Person
Person
Gender
PersonSpouse
October 19, 2005 78
Concept: Committing & Sharing
GP (Genealogy Primitives)
GC (Genealogy Concepts)
My Family
Commits toCommits to
PersonM/FSpouseParent
Dave is maleDave is Addie’s parentAddie is femaleNaomi is Dave’s parentNaomi is Tom’s parent
Father…Uncle…Cousin…Second Cousin, etc. …
Key concept: queries/ inference can be executed using ontological definitions I’m not even aware of
October 19, 2005 79
Good Resource
Ontology Development 101: A Guide to creating your first ontology
Natalya Noy and Deborah McGuinnesshttp://www.ksl.stanford.edu/people/dlm/papers/ontology-tutorial-noy-mcguinness.pdf
October 19, 2005 81
Description Logics
This is where the rigor comes in.
Three things that take some getting used to:– Classes and Instances interchangeable– Allowing the system to do some of the design
work for you– Open world logic
Plus some very strange terminology and symbology
October 19, 2005 82
Description Logics (DL)Points of Departure
As much as possible, minimize the number of concepts that have to be accepted axiomatically.
Emphasize formal definitions for all the rest.
October 19, 2005 84
Classes and Instances
Database designers make an early design decision as to what is going to be metadata (classes, columns, etc.) and what is going to be instance data.
For ontologists, this is a continually moving target.
Additionally, properties (which could be equivalent to attributes or relationships) are “free floating” and can be attached to classes, but don’t “belong” to them in the same way as with database models.
October 19, 2005 86
Open World
In closed world (i.e., SQL), absence of information is assumed to be negation. If the query doesn’t find it, it doesn’t exist.
In open world (DL), things are assumed to be possible until proven otherwise.
In DL, classes are assumed to overlap unless they are explicitly declared to be disjoint.
Domain and range are used for reasoning, not constraining.
October 19, 2005 87
Motherhood
Sue is John’s biological motherSarah is John’s biological mother
Therefore?
George Washington’s mother
October 19, 2005 89
Other strange vocabulary
DL Term English Description MeaningPartial Necessary Primitive, or
defined classesIf something is a member of this class then it is necessary to fulfill these conditions
Complete Necessary & Sufficient
Derived or defined classes
If something fulfills these conditions, then it is a member of this class
TBox Terms Metadata Reasoning in the ontology
ABox Assertions instances Reasoning over the data