dean allemang semantic web basics
DESCRIPTION
Monday Morning TutorialTRANSCRIPT
TopQuadrant in collaboration with Jim Hendler presents: “Getting Ready for the Semantic Web with
TopBraid Suite”
TopQuadrant Semantic Web Technology Training Series
Module I-2:Overview of Semantic Technologies and the
Semantic Web
What is Semantic Technology?What is it good for?
© Copyright 2007-2009 TopQuadrant Inc. Slide 2
The Semantic Wave is NOT one thing … there are differing major streams within it
The Semantic WebW3C Standards for sharing information on a world-wide scaleIntranets vs. Internet
Semantic TechnologyEnhanced knowledge access and searchSemantic InteroperabilityInformation syndication… and so forth
© Copyright 2007-2009 TopQuadrant Inc. Slide 3
Philosophy of Semantics
What is Semantics = “meaning of meaning”Logical Positivism (with a vengeance!) = “Everything is described by values of properties”Shirky: “Semantic Web is about tautologies. (…and tautologies aren’t interesting)”Intelligent Agents = AI on the web (it’s gotta be smart!)
© Copyright 2007-2009 TopQuadrant Inc. Slide 4
Semantic Web: Make web content machine-readable!
“The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.[W3C 2001] ”
“The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work incooperation.” [Tim Berners-Lee et al 2001]
© Copyright 2007-2009 TopQuadrant Inc. Slide 5
What could the Web do?
Web page interaction –uses people as its medium!
© Copyright 2007-2009 TopQuadrant Inc. Slide 6
What could the Web do? (cont.)
Can this sort of interactionbecome part of the Web itself?
© Copyright 2007-2009 TopQuadrant Inc. Slide 7
How could the Web do it?
Built-in by the Webmaster
Agree upon an “interlingua”
© Copyright 2007-2009 TopQuadrant Inc. Slide 8
What about XML? Doesn’t it support semantics?
HTML gave us formatting tags
XML gave us custom tagsYou get to pick your tags/attributesTags can have “meaning” specific to your application
Many dialects have blossomed
XML and XML Schema became W3C standards
Standard dialects are being developed by many industry groups – XBRL.org, FpML.org, TaxML.org, …
Every large organization has their own XML Schemas
© Copyright 2007-2009 TopQuadrant Inc. Slide 9
Gartner: All Tied Up with XML: 1999
Unprecedented growth of standard development
© Copyright 2007-2009 TopQuadrant Inc. Slide 10
Gartner: All Tied Up with XML: 2001
From 2001 through 2004 enterprises spent $3 billion on modeling activities with no return on investment from $ 2 billion of it.
© Copyright 2007-2009 TopQuadrant Inc. Slide 11
A new Web of terminology
What’s the Interlingua for the Interlingua?
Use the same technology for mapping web pages to terminologyto map terminology to one another
© Copyright 2007-2009 TopQuadrant Inc. Slide 12
What about people? Don’t they make the world go round?
computers and people…better cooperation
OntologyOntology
Annotated Web Page
Annotated Web Page
Annotated Web Page
Agent
OntologyInternet
Human
Source:, Phil Windridge WSWS (2004)
© Copyright 2007-2009 TopQuadrant Inc. Slide 13
The Web: The World’s Largest Information System!
How did it get so big? What is special about The Web?
© Copyright 2007-2009 TopQuadrant Inc. Slide 14
How the Web was Won – The Network Effect
The “Network Effect” –The more participants,the more value in joining
At first, there were few web pagesLittle value in joining in.
But then there were more –more value in joining in
A crowd is better yet!Lots of value in joining in
© Copyright 2007-2009 TopQuadrant Inc. Slide 15
Features of The Web
Anyone can say Anything about Any topic (AAA)All names are global (so that anyone can refer to them!)
Two people might have different names for the same thing . . . (non-unique naming)
. . . Or the same name for different things!
You never know everything on the web (“Open World”)
This isn’t what we want the web to be, it is how the web is (and how it supports the network effect!)
© Copyright 2007-2009 TopQuadrant Inc. Slide 16
AAA Slogan -
Anyone
can say Anything
about Any topic
© Copyright 2007-2009 TopQuadrant Inc. Slide 17
Non-unique naming
“Java”
public String getContextPath() {try {Method getContextPathMethod = delegate.getClass().getMethod("getContextPath", null); //$NON-NLS-1$return (String) getContextPathMethod.invoke(delegate, null);} catch (Exception e) {// ignore}return null;}
“Java”?
Programming language?Hot Beverage?
“Coffee”?
© Copyright 2007-2009 TopQuadrant Inc. Slide 18
Open World Assumption
There’s always something else over the horizon . . .
© Copyright 2007-2009 TopQuadrant Inc. Slide 19
Web of Ontologies
The more there are, the easier and more valuable it is to create a new one
Network Effect for Ontologies:
From Tim Berners-Lee, ISWC 2003
© Copyright 2007-2009 TopQuadrant Inc. Slide 20
Semantic Solutions -Expectations we have encountered
A solution is "Semantic" if it:
X understands the meaning of natural language
X interprets perceptual input and forms usable representations
X computes new statistical analyses of data
X recognizes complex patterns in huge amounts of data
X produces high-quality graphics of complex systems
X replaces an intelligent human in a high-impact task
X has a model of uncertainty in data
X can come to correct conclusions despite faulty input.
Companies provide (some of) these, buttypically as “add-ons” or 3rd party solutionsNot as part of core Semantic offerings
© Copyright 2007-2009 TopQuadrant Inc. Slide 21
W3C standards for semantic models
W3C Semantic stack is built on XMLXML-based Ontology languages are being developed to support semantic interoperability.
“Semantic Web is stimulating a whole new class of applications at individual, enterprise and web scales”– Eric Miller, W3C, Semantic Technologies for eGOV’2003
www.w3.org/www.w3.org/2001/sw
© Copyright 2007-2009 TopQuadrant Inc. Slide 22
W3C standards evolution
“Layer cake” has been modified a few times…
Significant changes:•Addition of a query language (SPARQL)•RDF not dependent on XML•OWL just one of many possible logic formalisms
Not shown: emergence of n3 serialization (built for triples, unlike XML)
TopQuadrant in collaboration with Jim Hendler presents: “Getting Ready for the Semantic Web with
TopBraid Suite”
TopQuadrant Semantic Web Technology Training Series
Module I-3:Overview of Semantic Technologies and the
Semantic Web
Mapping the Semantic Terrain:Standards and Languages from
the W3C
© Copyright 2007-2008 TopQuadrant Inc. Slide 2
The Tree of Knowledge Technologies
AI Knowledge Representation
Semantic Technology Languages
Content Management Languages
Process Knowledge Languages
Software Modeling Languages
© Copyright 2007-2008 TopQuadrant Inc. Slide 3
Semantic Web Standards Stack
© Copyright 2007-2009 TopQuadrant Inc. Slide 4
How Semantic Languages Work
Bring information togetherDraw inferences
OWLRDF
RDFS
© Copyright 2007-2009 TopQuadrant Inc. Slide 5
What is RDF? Distribution of data
ID Model No. Division Product Line Manufacture
location SKUInStock
1 ZX-3 Manufacturing support Paper machine Sacramento FB3524 23
2 ZX-3P Manufacturing support Paper machine Sacramento KD5243 4
3 ZX-3S Manufacturing support Paper machine Sacramento IL4028 34
4 B-1430 Control Engineering Feedback Line Elizabeth KS4520 23
5 B-1430X Control Engineering Feedback Line Elizabeth CL5934 14
6 B-1431 Control Engineering Active Sensor Seoul KK3945 0
7 DBB-12 Accessories Monitor Hong Kong ND5520 100
8 SP-1234 Safety Safety Valve Cleveland HI4554 4
9 SPX-1234 Safety Safety Valve Cleveland OP5333 14
© Copyright 2007-2009 TopQuadrant Inc. Slide 6
Distribute by rows?1 ZX-3 Manufacturing
supportPaper machine Sacramento FB3524 23
4 B-1430
Control Engineering
Feedback Line
Elizabeth
KS4520 23
7 DBB-12 Accessories Monitor Hong Kong
ND5520 100
Needs common schema - which column is which?
© Copyright 2007-2009 TopQuadrant Inc. Slide 7
Distribute by columns?
Division
Manufacturing support
Manufacturing support
Manufacturing support
Control Engineering
Control Engineering
Control Engineering
Accessories
Safety
Safety
In Stock
23
4
34
23
14
0
100
4
14
Model No.
ZX-3
ZX-3P
ZX-3S
B-1430
B-1430X
B-1431
DBB-12
SP-1234
SPX-1234
Needs to reference entities – which thing are
we talking about?
© Copyright 2007-2009 TopQuadrant Inc. Slide 8
Distribute by cells!?
Division
7 Accessories
Needs to reference both schema and entities
Most flexible – can distribute data in any way at all!
Division7 Accessorie
sProduct Line4 Feedback Line
Model1 ZX-3
© Copyright 2007-2009 TopQuadrant Inc. Slide 9
Distribute by cells!?
URI’s
PredicateDivision
7 Accessories
Subject
Object
•Store, •Index, and •Federate these triples
Triple Store
© Copyright 2007-2009 TopQuadrant Inc. Slide 10
Representing Data in Graphs
Graph = nodes linked by labeled edges
© Copyright 2007-2009 TopQuadrant Inc. Slide 11
What is RDF?
RDF (Resource Description Framework) is an infrastructure for:
Encoding,Exchange and Distributing metadata
A “Resource“ is anything that you want to describe
RDF Triple:
Subject
Predicate
Object
RDF Triple:
Subject
Predicate
Object
© Copyright 2007-2009 TopQuadrant Inc. Slide 12
Why is RDF useful?
After all, can’t we express semantics in XML?
XML allows application-specific tagsSemantic exchange can happen, as long as two parties agree on the use of these tags
XML Represents a tree structure – no indication about what is a description, what denotes an object, what values relate to what objects, etc.
RDF provides a way to integrate structured information from multiple sources
© Copyright 2007-2009 TopQuadrant Inc. Slide 13
What is RDFS?
RDFS is the schema language for RDFType inferences can be made, based on schema
© Copyright 2007-2009 TopQuadrant Inc. Slide 14
Why is RDFS useful?
RDFS allows us to talk about classes of instances
It provides inferences, e.g.,
Best Western is a Hotel
(and hence, anything we know about Hotelsapplies to Best Western)
RDFS is in RDF (it’s its own schema language!)
© Copyright 2007-2009 TopQuadrant Inc. Slide 15
How does it work?
RDF
RDF – the Ultimate Mash-up Language !!
RDF
RDFRDF
© Copyright 2007-2008 TopQuadrant Inc. Slide 16
A little RDF(S) goes a long way
gov: EPA
gov: department
gov: agency
gov: body
A model of government agencies and departments. Such models are called Ontologies.
brm: Business Area brm: Line Of Business
brm: subfunction
brm:Resource Mgmt
brm: s2citizens brm:Energy
eGOV: capabilityeGOV: Standard
eGOV:Service SpeceGOV: Remote Reporting
eGOV: web service
eGovOS: project
gov: FERCgov: DoE
© Copyright 2007-2008 TopQuadrant Inc. Slide 17
Ontologies are the means to separate “what is common” from “what is different”
Semantic map: Connecting silo domains
From Tim Berners-Lee, ISWC 2003
© Copyright 2007-2008 TopQuadrant Inc. Slide 18
OWL (formerly DAML+OIL)
What is OWL?
The “Web Ontology Language”W3C Standard
EU (various)DARPA
DAML
DAML+OIL
OIL
OWL
RDF
W3C
Became a Recommendation in February 2004
© Copyright 2007-2008 TopQuadrant Inc. Slide 19
OWL can specify rich relationships: equivalence, inverse, unique, …
© Copyright 2007-2009 TopQuadrant Inc. Slide 20
What is OWL good for?
OWL provides a flexible way to talk about sets of resources,
e.g., “All planets around the sun”, “The wives of Henry VIII”, “People who have visited Japan”, “Incidents of SIDS in which the mother is related to someone with epilepsy”
A powerful way to relate information “people who have flown to Japan have been to Japan”, “a person with epilepsy has a neurological disorder”, “planets are astronomical bodies”
Provides a robust way to re-use informationEven if I didn’t expect someone to be interested in the planets around the sun when I built my database, you can define this set and draw conclusions about it (e.g., “there are 9 of them”)
© Copyright 2007-2008 TopQuadrant Inc. Slide 21
SPARQL
SPARQL Protocol and RDF Query Language
Query Language (like SQL is for databases, XQuery is for XML, etc.)
Extracts information from a graph using pattern matching
“Four-star lodging in New York”
© Copyright 2007-2008 TopQuadrant Inc. Slide 22
Find data in a graph
TopQuadrant in collaboration with Jim Hendler presents: “Getting Ready for the Semantic Web with
TopBraid Suite”
TopQuadrant Semantic Web Technology Training Series
Module I-5:Overview of Semantic Technologies and the
Semantic Web
Comparing Semantic with
Conventional Technologies
© Copyright 2007-2009 TopQuadrant Inc. Slide 2
Semantic Technology, RDBMs and Object Models
In this session we will compare Semantic Technology with
Relational Database and Object Oriented Technology
We will use as an example a simple application for managing and tracking medical equipment
application will be asked to find an address compare semantic and relational database approaches
We will then identify key differences and similarities between semantic and object models
© Copyright 2007-2009 TopQuadrant Inc. Slide 3
Semantic and OO Technologies
Comparing Semantic Technology (ST)
and
Object Oriented (OO) Technology
© Copyright 2007-2009 TopQuadrant Inc. Slide 4
Differing Intents of Semantic and Object Models
Object ModelA specification of how a set of entities can encapsulate data and invoke behaviors on one anotherThe intent of the model is to provide realizable software where object behaviors become fragments of code
Semantic Model (Ontology)A specification of what is known in a region of interestIntent is to maintain consistency between what is known about a domain in general (expertise) and what is observed about a situation (data)
© Copyright 2007-2009 TopQuadrant Inc. Slide 5
Specification (OO) vs. Knowledge Discovery (ST)
OO: In an Object Model class membership and hierarchy are specified
Classes are defined and express ‘constraints’ on the instances (individuals) that belong to classes
ST: An Ontology model can serve either as a specification of class membership or as a means of knowledge discovery
Sets of properties associated with individuals (instances) may allow them to be viewed as members of several classes at the same timeClass membership is dynamic and depends on the value of the propertiesIn OWL, classes are computed based on assertions about the individuals that might belong to one or more classes
© Copyright 2007-2009 TopQuadrant Inc. Slide 6
(ST) In OWL, Classes are inferred or computed
OWL classes are interpreted as sets that contain individuals
A class is not a kind of template as in OO technologyIn OWL, classes are built up of descriptions that specify the conditions that must be satisfied by an individual to be a member of the class
Subclasses are subsets of their parent classes.
Superclass-subclass relationships can be computed automatically by a reasoner
© Copyright 2007-2009 TopQuadrant Inc. Slide 7
OO Systems Define Acceptable Expectations for Instances of Classes and Subclasses
Information Given
What can you do?
PW1729.list_tests(..)
This call must be valid.(and is done like it is for all Medical Equipment, unless stated otherwise)
method list_tests
extends Medical Equipment
Instance ClassPW1729 X-Ray Machine
Medical Equipment Class
X-Ray Machine Class
Instances
Relationships in the model determine what can be done, and what it means to do it.
© Copyright 2007-2009 TopQuadrant Inc. Slide 8
Properties (ST) vs. Attributes and Relations (OO)
OWL Properties represent relations between two individuals (Not classes)
OWL Property types:
Object Properties link an individual to an individual
Datatype Properties link an individual to simple values• integers, floats, strings, booleans, and so forth• an XML Schema Datatype primitive value or an RDF literal
Those with OO experience/expertise must overcome the typical pre-conception that properties belong to the class!
© Copyright 2007-2009 TopQuadrant Inc. Slide 9
Properties are first-class constructsThis allows relationships between Properties
hasParent
hasMother hasFather
This is not a class diagram!
In contrast to most OO paradigms, where properties are “owned” or “contained in” Classes
… and for other modelers to reuse properties
dc:creator
my:author
“wherever I use the property ‘author’, tell the world that they can read ‘dc:creator’”
This allows relationships between Properties
© Copyright 2007-2009 TopQuadrant Inc. Slide 10
In OWL, Properties may have Sub Properties
It is possible to form hierarchies of properties (these are not Class hierarchies)
The rdfs:subPropertyOf construct allows relationships to be abstracted up the sub-property tree.
© Copyright 2007-2009 TopQuadrant Inc. Slide 11
Properties may have Additional Characteristics
OWL allows the meaning of properties to be enriched through the use of property descriptions:
Sub-property: Relation between properties, that the pairs related byone property are included in the other.
Domain and Range: Descriptions of a property that determines class membership of individuals related by that property.
Inverse: Description of a property that exchanges the Subject and Object in their respective relationship in another property that it is an inverse-of.
Transitive: Chains of relationships collapse into a single relationship.
Symmetric: Description of a property that makes it a self-inverse property
Functional and Inverse Functional
…
© Copyright 2007-2009 TopQuadrant Inc. Slide 12
What does it mean to be a subclass in OWL?
In OWL, subclass means necessary implication
If A is a subclass of B, then ALL instances of A are instances of B, without exception.
If something is an A, then this implies that it is also a B
OWL allows the use of class expressions in place of named classes
restrictions describe an unnamed set that could contain some individualsthese are anonymous, computed classes
Properties are used to define and describe ClassesProperties are used to create restrictions. Restrictions are used to restrict the individuals that belong to a class. When you define a restriction, it is the property that give rise to the class
© Copyright 2007-2009 TopQuadrant Inc. Slide 13
Class Expressions and (Multiple) Inheritance
Semantic Web Classes are (logically) sets, so you can do set logic on them:
Parent
Person
Female Male
Mother ∩
“A ‘Mother’ is a ‘Female’ who is also a ‘Parent’”
© Copyright 2007-2009 TopQuadrant Inc. Slide 14
(ST) Run-time vs. (OO) Design time semantics
In semantic systems, ontologies are run-timemodels
There is no separation between the model and its technical implementation
The model (ontology) is the OWL code
Semantic web technology Model-driven applications
vs.OO technology Model-developed applications
© Copyright 2007-2009 TopQuadrant Inc. Slide 15
OO Development – Progressive Design and Transformation of Classes into Code
Requirements
Concepts
VOPOS or CRC CRC(ST) No behavior is described anywhere –only inferencing
Design ObjectsAnalysis Objects
Class CodeUse Cases
Method CodeScenarios
ExpectedBehavior
Object Responsibilities(public interface)
Object Services(signatures)
Method Code
Natural LanguageContract ValidationObject DiscoveryHuman Effort
RefactoringAdd SignaturesAdapt to ConstraintsPick Mechanisms
Write code that fulfillsthe defined signatures, Adding private/helper Methods as needed
© Copyright 2007-2009 TopQuadrant Inc. Slide 16
Summary Table: Understanding the “Lingo”
OO Term Object Technology Semantic Technology ST Term
Class A “Factory for instances”Definition of behavior (methods, values). Defines all the attributes and associations.
Definition of a set of individuals. Uses open world approach – defines some, but not necessarily all possible properties of its members.
Class
Instance Member of one direct class, responds to all methods defined in the class
Member of one or more classes, as determined by the instance property values and class definition
Individual or Instance
Attribute Data encapsulated with an instance. Defined only for the members of the class
Data describing an individual (or instance). Defined globally.
DatatypeProperty
Association Reference pointer from one object instance to another. Defined only for the members of the class
Reference from one RDF resource to another. Defined globally.
ObjectProperty
Inheritance Methods and properties inherit down
Domains and ranges inherits up. OWL constraints inherit down.
Inference
© Copyright 2007-2009 TopQuadrant Inc. Slide 17
Semantic Web – OO Gotchas!
In the Semantic Web, you infer the class of an object.
The class of an object can change:over timewith what you know/believewith whom you trust
Properties are first-class objects (independent of classes!)Properties form hierarchies as well as classes
No behavior is described anywhere – only inferencing
Multiple set membership is commonplaceNo OO inheritance
© Copyright 2007-2009 TopQuadrant Inc. Slide 18
OWL / RDFS Preview (for Comparison)
owl:inverseOf ‘child inverseOf parent’Elizabeth has child Charles,
therefore Charles has parent Elizabeth
owl:transitivePropertyElizabeth has descendant Charles, Charles has descendant Andrew, therefore Elizabeth has descendant Andrew
rdfs:subPropertyOf ‘child subPropertyOf descendant’
Elizabeth has child Charles,therefore has descendant Charles
© Copyright 2007-2008 TopQuadrant Inc. Slide 19
Semantic vs Relational
Comparing Semantic Technology
and
Relational Database Technology
© Copyright 2007-2009 TopQuadrant Inc. Slide 20
Data in a spreadsheet
Person Company
Melli Annamalai Oracle
Xavier Lopez Oracle
Dean Allemang TopQuadrant
Mike Uschold Boeing
Ora Lassila Nokia
© Copyright 2007-2009 TopQuadrant Inc. Slide 21
Same data in a relational database
ID Name
1 Melli Annamalai
2 Xavier Lopez
3 Dean Allemang
4 Mike Uschold
5 Ora Lassila
ID Name
1 Oracle
2 TopQuadrant
3 Boeing
4 Nokia
1 12 13 24 35 4
Give each entity a number (a ‘key’) so you have an unambiguous
way to talk about it.
Give each entity a number (a ‘key’) so you have an unambiguous
way to talk about it.
Express relationships by
associating these numbers together in another table.
Express relationships by
associating these numbers together in another table.
Company Table
Person Table
Works For
© Copyright 2007-2009 TopQuadrant Inc. Slide 22
Same data in a relational database
Works For Company TablePerson Table
ID Name
1 Melli Annamalai
2 Xavier Lopez
3 Dean Allemang
4 Mike Uschold
5 Ora Lassila
ID Name1 Oracle2 TopQuadrant3 Boeing4 Nokia
1 12 13 24 35 4
CountryBased in
ID Name1 USA
2 Finland
1 12 13 14 2
© Copyright 2007-2009 TopQuadrant Inc. Slide 23
Suppose (as a simplification) that we know that people live in the country where their employer is based.
Query: List all the people and the countries where they live
SELECT p.name, s.name
FROM employment.person p,
employment.company c,
employment.country s,
employment.basedIn bi,
employment.worksFor wf
WHERE
wf.person = p.id AND
wf.company = c.id AND
bi.company = c.id AND
bi.country = s.id ;
Person p works for company c
Person p works for company c
It is the ‘person’ field of the worksFor table that
corresponds to the person
It is the ‘company’ field of the worksFor table that
corresponds to the company
© Copyright 2007-2009 TopQuadrant Inc. Slide 24
Same data in a graphMelli Annimalai
Xavier Lopez
Ora Lassila
TopQuadrant
Boeing
Nokia
label
Oracle
USA
Dean Allemang
Finland
Mike Uschold
worksFor (person, company)basedIn (company, country)
© Copyright 2007-2009 TopQuadrant Inc. Slide 25
Suppose (as a simplification) that we know that people live in the country where their employer is based.
Query: List all the people and the countries where they live
SELECT ?pname ?sname
WHERE
{ ?p :worksFor ?c.
?c :basedIn ?s.
?p rdfs:label ?pname .
?s rdfs?label ?sname .}
Person p works for company c
Person p works for company c
It is the ‘person’ field of the worksFor table that
corresponds to the person
The ?p that :worksFor something is a Person
(in the schema)
© Copyright 2007-2008 TopQuadrant Inc. Slide 26
RDFS and OWL are about Rich Relationships in Data
Information Given
X-Ray Machine located in Room A2003Room A2003 located in ATA Building
Relationship Model
X-Ray Machine located in ATA Building
Information Inferred
QuestionWhere is the X-Ray Machine located?
AnswerIn ATA Building
Where are the relationships?
“located in” is a transitiveProperty
Relationships are explicit in the model and directly available to applications!
© Copyright 2007-2009 TopQuadrant Inc. Slide 27
OWL has many Build-in Functions for modeling Relationships
Information Given
X-Ray Machine stored in Room A2003Room A2003 located in ATA Building
“located in” is a transitiveProperty“stored in” is a subPropertyOf “located in”
X-Ray Machine located in Room 2003X-Ray Machine located in ATA Building
Information Inferred
QuestionWhere is the X-Ray Machine located?
AnswerIn ATA Building
Still works!Also inverse, functional, local contextual constraints, …
Relationship Model
© Copyright 2007-2009 TopQuadrant Inc. Slide 28
Getting the same result with a relational database
ID Equipment NameIDQ X-Ray Machine
Room Building A2003 ATA Building
Room_ID EQ_IDA2003 IDQ
Equipment Table
Building Table
Room_Equipment Table
QuestionWhere is the X-Ray Machine located?
AnswerIn ATA Building
Develop a QuerySELECT Building
FROM Equipment Table, Building Table, Room_Equipment Table
WHERE Equipment Name = “X-Ray Machine” and ID = EQ_ID and Room = Room_ID
Relationships are in documents and in collective memories -not available to applications!
Where are the relationships?
Data Definition Statements? Applications do not use them, they are not descriptive and their scope is a single database
Data Dictionary? Data Registry? They are for human, not computer use
© Copyright 2007-2009 TopQuadrant Inc. Slide 29
Change happens…
X-Ray Machine located in Room A2003Room A2003 part of Dr Smith’s OfficeDr. Smith’s Office located in ATA Building
“located in” is a transitiveProperty“part of” is a subPropertyOf “located in”
Room A2003 located inDr. Smith’s Office
Room A2003 located in ATA Building
X-Ray Machine located in ATA Building
Information Inferred
QuestionWhere is the X-Ray Machine located?
AnswerIn ATA Building
Relationship Model
Room A2003 located in ATA Building
new
Information Given
© Copyright 2007-2009 TopQuadrant Inc. Slide 30
Accommodating change with RDB requires database changes AND new queries
Equipment Table
ID Equipment NameIDQ X-Ray Machine
Room Building A2003 ATA Building
Room_ID EQ_ID A2003 IDQ
Equipment Table
Building Table
Room_Equipment Table
ID Equipment NameIDQ X-Ray Machine
Business Name ID Building Dr Smith MDS ATA Building
Room ID Equipment IDA2003 IDQ
Room_ID Bus_IDA2003 MDS
Building Table
Room_Equipment Table
Room_Business Table
© Copyright 2007-2009 TopQuadrant Inc. Slide 31
Accommodating change with RDB requires database changes AND new queries
Equipment Table
Using the same query:
QuestionWhere is the X-Ray Machine located?
SELECT Building
FROM Equipment Table, Building Table, Room_Equipment Table
WHERE Equipment Name = “X-Ray Machine” and ID = EQ_ID and Room = Room_IDAnswer
?
ID Equipment NameIDQ X-Ray Machine
Business Name ID Building Dr Smith MDS ATA Building
Room ID Equipment IDA2003 IDQ
Room_ID Bus_IDA2003 MDS
Building Table
Room_Equipment Table
Room_Business Table
© Copyright 2007-2009 TopQuadrant Inc. Slide 32
Comparing two approaches
RDB Semantic Model (Ontology)
Ability to Answer
Questions
Database must be designedto answer the questionsSpecific, typically complex,
queries must be developed
Ontology must be designed to answer the questionsQueries can be generic and
very simple
Ability to Accommoda
te Change
Inflexible:Database structure must be modified so it can continue to answer the questionsQueries must be re-writtenData must be ported
Flexible:Ontology can be easily extended
so it can continue to answer new questions
No data porting required
Processing Speed
Can be very fast with proper tuning – mature technology:
Known optimization approaches
Certain queries, such as multi table joins and self joins are known to cause problems
Not as fast, but improving, tuning does not affect flexibility:
Adding more processing power and distributed computation helps
Performs better than RDBMS for certain query types
© Copyright 2007-2009 TopQuadrant Inc. Slide 33
Key differences in the representation of relationships
RDB Semantic Model (Ontology)
Cardinality of relationships
Relationships are either 1:1, many:1 or many:manyMany:many relationships must be
broken into many:1 relationships by creating join tables
By default all relationships are many:manyFunctional properties and
cardinality restrictions are used to specify 1:1, many:1 as well as other cardinalities
It is possible to specify, for example, 1:4 or min 2, etc.
Information bearing
relationships
Additional information about the relationship is represented by the extra columns in the join table
Relationship is reified (made into a class)Additional information is
represented as properties of the class
The nature of relationships
ImplicitEmbedded in the name of the
join table or in the name of the columnTypically these names are not
designed for ease of understanding of the nature of the relationship
ExplicitCare is taken to name a relationship
in a way that its nature and intentions are well understood
© Copyright 2007-2009 TopQuadrant Inc. Slide 34
Another Key Capability of Semantic Web Technology is Managing Distributed Data
Information does not have to be in a single data source
Information about equipment can be in one place
Information about buildings, addresses and doctor offices in a different place
RDF provides and infrastructure for merging and unifying data in a consistent way
© Copyright 2007-2009 TopQuadrant Inc. Slide 35
Relational and Semantic Technology can work well together: it is not necessarily one or the other!
Semantic ApplicationInteraction LogicApplication Logic
Semantic Interface
ERP
WS
SI
QueryOntology Models
Enterprise Ontology Models
Mapping Mapping Mapping
Schema Translation Models
WS
Semantic Hub
SI
IL
AL
BL
SI
IL
AL
BL PLM
SI
DataWarehouse
CRM
WS
WS
WS
Mapping
LegacySystems
TopQuadrant in collaboration with Jim Hendler presents: “Getting Ready for the Semantic Web with TopBraid Suite”
TopQuadrant Semantic Web Technology Training Series
Module IIa-4:Using Semantic Standards, Languages and TopBraid Tools for Modeling and Querying
Querying RDF with SPARQL
© Copyright 2007-2009 TopQuadrant Inc. Slide 2
Query Languages
Familiar query languages:SQLXQuery
SPARQL is similar to these in that it allows one to specify patterns of data
SPARQL differs in syntax, e.g., no notion of “JOIN”
© Copyright 2007-2009 TopQuadrant Inc. Slide 3
Extracting information from RDFRDF is a graph structureHow do I get information from it?
Where is Stratford?Who married the person who wrote King Lear??Did the person who wrote Hamlet live in a Hamlet?
© Copyright 2007-2009 TopQuadrant Inc. Slide 4
Querying some sample data
ID Title Written By Year1 The Tempest Shakespeare 16112 Romeo and Juliet Shakespeare 15953 As You Like It Shakespeare 15994 Edward II Marlowe 15925 Dido, Queen of Carthage Marlowe 15866 Eastward Ho Johnson 16057 A Game at Chess Middleton 16248 Sir Thomas More Munday 1592
9 The Tragical History of Doctor Faustus Marlowe 1604
Table: Play
© Copyright 2007-2009 TopQuadrant Inc. Slide 5
Same data viewed as a graph
Row1
Row 2
Row 3
Shakespeare
Johnson
Marlowe
Munday
Middleton
1611
15951599
15921586
1605
1624
1604
writtenBy
year
title
…
rdf:type
Play
The Tempest
Romeo and Juliet
As You Like It
Edward II Row 4
Dido Row 5
Eastward Ho Row 6
Game at Chess Row 7
Sir Thomas More Row 8
Dr. Faustus Row 9
© Copyright 2007-2009 TopQuadrant Inc. Slide 6
Titles of plays written by Shakespeare
Row 1
Row 2
Row 9
Row 8
Row 7
Row 6
Row 5
Row 4
Shakespeare
Johnson
Marlowe
Munday
Middleton
1611
15951599
15921586
1605
1624
1604
writtenBy
year
title
The Tempest
Romeo and Juliet
As You Like It Row 3
Edward II
Dido
Eastward Ho
Game at Chess
Sir Thomas More
Dr. Faustus
© Copyright 2007-2009 TopQuadrant Inc. Slide 7
Matching data with graph patterns
The Tempest Row1 Shakespeare
?x ?yThe Tempest Row 1Romeo and Juliet Row 2
?x ?yThe Tempest Row 1Romeo and Juliet Row 2As You Like It Row 3
?x ?yThe Tempest Row 1
Row 2Romeo and Juliet
As You Like It Row 3
?x ?y Shakesepeare
writtenBy
yearHow do we know that these things are plays?
What is a better heading for column “?x”?
title
© Copyright 2007-2009 TopQuadrant Inc. Slide 8
A more complex graph pattern
?x ?z ?a
The Tempest Shakespeare 1611
?x ?z ?a
The Tempesttt Shakespeare 1611
As You Like It Shakespeare 1599ike Itike It
?x ?z ?a
The Tempes Shakespeare 1611
As You L Shakespeare 1599
Dido Marlowe 1586ido Marl
Row 1
Row 2
Row 9
Row 8
Row 7
Row 6
Row 5
Row 4
Shakespeare
Johnson
Marlowe
Munday
Middleton
1611
15951599
15921586
1605
1624
1604
writtenByyear
title
The Tempest
Romeo and Juliet
Row 3As You Like It
Edward II
Dido
Eastward Ho
Game at Chess
Sir Thomas More
Dr. Faustus
?x ?z ?a
The Tempes Shakespeare 1611
As You L?y ?z
?a
?x Shakespeare 1599
D owe 1586
Thomas More Munday 1592
etc.
© Copyright 2007-2009 TopQuadrant Inc. Slide 9
Representing graph patterns
Graph patterns are represented just like graphs – in triples!
Variables allowed as well as resources:?x ?y :Shakespeare
?y :title ?x.?y :writtenBy :Shakespeare .
?title ?play :Shakespeare
writtenBy?play :title ?title.?play :writtenBy :Shakespeare . year
title
© Copyright 2007-2009 TopQuadrant Inc. Slide 10
More graph patterns
?z
?a
?x ?y
writtenBy
?y :title ?x.?y :writtenBy ?z.?y :year ?a.
year
title
?playwright
?year
?title ?y
writtenBy
?y :title ?title.?y :writtenBy ?playwright.?y :year ?year.
year
title
© Copyright 2007-2009 TopQuadrant Inc. Slide 11
More graph patterns
“The names of two playwrights who wrote plays in the same year”
?play1?playwright1
Play
?year
?play1 rdf:type :Play .?play2 rdf:type :Play .?play1 :writtenBy ?playwright1 .?play2 :writtenBy ?playwright2 .?play1 :year ?year . ?play2 :year ?year .
?play2 ?playwright2
writtenBy
year
title
rdf:type
This finds all plays, so need some way to state that ?playwright1 is different than ?playwright2 (FILTER)
© Copyright 2007-2009 TopQuadrant Inc. Slide 12
Building Queries from Data
“The names of two playwrights who wrote plays in the same year”
Play
Edward IIMarlowe
1592
?play1 rdf:type :Play .?play2 rdf:type :Play .?play1 :writtenBy ?playwright1 .?play2 :writtenBy ?playwright2 .?play1 :year ?year . ?play2 :year ?year .
Sir Thomas More Munday
writtenBy
year
title
rdf:typeThis is the basis of the automatic query builder in TopBraid Composer and Ensemble
© Copyright 2007-2009 TopQuadrant Inc. Slide 13
A more complex graph pattern
Row 1
Row 2
Row 9
Row 8
Row 7
Row 6
Row 5
Row 4
Row 3
Shakespeare
Johnson
Marlowe
Munday
Middleton
1611
15951599
15921586
1605
1624
1604
writtenByyeartitle
Play…
Play
rdf:type
The Tempest
Romeo and Juliet
As You Like It
Edward II
Dido
Eastward Ho
Game at Chess
Sir Thomas More
Dr. Faustus
?play1?playwright1
?year
?play2 ?playwright2
?playwright1 ?playwright2Marlowe Munday
© Copyright 2007-2009 TopQuadrant Inc. Slide 14
Components of a QuerySelection mode and variables
SELECT ?placeSELECT ?playwright1 ?playwright2CONSTRUCT {?playwright rdf:type :Person}
Triple patterns – like a triple, but with named variables instead of some parts
?play :title ?title .?play :playwright :Shakespeare .?play :year ?year .
SPARQL standard is written similar to N3
Filtersnumerical comparisons, calculations?year > 1600
?play1 rdf:type :Play .?play2 rdf:type :Play .?play1 :writtenBy ?playwright1 .?play2 :writtenBy ?playwright2 .?play1 :year ?year . ?play2 :year ?year .FILTER (?playwright1 !=
? l i h 2)
© Copyright 2007-2009 TopQuadrant Inc. Slide 15
Query SyntaxBring it all together to form a real query:
SELECT ?place WHERE { :Stratford :isIn ?place .}
SELECT ?spouse WHERE
{ ?spouse :married ?author .?author :wrote :KingLear .}
SELECT ?place WHERE{ ?author :wrote :Hamlet .?author :livedIn ?place .?place rdf:type geo:Hamlet . }
Where is Stratford?
Who married the person who wrote King Lear?
Did the person who wrote Hamlet live in a Hamlet?
Separates graph triplesBest practice: end every triple with a '.'
© Copyright 2007-2009 TopQuadrant Inc. Slide 16
Filters“Shakespearean plays written after 1600”
You can only filter on values you have matched in the WHERE clause!
First, find Shakespearean plays and the years they were written:
writtenByyeartitle
?title ?play Shakespeare
?year
SELECT ?title ?yearWHERE { ?play :title ?title .
?play :writtenBy :Shakespeare .?play :year ?year .
?title ?yearThe Tempest 1611Romeo and Juliet 1595As You Like It 1599
?title ?yearThe Tempest 1611
FILTER (?year > 1600)}
© Copyright 2007-2009 TopQuadrant Inc. Slide 17
Optional Patterns
“Titles and authors of all plays”Suppose we have some plays whose author is unknown:
ID title writtenBy year101 Maid’s metamporphosis 1600102 Revenger’s Tragedy 16074 Edward II Marlowe 1592
Must match… Optional match
?play ?author?titlewrittenBy
yeartitle
SELECT ?title ?authorWHERE {?play :title ?title .
OPTIONAL {?play :writtenBy ?author .}}
?title ?author
Maid’s metamporphosis
Revenger’s Tragedy
Edward II Marlowe
© Copyright 2007-2009 TopQuadrant Inc. Slide 18
Negation
“Titles of anonymous plays”
ID title writtenBy year101 Maid’s metamporphosis 1600102 Revenger’s Tragedy 1607
?play ?author?titlewrittenBy
yeartitle
SELECT ?title WHERE {?play :title ?title .
OPTIONAL {?play :writtenBy ?author .}FILTER (!bound (?author))
}
Filter out those for which the author was found using the filter function “bound”:
?titleMaid’s metamporphosisRevenger’s Tragedy
The variable ?author was not bound to a value
© Copyright 2007-2009 TopQuadrant Inc. Slide 19
Searching for Literals
Previous queries matched a variable (e.g. ?title)find something with a specific literal (xsd) value
Who wrote “A Game at Chess”?SELECT ?author
WHERE { ?play :title "A Game at Chess" .?play :writtenBy ?author .
}
Order matters: each graph pattern further restricts the overall match
SELECT ?authorWHERE { ?play :title "A Game at Chess"^^xsd:string .
?play :writtenBy ?author .} specify xsd type
SELECT ?authorWHERE {?play :title ?title .
FILTER (regex(?title, "a game", "i")) .}
Regular expression defined by XQUERY 1.0regex(?title, "[a-zA-Z]{2}[aeiou]s{2}$”)matches when end of string is “Chess” (as well as other string combinations)
© Copyright 2007-2009 TopQuadrant Inc. Slide 20
Union
Find all plays written by Shakespeare either in 1611 or 1595
SELECT ?play ?title ?yearWHERE {
{ ?play :writtenBy :Shakespeare .?play :title ?title .?play :year "1611"^^xsd:string .
}UNION{ ?play :writtenBy :Shakespeare ;
:year ?year .FILTER(regex(?year, "1595")) .
}}
?play ?title ?year:Play1 The Tempest
:Play2 1595
The query matches two different graph patterns:1.Matches ?play and ?title with specified :year2.Matches ?play and ?year filtered by ?year?play has different binding in each of the graph patterns
© Copyright 2007-2009 TopQuadrant Inc. Slide 21
Query Forms
SELECTreturns matches for specified variables
CONSTRUCTreturns a graph that includes triples constructed from specifiedvariables
ASKtrue if a match to the graph pattern is found
DESCRIBEreturns a graph (instead of selecting “columns”) – not supported by TBC
© Copyright 2007-2009 TopQuadrant Inc. Slide 22
ASK
ASK returns a Boolean valuebased on whether WHERE clause finds data
Most useful in programsSPINSPARQLMotion, JSP, etc.
Result displayed in status bar
Were any plays written after 1600?
© Copyright 2007-2009 TopQuadrant Inc. Slide 23
CONSTRUCT
SPARQL queries can build new sets of triples based on patternsWHERE clause exactly as beforeCONSTRUCT returns a triple pattern
Subtly powerful:Under these conditions, add these triplesBasically what reasoners do (datalog)
plays are written by playwrights
CONSTRUCT {?person rdf:type :playwright}WHERE { ?play :writtenBy ?person .
?play rdf:type :Play .}
© Copyright 2007-2009 TopQuadrant Inc. Slide 24
Constructing new triples with SPARQL
Play 1 Shakespeare
Play 2
Play 9
Play 8
Play 7
Play 6
Play 5
Play 4
Play 3Johnson
Marlowe
Munday
Middleton
Playwright
Play
writtenBy
rdf:type
SELECT CONSTRUCT
?person ?personSame binding re-used from body to head
Play ?play
Playwright
TopQuadrant in collaboration with Jim Hendler presents: “Getting Ready for the Semantic Web with TopBraid Suite”
TopQuadrant Semantic Web Technology Training Series
Module IIa-3:Using Semantic Standards, Languages and TopBraid Tools for Modeling and Querying
SKOS – Simple Knowledge Organization System
© Copyright 2007-2009 TopQuadrant Inc. Slide 2
What is a Knowledge Organization System?
A means of organizing and communicating terminology about a domain
Controlled VocabularyTaxonomy Thesaurus
Thesaurus standardsNISO Z39.19, ISO 2788-1986(E), ISO 5964-1985(E)Provide for relationships between terms• BT, NT, RT• Preferred terms, alternate terms
Monolingual, Multilingual, hierarchical
© Copyright 2007-2009 TopQuadrant Inc. Slide 3
Vocabulary Uses
Tagging, e.g.:del.icio.us, FlickrWest Key Number SystemLibrary of Congress etc.
Entity Extraction, e.g.:Calais
Allow one set of expert users (‘catalogers’) to inform another set (information customers) to communicate about the content of information items(documents, photos, web pages, etc.)
Data ModelingData dictionaries, schemasSpreadsheet columns
Supports consistency in data modeling, promoting interoperability.
© Copyright 2007-2009 TopQuadrant Inc. Slide 4
Agreeing on terms
Why can’t everyone just agree to use terms the same way?
… Everyone has their own business processes and stakes.
Disagreement is legitimate!
© Copyright 2007-2009 TopQuadrant Inc. Slide 5
Simple Knowledge Organization System
SKOS is the RDF standard for Knowledge Organization SystemsAllows different groups to organize terminology…… while allowing them to link to one another
“The St. Louis notion of ‘Customer’ is broader than the New York notion of ‘Customer’”
You can’t even have this discussion, if you don’t recognize that there are actually two contexts for the word
“Customer”!
© Copyright 2007-2009 TopQuadrant Inc. Slide 6
Mapping terms
broader match
“Customer”“Customer”
© Copyright 2007-2009 TopQuadrant Inc. Slide 7
SKOS Resources
Qname label Abbreviation(where applicable)
skos:broader “has broader term” BT
skos:narrower “has narrower term” NT
skos:related “has related term” RT
skos:broadMatch “has broader match”
skos:narrowMatch “has narrower match”
skos:prefLabel “preferred label”
skos:altLabel “alternative label”