semantic web: introduction & overview
DESCRIPTION
A lecture/conversation focusing on the first 12 years of Semantic Web - delivered on February 21, 2012. See http://j.mp/SWIntro for more details. More detailed course material is at http://knoesis.org/courses/web3/TRANSCRIPT
1
Semantic Web: intro & overview
A conversation with students – Feb 21, 2012
Amit Sheth http://knoesis.org/amit
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, OH, USA
What are two of the most important software success
stories of 2012?
Apple’s SiriIBM’s Watson
What are common technologies?
Just stepping back a bit
Semantic technologies in the mainstream
• Microsoft purchased Powerset in 2008 • Apple purchased Siri [Apr 2010]
– “Once Again The Back Story Is About Semantic Web”
• Google buys Metaweb [June 2010]...” Google Snaps Up Metaweb in Semantic Web Play” – Now see: “Google Knowledge Graph Could Change Search Forever”
• Facebook OpenGraph, Twitter annotation …”another example of semantic web going mainstream” “Google, Twitter and Facebook build the semantic web”
5
• RDFa adoption ….Search engines (esp Bing) are about to introduce domain models and (all) use of background knowledge/structured databases with large entity bases
• Bing, Yahoo! and Google announced schema.org
A bit of history• Semantics with metadata and ontologies for heterogeneous
documents and multiple repositories of data including the Web was discussed in 1990s (semantic information brokering, faceted search, InfoHarness, SIMS, Ariadne, OBSERVER, SHOE, MREF, InfoQuilt, …). Also DAML and OIL.
• Tim Berners-Lee used “Semantic Web” in his 1999 book• I had founded a company Taalee in 1999, gave a keynote on
Semantic Web & commercialization in 2000 and filed for a patent in 2000 (awarded 2001).
• Well known TBL, Hendler, Lassila paper in Scientific American took AI-ish approach (agents,…) to Semantic Web
• First 5 years saw too much of AI/DL, but more practical/applied work has dominated recently
Different foci• TBL – focus on data: Data Web (“In a way, the Semantic
Web is a bit like having all the databases out there as one big database.”)
• Others focus on reasoning and intelligent processing
123of
Semantic Web
1
• Ontology: Agreement with a common vocabulary/nomenclature, conceptual models and domain Knowledge
• Schema + Knowledge base • Agreement is what enables interoperability• Formal description - Machine processability is
what leads to automation
2
• Semantic Annotation (Metadata Extraction): Associating meaning with data, or labeling data so it is more meaningful to the system and people.
• Can be manual, semi-automatic (automatic with human verification), automatic.
From Syntax to Semantics
Shallow semantics
Deep semantics
Expr
essi
vene
ss,
Rea
soni
ng
Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics
3
• Reasoning/Computation: semantics enabled search, integration, answering complex queries, connections and analyses (paths, sub graphs), pattern finding, mining, hypothesis validation, discovery, visualization
Semantic Web Stack
• Web of Linked Data• Introduced by Berners Lee
et. al as next step for Web of Documents
• Allow “machine understanding” of data,
• Create “common” models of domains using formal language - ontologies
Layer cake image source: http://www.w3.org; see W3C SW publications
Semantic Web Layer Cake
Characteristics of Semantic Web
15
SelfDescribing
Machine &HumanReadable
Issued bya TrustedAuthority
Easy toUnderstand
ConvertibleCan beSecured
The Semantic Web:XML, RDF & Ontology
Adapted from William Ruh (CISCO)
• Resource Description Framework – Recommended by W3C for metadata modeling [RDF]
• A standard common modeling framework – usable by humans and machine understandable
Resource Description Framework
IBM
Armonk, New York, United States
Zurich, Switzerland
Location
CompanyHeadquarters located in
Research lab located in
RDF/OWL slides From: Semantic Web in Health Informatics (thanks: Satya)
• RDF Tripleo Subject: The resource that the triple is abouto Predicate: The property of the subject that is described by the tripleo Object: The value of the property
• Web Addressable Resource: Uniform Resource Locator (URL), Uniform Resource Identifier (URI), Internationalized Resource Identifier (IRI)
• Qualified Namespace: http://www.w3.org/2001/XMLSchema# as xsd:o xsd: string instead of
http://www.w3.org/2001/XMLSchema#string
RDF: Triple Structure, IRI, Namespace
IBM Armonk, New York, United States
Headquarters located in
• Two types of property values in a tripleo Web resourceo Typed literal
RDF Representation
IBM Armonk, New York, United States
Headquarters located in
IBMHas total employees “430,000”
^^xsd:integer
• The graph model of RDF: node-arc-node is the primary representation model
• Secondary notations: Triple notationo companyExample:IBM companyExample:has-
Total-Employee “430,000”^^xsd:integer .
• RDF Schema: Vocabulary for describing groups of resources [RDFS]
RDF Schema
IBM Armonk, New York, United States
Headquarters located in
Oracle
Redwood Shores, California, United States
Headquarters located in
Company
Geographical Location
Headquarters located in
• Property domain (rdfs:domain) and range (rdfs:range)
RDF Schema
Headquarters located in
Company
Domain Range
Geographical Location
• Class Hierarchy/Taxonomy: rdfs:subClassOf
rdfs:subClassOf
Computer Technology Company
SubClass (Parent) Class
Company
Banking CompanyInsurance Company
Ontology: A Working Definition• Ontologies are shared conceptualizations of a
domain represented in a formal language*• Ontologies:
o Common representation model - facilitate interoperability, integration across different projects, and enforce consistent use of terminology
o Closely reflect domain-specific details (domain semantics) essential to answer end user
o Support reasoning to discover implicit knowledge* Paraphrased from Gruber, 1993
Expressiveness Range: Knowledge Representation
and Ontologies
Catalog/ID
GeneralLogical
constraints
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance
Value Restriction
Disjointness, Inverse,part of…
Ontology Dimensions After McGuinness and Finin
SimpleTaxonomies
ExpressiveOntologies
WordnetCYCRDF DAML
OODB Schema RDFS
IEEE SUOOWLUMLS
GO
KEGG TAMBIS
EcoCyc
BioPAX
GlycOSWETO
Pharma
• A language for modeling ontologies [OWL]
• OWL2 is declarative• An OWL2 ontology (schema) consists of:
o Entities: Company, Persono Axioms: Company employs Persono Expressions: A Person Employed by a Company =
CompanyEmployee• Reasoning: Draw a conclusion given certain
constraints are satisfiedo RDF(S) Entailmento OWL2 Entailment
OWL2 Web Ontology Language
• Class Disjointness: Instance of class A cannot be instance of class B
• Complex Classes: Combining multiple classes with set theory operators:o Union: Parent = ObjectUnionOf (:Mother :Father)o Logical negation: UnemployedPerson =
ObjectIntersectionOf (:EmployedPerson)o Intersection: Mother = ObjectIntersectionOf
(:Parent :Woman)
OWL2 Constructs
• Property restrictions: defined over property• Existential Quantification:
o Parent = ObjectSomeValuesFrom (:hasChild :Person)o To capture incomplete knowledge
• Universal Quantification:o US President = objectAllValuesFrom (:hasBirthPlace
United States)• Cardinality Restriction
OWL2 Constructs
SPARQL: Querying Semantic Web Data
• A SPARQL query pattern composed of triples• Triples correspond to RDF triple structure, but
have variable at:o Subject: ?company ex:hasHeadquaterLocation ex:NewYork.o Predicate: ex:IBM ?whatislocatedin ex:NewYork.o Object: ex:IBM ex:hasHeadquaterLocation ?
location.• Result of SPARQL query is list of values – values
can replace variable in query pattern
SPARQL: Query Patterns
• An example query patternPREFIX ex:<http://www.eecs600.case.edu/>SELECT ?company ?location WHERE{?company ex:hasHeadquaterLocation ?location.}• Query Result
company location
IBM NewYork
Oracle RedwoodCity
MicorosoftCorporation Bellevue
MultipleMatches
SPARQL: Query Forms
• SELECT: Returns the values bound to the variables• CONSTRUCT: Returns an RDF graph• DESCRIBE: Returns a description (RDF graph) of a
resource (e.g. IBM)o The contents of RDF graph is determined by SPARQL
query processor• ASK: Returns a Boolean
o Trueo False
a little bit about ontologies
Open Biomedical Ontologies
http://bioportal.bioontology.org/ , http://obo.sourceforge.net/
Many Ontologies Available Today
From simple ontologies
Drug Ontology Hierarchy (showing is-a relationships)
owl:thing
prescription_drug
_ brand_na
me
brandname_unde
clared
brandname_comp
osite
prescription_drug
monograph_ix_cla
ss
cpnum_ group
prescription_drug
_ property
indication_
property
formulary_
property
non_drug_
reactant
interaction_proper
ty
property
formulary
brandname_indivi
dual
interaction_with_prescriptio
n_drug
interaction
indication
generic_ individua
l
prescription_drug_ generic
generic_ composit
e
interaction_ with_non_ drug_react
ant
interaction_with_monograph_ix_class
to complex ontologies
N-Glycosylation metabolic pathway
GNT-Iattaches GlcNAc at position 2
UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=>
UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2
GNT-Vattaches GlcNAc at position 6
UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021
N-acetyl-glucosaminyl_transferase_VN-glycan_beta_GlcNAc_9N-glycan_alpha_man_4
A little bit about semantic metadata extractions and
annotations
WWW, EnterpriseRepositories
METADATA
EXTRACTORS
Digital Maps
NexisUPIAP
Feeds/Documents
Digital Audios
Data Stores
Digital Videos
Digital Images. . .
. . . . . .
Create/extract as much (semantics)metadata automatically as possible;
Use ontlogies to improve and enhanceextraction
Extraction for Metadata Creation
Automatic Semantic Metadata Extraction/Annotation
Semantics & Semantic Web in 1999-2002
Sample applications
• Early Semantic Search, use baby steps of today’s engines
• Enterprise applications – healthcare & life sciences, financial, security
• Driving the innovation with new types of data: sensor (Semantic Sensor Web), social (Semantic Social Web), semantic IoT/WoT
BLENDED BROWSING & QUERYING INTERFACE
ATTRIBUTE & KEYWORDQUERYING
uniform view of worldwide distributed assets of similar type
SEMANTIC BROWSING
Targeted e-shopping/e-commerce
assets access
Taalee Semantic/Faceted Search & Browsing (1999-2001)
Search for company ‘Commerce One’
Links to news on companies that compete against Commerce One
Links to news on companies Commerce One competes against
(To view news on Ariba, click on the link for Ariba)
Crucial news on Commerce One’s competitors (Ariba) can
be accessed easily and automatically
Semantic Search/Browsing/Directory (2001-….)
System recognizes ENTITY & CATEGORY
Relevant portionof the Directory is automatically presented.
Semantic Search/Browsing/Directory (2001-….)
Users can exploreSemantically related
Information.
Semantic Search/Browsing/Directory (2001-….)
Focused relevantcontent
organizedby topic
(semantic categorization)
Automatic ContentAggregationfrom multiple
content providers and feeds
Related relevant content not
explicitly asked for (semantic
associations)
Competitive research inferred
automatically
Automatic 3rd party content
integration
Equity Research Dashboard with Blended Semantic Querying and Browsing
Semagix Freedom for building ontology-driven information system
Extracting Semantic Metadata from Semistructured and Structured Sources (1999 – 2002)
Managing Semantic Content on the Web
Ontology
Semantic Query Server
1. Ontology Model Creation (Description) 2. Knowledge Agent Creation
3. Automatic aggregation of Knowledge4. Querying the Ontology
Ontology Creation and Maintenance Steps
© Semagix, Inc.
472004 SEMAGIX
Watch list Organization
Company
Hamas
WorldCom
FBI Watchlist
Ahmed Yaseer
appears on Watchlistmember of organization
works for Company
Ahmed Yaseer:• Appears on Watchlist
‘FBI’
• Works for Company ‘WorldCom’
• Member of a banned organization’
Semantic Associations - Connecting the Dots
Global Investment Bank
Fraud Prevention application used in financial services – Related KYC application is deployed at Majority of Global Banks
User will be able to navigate the ontology using a number of different interfaces
World Wide Web content
Public Records
BLOGS,RSS
Un-structure text, Semi-structured Data
Watch ListsLaw
Enforcement Regulators
Semi-structured Government Data
Scores the entity based on the content and entity relationships
EstablishingNew Account
Fast forward to 2005-2006
Semantic Web + Clinical Practice Informatics = Active Semantic Electronic Medical Record (ASEMR)
Operationally deployed in January 2006, in use (as of 2012)
ASEMR: SW application in useIn daily use at Athens Heart Center
– 28 person staff• Interventional Cardiologists• Electrophysiology Cardiologists
– Deployed since January 2006– 40-60 patients seen daily– 3000+ active patients– Serves a population of 250,000 people
Information Overload in Clinical Practice
• New drugs added to market– Adds interactions with current drugs– Changes possible procedures to treat an illness
• Insurance Coverage's Change– Insurance may pay for drug X but not drug Y even
though drug X and Y are equivalent– Patient may need a certain diagnosis before some
expensive test are run• Physicians need a system to keep track of ever
changing landscape
Active Semantic Document (ASD)A document (typically in XML) with the following features:
• Semantic annotations– Linking entities found in a document to ontology– Linking terms to a specialized lexicon [TR]
• Actionable information– Rules over semantic annotations– Violated rules can modify the appearance of the document (Show an alert)
Active Semantic Patient Record
• An application of ASD• Three Ontologies
– PracticeInformation about practice such as patient/physician data
– DrugInformation about drugs, interaction, formularies, etc.
– ICD/CPTDescribes the relationships between CPT and ICD codes
• Medical Records in XML created from database
Active Semantic Electronic Medical Record App
In Use Today at Athens Heart Center For Clinical Decision Support since January 2006
Amit P. Sheth, S. Agrawal,Jonathan Lathem, Nicole Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Proc. of the 5th International Semantic Web Conference, 2006
Demo of ASEMR and other applications
http://knoesis.org/showcasehttp://archive.knoesis.org/library/demos/
Benefits of ASEMR
• Error prevention (drug interactions, allergy)– Patient care– insurance
• Decision Support (formulary, billing)– Patient satisfaction– Reimbursement
• Efficiency/time– Real-time chart completion– “semantic” and automated linking with billing
Using large data sets for Structured Data on the web:
Linked Open Data – samples from 2005 to 2010
Linked Open DataPublish Open Data Sets in RDFBy 2010, 203 data data sets25 billion Triples
Image: http://richard.cyganiak.de/2007/10/lod/
You publish the raw data…
Ivan Herman, "Semantic Web Adoption and Application”, http://www.w3.org/People/Ivan/CorePresentations/Applications/
… and others can use it
Ivan Herman, "Semantic Web Adoption and Application”, http://www.w3.org/People/Ivan/CorePresentations/Applications/
Using the LOD to build Web site: BBC
Ivan Herman, "Semantic Web Adoption and Application”, http://www.w3.org/People/Ivan/CorePresentations/Applications/
Using the LOD to build Web site: BBC
Ivan Herman, "Semantic Web Adoption and Application”, http://www.w3.org/People/Ivan/CorePresentations/Applications/
GoodRelations Ontology - RDFa
Ivan Herman, "Semantic Web Adoption and Application”, http://www.w3.org/People/Ivan/CorePresentations/Applications/
GoodRelations Ontology - RDFa
Ivan Herman, "Semantic Web Adoption and Application”, http://www.w3.org/People/Ivan/CorePresentations/Applications/
GoodRelations Ontology - RDFa
Ivan Herman, "Semantic Web Adoption and Application”, http://www.w3.org/People/Ivan/CorePresentations/Applications/
Fast forward to 2010-2011
Schema.org
Shared Vocabulary
Amazing things can happen
Will give some on-line examples
Twitris: Semantic Social Web Mash-upSelect topic
Select date
Topic tree
Spatial Marker
N-gram summaries
Wikipedia articles
Reference news
Related tweets
Images & Videos
Tweet trafficSentiment Analysis
More: TWITRIS
Web (and associated computing) is evolving
Web of pages - text, manually created links - extensive navigation
2007
1997Web of databases - dynamically generated pages - web query interfaces
Web of resources - data, service, data, mashups - 4 billion mobile computing
Web of people, Sensor Web - social networks, user-created casual content - 40 billion sensors, 500M+ FB users, 1B tweets/wk
Web as an oracle / assistant / partner - “ask the Web”: using semantics to leverage text + data + services - Powerset
Sem
antic
Tec
hnol
ogy
Use
d
Computing for Human Experience
Keywords
Patterns
Objects
Situations,Events
Enhanced Experience,Tech assimilated in life
Structured text (Scientific
publications / white papers)
Experimental Results Clinical Trial Data
Public domain knowledge (PubMed)
Metadata Extraction/Semantic Annotations
Ontologies/Domain Models/
Knowledge
Meta data / Semantic Annotations
Semantic Search/Browsing/Personalization/Analysis, Knowledge Discovery,Visualization,Situational Awareness
Big data
Search and browsing
Patterns / Inference / Reasoning
2D-3D & Immersive Visualization, Human Computer Interfaces
Impacting bottom line
Knowledge discovery
Migraine
Stress
Patient
affects
isaMagnesium
Calcium Channel Blockers
inhibit
SEMANTICS, MEANING PROCESSING
71
Semantics as core enabler, enhancer @ Kno.e.sis
Ohio Center of Excellence in Knowledge-enabled
Computing
one of the two largest academic
groups in Semantic Web;
multidisciplinary
Take Home Message (Cont.)
Semantics play a key role in refering "meaning" behind the data. Requires progress from keywords -> entities -> relationships -> events, from raw data to human-centric abstractions.
Take Home Message (Cont.)
Wide variety of semantic models and KBs (vocabularies, social dictionaries, community created semi-
structured knowledge, domain-specific datasets, ontologies) empower semantic solutions. This can lead to Semantic Scalability – scalability that is meaningful to human activities and decision making.
Interested in more?Kno.e.sis Wiki for the following and more:• Computing for Human Experience• Continuous Semantics to Analyze Real-Time Data• Semantic Modeling for Cloud Computing• Citizen Sensing, Social Signals, and Enriching Human Experience• Semantics-Empowered Social Computing• Semantic Sensor Web • Traveling the Semantic Web through Space, Theme and Time • Relationship Web: Blazing Semantic Trails between Web Resources • SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups• Semantically Annotating a Web Service
Tutorials: Semantic Web:Technologies and Applications for the Real-World (WWW2007)Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications (WWW2011)
Partial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and DAGSI (Semantic Sensor Web), Microsoft Research (Semantic Search) and IBM Research (Analysis of Social Media Content),and HP Researh (Knowledge Extraction from Community-Generated Content).
76
http://knoesis.org
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA
Vision Paper: Computing for Human Experience:http://wiki.knoesis.org/index.php/Computing_For_Human_Experience
Future: Computing for Human Experience