Semantic Web Technologies
LectureDr. Harald Sack
Hasso-Plattner-Institut für IT Systems EngineeringUniversity of Potsdam
Winter Semester 2012/13
Lecture Blog: http://semweb2013.blogspot.com/This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0)
Dienstag, 22. Januar 13
Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
2 1. Introduction 2. Semantic Web - Basic Architecture
Languages of the Semantic Web - Part 1
3. Knowledge Representation and LogicsLanguages of the Semantic Web - Part 2
4. Applications in the ,Web of Data‘
Semantic Web Technologies Content
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
3
Ontolo
gical
Engine
ering
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
4
Linked
Data
Applic
ations
&
Semant
ic Sea
rch
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
4. Applications in the Web of Data4.1.Ontological Engineering4.2.Linked Data Engineering 4.3.Semantic Search
Semantic Web Technologies Content
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamTurmbau zu Babel, Pieter Brueghel, 1563
How do we get Data from the Web...?
4.1 Linked Data Engineering4.1.1 APIs vs. Linked Data4.1.2 Linked Data Principles4.1.3 Linked Data @ Work
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
7
How to get Data from the Web?
•Data can only be found on the Web, if it is available at some website
Database
Web-Server
JDBC
HTTPHTML
Browser
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
8
How to get Data from the Web?•There is a anumber of different (proprietary) Web APIs, data exchange formats and Mashups on top of that
Database 1
WebAPI 1
WebAPI 2
WebAPI 3
WebAPI 4
Database 2 Database 3 Database 4
Mashup
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
9
In the Web today...
•Data is locked up in small data islands •Other applications usually cannot acces this data...
Database
Database
Database
Database
Database
Database
Database
Database
Database
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
10
http://www.w3.org/2009/Talks/0204-ted-tbl/#(22)
Problems ahead....
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
11
But there is a solution:•...open up proprietary data islands•...publish all data that are of public interest
•...in a way that •other applications can access, utilize, and process this data,and
•all applications can access additional(meta)data for the available data
Database 1 Database 2 Database 3
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
12
•Apply semantic technologies:•to publish structured data on the web•to draw connections from one data source to data from other data sources
Database 1 Database 2 Database 3 Database 4
RDF Data RDF Data RDF Data RDF Data
RDF Links
RDF Links
RDF Links
But there is a solution:
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
13
4.1 Linked Data Engineering4.1.1 APIs vs. Linked Data4.1.2 Linked Data Principles4.1.3 Linked Data @ Work
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
14
Linked Data and the ‘Web of Data‘
■Term refers to an idea originally from Tim Berners-Lee(Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html)
□Set of best practices for publication and linking of structured data on the web□Basic assumption: The value of data on the web increases when
they are connected to other data sources
M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare.net/mediasemanticweb/quick-linked-data-introduction
The Web of data is abouta dataand namingmodel on the Web
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
15
Linked Data Principles(1) Use URIs as names for things.(2) Use HTTP URIs, so that people can look up those
names.(3) When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)(4) Include links to other URIs, so that they can discover
more things.
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
16
Linked Data Principles
(1) Use URIs as names for things.
• URIs do not only identify documents but also arbitrary objects of the real world as well as abstract concepts
http://dbpedia.org/resource/Albert_Einstein
http://musicbrainz.org/artist/b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d
http://semweb2013.blogspot.com
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
17
Linked Data Principles
(2) Use HTTP URIs, so that people can look up those names.
• HTTP URIs (URLs) as globally unique names enable dereferencing of assiciated information in the Web
• via http Content Negotiation• 303 URIs
http Response Code 303 ,See Other‘ (redirect)
• Hash URIshttp://example.com/Harald#me
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
18
Linked Data for Humans and Computers
■URI should deliver information as well as for humans as for computers, i.e.
URI
Accept: application/rdf+xml Accept: text/html
(Thing)
(RDF data) (HTML page)
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
19
■Server delivers different HTTP responses dependent ofHTTP-Accept-Header (Content Negotiation)
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
Linked Data for Humans and Computers
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
20
■URI should deliver information as well as for humans as for computers, i.e.
Accept: application/rdf+xml Accept: text/html
(Thing)
(RDF data) (HTML page)
http://dbpedia.org/resource/Ernest_Hemingway
http://dbpedia.org/data/Ernest_Hemingway.rdf
http://dbpedia.org/page/Ernest_Hemingway
Linked Data for Humans and Computers
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
21
Linked Data Principles
(3) When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
• RDF as universal data model for publishing structured data on the Web
• Make all URIs in the RDF graph dereferencable• Avoid RDF constructs that cause problems in Linked Data
context
• RDF Reification• RDF Collections und Containers• unnamed Blank Nodes
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
22
Linked Data Principles
(4) Include links to other URIs, so that they can discover more things.
• Link RDF references among data between different data sources, to find information related by content
• Relationship LinksLinks to external LOD Entitites related with the original entity
• Identity LinksLinks to external LOD Entities referring to the same object or concept
• Vocabulary LinksLinks to definitions of the original entity
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
23
The application of the Linked Data Principles leads to a ,Web of Data‘
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
24
Development of the ,Web of Data‘
May 2007
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
25
Nov 2007
Development of the ,Web of Data‘
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
26
Development of the ,Web of Data‘
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
27
July 2009
Development of the ,Web of Data‘
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
28
September 2010
Development of the ,Web of Data‘
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
29
September 2011
300 Datasets 31B RDF Triples504M Links
Development of the ,Web of Data‘
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
30
□Semantic Mashups are applications that use linked RDF data from various data sources
□ in difference to interfaces and exchange formats or ordinary Web APIs, Linked Data offers the following benefits:□ a flexible and standardized data format (RDF)□ standardized access mechanism (http)□ possibility to put links (RDF-Links) among different data sources
» enables navigation» is supported by search engines (Crawler)» enables expressive search facilities over the crawled data
and beyond
S. Auer, J. Lehmann, Ch. Bizer: Semantitsche Mashups auf Basis vernetzter Daten, in T. Pellegrini, A. Blumauer (Hrsg.): Social Semantic Web, Springer, 2009.
Semantic Mashups
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
31
Linked Data Sources in the Web
□Native publication□ D2R-Server, OpenLink Virtuoso, Pubby, etc.
□ Implementation of Wrappers around existing applications / APIs□SIOC Exporter for Wordpress, Drupal, phpBB,...□RDF Book Mashup (Amazon API, Google Base-API,...)
□Linking Open Data Project□Semantic Web Education and Outreach W3C working group □Catalogue of all known sources of linked data with an open
source license» DBPedia, Flickr, Open-Cyc, FOAF, SIOC, GeoNames, ...
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
32
Browser for Linked Data
■Differences to arbitrary RDF-Browsers□RDF Data to be visualized does not necessarely reside in
local repository, but is distributed in the Web□ requires dynamic reload of RDF resources■Tabulator (Tim Berners-Lee, MIT-)
(T. Berners-Lee et al.: Tabulator: Exploring and analyzing linked data on the semantic web, in Proc. 3rd Int. Semantic Web User Interaction Workshop, 2006, http://swui.semanticweb.org/swui06/papers/Berners-Lee/Berners-Lee.pdf)
■ OpenLink RDF Data Explorer□enables visualization as graph, timeline, map, etc.
http://ode.openlinksw.com/
■Zitgist Browserhttp://browser.zitgist.com/
■DISCO Browserhttp://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/disco/
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
33
Search Engines for Linked Data■Crawler-based, follow links in datasets to create an index that
can be queried
■Swoogle□ keyword-based full text searcg (Apache-Lucene), uses only limited
semantic annotationhttp://swoogle.umbc.edu/
■ Semantic Web Search Engine (SWSE)□ additionally uses rdf:type properties as search filter
http://swse.deri.org/
■Sindicehttp://www.sindice.com/
■ Falcons□with data browser for result analysis
http://iws.seu.edu.cn/services/falcons/
■Sig.ma - Semantic Information Mashup (based on Sindice)http://sig.ma/
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
34
http://dbpedia.neofonie.com
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
35
Linked Open Data■ public Linked Data Resourcen in the Web, licensed as
„Creative Common CC-BY“ ■ 5-Star Criteria for Linked Open Data
Available on the web (whatever format) but with an open licence, to be Open Data
Available as machine-readable structured data (e.g. excel instead of image scan of a table)
as (2) plus non-proprietary format (e.g. CSV instead of excel)
All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
All the above, plus: Link your data to other people’s data to provide context
★
★ ★
★ ★ ★★ ★ ★ ★
★ ★ ★ ★ ★
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
36
Linked Open Data
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
37
4.1 Linked Data Engineering4.1.1 APIs vs. Linked Data4.1.2 Linked Data Principles4.1.3 Linked Data @ Work
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
38
Linked Data □ordered by categories
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
39
Linked Data
Media
User Generated Content
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
40
Linked Data Publications
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
41
Government
Linked Data
Geographic
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
42
Life Sciences Linked Data Cross-Domain
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
43
Linking Open Data■Some statistics (as of 09/2011)
distribution of RDF Triples by domain
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
44
Linking Open Data■Some statistics (as of 09/2011)
distribution of Links by domain
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
45
Linked Data Ontologien □Ontologies hold the Linked Data Cloud together
Dienstag, 22. Januar 13
Linked Data Ontologien □ z.B. OWL
□owl:sameAs connects identical individuals□owl:equivalentClass connects equivalent classes
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
46
Dienstag, 22. Januar 13
Linked Data Ontologien □ z.B. umbel (version 1.0, Feb. 2011)
□ „Upper Mapping and Binding Exchange Layer“□Subset of OpenCyc
as RDF Triples based on SKOS and OWL2□Upper Ontology with 28.000
concepts (skos:Concept)□46.000 Mappings into
DBpedia, geonames e.a.(owl:equivalentClass, rdfs:subClassOf)□Links to more than 2 Mio Wikipedia
pages
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
47
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
48
Linked Data Ontologien □ z.B. SKOS
□ „Simple Knowledge Organization System“□based on RDF and RDFS □applied for definitions and mappings
of vocabularies and ontologies□skos:Concept (clsses)□skos:narrower
□skos:broader
□skos:related
□skos:exactMatch, skos:narrowMatch,skos:broadMatch, skos:relatedMatch
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
49
Linked Data (Research) Applications □WhoKnows
http://apps.facebook.com/whoknows_/
□RISQ!http://141.89.225.43/whoknowsmovies/game.html
□ for Data Cleansing□ for relevance ranking of facts□ for entity summarization
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
50
4.1 Linked Data Engineering4.1.1 APIs vs. Linked Data4.1.2 Linked Data Principles4.1.3 Linked Data @ Work
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
4. Applications in the Web of Data4.1.Ontological Engineering4.2.Linked Data Engineering 4.3.Semantic Search
Semantic Web Technologies Content
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
52
Semant
ic
Search
Albrecht Dürer: Melancholia I, 1514
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
53
4.3 Semantic Search4.3.1 Information Retrieval 4.3.3 Semantic Analysis and Retrieval4.3.4 Exploratory Search
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
54
The ,Google Dilemma‘Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
55
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
56
Classical Information Retrieval
(nach Salton,G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983)
Set of Documents
files of records
Set of Queries
Information requests
indexing language
similarity
indexingQueryFormulation
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
57
Classical Information Retrieval(simplified version)
Set of documents
search index
searching, vb. , in allen ger n sprachen bezeugt: got.sokjan, ags. sēcan, as. sokian, an. Soekj
[Bd. 20, Sp. 835]
sēza, ahd. suohhan. aus idg. sprachen steht am nächsten lat. sāgiospüre, air. saigim gehe
einer sache nach, suche; zur weiteren verwandtschaft vgl. Walde-Pokorny 2, 449.
der umlaut des stammvokals erscheint im nd., er wird im md. verzeichnet vonCrecelius
oberhess. wb. 827; Spiess henneb. id. 248; Hertel Thüringen240; Gerbet Vogtland 425
und auf kolonialem boden bei Schröerdeutsche mundarten des ungrischen
berglandes 225. neben eigentlichem suchen 'einer sache
nachspüren, sich bemühen, sie aufzufinden' (dann auch 'jemanden
aufsuchen, ihn bedrohen, angreifen') steht eine reich bezeugte bedeutungsgruppe mehr
keywords
„search“?
search query
search term(s)
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
58
relevant documents retrieved documents
relevant documents that have been retrieved
RP
Recall=| R ∩ P |
|R|
Precision=| R ∩ P |
|P|
Fα=(1+α)⋅(Recall ⋅ Precision )
α⋅(Recall + Precision )
Evaluation of Information Retrieval Systems
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
59
Search Engines in the Web
• The World Wide Web is a distributed hypermedia system with•multimedia documents and• linked via hyperlinks
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
60
URL list
http://www.xxxx.de/1234...http://www.xxxx.de/2234...http://www.xxxx.de/3234...http://www.xxxx.de/4234...http://www.xxxx.de/5234...http://www.xxxx.de/6234...http://www.xxxx.de/7234......
<a href=“...“ .../>
<a href=“...“ .../>
HTMLdocuments
WWW-ServerHTTP Request
WWW server delivers requestedHTML documents to the web crawler
1
2
3
4
Web-Crawler (Web Robot)
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
61
Data Normalization
Web Crawler
Data Analysis and creation of
index data structures
Preprocessing and IndexingSearch Engines in the WWW
Tokenization
Speech Identification
Word Stemming
POS-Tagging
Descriptor Generation
Document Preprocessing
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
62
Efficient Index Data Structures
Aachen
Altavista
Ananas
……
Zustand
Zypern
Index
AnanasDocID Pos Frequency Weight
D123 1;13;77;132 4 9.4D456 22;38 2 6.7 … … … …D998 15 1 1.2
Location List D123Frequency URL <H1> … <H6> <title> … text
4 1 1 0 1 … 1
D123 http://producers.ananas.org/index.htm
<html><head><title=“Ananas around the World“></head><body> … </body></html>
Inverted File
File
Search Engines in the WWW
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
63
Relevance Ranking
• Link Popularity (Google PageRank)
A
1.0
D
1.0
B
1.0
C
1.0
Start
Nr. PR(A) PR(B) PR(C) PR(D)1 1,0 1,0 1,0 1,02 1,0 0,575 2,275 0,153 2,083 0,575 1,191
20,15
… … … … …n 1,49 0,7833 1,577 0,15
Iteration of the PageRank computationA
1.49
D
0,15
B
0,78
C
1.57
resulting PageRank
Search Engines in theWWW
Dienstag, 22. Januar 13
Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
The Web is big. Really big. You just won't believe how vastly, hugely, mind-bogglingly big it is.(...according to Douglas Adams)
64
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
65
Language has its fa
llacies...
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
66
in particular,
if we don‘t know the langua
ge
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
4242 42 4224424242 42 424267
4.3 Semantic Search4.3.1 Information Retrieval 4.3.3 Semantic Analysis and Retrieval4.3.4 Exploratory Search
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
68
Definition (first try)Semantic Search
4242244242 • Annotation of (text-based) metadata with semantic entities• Entity-based Information Retrieval• Make use of semantic relations, as e.g. content-based
similarities of relationships• Interoperable metadata via semantic annotations• for content-based description• for structural / technical description
(Multimedia Ontologies)
Overall Goal: Quantitative and qualitative improvement of Information Retrieval
Dienstag, 22. Januar 13
• MPEG-7 has been re-engineered to become an OWL-DL ontology (2007: Arndt et al., COMM model)
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
69
Multimedia OntologiesSemantic Metadata
4242244242
• Localize a region → Draw a bounding box
• Annotate the content → Interpret the content → Tag ,Astronaut‘
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
70
Multimedia OntologienSemantic Metadata
Example: Tagging with an MPEG-7 Ontology
Reg1
mpeg7:image
mpeg7:depicts
Man on the Moon
mpeg7:spatial_decomposition Reg1
mpeg7:StillRegion
rdf:type
mpeg7:depicts
dbpedia:Astronaut
mpeg7:SpatialMask
mpeg7:polygon
mpeg7:Coords
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
71
Named Entity Recognition
Astronaut Person
Neil Armstrong
Science Occupation
Employment
is a is a
subClassOf
subClassOf
Entities
Classes
Named Entity Recognition„locating and classifying atomic elements...intopredefined categories such as names, persons, organizations, locations, expressions of time,quantities, monetary values, etc.“C.J.Rijsbergen, Information Retrieval (1979)
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
72
Astronaut Person
Neil Armstrong
Science Occupation
Employment
is a is a
subClassOf
subClassOf
Named Entity Recognition
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
73
Named Entity Recognition
Astronaut Person
Neil Armstrong
Science Occupation
Employment
is a is a
subClassOf
subClassOf
„Armstrong was the first man on the Moon.“ Text
Entity Mapping
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
74
Astronaut
Named Entity Recognition
Person
Neil Armstrong
Science Occupation
Employment
is a is a
subClassOf
subClassOf
Text
rdfs:label Neil Armstrong
rdf:type dbpedia-owl:Astronaut
rdf:type foaf:Person
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
75
Named Entity Recognition
Text
http://dbpedia.org/resource/Neil_Armstrong
„Armstrong was the first man on the Moon.“ Text
Entity Mapping
How do I find the right entity?
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
76
Named Entity Recognition
„Armstrong was the first man on the Moon.“ Text
In natural language text• nouns correspond to semantic concepts / entities• verbs correspond to semantic relations
Identify nouns in natural language text:• determination of language• Part-of-Speech Tagger• Word Stemming • e.g. with
http://gate.ac.uk/
How do I find the right entity?
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
77
Named Entity Recognition
Text
„Armstrong was the first man on the Moon.“ Text
Armstrong, Florida
Determine possible Entity Mapping Candidates
Armstrong, Ontario
Armstrong County, Texas
Armstrong Tunnel
Louis Armstrong
Armstrong Tools
Armstrong (moon crater)
Armstrong (car)
The Armstrongs
Craig Armstrong
Anton Armstrong
Edward Armstrong
Gary Armstrong
George Armstrong
The Armstrong Twins
Ian Armstrong
+ 400 more...
How do I find the right entity?
How do I find the right entity?
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
78
Text
TemporalContext
SpatialContext
ProvenanceContext
Context provides information for• Disambiguation• Reliability• Trustworthiness
StructuralContext
UserContext
Context Dimensions for Audiovisual Media
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
79
Named Entity Recognition
Text
„Armstrong was the first man on the Moon.“ Text
Determine Named Entities (nouns) from text
Armstrong
man
moonCreate all possible Sets of Mapping Candidates
• We have to examine the Context to understand the semantics
How do I find the right entity?
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
80
Named Entity Recognition
Text„Armstrong was the first man on the Moon.“ Text
Create all possible sets of Mapping Candidates
Armstrong Man MoonGeorge Armstrong Custer
Neil Armstrong
The Armstrong Twins
Armstrong, Florida
Armstrong, Ontario
Armstrong Automobile
Joe Armstrong
Armstrong County, Texass
Armstrong Gun
Craig Armstrong
Armstrong (Moonkrater)
Louis Armstrong
Armstrong Tunnel
Louis Armstrong International Airport
Armstrong‘s Theorem
Sir Thomas Armstrong
Ian Armstrong
HumanBill Man
Bob Man
David Man
Homer Man
Louise Man
Halber Man
Man ärgere Dich nichtMan Computer
Peter van Man
Daniel Man
Man (album)
Der Moon (Oper)
Moon
Moon Nickel CompanyBrunner Moon
Bernard Moon
Peter Moon
Julian Moon
Ludwig Moon
Violet MoonMoon Technologies
Robert Moon
Henry Moon
Alfred Moon
Chava Moon
How do I find the right entity?
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
81
Named Entity Recognition
Armstrong man moon
(1) Co-occurence Analysis(2) Semantic Analysis(3) Machine Learning
Armstrong, Florida man (Album) Moon Technologies
‣For all possible Combinations do:‣Determine the probability of the co-occurence of a
term combination in an arbitrary text document corpus, as e.g. in the wikipedia
‣Select the entity combination with the maximum probability of co-occurence
? ?
Dienstag, 22. Januar 13
4242244242
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
82
Named Entity Recognition
Armstrong man moon
(1) Co-occurence Analysis(2) Semantic Analysis(3) Machine Learning
George Armstrong Custer
Neil Armstrong
Armstrong, Florida
Armstrong, Ontario
Armstrong Gun
Craig Armstrong
Armstrong (Moonkrater)
Louis Armstrong
Sir Thomas Armstrong
Human
Bob Man
David Man
Homer Man
Louise Man
Half Man
Dead Man WalkingMan Machine
Man (album)
The Moon (Opera)
moon (planet)
Moon Nickel CompanyBrunner Moon
Bernard
Peter Moon
Julian Moon
Ludwig Moon
Henry Moon
Alfred Moon
Chava Moon
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamTurmbau zu Babel, Pieter Brueghel, 1563
How to use semantic data in Retrieval?
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamTurmbau zu Babel, Pieter Brueghel, 1563
Semantic metadata enable an improvement of traditional keyword-based retrieval by(1) Query String Refinement
enables more precise or more complete search results(2) Cross Referencing
enables to complement search results with additional associated or similar information
(3) Exploratory Search enables visualization and navigation of the search space
(4) Reasoningenables to complement search results with implicitly given information
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
85
4.3 Semantic Search4.3.1 Information Retrieval 4.3.3 Semantic Analysis and Retrieval4.3.4 Exploratory Search
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
86
Searching is not always
just searching
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
87
I‘m looking for the book „Brave New World“ by Aldous Huxley in the first German edition...
Brave New World. - Aldous H U X L E Y.
- The Albatros Continental Library, 47
(Hamburg usw., Albatros Verlag, 1933)
257 S. 8“
II 1, 2506, 34548
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
88
I really liked „Brave New World“ by Aldous Huxley but how should I find what to read next...?
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
89
Exploratory Search• What, if the user does not know, which query string to use?• What, if the user is looking for complex answers ?• What, if the user does not know the domain he/she is looking for?• What, if the user wants to know all(!) about a specific topic?
• ...,Browsing‘ instead of ,Searching‘• ...to find something by chance, i.e. Serendipity• ...to get an overview• ...enable content based navigation
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
90
Gather knowledge about dbpedia:Brave_New_Worldand decide, which interesting fact to follow....
http://dbpedia.org/page/Brave_New_World
Enable Exploratory Search based on Linked Open Data
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
91
dbpedia:Brave_New_World
dbpedia-owl:author
dbpedia:Aldous_Huxley
dbpe
dia-
owl:a
utho
r
dbpedia-owl:au
thor
dbpedia-owl:author
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
92
dbpedia:Brave_New_World
dbpedia-owl:author
dbpedia:Aldous_Huxley
dbpe
dia:
onto
logy
/influ
ence
s
dbpedia:H._G._Wells
dbpedia:ontology/in
fluences
dbpedia:George_Orwell
dbpedia:ontology/influences
dbpedia:Michel_Houellebecq
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
93
dbpedia:H._G._Wells dbpedia:George_Orwell dbpedia:Michel_Houellebecq
dbpedia-owl:notableWork
dbpedia:Les_Particules_élémentaires
dbpedia-owl:notableWork
dbpedia:Nineteen_Eighty-Four
dbpedia-owl:notableWork
dbpedia:The_Time_Machine
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
94
dbpedia-owl:author
dbpedia:Aldous_Huxley
...and now please surprise me.....SERENDIPITY
dbpedia:Tim_Berners-Leerdf:type
dbpedia:World_Wide_Web
dbpprop:inventor
Yago:EnglishExpatriatesInTheUnitedStates
rdf:type
rdf:type
dbpedia:Patrick_Stewart
dbpedia:Star_Trek:_The_Next_Generation
dbpedia-owl:starring
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
95
Explorative Search
dbpedia-owl:mission
dbpedia:Neil_Armstrong
dbpedia:Apollo_11dbpedia-owl:mission
category:Apollo_program
dcterms:subject
dbpedia:Apollo_13
dcterms:subject
yago:Space_accidents_and_incidents
rdf:type
rdf:type
dbpedia:Space_Shuttle_Challenger
dbpedia-owl:mission
dbpedia:Buzz_Collins
dbpedia:Michael_Collins
Dienstag, 22. Januar 13
Exploratory Search and Serendipity
•Find something that you were not looking for on purpose ...
dbpedia:Buzz_Collins
dbpedia:Cookie_Monster
dbpedia:Strictly_Come_Dancing
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
97
Exploratory Search with yovisto
Waitelonis, Sack: Augmenting Video Search with Linked Open Data, in Proc. I-Semantics , Graz 2009.
http://mediaglobe.yovisto.com:8080/
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
98
http://mediaglobe.yovisto.com:8080/mggui/#start
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
99
4.3 Semantic Search4.3.1 Information Retrieval 4.3.3 Semantic Analysis and Retrieval4.3.4 Exploratory Search
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
4. Applications in the Web of Data4.1.Ontological Engineering4.2.Linked Data Engineering 4.3.Semantic Search
Semantic Web Technologies Content
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
101
4. Semantic Web Anwendungen4.2 Linked Data Engineering4.3 Semantic Search
Literature
• T. Heath, Ch. BitzerLinked Data - Evolving the Web into a Global Data Space,Morgan & Claypool, 2011.
Dienstag, 22. Januar 13
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
102
□Bloghttp://semweb2013.blogspot.com/
□Webseitehttp://www.hpi.uni-potsdam.de/studium/lehrangebot/itse/veranstaltung/semantic_web_technologien-3.html
□bibsonomy - Bookmarkshttp://www.bibsonomy.org/user/lysander07/swt1213_13
4. Semantic Web Anwendungen4.2 Linked Data Engineering4.3 Semantic Search
Dienstag, 22. Januar 13