Transcript
Page 1: (13) Semantic Web Technologies - Linked Data & Semantic Search

Semantic Web Technologies

LectureDr. Harald Sack

Hasso-Plattner-Institut für IT Systems EngineeringUniversity of Potsdam

Winter Semester 2012/13

Lecture Blog: http://semweb2013.blogspot.com/This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0)

Dienstag, 22. Januar 13

Page 2: (13) Semantic Web Technologies - Linked Data & Semantic Search

Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

2 1. Introduction 2. Semantic Web - Basic Architecture

Languages of the Semantic Web - Part 1

3. Knowledge Representation and LogicsLanguages of the Semantic Web - Part 2

4. Applications in the ,Web of Data‘

Semantic Web Technologies Content

Dienstag, 22. Januar 13

Page 3: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

3

Ontolo

gical

Engine

ering

Dienstag, 22. Januar 13

Page 4: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

4

Linked

Data

Applic

ations

&

Semant

ic Sea

rch

Dienstag, 22. Januar 13

Page 5: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

4. Applications in the Web of Data4.1.Ontological Engineering4.2.Linked Data Engineering 4.3.Semantic Search

Semantic Web Technologies Content

Dienstag, 22. Januar 13

Page 6: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamTurmbau zu Babel, Pieter Brueghel, 1563

How do we get Data from the Web...?

4.1 Linked Data Engineering4.1.1 APIs vs. Linked Data4.1.2 Linked Data Principles4.1.3 Linked Data @ Work

Dienstag, 22. Januar 13

Page 7: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

7

How to get Data from the Web?

•Data can only be found on the Web, if it is available at some website

Database

Web-Server

JDBC

HTTPHTML

Browser

Dienstag, 22. Januar 13

Page 8: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

8

How to get Data from the Web?•There is a anumber of different (proprietary) Web APIs, data exchange formats and Mashups on top of that

Database 1

WebAPI 1

WebAPI 2

WebAPI 3

WebAPI 4

Database 2 Database 3 Database 4

Mashup

Dienstag, 22. Januar 13

Page 9: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

9

In the Web today...

•Data is locked up in small data islands •Other applications usually cannot acces this data...

Database

Database

Database

Database

Database

Database

Database

Database

Database

Dienstag, 22. Januar 13

Page 10: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

10

http://www.w3.org/2009/Talks/0204-ted-tbl/#(22)

Problems ahead....

Dienstag, 22. Januar 13

Page 11: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

11

But there is a solution:•...open up proprietary data islands•...publish all data that are of public interest

•...in a way that •other applications can access, utilize, and process this data,and

•all applications can access additional(meta)data for the available data

Database 1 Database 2 Database 3

Dienstag, 22. Januar 13

Page 12: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

12

•Apply semantic technologies:•to publish structured data on the web•to draw connections from one data source to data from other data sources

Database 1 Database 2 Database 3 Database 4

RDF Data RDF Data RDF Data RDF Data

RDF Links

RDF Links

RDF Links

But there is a solution:

Dienstag, 22. Januar 13

Page 13: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

13

4.1 Linked Data Engineering4.1.1 APIs vs. Linked Data4.1.2 Linked Data Principles4.1.3 Linked Data @ Work

Dienstag, 22. Januar 13

Page 14: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

14

Linked Data and the ‘Web of Data‘

■Term refers to an idea originally from Tim Berners-Lee(Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html)

□Set of best practices for publication and linking of structured data on the web□Basic assumption: The value of data on the web increases when

they are connected to other data sources

M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare.net/mediasemanticweb/quick-linked-data-introduction

The Web of data is abouta dataand namingmodel on the Web

Dienstag, 22. Januar 13

Page 15: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

15

Linked Data Principles(1) Use URIs as names for things.(2) Use HTTP URIs, so that people can look up those

names.(3) When someone looks up a URI, provide useful

information, using the standards (RDF, SPARQL)(4) Include links to other URIs, so that they can discover

more things.

Dienstag, 22. Januar 13

Page 16: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

16

Linked Data Principles

(1) Use URIs as names for things.

• URIs do not only identify documents but also arbitrary objects of the real world as well as abstract concepts

http://dbpedia.org/resource/Albert_Einstein

http://musicbrainz.org/artist/b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d

http://semweb2013.blogspot.com

Dienstag, 22. Januar 13

Page 17: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

17

Linked Data Principles

(2) Use HTTP URIs, so that people can look up those names.

• HTTP URIs (URLs) as globally unique names enable dereferencing of assiciated information in the Web

• via http Content Negotiation• 303 URIs

http Response Code 303 ,See Other‘ (redirect)

• Hash URIshttp://example.com/Harald#me

Dienstag, 22. Januar 13

Page 18: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

18

Linked Data for Humans and Computers

■URI should deliver information as well as for humans as for computers, i.e.

URI

Accept: application/rdf+xml Accept: text/html

(Thing)

(RDF data) (HTML page)

Dienstag, 22. Januar 13

Page 19: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

19

■Server delivers different HTTP responses dependent ofHTTP-Accept-Header (Content Negotiation)

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

Linked Data for Humans and Computers

Dienstag, 22. Januar 13

Page 21: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

21

Linked Data Principles

(3) When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

• RDF as universal data model for publishing structured data on the Web

• Make all URIs in the RDF graph dereferencable• Avoid RDF constructs that cause problems in Linked Data

context

• RDF Reification• RDF Collections und Containers• unnamed Blank Nodes

Dienstag, 22. Januar 13

Page 22: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

22

Linked Data Principles

(4) Include links to other URIs, so that they can discover more things.

• Link RDF references among data between different data sources, to find information related by content

• Relationship LinksLinks to external LOD Entitites related with the original entity

• Identity LinksLinks to external LOD Entities referring to the same object or concept

• Vocabulary LinksLinks to definitions of the original entity

Dienstag, 22. Januar 13

Page 23: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

23

The application of the Linked Data Principles leads to a ,Web of Data‘

Dienstag, 22. Januar 13

Page 24: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

24

Development of the ,Web of Data‘

May 2007

Dienstag, 22. Januar 13

Page 25: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

25

Nov 2007

Development of the ,Web of Data‘

Dienstag, 22. Januar 13

Page 26: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

26

Development of the ,Web of Data‘

Dienstag, 22. Januar 13

Page 27: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

27

July 2009

Development of the ,Web of Data‘

Dienstag, 22. Januar 13

Page 28: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

28

September 2010

Development of the ,Web of Data‘

Dienstag, 22. Januar 13

Page 29: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

29

September 2011

300 Datasets 31B RDF Triples504M Links

Development of the ,Web of Data‘

Dienstag, 22. Januar 13

Page 30: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

30

□Semantic Mashups are applications that use linked RDF data from various data sources

□ in difference to interfaces and exchange formats or ordinary Web APIs, Linked Data offers the following benefits:□ a flexible and standardized data format (RDF)□ standardized access mechanism (http)□ possibility to put links (RDF-Links) among different data sources

» enables navigation» is supported by search engines (Crawler)» enables expressive search facilities over the crawled data

and beyond

S. Auer, J. Lehmann, Ch. Bizer: Semantitsche Mashups auf Basis vernetzter Daten, in T. Pellegrini, A. Blumauer (Hrsg.): Social Semantic Web, Springer, 2009.

Semantic Mashups

Dienstag, 22. Januar 13

Page 31: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

31

Linked Data Sources in the Web

□Native publication□ D2R-Server, OpenLink Virtuoso, Pubby, etc.

□ Implementation of Wrappers around existing applications / APIs□SIOC Exporter for Wordpress, Drupal, phpBB,...□RDF Book Mashup (Amazon API, Google Base-API,...)

□Linking Open Data Project□Semantic Web Education and Outreach W3C working group □Catalogue of all known sources of linked data with an open

source license» DBPedia, Flickr, Open-Cyc, FOAF, SIOC, GeoNames, ...

Dienstag, 22. Januar 13

Page 32: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

32

Browser for Linked Data

■Differences to arbitrary RDF-Browsers□RDF Data to be visualized does not necessarely reside in

local repository, but is distributed in the Web□ requires dynamic reload of RDF resources■Tabulator (Tim Berners-Lee, MIT-)

(T. Berners-Lee et al.: Tabulator: Exploring and analyzing linked data on the semantic web, in Proc. 3rd Int. Semantic Web User Interaction Workshop, 2006, http://swui.semanticweb.org/swui06/papers/Berners-Lee/Berners-Lee.pdf)

■ OpenLink RDF Data Explorer□enables visualization as graph, timeline, map, etc.

http://ode.openlinksw.com/

■Zitgist Browserhttp://browser.zitgist.com/

■DISCO Browserhttp://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/disco/

Dienstag, 22. Januar 13

Page 33: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

33

Search Engines for Linked Data■Crawler-based, follow links in datasets to create an index that

can be queried

■Swoogle□ keyword-based full text searcg (Apache-Lucene), uses only limited

semantic annotationhttp://swoogle.umbc.edu/

■ Semantic Web Search Engine (SWSE)□ additionally uses rdf:type properties as search filter

http://swse.deri.org/

■Sindicehttp://www.sindice.com/

■ Falcons□with data browser for result analysis

http://iws.seu.edu.cn/services/falcons/

■Sig.ma - Semantic Information Mashup (based on Sindice)http://sig.ma/

Dienstag, 22. Januar 13

Page 34: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

34

http://dbpedia.neofonie.com

Dienstag, 22. Januar 13

Page 35: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

35

Linked Open Data■ public Linked Data Resourcen in the Web, licensed as

„Creative Common CC-BY“ ■ 5-Star Criteria for Linked Open Data

Available on the web (whatever format) but with an open licence, to be Open Data

Available as machine-readable structured data (e.g. excel instead of image scan of a table)

as (2) plus non-proprietary format (e.g. CSV instead of excel)

All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff

All the above, plus: Link your data to other people’s data to provide context

★ ★

★ ★ ★★ ★ ★ ★

★ ★ ★ ★ ★

Dienstag, 22. Januar 13

Page 36: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

36

Linked Open Data

Dienstag, 22. Januar 13

Page 37: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

37

4.1 Linked Data Engineering4.1.1 APIs vs. Linked Data4.1.2 Linked Data Principles4.1.3 Linked Data @ Work

Dienstag, 22. Januar 13

Page 38: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

38

Linked Data □ordered by categories

Dienstag, 22. Januar 13

Page 39: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

39

Linked Data

Media

User Generated Content

Dienstag, 22. Januar 13

Page 40: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

40

Linked Data Publications

Dienstag, 22. Januar 13

Page 41: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

41

Government

Linked Data

Geographic

Dienstag, 22. Januar 13

Page 42: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

42

Life Sciences Linked Data Cross-Domain

Dienstag, 22. Januar 13

Page 43: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

43

Linking Open Data■Some statistics (as of 09/2011)

distribution of RDF Triples by domain

Dienstag, 22. Januar 13

Page 44: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

44

Linking Open Data■Some statistics (as of 09/2011)

distribution of Links by domain

Dienstag, 22. Januar 13

Page 45: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

45

Linked Data Ontologien □Ontologies hold the Linked Data Cloud together

Dienstag, 22. Januar 13

Page 46: (13) Semantic Web Technologies - Linked Data & Semantic Search

Linked Data Ontologien □ z.B. OWL

□owl:sameAs connects identical individuals□owl:equivalentClass connects equivalent classes

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

46

Dienstag, 22. Januar 13

Page 47: (13) Semantic Web Technologies - Linked Data & Semantic Search

Linked Data Ontologien □ z.B. umbel (version 1.0, Feb. 2011)

□ „Upper Mapping and Binding Exchange Layer“□Subset of OpenCyc

as RDF Triples based on SKOS and OWL2□Upper Ontology with 28.000

concepts (skos:Concept)□46.000 Mappings into

DBpedia, geonames e.a.(owl:equivalentClass, rdfs:subClassOf)□Links to more than 2 Mio Wikipedia

pages

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

47

Dienstag, 22. Januar 13

Page 48: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

48

Linked Data Ontologien □ z.B. SKOS

□ „Simple Knowledge Organization System“□based on RDF and RDFS □applied for definitions and mappings

of vocabularies and ontologies□skos:Concept (clsses)□skos:narrower

□skos:broader

□skos:related

□skos:exactMatch, skos:narrowMatch,skos:broadMatch, skos:relatedMatch

Dienstag, 22. Januar 13

Page 49: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

49

Linked Data (Research) Applications □WhoKnows

http://apps.facebook.com/whoknows_/

□RISQ!http://141.89.225.43/whoknowsmovies/game.html

□ for Data Cleansing□ for relevance ranking of facts□ for entity summarization

Dienstag, 22. Januar 13

Page 50: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

50

4.1 Linked Data Engineering4.1.1 APIs vs. Linked Data4.1.2 Linked Data Principles4.1.3 Linked Data @ Work

Dienstag, 22. Januar 13

Page 51: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

4. Applications in the Web of Data4.1.Ontological Engineering4.2.Linked Data Engineering 4.3.Semantic Search

Semantic Web Technologies Content

Dienstag, 22. Januar 13

Page 52: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

52

Semant

ic

Search

Albrecht Dürer: Melancholia I, 1514

Dienstag, 22. Januar 13

Page 53: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

53

4.3 Semantic Search4.3.1 Information Retrieval 4.3.3 Semantic Analysis and Retrieval4.3.4 Exploratory Search

Dienstag, 22. Januar 13

Page 54: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

54

The ,Google Dilemma‘Dienstag, 22. Januar 13

Page 55: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

55

Dienstag, 22. Januar 13

Page 56: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

56

Classical Information Retrieval

(nach Salton,G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983)

Set of Documents

files of records

Set of Queries

Information requests

indexing language

similarity

indexingQueryFormulation

Dienstag, 22. Januar 13

Page 57: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

57

Classical Information Retrieval(simplified version)

Set of documents

search index

searching, vb. , in allen ger n sprachen bezeugt: got.sokjan, ags. sēcan, as. sokian, an. Soekj

[Bd. 20, Sp. 835]

sēza, ahd. suohhan. aus idg. sprachen steht am nächsten lat. sāgiospüre, air. saigim gehe

einer sache nach, suche; zur weiteren verwandtschaft vgl. Walde-Pokorny 2, 449.

der umlaut des stammvokals erscheint im nd., er wird im md. verzeichnet vonCrecelius

oberhess. wb. 827; Spiess henneb. id. 248; Hertel Thüringen240; Gerbet Vogtland 425

und auf kolonialem boden bei Schröerdeutsche mundarten des ungrischen

berglandes 225. neben eigentlichem suchen 'einer sache

nachspüren, sich bemühen, sie aufzufinden' (dann auch 'jemanden

aufsuchen, ihn bedrohen, angreifen') steht eine reich bezeugte bedeutungsgruppe mehr

keywords

„search“?

search query

search term(s)

Dienstag, 22. Januar 13

Page 58: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

58

relevant documents retrieved documents

relevant documents that have been retrieved

RP

Recall=| R ∩ P |

|R|

Precision=| R ∩ P |

|P|

Fα=(1+α)⋅(Recall ⋅ Precision )

α⋅(Recall + Precision )

Evaluation of Information Retrieval Systems

Dienstag, 22. Januar 13

Page 59: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

59

Search Engines in the Web

• The World Wide Web is a distributed hypermedia system with•multimedia documents and• linked via hyperlinks

Dienstag, 22. Januar 13

Page 60: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

60

URL list

http://www.xxxx.de/1234...http://www.xxxx.de/2234...http://www.xxxx.de/3234...http://www.xxxx.de/4234...http://www.xxxx.de/5234...http://www.xxxx.de/6234...http://www.xxxx.de/7234......

<a href=“...“ .../>

<a href=“...“ .../>

HTMLdocuments

WWW-ServerHTTP Request

WWW server delivers requestedHTML documents to the web crawler

1

2

3

4

Web-Crawler (Web Robot)

Dienstag, 22. Januar 13

Page 61: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

61

Data Normalization

Web Crawler

Data Analysis and creation of

index data structures

Preprocessing and IndexingSearch Engines in the WWW

Tokenization

Speech Identification

Word Stemming

POS-Tagging

Descriptor Generation

Document Preprocessing

Dienstag, 22. Januar 13

Page 62: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

62

Efficient Index Data Structures

Aachen

Altavista

Ananas

……

Zustand

Zypern

Index

AnanasDocID Pos Frequency Weight

D123 1;13;77;132 4 9.4D456 22;38 2 6.7 … … … …D998 15 1 1.2

Location List D123Frequency URL <H1> … <H6> <title> … text

4 1 1 0 1 … 1

D123 http://producers.ananas.org/index.htm

<html><head><title=“Ananas around the World“></head><body> … </body></html>

Inverted File

File

Search Engines in the WWW

Dienstag, 22. Januar 13

Page 63: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

63

Relevance Ranking

• Link Popularity (Google PageRank)

A

1.0

D

1.0

B

1.0

C

1.0

Start

Nr. PR(A) PR(B) PR(C) PR(D)1 1,0 1,0 1,0 1,02 1,0 0,575 2,275 0,153 2,083 0,575 1,191

20,15

… … … … …n 1,49 0,7833 1,577 0,15

Iteration of the PageRank computationA

1.49

D

0,15

B

0,78

C

1.57

resulting PageRank

Search Engines in theWWW

Dienstag, 22. Januar 13

Page 64: (13) Semantic Web Technologies - Linked Data & Semantic Search

Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

The Web is big. Really big. You just won't believe how vastly, hugely, mind-bogglingly big it is.(...according to Douglas Adams)

64

Dienstag, 22. Januar 13

Page 65: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

65

Language has its fa

llacies...

Dienstag, 22. Januar 13

Page 66: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

66

in particular,

if we don‘t know the langua

ge

Dienstag, 22. Januar 13

Page 67: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

4242 42 4224424242 42 424267

4.3 Semantic Search4.3.1 Information Retrieval 4.3.3 Semantic Analysis and Retrieval4.3.4 Exploratory Search

Dienstag, 22. Januar 13

Page 68: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

68

Definition (first try)Semantic Search

4242244242 • Annotation of (text-based) metadata with semantic entities• Entity-based Information Retrieval• Make use of semantic relations, as e.g. content-based

similarities of relationships• Interoperable metadata via semantic annotations• for content-based description• for structural / technical description

(Multimedia Ontologies)

Overall Goal: Quantitative and qualitative improvement of Information Retrieval

Dienstag, 22. Januar 13

Page 69: (13) Semantic Web Technologies - Linked Data & Semantic Search

• MPEG-7 has been re-engineered to become an OWL-DL ontology (2007: Arndt et al., COMM model)

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

69

Multimedia OntologiesSemantic Metadata

4242244242

• Localize a region → Draw a bounding box

• Annotate the content → Interpret the content → Tag ,Astronaut‘

Dienstag, 22. Januar 13

Page 70: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

70

Multimedia OntologienSemantic Metadata

Example: Tagging with an MPEG-7 Ontology

Reg1

mpeg7:image

mpeg7:depicts

Man on the Moon

mpeg7:spatial_decomposition Reg1

mpeg7:StillRegion

rdf:type

mpeg7:depicts

dbpedia:Astronaut

mpeg7:SpatialMask

mpeg7:polygon

mpeg7:Coords

Dienstag, 22. Januar 13

Page 71: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

71

Named Entity Recognition

Astronaut Person

Neil Armstrong

Science Occupation

Employment

is a is a

subClassOf

subClassOf

Entities

Classes

Named Entity Recognition„locating and classifying atomic elements...intopredefined categories such as names, persons, organizations, locations, expressions of time,quantities, monetary values, etc.“C.J.Rijsbergen, Information Retrieval (1979)

Dienstag, 22. Januar 13

Page 72: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

72

Astronaut Person

Neil Armstrong

Science Occupation

Employment

is a is a

subClassOf

subClassOf

Named Entity Recognition

Dienstag, 22. Januar 13

Page 73: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

73

Named Entity Recognition

Astronaut Person

Neil Armstrong

Science Occupation

Employment

is a is a

subClassOf

subClassOf

„Armstrong was the first man on the Moon.“ Text

Entity Mapping

Dienstag, 22. Januar 13

Page 74: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

74

Astronaut

Named Entity Recognition

Person

Neil Armstrong

Science Occupation

Employment

is a is a

subClassOf

subClassOf

Text

rdfs:label Neil Armstrong

rdf:type dbpedia-owl:Astronaut

rdf:type foaf:Person

Dienstag, 22. Januar 13

Page 75: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

75

Named Entity Recognition

Text

http://dbpedia.org/resource/Neil_Armstrong

„Armstrong was the first man on the Moon.“ Text

Entity Mapping

How do I find the right entity?

Dienstag, 22. Januar 13

Page 76: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

76

Named Entity Recognition

„Armstrong was the first man on the Moon.“ Text

In natural language text• nouns correspond to semantic concepts / entities• verbs correspond to semantic relations

Identify nouns in natural language text:• determination of language• Part-of-Speech Tagger• Word Stemming • e.g. with

http://gate.ac.uk/

How do I find the right entity?

Dienstag, 22. Januar 13

Page 77: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

77

Named Entity Recognition

Text

„Armstrong was the first man on the Moon.“ Text

Armstrong, Florida

Determine possible Entity Mapping Candidates

Armstrong, Ontario

Armstrong County, Texas

Armstrong Tunnel

Louis Armstrong

Armstrong Tools

Armstrong (moon crater)

Armstrong (car)

The Armstrongs

Craig Armstrong

Anton Armstrong

Edward Armstrong

Gary Armstrong

George Armstrong

The Armstrong Twins

Ian Armstrong

+ 400 more...

How do I find the right entity?

How do I find the right entity?

Dienstag, 22. Januar 13

Page 78: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

78

Text

TemporalContext

SpatialContext

ProvenanceContext

Context provides information for• Disambiguation• Reliability• Trustworthiness

StructuralContext

UserContext

Context Dimensions for Audiovisual Media

Dienstag, 22. Januar 13

Page 79: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

79

Named Entity Recognition

Text

„Armstrong was the first man on the Moon.“ Text

Determine Named Entities (nouns) from text

Armstrong

man

moonCreate all possible Sets of Mapping Candidates

• We have to examine the Context to understand the semantics

How do I find the right entity?

Dienstag, 22. Januar 13

Page 80: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

80

Named Entity Recognition

Text„Armstrong was the first man on the Moon.“ Text

Create all possible sets of Mapping Candidates

Armstrong Man MoonGeorge Armstrong Custer

Neil Armstrong

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe Armstrong

Armstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Moonkrater)

Louis Armstrong

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

HumanBill Man

Bob Man

David Man

Homer Man

Louise Man

Halber Man

Man ärgere Dich nichtMan Computer

Peter van Man

Daniel Man

Man (album)

Der Moon (Oper)

Moon

Moon Nickel CompanyBrunner Moon

Bernard Moon

Peter Moon

Julian Moon

Ludwig Moon

Violet MoonMoon Technologies

Robert Moon

Henry Moon

Alfred Moon

Chava Moon

How do I find the right entity?

Dienstag, 22. Januar 13

Page 81: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

81

Named Entity Recognition

Armstrong man moon

(1) Co-occurence Analysis(2) Semantic Analysis(3) Machine Learning

Armstrong, Florida man (Album) Moon Technologies

‣For all possible Combinations do:‣Determine the probability of the co-occurence of a

term combination in an arbitrary text document corpus, as e.g. in the wikipedia

‣Select the entity combination with the maximum probability of co-occurence

? ?

Dienstag, 22. Januar 13

Page 82: (13) Semantic Web Technologies - Linked Data & Semantic Search

4242244242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

82

Named Entity Recognition

Armstrong man moon

(1) Co-occurence Analysis(2) Semantic Analysis(3) Machine Learning

George Armstrong Custer

Neil Armstrong

Armstrong, Florida

Armstrong, Ontario

Armstrong Gun

Craig Armstrong

Armstrong (Moonkrater)

Louis Armstrong

Sir Thomas Armstrong

Human

Bob Man

David Man

Homer Man

Louise Man

Half Man

Dead Man WalkingMan Machine

Man (album)

The Moon (Opera)

moon (planet)

Moon Nickel CompanyBrunner Moon

Bernard

Peter Moon

Julian Moon

Ludwig Moon

Henry Moon

Alfred Moon

Chava Moon

Dienstag, 22. Januar 13

Page 83: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamTurmbau zu Babel, Pieter Brueghel, 1563

How to use semantic data in Retrieval?

Dienstag, 22. Januar 13

Page 84: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamTurmbau zu Babel, Pieter Brueghel, 1563

Semantic metadata enable an improvement of traditional keyword-based retrieval by(1) Query String Refinement

enables more precise or more complete search results(2) Cross Referencing

enables to complement search results with additional associated or similar information

(3) Exploratory Search enables visualization and navigation of the search space

(4) Reasoningenables to complement search results with implicitly given information

Dienstag, 22. Januar 13

Page 85: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

85

4.3 Semantic Search4.3.1 Information Retrieval 4.3.3 Semantic Analysis and Retrieval4.3.4 Exploratory Search

Dienstag, 22. Januar 13

Page 86: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

86

Searching is not always

just searching

Dienstag, 22. Januar 13

Page 87: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

87

I‘m looking for the book „Brave New World“ by Aldous Huxley in the first German edition...

Brave New World. - Aldous H U X L E Y.

- The Albatros Continental Library, 47

(Hamburg usw., Albatros Verlag, 1933)

257 S. 8“

II 1, 2506, 34548

Dienstag, 22. Januar 13

Page 88: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

88

I really liked „Brave New World“ by Aldous Huxley but how should I find what to read next...?

Dienstag, 22. Januar 13

Page 89: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

89

Exploratory Search• What, if the user does not know, which query string to use?• What, if the user is looking for complex answers ?• What, if the user does not know the domain he/she is looking for?• What, if the user wants to know all(!) about a specific topic?

• ...,Browsing‘ instead of ,Searching‘• ...to find something by chance, i.e. Serendipity• ...to get an overview• ...enable content based navigation

Dienstag, 22. Januar 13

Page 90: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

90

Gather knowledge about dbpedia:Brave_New_Worldand decide, which interesting fact to follow....

http://dbpedia.org/page/Brave_New_World

Enable Exploratory Search based on Linked Open Data

Dienstag, 22. Januar 13

Page 91: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

91

dbpedia:Brave_New_World

dbpedia-owl:author

dbpedia:Aldous_Huxley

dbpe

dia-

owl:a

utho

r

dbpedia-owl:au

thor

dbpedia-owl:author

Dienstag, 22. Januar 13

Page 92: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

92

dbpedia:Brave_New_World

dbpedia-owl:author

dbpedia:Aldous_Huxley

dbpe

dia:

onto

logy

/influ

ence

s

dbpedia:H._G._Wells

dbpedia:ontology/in

fluences

dbpedia:George_Orwell

dbpedia:ontology/influences

dbpedia:Michel_Houellebecq

Dienstag, 22. Januar 13

Page 93: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

93

dbpedia:H._G._Wells dbpedia:George_Orwell dbpedia:Michel_Houellebecq

dbpedia-owl:notableWork

dbpedia:Les_Particules_élémentaires

dbpedia-owl:notableWork

dbpedia:Nineteen_Eighty-Four

dbpedia-owl:notableWork

dbpedia:The_Time_Machine

Dienstag, 22. Januar 13

Page 94: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

94

dbpedia-owl:author

dbpedia:Aldous_Huxley

...and now please surprise me.....SERENDIPITY

dbpedia:Tim_Berners-Leerdf:type

dbpedia:World_Wide_Web

dbpprop:inventor

Yago:EnglishExpatriatesInTheUnitedStates

rdf:type

rdf:type

dbpedia:Patrick_Stewart

dbpedia:Star_Trek:_The_Next_Generation

dbpedia-owl:starring

Dienstag, 22. Januar 13

Page 95: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

95

Explorative Search

dbpedia-owl:mission

dbpedia:Neil_Armstrong

dbpedia:Apollo_11dbpedia-owl:mission

category:Apollo_program

dcterms:subject

dbpedia:Apollo_13

dcterms:subject

yago:Space_accidents_and_incidents

rdf:type

rdf:type

dbpedia:Space_Shuttle_Challenger

dbpedia-owl:mission

dbpedia:Buzz_Collins

dbpedia:Michael_Collins

Dienstag, 22. Januar 13

Page 96: (13) Semantic Web Technologies - Linked Data & Semantic Search

Exploratory Search and Serendipity

•Find something that you were not looking for on purpose ...

dbpedia:Buzz_Collins

dbpedia:Cookie_Monster

dbpedia:Strictly_Come_Dancing

Dienstag, 22. Januar 13

Page 97: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

97

Exploratory Search with yovisto

Waitelonis, Sack: Augmenting Video Search with Linked Open Data, in Proc. I-Semantics , Graz 2009.

http://mediaglobe.yovisto.com:8080/

Dienstag, 22. Januar 13

Page 98: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

98

http://mediaglobe.yovisto.com:8080/mggui/#start

Dienstag, 22. Januar 13

Page 99: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

99

4.3 Semantic Search4.3.1 Information Retrieval 4.3.3 Semantic Analysis and Retrieval4.3.4 Exploratory Search

Dienstag, 22. Januar 13

Page 100: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

4. Applications in the Web of Data4.1.Ontological Engineering4.2.Linked Data Engineering 4.3.Semantic Search

Semantic Web Technologies Content

Dienstag, 22. Januar 13

Page 101: (13) Semantic Web Technologies - Linked Data & Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

101

4. Semantic Web Anwendungen4.2 Linked Data Engineering4.3 Semantic Search

Literature

• T. Heath, Ch. BitzerLinked Data - Evolving the Web into a Global Data Space,Morgan & Claypool, 2011.

Dienstag, 22. Januar 13


Top Related