towards an empirical semantic web science: knowledge pattern extraction and usage

Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Andrea NuzzolesePh.D. Student

Università di Bologna STLab, ISTC-CNR

Outline

• Empirical Semantic Web Science and Knowledge Patterns (KPs)

• A possible methodology for making KPs emerge from the Web of Data

• The work done so far in KP extraction

• Evaluating KPs' efficacy through Exploratory Search

2

Does a Web science exist?

• A science usually is applied to clear research objects✦ Physical and biological science analyzes the natural world, and tries to find

microscopic laws that, extrapolated to the macroscopic realm, would generate the behavior observed

• The Web is an engineered space created through formally specified languages and protocols

• Web pages with their content and links are created by humans with a particular task governed by social conventions and laws

• A Web science exists [Berners-Lee Et Al., 2006] and is oriented to:

✦ Growth of the engineered space;✦ Human-web interaction patterns

3

What about a Web of Data science?

• Linked data offers huge data for empirical research

4

What are the research objects of the empirical SW science?

• The Semantic Web and Linked data give us the chance to empirically study what are the patterns in organizing and representing knowledge

• The research objects of the Semantic Web as an empirical science are Knowledge Patterns (KPs)

5

Knoweldge Patterns

• KPs are small well connected units of meaning, which are✦ task based✦ well grounded✦ cognitively sound

• KPs find their theoretical grounding in frames✦ “… a frame is a data-structure for representing a stereotyped

situation.” [Minsky 1975]

✦ “...the availability of global patterns of knowledge cuts down on non-determinacy enough to offset idiosyncratic bottom-up input that might otherwise be confusing.” [Beaugrande 1980]

6

An example of KP

7

Empirical Semantic Web and KPs

• KPs emerge from the knowledge soup deriving from the Web

• A methodology for KP extraction from the Web

8

KP extraction

• The Web is populated by heterogeneous sources

• We can classify sources in two categories✦ Formal and semi-formal sources modeled by adopting a top-down approach

✴ e.g., foundational ontologies, frames, thesauri, etc.✦ Non-formal sources modeled by adopting a bottom-up approach

✴ e.g., RDBs, Linked Data, Web pages, XML documents, etc.

• Our KP extraction methodology is based on two complementary approaches

✦ A top-down approach✦ A bottom-up approach

9

KP boundary

10

KP detection and discovery

• The top-down approach is aimed to extract KPs that already exists in a formal or semi-formal structure

✦ Possible techniques: reengineering, refactoring based on association rules, key concept identification, ontology mapping, etc.

• The bottom-up approach is aimed to extract to discover or detect KPs from data

✦ Possible techniques: inductive techniques, machine learning, data mining, ontology mining, etc.

11

KP validation

• The top-down and the bottom-up approaches concur in the validation of KPs

• KP extraction is a matter of understanding how the world or specific domains have been described from different perspectives

✦ The perspective of domain experts, ontologists, etc., which try to give formalizations either of the world or of specific domains

✦ The perspective of users, data entries, etc, which effectively populate and manage data that report facts about the world

• For example it would be cognitively relevant if an occurrence of KP emerges both with the top-down and the bottom-up approach

12

KP extraction methodology

13

KP reengineering from FrameNet’s frames

• FrameNet is a cognitive sound lexical knowledge base, which is grounded in a large corpus

• FrameNet consists of a set of frames, which have frame elements lexical units, which pair words (lexemes) to frames, and relations to corpus elements

✦ Each frame can be interpreted as a class of situations

14

An example of frame

15

Using Semion for reengineering and refactoring FrameNet’s frame

16

!"#$%"$#&'(!%)*+&(

,-./$-01%(!%)*+&(

,-./$-01%(2&"&(

34#5$0(2&"&(

6*7*#*.1&'(2&"&(

FrameNet as LOD

17

FrameNet as KPs

18

KP discovery from Wikipedia links

• Hypothesis✦ the types of linked resources that occur most often for a certain type of

resource constitute its KP ✦ since we expect that any cognitive invariance in explaining/describing things

is reflected in the wikilink graph, discovered KPs are cognitively sound

• Contribution✦ an EKP discovery procedure✦ 184 EKPs published in OWL2

19

Collecting paths from wikilinks

20

owl:Thing owl:Thing

dbpedia:Company

dbpo:Person

owl:Thingdbpo:

FictionalCharacter

dbpo:Person

db:Mickey_Mouse

db:Minnie_Mouse db:The_Walt_Disney_Company

dbpedia:Organisation

dbpo:wikiPageWikiLink

rdf:type

rdfs:subClassOf

dbpo:FictionalCharacter

Path

Path

dbpo:FictionalCharacter

Path popularity

21

Dave_Grohl

Foo Fighters

John_Lennon

Michael_Jackson

Paul_McCartney

Jackson_5

Beatles

Jackie_Jackson

Nirvana

Madonna

Prince Charlie_Parker Keith_Jarrett

nSubjectRes(Pi,j)/nRes(Si)

Boundaries of KPs

• An KP(Si) is a set of paths, such that

Pi,j ∈ KP(Si) ! pathPopularity(Pi,j, Si) ≥ t

• t is a threshold, under which a path is not included in an KP

• How to get a good value for t?

22

Boundary induction

23

Step Description

1 For each path, calculate the path popularity

2For each subject type, get the 40 top-ranked path popularity values*

3Apply multiple correlation (Pearson ρ) between the paths of all subject types by rank, and check for homogeneity of ranks across subject types

4For each of the 40 path popularity ranks, calculate its mean across all subject types

5 Apply k-means clustering on the 40 ranks

6Decide threshold(s) based on k-means as well as other indicators (e.g. FrameNet roles distribution)

Boundary induction

24

How can be KPs evaluated and used?

• The evaluation of KPs should be performed in terms of their capability to be cognitively sound in capturing and representing knowledge

• A scenario that can be used as for evaluating the efficacy of KPs is the exploratory search combined with user studies.

25

Why exploratory search?

• Exploratory search is characterized “by uncertainty about the space being searched and the nature of the problem that motivates the search” [White Et Al., 2005]

• KPs can be used for supporting exploratory search ✦ They can be used in order to filter knowledge by drawing a meaningful

boundary around the retrieved data✦ They allow to suggest exploratory paths based on cognitive criteria of

relevance

• We can investigate how KPs help users in exploratory search tasks

26

Aemoo: KP-based exploratory search

• A Web application that supports exploratory search on the Web based on KPs extracted from Wikipedia links

• It aggregates knowledge from Linked Data, Wikipedia, Twitter and Google News by applying KPs as knowledge lenses over data

• It provides an effective summary of knowledge about an entity, including explanations

27

Exploring knowledge with Aemoo (1)

28

Exploring knowledge with Aemoo (2)

29

Conclusions

• We want to contribute to the realization of the Semantic Web as an empirical science by providing a methodology for KP extraction

• Our methodology for extracting KPs is based on two approaches✦ a top-down approach✦ a bottom-up approach

• We have seen our experience in KP extraction so far✦ KPs from FrameNet’s frames✦ KPs from Wikipedia links

• The evaluation we have in mind should be performed by means of exploratory search tasks

✦ Aemoo

30

Thanks

31

towards an empirical semantic web science: knowledge pattern extraction and usage

Education

kps kps

web of data science

kp reengineering

example of kp

empirical semantic web

kp extraction methodology

web ofdata

kp detection