towards an empirical semantic web science: knowledge pattern extraction and usage

31
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage Andrea Nuzzolese Ph.D. Student Università di Bologna STLab, ISTC-CNR

Upload: andrea-nuzzolese

Post on 10-May-2015

242 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Andrea NuzzolesePh.D. Student

Università di Bologna STLab, ISTC-CNR

Page 2: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Outline

• Empirical Semantic Web Science and Knowledge Patterns (KPs)

• A possible methodology for making KPs emerge from the Web of Data

• The work done so far in KP extraction

• Evaluating KPs' efficacy through Exploratory Search

2

Page 3: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Does a Web science exist?

• A science usually is applied to clear research objects✦ Physical and biological science analyzes the natural world, and tries to find

microscopic laws that, extrapolated to the macroscopic realm, would generate the behavior observed

• The Web is an engineered space created through formally specified languages and protocols

• Web pages with their content and links are created by humans with a particular task governed by social conventions and laws

• A Web science exists [Berners-Lee Et Al., 2006] and is oriented to:

✦ Growth of the engineered space;✦ Human-web interaction patterns

3

Page 4: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

What about a Web of Data science?

• Linked data offers huge data for empirical research

4

Page 5: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

What are the research objects of the empirical SW science?

• The Semantic Web and Linked data give us the chance to empirically study what are the patterns in organizing and representing knowledge

• The research objects of the Semantic Web as an empirical science are Knowledge Patterns (KPs)

5

Page 6: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Knoweldge Patterns

• KPs are small well connected units of meaning, which are✦ task based✦ well grounded✦ cognitively sound

• KPs find their theoretical grounding in frames✦ “… a frame is a data-structure for representing a stereotyped

situation.” [Minsky 1975]

✦ “...the availability of global patterns of knowledge cuts down on non-determinacy enough to offset idiosyncratic bottom-up input that might otherwise be confusing.” [Beaugrande 1980]

6

Page 7: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

An example of KP

7

Page 8: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Empirical Semantic Web and KPs

• KPs emerge from the knowledge soup deriving from the Web

• A methodology for KP extraction from the Web

8

Page 9: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

KP extraction

• The Web is populated by heterogeneous sources

• We can classify sources in two categories✦ Formal and semi-formal sources modeled by adopting a top-down approach

✴ e.g., foundational ontologies, frames, thesauri, etc.✦ Non-formal sources modeled by adopting a bottom-up approach

✴ e.g., RDBs, Linked Data, Web pages, XML documents, etc.

• Our KP extraction methodology is based on two complementary approaches

✦ A top-down approach✦ A bottom-up approach

9

Page 10: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

KP boundary

10

Page 11: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

KP detection and discovery

• The top-down approach is aimed to extract KPs that already exists in a formal or semi-formal structure

✦ Possible techniques: reengineering, refactoring based on association rules, key concept identification, ontology mapping, etc.

• The bottom-up approach is aimed to extract to discover or detect KPs from data

✦ Possible techniques: inductive techniques, machine learning, data mining, ontology mining, etc.

11

Page 12: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

KP validation

• The top-down and the bottom-up approaches concur in the validation of KPs

• KP extraction is a matter of understanding how the world or specific domains have been described from different perspectives

✦ The perspective of domain experts, ontologists, etc., which try to give formalizations either of the world or of specific domains

✦ The perspective of users, data entries, etc, which effectively populate and manage data that report facts about the world

• For example it would be cognitively relevant if an occurrence of KP emerges both with the top-down and the bottom-up approach

12

Page 13: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

KP extraction methodology

13

Page 14: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

KP reengineering from FrameNet’s frames

• FrameNet is a cognitive sound lexical knowledge base, which is grounded in a large corpus

• FrameNet consists of a set of frames, which have frame elements lexical units, which pair words (lexemes) to frames, and relations to corpus elements

✦ Each frame can be interpreted as a class of situations

14

Page 15: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

An example of frame

15

Page 16: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Using Semion for reengineering and refactoring FrameNet’s frame

16

!"#$%"$#&'(!%)*+&(

,-./$-01%(!%)*+&(

,-./$-01%(2&"&(

34#5$0(2&"&(

6*7*#*.1&'(2&"&(

Page 17: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

FrameNet as LOD

17

Page 18: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

FrameNet as KPs

18

Page 19: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

KP discovery from Wikipedia links

• Hypothesis✦ the types of linked resources that occur most often for a certain type of

resource constitute its KP ✦ since we expect that any cognitive invariance in explaining/describing things

is reflected in the wikilink graph, discovered KPs are cognitively sound

• Contribution✦ an EKP discovery procedure✦ 184 EKPs published in OWL2

19

Page 20: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Collecting paths from wikilinks

20

owl:Thing owl:Thing

dbpedia:Company

dbpo:Person

owl:Thingdbpo:

FictionalCharacter

dbpo:Person

db:Mickey_Mouse

db:Minnie_Mouse db:The_Walt_Disney_Company

dbpedia:Organisation

dbpo:wikiPageWikiLink

rdf:type

rdfs:subClassOf

dbpo:FictionalCharacter

Path

Path

dbpo:FictionalCharacter

Page 21: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Path popularity

21

Dave_Grohl

Foo Fighters

John_Lennon

Michael_Jackson

Paul_McCartney

Jackson_5

Beatles

Jackie_Jackson

Nirvana

Madonna

Prince Charlie_Parker Keith_Jarrett

nSubjectRes(Pi,j)/nRes(Si)

Page 22: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Boundaries of KPs

• An KP(Si) is a set of paths, such that

Pi,j ∈ KP(Si) ! pathPopularity(Pi,j, Si) ≥ t

• t is a threshold, under which a path is not included in an KP

• How to get a good value for t?

22

Page 23: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Boundary induction

23

Step Description

1 For each path, calculate the path popularity

2For each subject type, get the 40 top-ranked path popularity values*

3Apply multiple correlation (Pearson ρ) between the paths of all subject types by rank, and check for homogeneity of ranks across subject types

4For each of the 40 path popularity ranks, calculate its mean across all subject types

5 Apply k-means clustering on the 40 ranks

6Decide threshold(s) based on k-means as well as other indicators (e.g. FrameNet roles distribution)

Page 24: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Boundary induction

24

Page 25: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

How can be KPs evaluated and used?

• The evaluation of KPs should be performed in terms of their capability to be cognitively sound in capturing and representing knowledge

• A scenario that can be used as for evaluating the efficacy of KPs is the exploratory search combined with user studies.

25

Page 26: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Why exploratory search?

• Exploratory search is characterized “by uncertainty about the space being searched and the nature of the problem that motivates the search” [White Et Al., 2005]

• KPs can be used for supporting exploratory search ✦ They can be used in order to filter knowledge by drawing a meaningful

boundary around the retrieved data✦ They allow to suggest exploratory paths based on cognitive criteria of

relevance

• We can investigate how KPs help users in exploratory search tasks

26

Page 27: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Aemoo: KP-based exploratory search

• A Web application that supports exploratory search on the Web based on KPs extracted from Wikipedia links

• It aggregates knowledge from Linked Data, Wikipedia, Twitter and Google News by applying KPs as knowledge lenses over data

• It provides an effective summary of knowledge about an entity, including explanations

27

Page 28: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Exploring knowledge with Aemoo (1)

28

Page 29: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Exploring knowledge with Aemoo (2)

29

Page 30: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Conclusions

• We want to contribute to the realization of the Semantic Web as an empirical science by providing a methodology for KP extraction

• Our methodology for extracting KPs is based on two approaches✦ a top-down approach✦ a bottom-up approach

• We have seen our experience in KP extraction so far✦ KPs from FrameNet’s frames✦ KPs from Wikipedia links

• The evaluation we have in mind should be performed by means of exploratory search tasks

✦ Aemoo

30

Page 31: Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

Thanks

31