towards an empirical semantic web science: knowledge pattern extraction and usage
TRANSCRIPT
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage
Andrea NuzzolesePh.D. Student
Università di Bologna STLab, ISTC-CNR
Outline
• Empirical Semantic Web Science and Knowledge Patterns (KPs)
• A possible methodology for making KPs emerge from the Web of Data
• The work done so far in KP extraction
• Evaluating KPs' efficacy through Exploratory Search
2
Does a Web science exist?
• A science usually is applied to clear research objects✦ Physical and biological science analyzes the natural world, and tries to find
microscopic laws that, extrapolated to the macroscopic realm, would generate the behavior observed
• The Web is an engineered space created through formally specified languages and protocols
• Web pages with their content and links are created by humans with a particular task governed by social conventions and laws
• A Web science exists [Berners-Lee Et Al., 2006] and is oriented to:
✦ Growth of the engineered space;✦ Human-web interaction patterns
3
What about a Web of Data science?
• Linked data offers huge data for empirical research
4
What are the research objects of the empirical SW science?
• The Semantic Web and Linked data give us the chance to empirically study what are the patterns in organizing and representing knowledge
• The research objects of the Semantic Web as an empirical science are Knowledge Patterns (KPs)
5
Knoweldge Patterns
• KPs are small well connected units of meaning, which are✦ task based✦ well grounded✦ cognitively sound
• KPs find their theoretical grounding in frames✦ “… a frame is a data-structure for representing a stereotyped
situation.” [Minsky 1975]
✦ “...the availability of global patterns of knowledge cuts down on non-determinacy enough to offset idiosyncratic bottom-up input that might otherwise be confusing.” [Beaugrande 1980]
6
An example of KP
7
Empirical Semantic Web and KPs
• KPs emerge from the knowledge soup deriving from the Web
• A methodology for KP extraction from the Web
8
KP extraction
• The Web is populated by heterogeneous sources
• We can classify sources in two categories✦ Formal and semi-formal sources modeled by adopting a top-down approach
✴ e.g., foundational ontologies, frames, thesauri, etc.✦ Non-formal sources modeled by adopting a bottom-up approach
✴ e.g., RDBs, Linked Data, Web pages, XML documents, etc.
• Our KP extraction methodology is based on two complementary approaches
✦ A top-down approach✦ A bottom-up approach
9
KP boundary
10
KP detection and discovery
• The top-down approach is aimed to extract KPs that already exists in a formal or semi-formal structure
✦ Possible techniques: reengineering, refactoring based on association rules, key concept identification, ontology mapping, etc.
• The bottom-up approach is aimed to extract to discover or detect KPs from data
✦ Possible techniques: inductive techniques, machine learning, data mining, ontology mining, etc.
11
KP validation
• The top-down and the bottom-up approaches concur in the validation of KPs
• KP extraction is a matter of understanding how the world or specific domains have been described from different perspectives
✦ The perspective of domain experts, ontologists, etc., which try to give formalizations either of the world or of specific domains
✦ The perspective of users, data entries, etc, which effectively populate and manage data that report facts about the world
• For example it would be cognitively relevant if an occurrence of KP emerges both with the top-down and the bottom-up approach
12
KP extraction methodology
13
KP reengineering from FrameNet’s frames
• FrameNet is a cognitive sound lexical knowledge base, which is grounded in a large corpus
• FrameNet consists of a set of frames, which have frame elements lexical units, which pair words (lexemes) to frames, and relations to corpus elements
✦ Each frame can be interpreted as a class of situations
14
An example of frame
15
Using Semion for reengineering and refactoring FrameNet’s frame
16
!"#$%"$#&'(!%)*+&(
,-./$-01%(!%)*+&(
,-./$-01%(2&"&(
34#5$0(2&"&(
6*7*#*.1&'(2&"&(
FrameNet as LOD
17
FrameNet as KPs
18
KP discovery from Wikipedia links
• Hypothesis✦ the types of linked resources that occur most often for a certain type of
resource constitute its KP ✦ since we expect that any cognitive invariance in explaining/describing things
is reflected in the wikilink graph, discovered KPs are cognitively sound
• Contribution✦ an EKP discovery procedure✦ 184 EKPs published in OWL2
19
Collecting paths from wikilinks
20
owl:Thing owl:Thing
dbpedia:Company
dbpo:Person
owl:Thingdbpo:
FictionalCharacter
dbpo:Person
db:Mickey_Mouse
db:Minnie_Mouse db:The_Walt_Disney_Company
dbpedia:Organisation
dbpo:wikiPageWikiLink
rdf:type
rdfs:subClassOf
dbpo:FictionalCharacter
Path
Path
dbpo:FictionalCharacter
Path popularity
21
Dave_Grohl
Foo Fighters
John_Lennon
Michael_Jackson
Paul_McCartney
Jackson_5
Beatles
Jackie_Jackson
Nirvana
Madonna
Prince Charlie_Parker Keith_Jarrett
nSubjectRes(Pi,j)/nRes(Si)
Boundaries of KPs
• An KP(Si) is a set of paths, such that
Pi,j ∈ KP(Si) ! pathPopularity(Pi,j, Si) ≥ t
• t is a threshold, under which a path is not included in an KP
• How to get a good value for t?
22
Boundary induction
23
Step Description
1 For each path, calculate the path popularity
2For each subject type, get the 40 top-ranked path popularity values*
3Apply multiple correlation (Pearson ρ) between the paths of all subject types by rank, and check for homogeneity of ranks across subject types
4For each of the 40 path popularity ranks, calculate its mean across all subject types
5 Apply k-means clustering on the 40 ranks
6Decide threshold(s) based on k-means as well as other indicators (e.g. FrameNet roles distribution)
Boundary induction
24
How can be KPs evaluated and used?
• The evaluation of KPs should be performed in terms of their capability to be cognitively sound in capturing and representing knowledge
• A scenario that can be used as for evaluating the efficacy of KPs is the exploratory search combined with user studies.
25
Why exploratory search?
• Exploratory search is characterized “by uncertainty about the space being searched and the nature of the problem that motivates the search” [White Et Al., 2005]
• KPs can be used for supporting exploratory search ✦ They can be used in order to filter knowledge by drawing a meaningful
boundary around the retrieved data✦ They allow to suggest exploratory paths based on cognitive criteria of
relevance
• We can investigate how KPs help users in exploratory search tasks
26
Aemoo: KP-based exploratory search
• A Web application that supports exploratory search on the Web based on KPs extracted from Wikipedia links
• It aggregates knowledge from Linked Data, Wikipedia, Twitter and Google News by applying KPs as knowledge lenses over data
• It provides an effective summary of knowledge about an entity, including explanations
27
Exploring knowledge with Aemoo (1)
28
Exploring knowledge with Aemoo (2)
29
Conclusions
• We want to contribute to the realization of the Semantic Web as an empirical science by providing a methodology for KP extraction
• Our methodology for extracting KPs is based on two approaches✦ a top-down approach✦ a bottom-up approach
• We have seen our experience in KP extraction so far✦ KPs from FrameNet’s frames✦ KPs from Wikipedia links
• The evaluation we have in mind should be performed by means of exploratory search tasks
✦ Aemoo
30
Thanks
31