creating semantic fingerprints for web documents

23
1 00.00.2009 OVGU Präsentation /22 Katrin Krieger – Creating Semantic Fingerprints for Web Documents 1 Creating Semantic Fingerprints for Web Resources Katrin Krieger , Jens Schneider, Christian Nywelt, Dietmar Rösner Otto-von-Guericke Universität Magdeburg (Germany)

Upload: katrin-krieger

Post on 22-Jan-2018

283 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Creating Semantic Fingerprints for Web Documents

100.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 1

Creating Semantic Fingerprints for Web Resources

Katrin Krieger, Jens Schneider, Christian Nywelt, Dietmar RösnerOtto-von-Guericke Universität Magdeburg (Germany)

Page 2: Creating Semantic Fingerprints for Web Documents

200.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 2

Motivation

• Automatic extraction of information and generating formal

semantic descriptions are important aspects of Semantic Web

research

query

compare

combine

http://mehmetveysiadam.com

Page 3: Creating Semantic Fingerprints for Web Documents

300.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 3

Page 4: Creating Semantic Fingerprints for Web Documents

400.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 4

Page 5: Creating Semantic Fingerprints for Web Documents

500.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 5

Semantic Fingerprints (SF)

• Semantic signatures of Web documents

• Representing concepts to be found in documents as well as

relationships between these concepts

• Graph structures with concepts as nodes and relationships as

edges

• Can be used to compute semantic relatedness, e.g. in e-learning

scenarios

Page 6: Creating Semantic Fingerprints for Web Documents

600.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 6

Desired Properties of Semantic Fingerprints

P1 Concepts are distinct and unambiguous

P2 Concepts are connected through relationships

P3 Documents with similar content will

yield similar SF

P4 A SF covers all essential concepts

belonging to a document

Page 7: Creating Semantic Fingerprints for Web Documents

700.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 7

General Idea

• Hypothesis: semantically related concepts of a domain are

connected through relationships

• This information is inherent in LOD datasets which we can exploit

to disambiguate concepts

• This information is sufficient to build semantic fingerprints

Page 8: Creating Semantic Fingerprints for Web Documents

800.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 8

How to automatically obtain a Semantic Fingerprint

1. Extract keywords from Web document

2. Create nodes by mapping keywords to semantic concepts

3. Add edges by finding relations

4. Remove irrelevant nodes and edges

5. Identify all connected subgraphs

6. Choose semantic fingerprint from connected subgraphs

Page 9: Creating Semantic Fingerprints for Web Documents

900.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 9

Extracting Keywords and Mapping to Concepts

• Use Natural Language Processing (NLP) tools to extract nouns and

noun phrases

• Query dataset to find concepts whose labels correspond with

keywords

Page 10: Creating Semantic Fingerprints for Web Documents

1000.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 10

Result of step #1

Disconnected graph with n concepts per keyword

Page 11: Creating Semantic Fingerprints for Web Documents

1200.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 12

Find relationships

• Expand each node and search for neighboring concepts to “grow”

the graph (BFS) up to a certain path length n

Page 12: Creating Semantic Fingerprints for Web Documents

1300.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 13

Result of Step #2

• Graph with connected subgraphs

Page 13: Creating Semantic Fingerprints for Web Documents

1400.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 14

Removing irrelevant nodes and edges

Which nodes and edges are really relevant for the semantic

fingerprint?

Heuristics:

• Path length

• Number of connecting paths

• Occurences in paths

• Number of corresponding keywords

• Interconnection property

Page 14: Creating Semantic Fingerprints for Web Documents

1500.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 15

Identifying subgraphs and picking the SF

• Identify subgraphs by performing BFS

• Determine which of the subgraphs is the semantic fingerprint

• Cover as many keywords as possible

• Number of concepts in the subgraph

Page 15: Creating Semantic Fingerprints for Web Documents

1600.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 16

Evaluation

P1 Concepts are distinct and unambiguous

P2 Concepts are connected through relationships

P3 Documents with similar content will

yield similar SF

P4 A SF covers all essential concepts

belonging to a document

Page 16: Creating Semantic Fingerprints for Web Documents

1700.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 17

Quantitative Evaluation

• P3: Documents with similar content will yield similar SF

• Extraction of 11 different KW lists from real world e-learning

documents

• Generation of SF for all KW lists

• Generation of SF for all (|KWi| k)− -tuple subsets for each KWi with |KWi|

denoting the number of keywords in KWi and varied k from 1 to 4

• Comparison of SF of original KW lists with varied KW lists

• Number of contained concepts

• Number of common concepts

Page 17: Creating Semantic Fingerprints for Web Documents

1800.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 18

Quantitative Evaluation (2)

● Number of concepts in 1992 SF vary

from 0 to 22● SF with 14-16 concepts make up 8.3%● SF with 10-13 concepts make up 20.8%

● Grouping into bins● Majority of SF with one KW

less still have ≥90% KW in

common with original SF

Page 18: Creating Semantic Fingerprints for Web Documents

1900.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 19

Quantitative Evaluation

• P1: Concepts are distinct and unambiguous

• P4: A SF covers all essential concepts belonging to a document

• Evaluation with human reviewers:

• the reviewers rated the behavior of our algorithm as comprehensible and

the fingerprints as suitable for the keyword lists

• The reviewers also found that some concepts seem to be more important

than others.

Page 19: Creating Semantic Fingerprints for Web Documents

2000.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 20

Conclusion

• New method to create a formal semantic description of

a document

• Exploits inherent properties and structures in LOD

datasets

• No need for other methods such as statistics

Open Issues

• Runtime is rather high and expensive in computing

resources

• Not all semantic relations from the documents are also

in the dataset

• Scalability

Page 20: Creating Semantic Fingerprints for Web Documents

2100.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 21

Outlook

• Exploit DOM structure of the document

• Add weights to keywords

• Investigate other data structures and adapted expansion

algorithms

• Study other methods to capture semantic relationships from text

Page 21: Creating Semantic Fingerprints for Web Documents

2200.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 22

Thank you for your attention.

What are your questions?

img src: https://flic.kr/p/6DBVxb

[email protected]

Page 22: Creating Semantic Fingerprints for Web Documents

2300.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 23

SF for KW ={“haskell”, “fold”, “higher order function”, “prove”}

Page 23: Creating Semantic Fingerprints for Web Documents

2400.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 24

In use SlideshareConnector

SlideshareConnector

StackOverflowConnector

StackOverflowConnector

FreebaseConnectorFreebaseConnector

DBpediaConnectorDBpedia

Connector

LectureSlideConnector

LectureSlideConnector

Educational metadata

Educational metadata

RESTbasedWeb-Service

(Codename: Guinan)