creating semantic fingerprints for web documents

Post on 22-Jan-2018

283 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

100.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 1

Creating Semantic Fingerprints for Web Resources

Katrin Krieger, Jens Schneider, Christian Nywelt, Dietmar RösnerOtto-von-Guericke Universität Magdeburg (Germany)

200.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 2

Motivation

• Automatic extraction of information and generating formal

semantic descriptions are important aspects of Semantic Web

research

query

compare

combine

http://mehmetveysiadam.com

300.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 3

400.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 4

500.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 5

Semantic Fingerprints (SF)

• Semantic signatures of Web documents

• Representing concepts to be found in documents as well as

relationships between these concepts

• Graph structures with concepts as nodes and relationships as

edges

• Can be used to compute semantic relatedness, e.g. in e-learning

scenarios

600.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 6

Desired Properties of Semantic Fingerprints

P1 Concepts are distinct and unambiguous

P2 Concepts are connected through relationships

P3 Documents with similar content will

yield similar SF

P4 A SF covers all essential concepts

belonging to a document

700.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 7

General Idea

• Hypothesis: semantically related concepts of a domain are

connected through relationships

• This information is inherent in LOD datasets which we can exploit

to disambiguate concepts

• This information is sufficient to build semantic fingerprints

800.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 8

How to automatically obtain a Semantic Fingerprint

1. Extract keywords from Web document

2. Create nodes by mapping keywords to semantic concepts

3. Add edges by finding relations

4. Remove irrelevant nodes and edges

5. Identify all connected subgraphs

6. Choose semantic fingerprint from connected subgraphs

900.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 9

Extracting Keywords and Mapping to Concepts

• Use Natural Language Processing (NLP) tools to extract nouns and

noun phrases

• Query dataset to find concepts whose labels correspond with

keywords

1000.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 10

Result of step #1

Disconnected graph with n concepts per keyword

1200.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 12

Find relationships

• Expand each node and search for neighboring concepts to “grow”

the graph (BFS) up to a certain path length n

1300.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 13

Result of Step #2

• Graph with connected subgraphs

1400.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 14

Removing irrelevant nodes and edges

Which nodes and edges are really relevant for the semantic

fingerprint?

Heuristics:

• Path length

• Number of connecting paths

• Occurences in paths

• Number of corresponding keywords

• Interconnection property

1500.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 15

Identifying subgraphs and picking the SF

• Identify subgraphs by performing BFS

• Determine which of the subgraphs is the semantic fingerprint

• Cover as many keywords as possible

• Number of concepts in the subgraph

1600.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 16

Evaluation

P1 Concepts are distinct and unambiguous

P2 Concepts are connected through relationships

P3 Documents with similar content will

yield similar SF

P4 A SF covers all essential concepts

belonging to a document

1700.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 17

Quantitative Evaluation

• P3: Documents with similar content will yield similar SF

• Extraction of 11 different KW lists from real world e-learning

documents

• Generation of SF for all KW lists

• Generation of SF for all (|KWi| k)− -tuple subsets for each KWi with |KWi|

denoting the number of keywords in KWi and varied k from 1 to 4

• Comparison of SF of original KW lists with varied KW lists

• Number of contained concepts

• Number of common concepts

1800.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 18

Quantitative Evaluation (2)

● Number of concepts in 1992 SF vary

from 0 to 22● SF with 14-16 concepts make up 8.3%● SF with 10-13 concepts make up 20.8%

● Grouping into bins● Majority of SF with one KW

less still have ≥90% KW in

common with original SF

1900.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 19

Quantitative Evaluation

• P1: Concepts are distinct and unambiguous

• P4: A SF covers all essential concepts belonging to a document

• Evaluation with human reviewers:

• the reviewers rated the behavior of our algorithm as comprehensible and

the fingerprints as suitable for the keyword lists

• The reviewers also found that some concepts seem to be more important

than others.

2000.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 20

Conclusion

• New method to create a formal semantic description of

a document

• Exploits inherent properties and structures in LOD

datasets

• No need for other methods such as statistics

Open Issues

• Runtime is rather high and expensive in computing

resources

• Not all semantic relations from the documents are also

in the dataset

• Scalability

2100.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 21

Outlook

• Exploit DOM structure of the document

• Add weights to keywords

• Investigate other data structures and adapted expansion

algorithms

• Study other methods to capture semantic relationships from text

2200.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 22

Thank you for your attention.

What are your questions?

img src: https://flic.kr/p/6DBVxb

katrin.krieger@ovgu.de

2300.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 23

SF for KW ={“haskell”, “fold”, “higher order function”, “prove”}

2400.00.2009 OVGU Präsentation

/22Katrin Krieger – Creating Semantic Fingerprints for Web Documents 24

In use SlideshareConnector

SlideshareConnector

StackOverflowConnector

StackOverflowConnector

FreebaseConnectorFreebaseConnector

DBpediaConnectorDBpedia

Connector

LectureSlideConnector

LectureSlideConnector

Educational metadata

Educational metadata

RESTbasedWeb-Service

(Codename: Guinan)

top related