semantic knowledge acquisition of information for syntactic web

SEMANTIC KNOWLEDGE ACQUISITION OFINFORMATION FOR SYNTACTIC WEBBy G.Nagarajan and K.K.ThyagharajanInternational Journal of Web & Semantic Technology

By Raef Mchaymech

Outline

Introduction

The Proposed Solution

Conclusion

The Problem

The Proposed Architecture

Critics6

5

4

3

2

1

• People are using the web for everything

Introduction

A googled fact

The First Problem

• Search engines are returning:• Billions of results, informative and non informative

So What is the Problem NOW !!!

The introduction of the semantic web in 2000, had encouraged researchers to create the concept of semantic search engines

Semantic search engines are indeed widely adopted by developers and engineers

Querying the semantic web, using the semantic search engines returned expected results.

Current Solutions

Semantic Web

Expected Result

Semantic search engines

The Second Problem

• There is no enough resources to search:• Searches and queries are very domain-dependent• E.g.:

• Dbpedia to search Wikipedia• LinkedMDB to search IMDB

Current Solutions VS. Proposed Solution

Semantic Web Current Web

Write ontologies

Write

ontologies

Transformation

The Proposed Architecture

WWW

Web CrawlerConversion to XML

List Of URLsFiltering Conversion to

RDF/OWL

Ontology Repository

About the Crawler

Templates

A Web crawler starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit. URLs from the list are recursively visited according to a set of policies.

Web Crawler

HTML to XML Conversion

The conversion is based on Natural Language Processing (NLP), specifically on Name Entity Recognition (NER)

To XML

NER can classify the entity as: Person Name, Organization Name, Location…

1

2

The Proposed Framework

HTML to XML Conversion

HTML Web Page

Lexicon & pattern Repository

Corresponding XML file

Entity Recognition

HTML Document Preprocessing

Domain Hierarchy

• Two main definitions should take place:– RDFS which provide the rules of the web page– OWL which define the conceptual ontology of the web

page

• Two techniques should used:– Syntactic Analysis– Semantic Analysis

XML to RDFS/OWL Conversion

• It’s a simple mapping between XML elements and OWL elements

Syntactic Analysis

For more rules please refer to the paper

Semantic Analysis

Strongly based on NLP techniques:• The analyzer works on identifying nouns, verbs, etc…• Probability Reasoner is used to separate concepts and relations• Relationships also consists of is-a and part-of• T box and A box are used to define logic and rules

• T box provides the classes and property• A box provides instances

The Overall Architecture

Conclusion

• Intelligent information retrieval system

• Projecting the reusability concept here• The authors reused the html pages• Convert them to ontologies

• The English is a complete disaster

Critics

• Authors did not show any real example:• They did not convert from syntactic pages to semantic ones• An example about the conversion of HTML to XML is provided but

not from XML to OWL• The mapping from XSD elements to OWL is not efficient and is error-prone, irrelevant elements could be easily inserted in the ontology.

• The authors did not benefit from the expressive power of ontology (restrictions, type of properties…)

• They wrote exactly:

They talked about the architecture of the syntactic/semantic conversion. But no search engine was designed• No evaluation at all: speed of the solution, the amount of

resource consumption…

Critics

THANK YOU!

semantic knowledge acquisition of information for syntactic web

Documents

semantic web concept

based web engines

crawlertemplatesa web

number of web pages

problemsearch engines

proposed framework html

millions of results

billions of results