presented by: akhil gada csci 572 university of southern california full text indexing based on...

23
Presented by: AKHIL GADA CSCI 572 University of Southern California Full Text Indexing Based On Lexical Relations An Application :Software Library by YS Maarek and F.A. Smadja

Upload: jonas-horton

Post on 29-Jan-2016

225 views

Category:

Documents


1 download

TRANSCRIPT

Presented by:AKHIL GADA

CSCI 572University of Southern California

Full Text Indexing Based On Lexical Relations An

Application :Software Library by YS Maarek and F.A. Smadja

July 15th , 20102Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries

REQUIREMENT FOR SEARCH IN SOFTWARE LIBRARY

SEARCH FOR FUNCTIONALLY SIMILAR COMPONENTS

E.g. Yahoo Search API and Google Search API for query “I want to search pages”

3Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

A.I. OR Knowledge Base Approach

I.R. OR Free Text Based Approach

ENTER DOMAIN KNOWLEDGE

NO PRIOR KNOWLEDGE REQUIRED

MANUAL OR SEMI-AUTOMATIC COMPLETELY AUTOMATIC

SPECIFIC AND DIFFICULT TO SCALE TO NEW DOMAIN

GENERIC AND VERRY EASY TO SCALE TO NEW DOMAIN

SEMANTIC UNDERSTANDING OF DOCUMENTS

NO SEMANTIC UNDERSTANDING OF DOCUMENTS

4Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

SINGLE KEYWORD LEXICAL RELATION

CONTEXT INFORMATION IS LOST E.g. Apple Fruit VS Apple Computers

REVEALS CONTEXT INFORMATION

HIGH FREQUENCY GENERIC TERMS MIGHT INTRODUCE NOISE . E.g. Word “File” in UNIX manual does not characterize the functionality of any command

HIGH FREQUENCY OF LEXICAL TERM PROVIDES HIGH FUNCTIONAL INFORMATION OF DOCUMENT

E.g. Word “Copy File” in UNIX

VS

5Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

LINEAR IR USING INVERTED INDEX

CLUSTERING IR USING HAC(Hierarchical Agglomerative Clustering)

6Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

LEXICAL RELATIONS TWO WORDS IN A SENTENCE HAVING SYNTACTIC RELATIONSHIP BETWEEN THEM : Subject-Verb, Verb-Direct object , Verb-Indirect object, etc

OPEN CLASS WORD – NOUNS,ADJECTIVE,ADVERBS ARE MEANING BEARING .

CLOSED CLASS WORD – Conjunctions (and, or), Articles (the, a), Demonstratives (this, that), and Prepositions (to, from, at, with). Does not convey any Meaning to sentence

7Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

5 – Word Window

EXTRACT [1] LEXICAL RELATIONS ALGO.[2]

W1

W2

W3

W4

W5

8Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

EXTRACT [1] LEXICAL RELATIONS ALGO. [2]

9Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

EXTRACT [1] LEXICAL RELATIONS ALGO. [2]

10Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

RESOLVING POWER

OUTPUT FROM EXTRACT [1] ALGORITHM. [0]

11Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

SELECT TOP N INFORMATIVE (RESOLVING POWER)LEXICAL RELATION FOR EACH DOCUMENT FORMING PROFILE FOR THE DOCUMENT .

CREATE INVERTED INDEX . [2]

12Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

SIMILARITY MEASURE BETWEEN TWO DOCUMENTS [2]

• LET X = set of top N resolving power lexical relations for document dx Y = set of top N resolving power lexical relations for document dy (X ∩ Y) = Set of Lexical Relations Common Between dx and dy

dx dy∂(dx,dy)

13Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

CLUSTER SIMILAR FUNCTIONAL COMPONENTS USING HIERARCHICAL AGGLOMERATIVE CLUSTERING[2]

{d1}

∂({d1},{d2}) ∂({d3},{d4})

{d2}

{d3}

{d4}

{d5}

∂({d3,d4},{d5})

14Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

INFORMATION RETRIEVAL[2]USER SPECIFY FREE TEXT

QUERY

SEARCH AND RETURN RESULTS - LINEAR I.R. USING INVERTED INDEX

USER SATISFIED ??

ALLOW USER TO TRAVERSE THROUGH CLUSTERED HIERARCHY

NO

15Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

LINEAR INFORMATION RETRIEVAL[2]

dqdq

d1

∂(dq,d2)

∂(dq,d1)

∂(dq,dn)

d2

dn

16Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

GURU : WORKING SYSTEM SNAPSHOT [2]

17Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

EVALUATION[2]

MAINTENANCE COST : INCREMENTAL INSERTION [3] OF NEW COMPONENTS IS EASY

EFFICIENCY: 2.5 secs on RT ;0.15 secs on IBM RISC for query containing 5 to 15 Lexical Relation

RETRIEVAL EFFECTIVENESS : Contd…

18Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

EVALUATION Precision-Recall Curve[ 2]

If c = Total number of records retrieved after executing query q R= Total Number of expected correct result - Determined before

query is executed. r = Total number of correct result retrieved after executing query q.

Then Recall = r/R Prescision= r/c

19Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

PROS:

EASY TO EXTEND TO ANY DOMAIN i.e. GENERIC APPROACH

VERY SIMPLE AND ELEGANT APPROACH

PAPER ADEQUATELY PROVIDED BACKGROUND BY DESCRIBING PAST RESEARCH

20Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

CONS:May fail in following case

E.g. ‘xcalc’ and ‘bc’

21Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

FURTHER RESEARCH:COMBINE KNOWLEDGE BASE APPROACH WITH THIS TECHNIQUE e.g. Knowledge bc=calculator can be added to GURU to increase recall.

IMPROVED ALGORITHMS FOR INCREMENTAL UPDATION OF INDICES .

22Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

References• 0 - Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries by Yoelle S. Maarek, Frank A Smadja

• 1 - F. De Saussure, Cours de Linguistique Geaerale, Qualridme edition. Librairie Payot, Paris, France, 1949.

• 2 – GURU-Information Retrieval For Reuse - Y S. Maarek,Deniel M Berry,Gail E . Kaiser.

• 3 - Kaplan and Maarek, 1990: Incremental maintenance of semantic links in dynamically changing hypertext systems .Interacting with Computers

23Full Text Indexing Based On Lexical Relations ; An Application : Software Libraries July 15th , 2010

Q & A