hykss: hybrid keyword and semantic search
DESCRIPTION
HyKSS: Hybrid Keyword and Semantic Search. Andrew Zitzelberger. 1. Keyword Search. 2. Form Based Search. 3. What about?. over 8,000 meters in elevation. less than 100K miles. faster than 100 mph. 4. 5. HyKSS. Hy brid K eyword and S emantic S earch - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/1.jpg)
HyKSS: Hybrid Keyword and Semantic Search
Andrew Zitzelberger
1
![Page 2: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/2.jpg)
Keyword Search
2
![Page 3: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/3.jpg)
Form Based Search
3
![Page 4: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/4.jpg)
4
over 8,000 meters in elevation less than 100K miles faster than 100 mph
What about?
![Page 5: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/5.jpg)
5
![Page 6: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/6.jpg)
HyKSS
• Hybrid Keyword and Semantic Search• Semantics – extracted annotations–Multiple ontologies
• Keywords – text
6
![Page 7: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/7.jpg)
Thesis Statement
• HyKSS (hybrid search)– Outperforms keyword and semantic search– Dynamic query weighting outperforms various
other hybrid search approaches– Allows queries over multiple ontologies– Allows pay-as-you-go improvement
7
![Page 8: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/8.jpg)
Extraction Ontologies
8
![Page 9: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/9.jpg)
Data Frames
9
![Page 10: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/10.jpg)
Indexing Architecture
10
Keyword Indexer Semantic Indexer
Keyword Index Semantic Index
Document Collection
![Page 11: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/11.jpg)
Indexing Architecture Implementation
1111
Keyword Indexer
Semantic Indexer
Keyword Index
Semantic Index
Document Collection
OntoES
OntologyLibrary
Sesame
Lucene
![Page 12: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/12.jpg)
Query Processing
12
Free Form Query
Execute Query
Post-Process Query
Combine Results
Pre-Process Query
Execute Query
Post-Process Query
Pre-Process Query
Keyword Processing Semantic Processing
![Page 13: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/13.jpg)
Keyword Query Pre-Processing
13
• Remove Lucene special characters (except quotes)• Remove (inequality) comparison constraints• Remove non-phrase stopwords
hondas in "excellent condition" in orem for under 12 grand
hondas “excellent condition” orem
![Page 14: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/14.jpg)
Keyword Query Execution and Post-Processing
• Executed by Lucene• Empty Post-Processing step
14
![Page 15: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/15.jpg)
Semantic Query Pre-ProcessingIndividual Ontology Scoring
hondas in "excellent condition" in orem for under 12 grand
15
![Page 16: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/16.jpg)
Semantic Query Pre-ProcessingOntology Set Creation
• For each ontology sorted by score:– For each remaining ontology:• Add point for each new or subsuming match• If added points > 0 add ontology
• Completely subsumed ontologies are removed during query generation
16
![Page 17: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/17.jpg)
Semantic Query Pre-ProcessingOntology Set Creation
17
Price < 12000
LocationVehicle
ContractualServices Location
Vehicle
ContractualServices
Vehicle_Score + 1
US_City=“orem”
Price < 12000
Price < 12000
ContractualServices_Score + 1 Vehicle_Score
US_City=“orem”
![Page 18: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/18.jpg)
Semantic Query Pre-ProcessingStructured Query Generation
• Open world assumption• SPARQL query
18
![Page 19: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/19.jpg)
Semantic Query Execution and Post-Processing
• Sesame query execution• Semantic ranking:– 1 point for each requested projection satisfied– Normalized by # of projections requested
hondas in "excellent condition" in orem for under 12 grand– Projections on Make, Price and US_City
19
![Page 20: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/20.jpg)
Hybrid Query Processing
• Linear interpolation:– (kw_weight * kw_score) + (sm_weight * sm_score)
• Dynamic solution:– # keywords remaining (#kw)– concept match score (cms)
= ½ * (selections + projections)– kw_weight = #kw/(#kw + cms)– sm_weight = cms/(#kw + cms)
20
![Page 21: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/21.jpg)
Basic Search
21
![Page 22: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/22.jpg)
Results Display
22
![Page 23: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/23.jpg)
23
Form Based Search
![Page 24: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/24.jpg)
Results Display
![Page 25: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/25.jpg)
Experimental Setup – Ontology Libraries
• 5 Ontology Levels– Number– Generic Units– Vehicle Units– Vehicle– Vehicle+
25
![Page 26: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/26.jpg)
Experimental Setup – Query Sets
• 113 syntactically unique queries from database students
• 60 syntactically unique queries from linguistic students
26
![Page 27: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/27.jpg)
Experimental Setup – Document Collection
• 250 vehicle advertisements (Craigslist)– 100 training, 50 validation, 100 test
• 318 mountain pages (Wikipedia)• 66 roller coaster (Wikipedia)• 88 video game advertisements (Craigslist)
27
![Page 28: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/28.jpg)
Experiments
1) Training queries over test vehicle documents2) Test queries over test vehicle documents3) Training queries over test vehicle documents +
additional noise4) Test queries over test vehicle documents + additional
noise5) 5 queries over noisy data (Generic Units only)
28
![Page 29: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/29.jpg)
Experiments - Metric
• Mean Average Precision
29
![Page 30: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/30.jpg)
Experimental Results
30
![Page 31: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/31.jpg)
Experimental Results
31
![Page 32: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/32.jpg)
Experimental Results
32
![Page 33: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/33.jpg)
Conclusions
• Hybrid search outperforms keyword and semantic search
• HyKSS’s dynamic query weighting approach outperforms various other weighting techniques
• Using multiple does not outperform selecting and using a single ontology
33
![Page 34: HyKSS: Hybrid Keyword and Semantic Search](https://reader035.vdocuments.site/reader035/viewer/2022062517/56813a12550346895da1eb11/html5/thumbnails/34.jpg)
External Image Citations• Slide 2 Google search screenshot: http://www.google.com (07/30/11)• Slide 3 partial car search form screenshots: http://autotrader.com/fyc (07/30/11)• Slide 4 mountain image: http://en.wikipedia.org/wiki/Lhotse (04/26/11)• Slide 4 car image: http://en.wikipedia.org/wiki/Honda (04/26/11)• Slide 4 roller coaster image: http://en.wikipedia.org/wiki/Kingda_Ka (04/26/11)• Slide 4 Wikipedia logo: http://en.wikipedia.org/wiki/Main_Page (04/26/11)• Slide 4 craigslist logo: http://provo.craigslist.org/ (04/26/11)
34