ontology-based free-form query processing for the semantic web mark vickers brigham young university...
Post on 21-Dec-2015
218 views
TRANSCRIPT
Ontology-Based Ontology-Based Free-Form Query Free-Form Query
Processing for the Processing for the Semantic WebSemantic Web
Mark VickersMark Vickers
Brigham Young UniversityBrigham Young University
MS Thesis DefenseMS Thesis Defense
Supported by:
2
Presentation OverviewPresentation Overview
Web Queries Web Queries Explanation of AskOntosExplanation of AskOntos DemoDemo Evaluation Evaluation Future Work and ConclusionFuture Work and Conclusion
3
Web Queries: Web Queries: ChallengesChallengesExample: Searching for a carExample: Searching for a car
Cannot specify constraintsCannot specify constraints
Documents returned (usually too many)Documents returned (usually too many)
Takes time to read through documents Takes time to read through documents
Determine relevance Determine relevance
Find information (price, year, etc.)Find information (price, year, etc.)
4
Web Queries: Web Queries: OpportunitiesOpportunities Semantic webSemantic web
Proposed ontology-based framework for Proposed ontology-based framework for making information machine-readablemaking information machine-readable
Uses markup languages to identify Uses markup languages to identify informationinformation
““[A] search program can look for only those [A] search program can look for only those pages that refer to a precise concept…”pages that refer to a precise concept…”
-Tim Berners-Lee-Tim Berners-Lee
How should semantic web be searched?How should semantic web be searched?
5
Solution: AskOntos – a Solution: AskOntos – a Query System for the Query System for the Semantic WebSemantic Web
Allows free-form queries over Allows free-form queries over
semantically annotated pagessemantically annotated pages
Processes queries using information Processes queries using information
extractionextraction
Returns tables of extracted valuesReturns tables of extracted values
6
AskOntos OverviewAskOntos Overview
7
Extraction OntologiesExtraction Ontologies
Object sets
Relationship sets
Participation constraints
Lexical
Non-lexical
Primary object set
Aggregation
Generalization/Specialization
8
Extraction OntologiesExtraction Ontologies
Value Expression: \s*[$]\s*(\d{1,3})*(\.\d{2})?
Key Word Phrase
Left Context: $
Data Frame:
Internal Representation: float
Value Phrase
Key Word Expression: ([Pp]rice)|([Cc]ost)| …
Operation Phrase
Operator: >
Expression: (more\s*than)|(more\s*costly)|…
9
Annotating Web PagesAnnotating Web Pages
10
Annotating Web PagesAnnotating Web Pages
11
Step 1. Parse Query Step 1. Parse Query “Find me the and of all s – I want a ”
price
mileage
red
Nissan
1996
or newer
>= Operator
12
Step 2. Find Related Step 2. Find Related OntologyOntology
Similarity value: 5
Similarity value: 2
“Find me the price and mileage of all red Nissans – I want a 1996 or newer”
13
Conjunctive and aggregate queries run Conjunctive and aggregate queries run over selected ontology’s extracted over selected ontology’s extracted valuesvalues
Value-phrase-matching words Value-phrase-matching words determine conditionsdetermine conditions
Conditions:Conditions: Color = “red”Color = “red” Make = “Nissan”Make = “Nissan” Year >= 1996Year >= 1996 >= Operator
Step 3. Formulate XQuery Step 3. Formulate XQuery ExpressionExpression
14
For
Let
Where
Return
Step 3. Formulate XQuery Step 3. Formulate XQuery ExpressionExpression
15
Step 4. Run XQuery Step 4. Run XQuery Expression OverExpression Over Ontology’s Extracted Ontology’s Extracted DataData Uses Qexo 1.7, GNU’s XQuery engine for JavaUses Qexo 1.7, GNU’s XQuery engine for Java
Orders results according to number of valuesOrders results according to number of values
16
DemoDemo
17
Evaluation of AskOntosEvaluation of AskOntos
Success Measure:Success Measure: ability to translate free- ability to translate free-form queries into formal queriesform queries into formal queries
Extraction ontologiesExtraction ontologies: car ads, house ads, : car ads, house ads, countries, movies, and diamond adscountries, movies, and diamond ads
3 rounds of testing3 rounds of testing 50 queries each (gathered from other CS 50 queries each (gathered from other CS
students)students) 11stst round discarded due to queries round discarded due to queries Minor improvements on system between Minor improvements on system between
roundsrounds
18
Query Translation Query Translation MetricsMetrics
“Find me the price and mileage of all red Nissans – I want a 1996 or newer.”
Human conversion
for $doc in document("file:///.../Car.OWL")/rdf:RDF for $Record in $doc/owl:Thing
… where($Color="red" or empty($Color)) and ($Make="Nissan" or empty($Make)) and ($Year="1996" or empty($Year)) return <Record ID="{$id}"> <Price>{$Price}</Price> <Color>{$Color}</Color> <Make>{$Make}</Make> <Year>{$Year}</Year> </Record>
Automated conversion PrecisionPrecision RecallRecall
Return-Clause Return-Clause NamesNames 100%100% 80%80%
ConditionsConditions 66%66% 66%66%
Return-Clause
Names: {Price,Color, Make, Year}
Conditions: {(Color,=,“red”), (Make,=,“Nissan”), (Year,=,“1996”)}
Return-Clause
Names: {Price, Mileage,Color, Make, Year}
Conditions: {(Color,=,“red”), (Make,=,“Nissan”), (Year,>=,“1996”)}
19
ResultsResults
20
Result AnalysisResult AnalysisCommon reasons for errors:Common reasons for errors:
1. Word not in lexicon:1. Word not in lexicon:
““5 Bedrooms, 3 Bath, 5 Bedrooms, 3 Bath, studystudy, , game roomgame room, 2 car garage, and < $250,000”, 2 car garage, and < $250,000”
21
Result AnalysisResult Analysis
““Which countries Which countries ususe the euro?”e the euro?”
2. Mistakes in regular expressions2. Mistakes in regular expressions
22
Result AnalysisResult Analysis3. Not enough context:3. Not enough context:
““What are the models from What are the models from 20052005””
23
Conclusion/Conclusion/ContributionsContributions AskOntos AskOntos
Is a free-form query system for the semantic Is a free-form query system for the semantic webweb
Applies information extraction for query Applies information extraction for query processingprocessing
Answers questions with extracted data valuesAnswers questions with extracted data values ContributionsContributions
Web queries that use semantic annotationsWeb queries that use semantic annotations Web queries returning answers from extracted Web queries returning answers from extracted
datadata Processing free-form queries using ontologies Processing free-form queries using ontologies
24
Future WorkFuture Work
Disjunction and negationDisjunction and negation Fuzzy queriesFuzzy queries Spellchecker Spellchecker
25
26
TREC 2004 QA Question TREC 2004 QA Question TopicsTopics
27
Related ResearchRelated Research
SimilaritiesSimilarities DifferencesDifferences
QUESTQUEST (1999)(1999)
• Uses OntologiesUses Ontologies
• Graphic-based interfaceGraphic-based interface• Returns generated Returns generated documents and documents and
graphsgraphs
SHOESHOE (2000) (2000) • Returns tables of dataReturns tables of data • Form-based interfaceForm-based interface
AQUAAQUA (2004) (2004)
• Natural language Natural language interfaceinterface• Uses ontology as part of Uses ontology as part of query translation processquery translation process
• For single domain For single domain environmentenvironment• Part-of-speech recognitionPart-of-speech recognition• Uses ontology for term Uses ontology for term replacementreplacement• Returns passagesReturns passages
28
Related ResearchRelated Research
SimilaritiesSimilarities DifferencesDifferences
Bernstein Bernstein et alet al. (2005). (2005)
• Natural language Natural language interfaceinterface
• Allows only subset of English Allows only subset of English (Attempto Controlled English) (Attempto Controlled English) queriesqueries
SWSE (2005)SWSE (2005)
• Natural language Natural language interfaceinterface• Returns semantically Returns semantically annotatedannotated
datadata• No part-of-speech No part-of-speech recognitionrecognition
• Query context found by Query context found by matchingmatching
RDF labels, comments and RDF labels, comments and literalsliterals• Uses WordNetUses WordNet
NaLIX (2006)NaLIX (2006)
• Converts natural Converts natural languagelanguage
query to same XML query to same XML queryquery
languagelanguage
• Limited to parsing ability of Limited to parsing ability of MINIPARMINIPAR• For XML databaseFor XML database• Query terms expanded with Query terms expanded with WordNetWordNet
29
records returned correct precision recall
simple1 19 20 19 95.00% 100.00%
Simple2 19 17 17 100.00% 89.47%
Simple3 11 11 11 100.00% 100.00%
Simple4 9 9 9 100.00% 100.00%
Simple5 12 13 11 84.62% 91.67%
Simple6 12 11 10 90.91% 83.33%
Simple7 14 10 10 100.00% 71.43%
Simple8 5 7 5 71.43% 100.00%
Simple9 14 14 14 100.00% 100.00%
Simple10 15 15 15 100.00% 100.00%
Total 130 127 121 95.28% 93.08%
records returned correct precision recall
simple1 19 22 19 86.36% 100.00%
simple2 19 20 0 0.00% 0.00%
simple3 11 14 11 78.57% 100.00%
simple4 9 10 9 90.00% 100.00%
simple5 12 16 12 75.00% 100.00%
simple6 12 23 9 39.13% 75.00%
simple7 14 22 13 59.09% 92.86%
simple8 5 10 0 0.00% 0.00%
simple9 14 16 14 87.50% 100.00%
simple10 15 16 0 0.00% 0.00%
Total 130 169 87 51.48% 66.92%
Simple Multiple-Record Simple Multiple-Record Documents Documents
VSM SeparatorVSM Separator Highest-Fanout SeparatorHighest-Fanout Separator
Genealogy Domain – from Troy Walker’s thesis
30
Complex Complex MultipleMultiple-Record -Record DocumeDocume
ntsnts
records returned missed extra correct precision recall
complex1 10 10 0 0 10 100.00% 100.00%
complex2 15 15 0 0 15 100.00% 100.00%
complex3 12 12 0 0 12 100.00% 100.00%
complex4 7 9 1 3 6 66.67% 85.71%
complex5 16 15 1 0 15 100.00% 93.75%
complex6 15 16 2 3 13 81.25% 86.67%
complex7 13 12 1 0 12 100.00% 92.31%
complex8 10 10 0 0 10 100.00% 100.00%
complex9 19 20 1 2 18 90.00% 94.74%
complex10 10 10 1 1 9 90.00% 90.00%
complex11 15 11 4 0 11 100.00% 73.33%
complex12 15 15 0 0 15 100.00% 100.00%
complex13 11 11 0 0 11 100.00% 100.00%
complex14 16 18 1 3 15 83.33% 93.75%
complex15 8 8 2 2 6 75.00% 75.00%
complex16 8 9 0 1 8 88.89% 100.00%
complex17 10 11 0 0 11 100.00% 110.00%
complex18 4 1 3 0 1 100.00% 25.00%
complex19 8 11 0 3 8 72.73% 100.00%
complex20 16 13 4 1 12 92.31% 75.00%
Total 238 237 21 19 218 91.98% 91.60%
31
Scaling to the WebScaling to the Web
Ontologies crawl and harvest web Ontologies crawl and harvest web pagespages
Ontologies extract values from pagesOntologies extract values from pages Ontologies indexed Ontologies indexed Queries extracted by relevant Queries extracted by relevant
ontologiesontologies
Rely on Google-like technologyRely on Google-like technology