Download - NetIKX Semantic Search Presentation
![Page 1: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/1.jpg)
Semantic Search
Ready to Use?
Dr Victoria Uren
![Page 2: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/2.jpg)
Motivation
“A little semantics goes a long way”
Jim Hendler
“The classic keyword search box exerts a powerful gravitational pull.
Academics and industry researchers need to achieve the intellectual
‘escape velocity’ necessary to revolutionize search. They must invest
much more in bold strategies that can achieve natural-language
searching and answering, rather than providing the electronic
equivalent of the index at the back of a reference book. “
Oren Etzioni, Search needs a shake up, Nature, 4 Aug. 2011, v.476,
pp25-26
![Page 3: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/3.jpg)
Plan
Introduction - What is semantic search?
Research Background
How it works
Interface types
Research Issues
What is usable?
For web search
For corporate data management
![Page 4: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/4.jpg)
Introduction
![Page 5: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/5.jpg)
Search as we know it
Full text search
TF-IDF & other statistical approaches
PageRank – exploiting hyperlink graph
Controlled term search
OPAC
MESH etc.
Other metadata
Date of publication, author etc.
Output typically ranked pages, records, documents
![Page 6: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/6.jpg)
Semantic Search
Classic IR perspective
Improve statistical/link based search of documents / webpages
by better understanding user’s information need
Resolve ambiguity
Clustering
Query expansion
Past searches, WordNet etc. to suggest related terms
![Page 7: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/7.jpg)
![Page 8: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/8.jpg)
![Page 9: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/9.jpg)
Semantic Search
Web 3.0 perspective
Improve search over machine understandable data which
may, or may not, include annotated documents
Search for entities (people, products …)
Search for facts (capital of Georgia?)
Fuse knowledge from different sources
Exploit structure of formal knowledge
Broader / narrower plus much more
![Page 10: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/10.jpg)
Web 3.0 Search is
Metadata search
So more like
Searching a relational database
E.g. an OPAC
Search of the deep web
BUT linked data is “heterogeneous”
Multiple domains mixed together
Microformats & RDFa are from multiple sources
Quality & consistency variable
![Page 11: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/11.jpg)
Benefits of Semantic Search
Machine understandability
i.e. controlled by “ontologies” so you can reason over it
Supports entity search
Ambiguity
Seat/SEAT
Broader/narrower
Exploiting hierarchical class relations
Complex queries over triples
E.g. Joint between mild steel and stainless steel
Heterogeneity
Mappings between ontologies (silo bridging)
![Page 12: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/12.jpg)
Research Systems
![Page 13: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/13.jpg)
Formal queries over RDF
SQL-like languages
SPARQL , SeRQL
Xpath like languages
Xquery, Rpath
Others
Metalog (controlled English)
F-logic
RDF-QBE (query by example)
James Bailey et al., Web and Semantic Web Query
Languages: A Survey. Reasoning Web 2005: 35-133
![Page 14: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/14.jpg)
Sample SPARQL
SELECT ?x
WHERE { ?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John Smith" }
PREFIX vcard: http://www.w3.org/2001/vcard-rdf/3.0#
SELECT ?y ?givenName
WHERE { ?y vcard:Family "Smith" .
?y vcard:Given ?givenName . }
Examples from http://jena.sourceforge.net/ARQ/Tutorial/
Subject Object
Predicate
![Page 15: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/15.jpg)
Interfaces for Query Generation
Keyword
Forms
Graph based
Question answering
Tabular browsers
![Page 16: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/16.jpg)
Keyword based
Aims to be as close as possible to Google-like keyword search
Pluses
Minimal learning curve for users
Can handle heterogeneity
Minus
Query complexity is limited to Entity search & Simple
triples
![Page 17: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/17.jpg)
SemSearch
Y. Lei, V. Uren, and E. Motta, A Ranking-Driven
Approach to Semantic Search, Poster in ASWC 2008
![Page 18: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/18.jpg)
SemSearch
4 matches
(2 classes & 2 individuals)
6 matches
(relations)
Total queries generated = 4*6 = 24
for “News: Victoria“
![Page 19: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/19.jpg)
Forms
Familiar interface metaphor
Database search
Product search
Plus
Allows construction of more complex searches
Minus
Can’t handle heterogeneous open web - forms need to be pre-defined
![Page 20: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/20.jpg)
![Page 21: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/21.jpg)
![Page 22: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/22.jpg)
Graph-based Search
Aim is to expose the structure of the ontology to the user to
scaffold query formulation
Pluses
Good for single ontology environments
Helps the user comprehend the domain
Minuses
Can become unwieldy with big and complex domains
![Page 23: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/23.jpg)
![Page 24: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/24.jpg)
Question Answering
Natural language input
“What is the capital of Georgia?”
Translation process transforms the natural language into a formal query
Pluses
Relatively complex queries possible (intersection of 2 triples)
Can deal with heterogeneity
User doesn’t need to understand the ontology
Minuses
Heavy computation
![Page 25: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/25.jpg)
AquaLog: question answering
Natural Language
Query
Linguistic Triple
Logical Triples
Answer
GATE
components
Relation
Similarity
Service
Semantic
match
Lopez, V., Uren, V., Motta, E. and Pasin, M. (2007) AquaLog: An
ontology-driven question answering system for organizational
semantic intranets, Journal of Web Semantics, 5, 2, pp. 72-105.
What are the
projects
of Vanessa?
which is,
projects,
vanessa
project, has-
project-member/
has-project-leader,
vanessa
AKT,
Dot.KoM
![Page 26: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/26.jpg)
Tabular Browsing
Start with keyword search expand by browsing through links
Pluses
Supports data exploration
Output as sets of facts
Minuses
Not suitable for heterogeneous datasets
Can be slow
![Page 27: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/27.jpg)
Parallax (http://www.freebase.com/labs/parallax/)
![Page 28: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/28.jpg)
Research Challenges
Usability / expressivity trade off
Heterogeneity
Ontologies, quality, provenance
Mapping, filtering
Security & Privacy
Personal data, social web
Scalability
![Page 29: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/29.jpg)
Near Commercial Systems
![Page 30: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/30.jpg)
Usable Web3.0 Tools
For Web search
For Corporate data management
NOTE – a personal selection – I’m not endorsing any of these!
![Page 31: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/31.jpg)
Sig.ma (Semantic Information Mashup) http://sig.ma
Runs off Sindice crawl of pages with embedded RDFa and
other microformats
Uses a keyword search for entities
No attempt at fusion or disambiguation
![Page 32: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/32.jpg)
Web Search -Sig.ma
![Page 33: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/33.jpg)
![Page 34: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/34.jpg)
![Page 35: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/35.jpg)
Google RichSnippets
Entity data based on microformats, RDFa, microdata
Reviews
People
Products (GoodRelations)
Businesses & Organizations
Recipes
Events
Video
Supports entity search, with keyword search & facetted browsing
Harvested from sites which supply the data in the required formats
![Page 36: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/36.jpg)
![Page 37: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/37.jpg)
Wolfram|Alpha http://www.wolframalpha.com/
Focus is on computational knowledge
Natural language question input
Uses its own proprietary knowledge base
![Page 38: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/38.jpg)
![Page 39: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/39.jpg)
DBpedia http://dbpedia.neofonie.de/browse/
Searches factual information extracted from Wikipedia as RDF
Facetted browse approach in the home page
BUT used in many many other research & Open Linked Data
sites (e.g. Sig.ma)
![Page 40: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/40.jpg)
![Page 41: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/41.jpg)
Usable Web3.0 Tools
For Web Search
For Corporate Data Management
Opportunity for bridging data silos
Keyword search has never been as good for CMS and
Intranet as for internet
Need experts to configure free text search well
Distribution of terms can be skewed – impossible to
configure
Web3.0 is a network native technology
![Page 42: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/42.jpg)
Drupal 7
One of the most popular CMS
E.g. Recovery.gov was originally on Drupal
Semantic Drupal research pioneered by DERI Galway
Open Source
Developers often prefer it to Sharepoint
RDFa export as standard from CMS structure (no annotation needed)
Publish structured data that Google, Sindice etc. can harvest
API methods built in
Search NOT built in
![Page 43: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/43.jpg)
![Page 44: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/44.jpg)
Virtuoso (http://virtuoso.openlinksw.com/)
Hybrid server
XML
SQL
RDF
Free Text
Supporting
Merging of data silos in different formats
Production of Web applications & services
Large Scale
Open Source version
![Page 45: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/45.jpg)
Ready to use?
Beyond the TRL3-5 “valley of Death”
TRL7? for facetted browse, server technology
Not yet a stable market - technologies like SearchMonkey may come & go
![Page 46: NetIKX Semantic Search Presentation](https://reader033.vdocuments.site/reader033/viewer/2022051608/5441ef1dafaf9f52208b4839/html5/thumbnails/46.jpg)
Acknowledgements
People: Fabio Ciravegna , Aba-Sah Dadzie, Khadija
Elbedweihy, Miriam Fernandez, Yuangui Lei, Vanessa Lopez,
Enrico Motta
Projects: X-Media, OpenKnowledge, AKT, SmartProducts