why semantic search is hard
DESCRIPTION
Describes problems with semantic search and how Truevert technology overcomes themTRANSCRIPT
© 2008 OrcaTec LLC
Why is Semantic Search so Hard?
and What Truevert Does About It
Powered by
www.truevert.com
www.orcatec.com
© 2008 OrcaTec LLC
Semantic search harnesses the meaning of words to improve the quality of search results
© 2008 OrcaTec LLC
Using meaning is difficult
© 2008 OrcaTec LLC
Language is dynamic
Jabberwocky Effect
Humpty Dumpty Syndrome
Making up new words Using old words in new ways
blog Twitter
© 2008 OrcaTec LLC
Strike
Bank
Words are ambiguous
© 2008 OrcaTec LLC
How ambiguous? Look it up!The companies have agreed to a brief delay in implementing their agreement. 37 14 39 17 54 62 20 8 84 8 7 9
7,788,584,618,680,320 possible interpretations
Each word disambiguates the others
# definitions
© 2008 OrcaTec LLC
Isn’t the Semantic Web supposed to fix these problems?
The Semantic Web was intended to support machine – machine communication to manage the day to day mechanisms of trade, bureaucracy, and daily life (Berners-Lee, 1999).
© 2008 OrcaTec LLC
Web Ontology Language: OWL
Semantic Web
Line up the information in web pages with predefined categories
© 2008 OrcaTec LLC
Sports Recreation
Baseball
Basketball Cricket
Gloves
Basketballs
Baseballs
Wicket
Is aIs a
Is a
Batter
Is a
Is a
UsesUses
Uses Player UsesPlayer
Ontology: set of concepts, categories, relations
Ontologies cast meaning into categories
Is a
© 2008 OrcaTec LLC
Ontologies
Limit thinking to known tracks
© 2008 OrcaTec LLC
People are creative
For example: 20 - 25% of the searches on Google on any day have never been seen before
© 2008 OrcaTec LLC
What categories matter to you? “basketball?”
Bouncy things
Round things
Things to dribble
Things that my brother hates
Things with a pebbly surface
Things that Barack Obama likes
Things that float
An infinite number of ways to categorize
© 2008 OrcaTec LLC
What’s Truevert’s solution?
© 2008 OrcaTec LLC
“The meaning of a word is its use in the language.”
—Ludwig Wittgenstein
Philosophical Investigations, § 43.
© 2008 OrcaTec LLC
Truevert learns the meaning of words in the same way that people do, from the context in which they are used
Truevert works in any language
© 2008 OrcaTec LLC
Gabbro is a dark, coarse-grained, igneous rock formed underground. It is chemically equivalent to basalt.
Gabbro is rarely used as a building stone.
Do you know the meaning of the word “Gabbro?”
© 2008 OrcaTec LLC
Blah blah blah court blah blah blah lawyer blah blah blah blah bailiff blah blah blah blah blah.
Blah blah court blah blah blah basketball blah blah blah blah blah blah freethrow blah blah blah blah.
Computer creates model of word use patterns from documents in its vertical
Legal vertical
Sport vertical
© 2008 OrcaTec LLC
Model identifies characteristic word patterns for vertical
Court & (lawyer or bailiff or jury or attorney or …) = legal
Court & (basketball or hoops or freethrow or …) = sports
© 2008 OrcaTec LLC
Word use patterns are meaning
© 2008 OrcaTec LLC
Follow your own path
Truevert delivers results tuned to your interests
© 2008 OrcaTec LLC
Truevert’s patterns let YOU find the results that YOU are looking for
© 2008 OrcaTec LLC
Green Vertical Semantic Search Results
© 2008 OrcaTec LLC
Truevert is a project of OrcaTec LLC.
Headquartered in Ojai, CA. OrcaTec is a leading provider of information discovery software including intelligent semantic search, near duplicate clustering, language identification, email threading, and interesting phrase finding.
OrcaTec-developed software was nominated by the Jet Propulsion Laboratory as NASA software of the year 2008.
OrcaTec software has been used in electronic discovery and advertising applications as well as knowledge management.
Core OrcaTec software is patent pending.