why semantic search is hard

24
© 2008 OrcaTec LLC Why is Semantic Search so Hard? and What Truevert Does About It Powered by www.truevert.com www.orcatec.com

Upload: truevert

Post on 30-Oct-2014

13 views

Category:

Technology


2 download

DESCRIPTION

Describes problems with semantic search and how Truevert technology overcomes them

TRANSCRIPT

Page 1: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Why is Semantic Search so Hard?

and What Truevert Does About It

Powered by

www.truevert.com

www.orcatec.com

Page 2: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Semantic search harnesses the meaning of words to improve the quality of search results

Page 3: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Using meaning is difficult

Page 4: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Language is dynamic

Jabberwocky Effect

Humpty Dumpty Syndrome

Making up new words Using old words in new ways

blog Twitter

Page 5: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Strike

Bank

Words are ambiguous

Page 6: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

How ambiguous? Look it up!The companies have agreed to a brief delay in implementing their agreement. 37 14 39 17 54 62 20 8 84 8 7 9

7,788,584,618,680,320 possible interpretations

Each word disambiguates the others

# definitions

Page 7: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Isn’t the Semantic Web supposed to fix these problems?

The Semantic Web was intended to support machine – machine communication to manage the day to day mechanisms of trade, bureaucracy, and daily life (Berners-Lee, 1999).

Page 8: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Web Ontology Language: OWL

Semantic Web

Line up the information in web pages with predefined categories

Page 9: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Sports Recreation

Baseball

Basketball Cricket

Gloves

Basketballs

Baseballs

Wicket

Is aIs a

Is a

Batter

Is a

Is a

UsesUses

Uses Player UsesPlayer

Ontology: set of concepts, categories, relations

Ontologies cast meaning into categories

Is a

Page 10: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Ontologies

Limit thinking to known tracks

Page 11: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

People are creative

For example: 20 - 25% of the searches on Google on any day have never been seen before

Page 12: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

What categories matter to you? “basketball?”

Bouncy things

Round things

Things to dribble

Things that my brother hates

Things with a pebbly surface

Things that Barack Obama likes

Things that float

An infinite number of ways to categorize

Page 13: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

What’s Truevert’s solution?

Page 14: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

“The meaning of a word is its use in the language.”

—Ludwig Wittgenstein

Philosophical Investigations, § 43.

Page 15: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Truevert learns the meaning of words in the same way that people do, from the context in which they are used

Truevert works in any language

Page 16: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Gabbro is a dark, coarse-grained, igneous rock formed underground. It is chemically equivalent to basalt.

Gabbro is rarely used as a building stone.

Do you know the meaning of the word “Gabbro?”

Page 17: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Blah blah blah court blah blah blah lawyer blah blah blah blah bailiff blah blah blah blah blah.

Blah blah court blah blah blah basketball blah blah blah blah blah blah freethrow blah blah blah blah.

Computer creates model of word use patterns from documents in its vertical

Legal vertical

Sport vertical

Page 18: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Model identifies characteristic word patterns for vertical

Court & (lawyer or bailiff or jury or attorney or …) = legal

Court & (basketball or hoops or freethrow or …) = sports

Page 19: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Word use patterns are meaning

Page 20: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Follow your own path

Truevert delivers results tuned to your interests

Page 21: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Truevert’s patterns let YOU find the results that YOU are looking for

Page 22: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Green Vertical Semantic Search Results

Page 23: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Truevert is a project of OrcaTec LLC.

Headquartered in Ojai, CA. OrcaTec is a leading provider of information discovery software including intelligent semantic search, near duplicate clustering, language identification, email threading, and interesting phrase finding.

OrcaTec-developed software was nominated by the Jet Propulsion Laboratory as NASA software of the year 2008.

OrcaTec software has been used in electronic discovery and advertising applications as well as knowledge management.

Core OrcaTec software is patent pending.

Page 24: Why Semantic Search Is Hard

© 2008 OrcaTec LLC

Contact [email protected]

805-918-4612