gregory grefenstette exalead exalead s.a. © 2009 search-based applications: the maturation of...
TRANSCRIPT
Gregory GrefenstetteExalead
Exalead S.A. © 2009
Search-Based Applications:the Maturation of Search
Maturation of Search
2
3
www.exalead.com/search 8 billion URLS, 2 billion images, 200 million videosWikipedia, cloud tags also Labs.exalead.com
Two ways to find information
44
DATABASESDATABASES
SEARCH ENGINESSEARCH ENGINESVSVS
Recent Past
5
SEARCH ENGINESSEARCH ENGINESDATABASESDATABASES
• Structured Data
• Transaction• Precise• All tuples
• SQL• Slow
• Structured Data
• Transaction• Precise• All tuples
• SQL• Slow
• Text • Similarity• Ranking
• Intuitive• Fast• Partial
• Text • Similarity• Ranking
• Intuitive• Fast• Partial
More Recent
6
DATABASESDATABASES
• Structured Data
• Transactions
• Precise• All tuples
• SQL• Slow
• Structured Data
• Transactions
• Precise• All tuples
• SQL• Slow
• Text • Similarity• Ranking
• Intuitive• Fast• Partial
• Text • Similarity• Ranking
• Intuitive• Fast• Partial
• Top-K• Column store• Map Reduce• Data Cube
• Top-K• Column store• Map Reduce• Data Cube
• Connectors• Facets• Map Reduce• Tables
• Connectors• Facets• Map Reduce• Tables
SEARCH ENGINESSEARCH ENGINES
NOW
DATABASESDATABASES SEARCH BASED SEARCH BASED APPLICATIONSAPPLICATIONSSEARCH BASED SEARCH BASED APPLICATIONSAPPLICATIONS SEARCH ENGINESSEARCH ENGINES
Search based Application
An application which uses a search engine component, but whose final purpose is not searching for a document, but rather a domain-oriented process result
– Examples: • Custom response management• Logistic tracking and tracing• Contextual Advertising• Database reporting after offloading
8
Databases are the backbone of search in information systemsDatabases are the backbone of search in information systems
Current situation
Front-officeusers
DatabaseDatabase
DataDataWarehouseWarehouse
DataMartDataMart
BIreports
Businessprocesses
Search-enabled applicationOptimized solution for information accessOptimized solution for information access
DatabaseDatabase
DataDataWarehouseWarehouse
SearchSearchEngineEngine
Front-officeusers
BIreports
Businessprocesses
Drawbacks of Using
Database Search
As aComponent
12
Search Based ArchitectureSearch Based ArchitectureStandard ArchitectureStandard Architecture
How does a Search Based Application work?
14
• Business items are concrete objects directly understandable by end-users– Product, Customer, Purchase order, Technical support call
• Each business item becomes a document• Straightforward and simple format of the document index
allows performance and ease-of-use• Search engine can offer rich and powerful query language that
allows to make queries as complex and advanced as SQL despite the flat data model
• Search Engine must support – typed fields, intra field scope search, category/facets
15
Database converted to Business ItemsDatabase converted to Business ItemsStored as structured documentsStored as structured documents
Product_ID Product_Name Manufacturer_Names
123 control switch ACME Inc ; The Control Switch Company; Karl GmbH
124 red warning light …
Database into structured documentsDatabase into structured documents
Scope Search
Product_ID Product_Name
123 control switch
124 red warning light
Product_ID Manufacturer_ID
123 345
123 8574
123 4483
Manufacturer_ID Manufacturer_NAME
345 ACME Inc.
8574 The Control Switch Company
4483 Karl GmbH
Product_ID Product_Name Manufacturer_Names
123 control switch ACME Inc ; The Control Switch Company; Karl GmbH
124 red warning light …
… but the manufacturer namescan still be searched as individual records with scope search "ACME GmbH"does not match the document here)
Hierarchical categories
18
Product_ID Color Brand Fragile Nb of wheels
Wheel type
123 Red ACME Y 3 2
Product_ID Country
123 France
123 UK
123 Germany
Product_ID Attributes
123 Color/Red ; Brand/ACME ; Fragile/Y ; Nb_wheels/3 ; Wheel_type/2;Country/France ; Country/UK; Country/Germany
124 …
Multiple kinds of attributes can be mixed in a same category field. The hierarchical tree structure of
the categories preserves the differences between attribute
types
Multiple kinds of attributes can be mixed in a same category field. The hierarchical tree structure of
the categories preserves the differences between attribute
types
Multi-valued attributes can also be represented by categories. A single category field can be used
to store hundreds or thousands of attribute columns.
Multi-valued attributes can also be represented by categories. A single category field can be used
to store hundreds or thousands of attribute columns.
Multi-dimensional facets
19
Multi-dimensional facets• Search results facets provide aggregate values computed on-
the-fly with the search results list– One single search query can return the equivalent of dozens of
“GROUP BY” SQL clauses– Numerical values associated with facets (count, score, …) can be used
to perform complex computations on the results list
20
• Search performance is not affected by the size of the category tree– Thousands of attribute types can be represented by categories– Facets are dynamically selected by the search results: the displayed
attributes are always consistent with the search query (e.g. “color” and “engine type” when searching for a car, “screen size” and “CPU speed” when searching for a laptop)
CASE STUDY
LOGISTICS TRACK & TRACE
21
Gefco overview• A subsidiary of French car maker PSA (Peugeot, Citroën)
– Now does most of its business outside of PSA• Logistics operator
– Carries cars from factories to dealers (road, rail)– Carries freight (parcels ; originally spare parts)– Supply chain and logistic platform design
• 3.5B€, 10 000 employees, 100 countries
The original pain
• Classical multi-criteria search over Oracle, 2 million rows• Poor performance despite 2 years of optimization
– Minute response times– Ask users to do simple queries and preferably at some given hours
From forms to a search box
24
25
New application With operational reporting
French Post Office
28
Partner
• Tracing of incidents• Real-time system• Used as an internal
audit tool for the mail• Suggestion of addresses
for customers• Search in file numbers,
addresses, names, etc.
• Tracing of incidents• Real-time system• Used as an internal
audit tool for the mail• Suggestion of addresses
for customers• Search in file numbers,
addresses, names, etc.
Case Study: RightMove
31
Rightmove: Reduce Costs and Improve Performance through Database
32
Advantages of Search Based Applications
33
35
Conclusions• Search engines mature
– Structured data, high volume, high speed• Search based Applications offer
– Usage: Search interface familiar to user– Performance: Search engine geared to search,
eases load on database platform– Agility: Original database design untouched,
reconfiguring output lightweight
36