-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
1/20
EXALEAD WHITEPAPER
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
2/20
Foreword
his whitepaper is intended to aid you in evaluating search and information access
proposals for your organization by detailing a very important, often overlooked, cost
component: scaling your search solution. Too many customers are surprised to find
that almost immediately after deploying a search engine, they need to scale their
platformand that the cost of scaling can be exorbitant.
This paper therefore:
Identifies the reasons why search needs escalate so frequently and dramatically,
Explains why scaling is often expensive,
Provides practical advice for anticipating and controlling costs, and
Furnishes performance benchmarks for more effectively making cost comparisons between
solutions.
We hope this information will aid you in developing a complete TCO forecast for your search platform,
one that effectively incorporates the costs associated with scaling functionality and/or performance
in addition to more easily identifying direct, indirect and upgrade costs.
The Authors
We Welcome Your Feedback
Whatever your roleIT analyst, system administrator, application end user, business manager,
security expert, or simply a curious readeryour feedback is important to us. We invite you to
contact us at the address below with your comments, suggestions or questions.
Frdric Catherine, Marketing Supervisor, [email protected]
+33 1 55 35 26 81
www.exalead.com
T
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
3/20
1 Why Search Demands & Costs Escalate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1 Users Demand Wider Access, More Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.2 IT Discovers New Uses, Additional Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
2 Anticipating and Controlling Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33 Forecasting Demand: Five Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
3.1 Double Your Estimated Volume; Anticipate Double-Digit Growth . . . . . . . . . . . . . . . . . . . .3
3.2 Plan for Additional Data Sources, including the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
3.3 Anticipate Demand for a Web-Style Experience, and Real Web Integration . . . . . . . . . . .4
3.4 Plan for Increased Compliance Demands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
3.5 Position Yourself for the Unexpected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
4 Understanding Search Types & Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54.1 Legacy Enterprise Search Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
4.2 Search Add-Ons from Mainstream Application Providers . . . . . . . . . . . . . . . . . . . . . . . . . .5
4.3 Web Search Engines Ported to the Enterprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
5 Establishing Apples-to-Apples Cost Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66 About Exalead CloudViewTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
6.1 Dual Web/Enterprise DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
6.2 High Performance with Minimal Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
6.3 Infinite Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
6.4 True Unified Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
6.5 Rapid Time to Market, Agile Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
7 CloudView Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107.1 Enterprise Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
7.2 Business Applications - Database Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
7.3 Web Applications - Online Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
7.4 Web Applications - Online Classifieds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Table of Contents
Figures
Fig. 1: Data Volume Managed by Companies Worldwide (IDC) . . . . . . . . . . . . . . . . . . . . . . . . .3
Fig. 2: CloudView Scales with Minimal Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
Fig. 3: CloudView Scales Infinitely in Five Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
Fig. 4: Scaling with a Distributed Architecture + Commodity Hardware . . . . . . . . . . . . . . . . .8
Fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource . . . . .9
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
4/20
1 Why Search Demands & Costs Escalate
At root, search demands and costs escalate because searchworks. Users are hungry for better, easier information access.In fact, IDC estimates that information workers spend 48% of
their time searching for and analyzing information, with one-
third of that time resulting in failed searches (and re-created
work), costing organizations $28,000 per worker per year.1
1.1 Users Demand Wider Access, More FeaturesOnce enterprise search is deployed and users get a taste of unified, universal data access, demands
to scale the system in functionality and performance appear almost immediately. Often, this is
because organizations begin with an overly basic search solution: simple keyword searching of afinite set of resources, often HTML-centric, delivered via an appliance, hosted service or open source
solution, and provided to a restricted user base.Even when more advanced systems are deployed, and a wider initial user base is served, users still
quickly demand access to a wider range of data sources, and insist on more sophisticated featuresand functionality, such as automatic clustering and categorization, multilingual indexing, natural
language querying and Web-style collaboration tools. And, of course, whatever the scope or functionality,
users expect the sub-second responsiveness theyve become accustomed to on the Internet.
1.2 IT Discovers New Uses, Additional ValueIn addition to this user-driven escalation, IT departments often discover that their search engine can
provide value beyond simply locating information. They learn that search engines can be used to
derive new value from existing information assets while adding much-needed IT agility. Specifically,
these engines can be used to:
Create new, exploitable assets from unstructured content like email, Office documents,
chat and Web pages Increase the value of existing structured content (i.e., database systems)
Provide a unified data platform for constructing agile business applications
Transforming Unstructured Content into an Exploitable AssetSearch engines automatically classify and categorize unstructured data. Once this data is structuredand indexed, it can be incorporated into business information systems and processes. Enterprises
find this can provide a significant competitive advantage given that unstructured data makes up onaverage 80% of corporate information assets, and that it contains highly valuable emotive andqualitative data.1. IDC Predictions 2009: An Economic Pressure Cooker Will Accelerate the IT Industry Transformation, IDC, 12/2008
Page 1
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
Information workers spend48% of their time searchingfor information, with 1/3 ofthat time resulting in failedsearches and recreated work
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
5/20
Increasing the Value of Structured DataBecause of performance limitations (databases are optimized for storing, not accessingdata), and
heavy licensing and infrastructure costs, database resources are frequently under-utilized in theenterprise. However, when IT managers discover that index-based querying is as rich as
relational database querying yet ten times faster and cheaper, they begin to use search engines toprovide alternative access to essential database content.Unified Data Access for Agile ApplicationsUnified data access is essential for meeting escalating compliance requirements, and for satisfyingWeb-savvy users appetite for fast, easy information access. However, by decoupling data fromtraditional application layers, enterprises are also learning that search engines can enable a new
breed of light business applications.
Known as Search-Based Business Applications (SBAs), theseapplications can be created on-the-fly to satisfy evolving business
needs using information drawn from any sourcefrom legacydatabases to email, blogs, and the Webwhile leaving existingsystems and structures untouched, an approach that preservesexisting IS investments and is clearly less complex and costly than
traditional data and application integration strategies.
Maximize Benefit; Avoid Sticker ShockGiven these benefits to both end users and IT managers, it is no wonder that functional andperformative search demands escalate so frequently. And it is in this attempt to meet these escalating
search demands by scaling hardware, infrastructure and functionality that organizations frequently
encounter search sticker shock.They boost RAM, add servers, increase bandwidth, add or upgrade licenses, and set about the
difficult (sometimes impossible) task of trying to make simple search tools perform complex
analytic functions.
But, given that search is too often a complex, resource-intensiveprocess, with infrastructure requirements increasing exponentiallywith increases in functional requirements, and that scaling is often
tied to proprietary hardware or to unreasonable user or documentcounts, it is easy to see why costs can quickly mount. Even somesolutions that begin at only a few thousand dollars can skyrocket
to millions of dollars within just a few short years (sometimes even
within one year) when functional or performance needs escalate.
Page 2
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
Search-BasedBusiness Applications(SBAs) are fast andeasy to construct andcan incorporate datafrom any source
Without built-inscalability, even lowcost solutions canskyrocket tomillions of dollarsin just a few years
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
6/20
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
7/20
3.3 Anticipate Demand for a Web-Style Experience, and Real WebIntegrationThese same Web-savvy users are also demanding that enterprise search, and the business applications
built upon search platforms, be as easy and intuitive to use as Web applications, if not seamlesslyintegrated with those same tools. Make sure prospective bidders can meet users demands for: Zero-training usage (for search and search-based applications) The ability to leverage Web and personal information for business tasks (e.g., using their
LinkedIn network for sales and recruiting or integrating FaceBook data in CRM applications)
Web 2.0/3.0 interactive capabilities, such as workflow integration and collaborative toolslike resource tagging, bookmarking and sharing
Fresh, up-to-date dataAs people spend more time online actively participating in Web 2.0 technologies such as rich user
interfaces based on Ajax and Flash, social networking and tagging, blogs and wikis, Web mashups, and
on-demand services in general, information workers will start expecting Enterprise 2.0 applications
in the workplace that focus on providing easy-to-use and many-to-many personalized online experiences
for creating, publishing, locating, and sharing information with colleagues, customers, and partners.
Susan Feldman, IDC, Worldwide Search and Discovery Software 2008-2012 Forecast Update and
2007 Vendor Shares
3.4 Plan for Increased Compliance DemandsWhile IT has been working to meet increased legal and regulatory compliance demands for several
years, regulatory pressures are revving up again in response to mismanagement issues underlying
the recent economic crisis. Expect a trickle down impact on your own compliance strategy, withheightened internal demand for better risk management as well.
3.5 Position Yourself for the UnexpectedAs the evolution of the Internet and Cloud computing attest, the information landscape is changingso fast that many demands simply cannot be anticipated. To make sure you have an enterprise
search tool that provides you with maximum agility in responding to the unexpected, look for:
An SOA architecture, with core services that can be easily replicated and distributed
Open, standards-based APIs for flexibility in managing and interacting with the platform
Support for Web formats and protocols (SOAP, REST, OWL, XML, RDF, RSS, etc.) as well as
major programming environments (Java, C#, .Net)
A single, unified base of unstructured and structured data
Linear scaling using commodity hardware
With these platform attributes, you can quickly modify existing applications, rapidly construct new
applications and easily scale on demand.
Page 4
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
8/20
4 Understanding Search Types & Limitations
Another aid to accurately forecasting costs is understanding the three basic types of enterprise
search engines and their unique performance capabilities and limitations. These types are:
Legacy enterprise search products
Search add-ons from mainstream business application providers
Web-based search engines ported to an enterprise environment
4.1 Legacy Enterprise Search ProductsDesigned from inception for enterprise search, these engines were constructed for cross-repository
data access and use statistical and linguistics-based text analytics to automate content processing.
This enables them to produce the kind of faceted results navigation required for task-based searching
in the enterprise. Most also provide good support for existing enterprise security infrastructures.
While this native enterprise focus enables these engines to
accommodate a wide range of functional requirements, they
are often complex to use and lag in Web-style features. Theycan also be expensive to scale as they were designed from the
outset for a relatively small user base and a limited (ofteninternal) set of data sources.
4.2 Search Add-Ons from Mainstream Application ProvidersAnother class of engine is that developed by leading business application providers (IBM, SAP, SAS,
Microsoft, Oracle, etc.) who sought first to improve the search function within their own database-
centered products, then to extend that search functionality to external repositories.
As they were originally designed for database querying, these
products typically offer limited text analytics (i.e., limited ability to
process unstructured data), are expensive to connect to external
data sources, and expensive to scale due to restrictive licensingpolicies and resource-intense engineering. Many of these vendorshave attempted to address these shortcomings by acquiring native
enterprise search companies, with limited success in product
integration and support.
4.3 Web Search Engines Ported to the EnterpriseThese search engines scale well, up to tens of billions of documents and hundreds of queries per
second, however, they are feature-poor, designed for light keyword searching of mainly HTML
content. They typically return a laundry list of search results rather than the faceted navigationrequired for task-based enterprise search (popularity-driven Web relevancy is meaningless in an
enterprise context).
Page 5
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
Originally designed forlimited data collectionsand a small, traineduser base, traditionalenterprise systems areoften difficult to scale
Search add-ons fromnon-search vendorsare typically poor intext analytics, limitedin source connectors,and expensive to scale
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
9/20
They have limited text analytic capabilities and a limited capacity to
ingest, process and integrate structured content. They likewise have
limited built-in support for the special security constraints of an
enterprise environment. Therefore, extending the functionality of such
systems to better meet enterprise needs can be very expensive, when
it is doable at all.
Lastly, even when scaling is limited to content well-suited to
these engines, it can still be surprisingly expensive. These productsare often sold with licenses tied to unrealistically low documentcounts, or scaling necessitates the purchase of expensive proprietaryhardware. Consider, for example, the cost of scaling a searchsolution from one popular Web vendor to hundreds of millions of
documents when for only 30 million documents, the solutionrequires a $500k bi-annual license of proprietary hardware.
5 Establishing Apples-to-Apples Cost
Comparisons
Finally, you can better anticipate and control costs by conducting a more accurate, more complete
comparison of vendor cost proposals. To do so, first, detail your now-revised demand forecast,
specifying:
The Number of Users and Simultaneous Queries to be Processed
The Number and Type of Sources and Documents to be Indexed
The Range of Search and Indexing Features Required
The Data Refresh Rate
Next, ask prospective vendors to provide 5 year costs to cover both the initial demand and the scaleddemand. To realistically forecast TCO, these costs should include the following:Direct Costs:
Software Licensing Fees
Hardware & Operating System
(servers, server clusters, back-up
systems) - Initial Purchase and
Upgrade Costs
3 year 24*7 Support
5 Year Maintenance & Support
Note: Keep in mind that you can not only reduce costs by selecting a resource-efficient solution,you can also help your organization meet Green IT objectives.
Page 6
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
Products from Websearch vendors areweak in structureddata handling,faceted navigation,and security
Though theytechnically scale well,licensing policiesoften make scalingWeb search enginesexpensive
Indirect Costs: Staffing Costs for Software
Implementation, and Software
and Hardware Administration
Hardware Floor Space
Hardware Power
Cooling Hardware
Bandwidth
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
10/20
6 About Exalead CloudViewTM
As a final tool for anticipating and controlling costs, we provide several performance benchmarks
for CloudView that you can use in comparing vendor solutions. But first, it is helpful to understand
why CloudView provides an important comparative model for cost-efficient search scaling.
6.1 Dual Web/Enterprise DNAFirst, and most importantly, CloudView was designed from inception for both the Web, driving an 8billion (soon to be 16 billion) page public search engine and serving 100 million unique researchers
a month, and the enterprise market, with advanced semantic processing of unstructured data,superior structured data handling, and full compliance with existing security systems.
6.2 High Performance with Minimal ResourcesFurthermore, CloudView was designed toachieve this balance of Web scalability
and enterprise functionality using minimal
resources. The end result is a platform
that uses on average 1/5th the hardwareresources of competitors, providing real-
time indexing of 100 million documentsand processing 20 queries per second ona single commodity serverall while
providing advanced semantic featureslike dynamic categorization and clustering.
6.3 Infinite Scaling
Fig. 3: CloudView Scales Infinitely in Five Directions
Page 7
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
Fig. 2: CloudView Scales with Minimal Resources
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
11/20
CloudView easily and cost-effectively scales in five directions:
The Total Number of System UsersProven capacity to serve 100 million unique monthly visitors
System Features and FunctionalityExtensive built-in functionality with full administrator control over features activated;
open APIs for endlessly extending functionality
Volume of Data IndexedIndex and index build services can easily be distributed across commodity servers; built-in
index partitioning and replication services further extend performance and availability
Number of Simultaneous Queries ProcessedAverage throughput of 20 Queries per Second (QPS) per server; easily scales by distributing
query processing across multiple commodity servers
Index Refresh RateSupports any data refresh strategy: 1) real-time, 2) interval, and 3) just in time (on query
reception). Dictionaries, thesauri, etc., are automatically updated as the index is updated.
Fig. 4: Scaling with aDistributed Architecture+ Commodity Hardware
CloudView is designed tomaximize performance andavailability through process
distribution, load balancing,index partitioning and index
replication.
Exaleads ability to scale is comparable to GooglesMost enterprise search and content processing
systems cannot handle billions of documentsExalead does. Exalead's search and content
processing solutions give the company a technical advantage over vendors whose systems choke
when thousands of users simultaneously want access to information.
Stephen Arnold, ArnoldIT
Page 8
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
12/20
6.4 True Unified Data AccessBecause CloudView was developed simultaneously for Web and enterprise search, the platforms
natural language processing modules (text processing and annotation, automatic document
classification, named entity extraction, etc.) are especially adept at analyzing, categorizing and
classifying very high volumes of unstructured data, content like Word documents, Web pages, blogentries, email messages, PowerPoint presentations, PDFs, etc.
This automatic structuration not only makes previously unstructured data directly accessible as a
new information channel, it also enables CloudView to synthesize it with existing structured data,such as that from corporate databases and business applications. This meaningful correlation forms
the foundation for value-added uses such as database offloading, data migration, and content mash-ups.
Fig. 5: Transform Unstructured & Structured Data into a Single Structured Resource
6.5 Rapid Time to Market, Agile DevelopmentRapid Time to MarketCloudView is both a fully packaged, off-the shelf product designed for plug and play use, and awhite box solution that can be quickly adapted to specific needs using standards-based APIs. Asa result, CloudView typically deploys in just days for enterprise search, and on average within only4-6 weeks for advanced business applications and data mash-ups, with little to no need forprofessional services support.Agile DevelopmentBeyond initial deployment, CloudView provides an agile base for rapidly constructing new business
applications, and can be quickly scaled to meet evolving demands. Application agility is assured
by CloudViews fully unified data access platform, SOA architecture and open API framework,while the ability to scale quickly is made possible by built-in distribution and replication facilities
that simply require the addition of commodity hardware.
Page 9
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
13/20
7 CloudView Benchmarks
To help you form a baseline of functional and performance requirements for comparing solutions,
we provide below benchmarks for actual Exalead CloudViewTM installations. These benchmarks
include statistics such as:
Number of documents indexed
Refresh rate for the index
Queries processed per second
Servers required
Time to market
Data source connectors used
We invite you to use these specifications when evaluating vendor offerings. Furthermore, weencourage you to demand that prospective vendors contractually agree to meet your requirements
with the resources they have proposed. Exalead can, and does.
7.1 Enterprise SearchCOFACE EXTRANETCoface, a world leader in trade-credit information and protection with offices in 60 countries, selected
CloudView for this extranet which provides customers with key data on 100 million companies.
Performance Benchmarks Documents Indexed: 100 million (Oracle db records)
Processing: 2000 documents indexed per second; 1.7 million company profiles
added per hour
Refresh Rate: Less than 1 minute
Servers Required: 2 for indexing + 2 for searching
Time to Market: 60 days
Connectors: Standard PAPI and ODBC Connectors
Competitors: Sinequa, Fast
Note: Response rate is five times faster than legacy system
The indexing capacity and performance of CloudView impressed us, and we quickly realized that
this solution would enable us to create the kind of research services we wanted for our clients
while letting us retain control over our costs, software, services, servers and maintenance. Whats
more, the Exalead solution integrated transparently into our infrastructure, and offered essential
security guarantees.
Jean-Luc Brizard, ISD, Coface Services
Page 10
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
14/20
SANGER INSTITUTE INTRANET
The Sanger Institute, a world-renowned research center dedicated to the study and analysis of
genomes, uses CloudView for its knowledgebase of resources including genome data and
genome-related scientific articles. Features include dynamic categorization and clustering, entity
extraction (people, places, organizations), faceted results navigation, reverse search, proximity
search, approximate search, spell checker.
Documents: 1.2 billion (XML files, database records, scientific documents); growing by
120 million documents every 2 months; projected to eventually reach 20
billion documents
Processing: 5 Queries Per Second (QPS)
Servers: 1 for indexing + 1 for searching
Time to Market: 6 weeks; search component ready in 10 days
Staffing: 1 part-time technician: 2 days per month
Connectors: Native ODBC Connector; XML API
Competitors: Lucene; CloudView replaced Altavista
Our in-house staff and our external researcher community are now instantaneously in touch with
all the information they need... We have to provide the context behind the search that allows our
users to navigate to the specific area of interest in a few clicks. It is a unique solution over our size
of index.
Tony Cox, Head of Software, The Sanger Institute
Page 11
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
15/20
7.2 Business Applications - Database Offloading
GEFCO EXTRANET/DATABASE OFFLOADING
GEFCO selected CloudView for its redesigned logistics portal, reducing the load on its Oracle
databases and allowing staff, partners and customers to locate, track and optimize vehicle transport
in real-time across 80 countries and 500 international routes. Features include dynamic categorization
and clustering, entity extraction (people, places, organizations), faceted results navigation,
geolocalization, reverse search, proximity search, approximate search, spell checker.
Documents: 1 million (representing 600,000 daily transactions)
Processing: 2000 documents indexed per second Refresh Rate: Quasi real-time (30 seconds)
Servers: 1 for index build + 1 for search + 1 for high availability
Time to Market: Prototype 10 days; deployment in 60 days
Connectors: Native ODBC Connector
Notes: Improved functionality, performance and data freshness while offloading
central databases and reducing IS infrastructure. Enforces strong firewalling
of confidential client data.
Exalead CloudView has dramatically improved system efficiency across the board. Before we
installed CloudView it could take a day to get the results of such CPU-intensive queries, by which
time the information was out of date. Now we get these answers almost instantly.
Guillaume Rabier, Manager of Studies and Projects, GEFCO
Page 12
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
16/20
7.3 Web Applications - Online Directories118 218.fr
This hybrid online yellow and white page directory from Frances leading directory service company
uses CloudView to dynamically enrich database content with Web content (Web/database mash-up).
Features include geolocalization, faceted results navigation, dynamic categorization andclustering, entity extraction (people, places, organizations), reverse search, proximity search,
approximate search, spell checker.
Documents: 30 million (database records and Webpages)
Processing: 40 QPS per server
Refresh Rate: 15 minutes
Servers: 1 for build + 2 for search
Time to Market: 60 days
Connectors: Built-In HTTP and ODBC Connectors; XML API
Competitors: FAST
Notes: Features powerful natural language interpretation capabilities.
Deploying an online directory is highly complex and usually requires 12 to 24 months. Exalead
allowed us to launch our site in 2 months while bringing unmatched differentiating innovation.
Bruno Massiet Dubiest, CEO, 118 218
Page 13
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
17/20
VIAMICHELIN
Travel publishing and services leader Michelin selected CloudView for its high-traffic travel portal,
ViaMichelin. Features include rich mapping, dynamic categorization and clustering, entity extraction
(people, places, organizations), faceted results navigation, geolocalization, reverse search, proximity
search, approximate search, spell checker.
Documents: 15 million points of interest (hotels, restaurants, attractions, etc.)
Processing: 800 QPS; 150 milliseconds per query
Servers: 8
Time to Market: 4 weeks
Connectors: Built-In HTTP and ODBC Connectors; XML API
Page 14
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
18/20
7.4 Web Applications - Online ClassifiedsYAKAZ
This classified ad site uses CloudView to aggregate listings from more than 500 public websites.
Features include dynamic categorization and clustering, entity extraction (people, places, organizations),faceted results navigation, reverse search, proximity search, approximate search, spell checker.
Documents: 1 million announcements from 500 databases in 15 languages
Processing: 40 QPS; 6 million unique monthly visitors, with traffic growing rapidly
(18% in most recent quarter)
Servers: 1 index build + 1 search + 1 high availability
Staffing: 100% of the work done by Yakaz team; Exalead provided only training
Connectors: Built-In HTTP Crawler + Extractors
Notes: The system is very non-intrusive; indexing has no impact on source
databases.
Page 15
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
19/20
RIGHTMOVE
Rightmove, the UKs top real estate classifieds portal, selected CloudView to enhance the end user
experience, improve system performance, and reduce IT costs. Features include dynamic categorization
and clustering, faceted results navigation, and geolocalization.
Documents: 2 million (real estate ads)
Processing: 400 QPS; 1.2 million records indexed in 1 hour; 29 million monthly visitors
Refresh Rate: Less than 2 minutes
Servers: 3 datacenters for high availability: each has 1 build + 2 search servers
Deployment: 3 months
Connectors: Built-In ODBC Connector
Notes: Cost of search successfully reduced from .06 pence to .01 pence per 1000
queries (with more powerful and intuitive search and navigation features).
99.99% reliability achieved. 30 Oracle CPUs replaced by 9 Exalead CPUs.
Rightmove has already found that Exalead CloudView has allowed the speedy development of
advanced search functionality whilst reducing search costs by 83%.
Peter Brooks-Johnson, Product Director, Rightmove
Page 16
Exalead Whitepaper: The Hidden Costs of Scaling Search, v 1.0 2009 Exalead
-
7/28/2019 Hidden Costs of Scaling Search Whitepaper English
20/20