implementing oracle secure enterprise search 11g
TRANSCRIPT
8/9/2019 Implementing Oracle Secure Enterprise Search 11g
http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 1/9
Implementing Oracle Secure Enterprise Search 11g
An Oracle White Paper
July 2011
Implementing Oracle
Secure Enterprise Search 11g
8/9/2019 Implementing Oracle Secure Enterprise Search 11g
http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 2/9
Implementing Oracle Secure Enterprise Search 11g
Introduction ....................................................................................... 3
Planning Considerations .................................................................... 3
Hardware / OS platform ................................................................. 3
Hardware Sizing ............................................................................ 3
The need for secure versus public searching ................................. 4
The Identity Platform ..................................................................... 4
User interface (standard or customized Web Services application) 5
The data sources to be crawled ..................................................... 5
Custom Connectors ....................................................................... 6
Recrawl schedules ........................................................................ 6
Index defragmentation schedules .................................................. 6
High Availability Strategies ............................................................ 6
Firewall / DMZ issues .................................................................... 7
Process and Timescales ................................................................... 7
Installation ..................................................................................... 7
Source Setup ................................................................................. 7
Use Interface Building ................................................................... 8
Administration ................................................................................ 8
User Training ................................................................................. 8
Conclusion ........................................................................................ 8
8/9/2019 Implementing Oracle Secure Enterprise Search 11g
http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 3/9
Implementing Oracle Secure Enterprise Search 11g
3
Introduction
Oracle Secure Enterprise Search is a product from Oracle which allows you to search all your
enterprise data sources from a single simple, convenient interface. Searching enterprise data is made assimple for the users as searching the internet.
One of the overriding design goals of Secure Enterprise Search was simplicity of installation and
implementation.
Nevertheless, there is still some planning that needs to take place in order to achieve a successful
implementation.
This document covers the steps necessary to install and deploy Oracle Secure Enterprise Search. It is
expected that customers will refer to this document when creating their own project plan for
implementing Secure Enterprise Search.
Here’s a brief list of decisions that need to be made as part of the initial plan. We will expand on eachof them later.
Hardware/OS platform
The need for secure versus public searching
The identity platform (such as Oracle Internet Directory or Microsoft Active Directory) to use
User interface (standard or customized Web Services application)
The data sources to be crawled
Any custom crawler plug-ins that need to be written
Recrawl schedules
Index defragmentation schedules
High Availability Strategies
Firewall / DMZ issues
Planning Considerations
Hardware / OS platform
Secure Enterprise Search runs on Microsoft Windows, and also on a variety of Unix/Linux platforms.
SES is platform-neutral, all connectors and datasources will work on any supported platform.
For certain datasources (current examples are the NTFS file crawler and the Exchange email crawler) it
is necessary to run an agent program on a Windows system. Hence to crawl these sources it will be
necessary to have access to a suitable Windows server system (and a suitably privileged account on that
system) in order to run the agent.
Note that the choice of identity platforms (see following sections) does not need to influence the OS
platform – you can connect to Microsoft Active Directory from a Linux or Unix machine, for example.
Hardware Sizing
8/9/2019 Implementing Oracle Secure Enterprise Search 11g
http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 4/9
Implementing Oracle Secure Enterprise Search 11g
4
The hardware must be adequately specified for the expected system load. This document will not cover
sizing as such, since sizing guidelines change rapidly as hardware improves. However, the following list
shows the information you would need to gather in order to calculate the correct hardware
specification:
The quantity of data (eg. in GB) to be indexed
The mix of data - is it all plain text, is it mixed document types, is much of the data in non-indexable
formats such as audio, image or video files?
The number of discrete documents to be indexed
The number of searches expected per minute, which can often be estimated by findin the number of
users, and figuring out how many searches they might each run, per day.
The need for secure versus public searching
Do you need to index and search private content? If all the content you search is public, then there isno need to use an identity platform. This will simplify your implementation. If you do need to search
private content, you will need to connect to a directory server of some sort, which is covered in the
next section.
The Identity Platform
For secure searching, you need to have a user directory of some sort, which users may be authenticated
agains when they log in, and which may provide the Access Control List (ACL) information which is
stored in Secure Enterprise Search for certain types of secure document.
Directory plugins can be either LDAP (Lightweight Directory Access Protocol) Directories such as
Microsoft Active Directory or Open LDAP, or they can be the special plugins which work with the
internal user directory within a particular application - for example Lotus Notes or Oracle Content
Server.
The actual choice of platform will largely depend on what is already in use to secure your datasources.
Note: It is important to realise that only one identity plug-in can be active in one SES instance. If you
have a mixture of sources, for example
Oracle Portal secured by Oracle Internet Directory, and
Exchange email secured by Microsoft Active Directory
The normal solution is to install two instances of SES, each secured by a different identity manager,
and federate between then. The federation process allows for a user to log onto one SES system, butsearch seamlessly on a remote system, even when the user's username on the remote system might be
different from his username on the SES system he logged onto. For more information on configuring
SES Federation, please see the Secure Enterprise Search Administration Guide.
Another option may be to use a virtual identity management system search as Oracle Access Manager,
which is able to link multiple directories together .
8/9/2019 Implementing Oracle Secure Enterprise Search 11g
http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 5/9
Implementing Oracle Secure Enterprise Search 11g
5
User interface (standard or customized Web Services application)
The simplest way to use SES is to use the default searching GUI (Graphical User Interface). This is
available “out of the box” and can be used with no further customization for any types data sources.
The default GUI may be customized in a number of ways. Initially, you may customize the search
result set ("hitlist") by adding XSLT (XML Stylesheet Language Transformation) code to specify the
layout and content of the hitlist. Going further, you can use templates to specify the overall layout and
"look-and-feel" of the default search GUI. For more information on hitlist and template
customization, see the SES Administrators Guide (TODO: chapter what? Should we provide a link to
the doc?)
Taking customization further, it is possible to create a complete custom search application - or to
imbed the SES search within other applications - by using the Web Services API provided as part of
SES.
The data sources to be crawledPerhaps the most important – and time consuming – part of the process. You must decide what data
you wish to index, what method (“source type”) you will use to crawl those sources, and define the
information needed to crawl them.
For each source type, there are various parameters to be set. We will consider here a Web source type,
as this is a common choice.
A web source type must have one or more entry points. This is the point from which the crawler will
start searching for links to follow. We could choose http://www.oracle.com as a starting point, for
example. We might also wish to add http://technology.oracle.com as a second starting point, to
ensure we don’t miss any links.
Then we need to consider include and exclude rules. By default, if http://www.oracle.com is our start
point, then a page called http://download.oracle.com would not be crawled, as it does not fit the
default include rule (which only matches URLs containing the string www.oracle.com). We would need
to set up a new include rule, for just oracle.com, or perhaps for download.oracle.com
We might also want to exclude some pages, where they are not considered helpful to include in search
results. A “logs” directory might be excluded with an exclude rule of www.oracle.com/logs.
In some cases, it may not be possible to directly crawl all the pages in a site. Possible reasons for this
might include
The pages are normally served as dynamic content depending on customer input (for example help
pages)
The pages are linked to via flash-based hyperlinks (or complex JavaScript links), which are not
followed by the SES crawler.
In this case, the best method is to create a page consisting of a long list of URLs to each page you want
to be indexed, similar to a site map. This page can then be used as an extra starting point for the
8/9/2019 Implementing Oracle Secure Enterprise Search 11g
http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 6/9
Implementing Oracle Secure Enterprise Search 11g
6
crawler. Obviously, it is best if this page can be automatically generated and kept up to date by
whatever production system you use to create web pages.
Other settings that are worth paying particular attention to are the “crawl depth”, “Index Dynamic
Page” and “Honor Robots Exclusion” settings under the Crawling Parameters tab. Mis-setting of these(or leaving them at the defaults) may cause you to index far fewer pages, or far more, than you
intended.
Custom Connectors
There are several different source types provided with SES, and for many customers these will be
sufficient to index all their content. Other customers, however, may find that they have information
repositories that cannot easily be crawled using the standard crawl types. In this case, a crawler plug-in
will be required. Oracle is constantly developing and releasing new connectors, so a customer looking
for a connector is advised to
Check whether the required connector is provided with the latest release of SES, and considerupgrading to that if appropriate
See if there is a suitable connector listed for download on the SES page at
http://technology.oracle.com
Consider writing a custom connector plug-in for the data source.
Recrawl schedules
For each of your data sources, you must decide how often they should be recrawled, and at what times
of day. This will depend partly on how efficient the recrawl is. For example Oracle Portal is able to tell
SES which pages have changed, without SES having to do any actual crawling. On the other hand, a
web source must always be fully crawled in order to find new or modified documents. For a sourcethat is to be recrawled daily, or less frequently, it is best to identify the quietest t ime of day so that the
crawler can be launched when it will have minimum impact on users running queries.
Index defragmentation schedules
Over time, as content changes, the Oracle Text indexes used by Secure Enterprise Search become
fragmented. This fragmentation slows down queries. The amount of fragmentation can be monitored
on the Global Settings / Index Optimization page. Typically, a fragmentation level greater than 30%
should be a cause for concern.
For most customers, scheduling a optimization task once a week will be sufficient. However when data
sources have a high turnover of documents, it may be beneficial to run daily optimizations.
High Availability Strategies
An integral part of planning for SES is creating a strategy to handle software or hardware failure on the
SES server. In the current release of SES, database backup and recovery is not supported. It is possible
8/9/2019 Implementing Oracle Secure Enterprise Search 11g
http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 7/9
Implementing Oracle Secure Enterprise Search 11g
7
to backup system metadata, and then recover this to a freshly installed SES system, but the actual
datasources must be recrawled.
This process can take some time, so will not always be acceptable where SES is a critical resource. In
this case, Oracle Corporation recommends running two SES instances in parallel. The two instancesshould be given the same set of datasources to crawl, and the two systems can be used – if required –
for load sharing as well as backup purposes.
Firewall / DMZ issues
This last topic will probably only apply to public-facing SES servers. A common scenario is that a
corporation wishes to index public material which is either inside or outside its corporate firewall, but
does not wish to place the SES server outside the firewall.
In this case, most customers will decide that the SES server itself should be inside the firewall, or in a
“De-Militarized Zone” (DMZ), with a web server such as Oracle Application Server (OAS) acting as
the front end on another machine outside the firewall.Instructions for fronting SES with OAS are in the SES Administrators Guide.
Process and Timescales
Once all the above considerations have been settled, it is time to set out a schedule for implementation.
Obviously, the actual timescale for each stage will depend very much on the amount of work required
– if several custom crawlers must be written it’s going to take considerably longer than if all sources
will use standard source types.
Listed below are some suggested timescales.
Installation
Preinstallation – checking packages, OS settings, etc: 2 hours
Installation of SES: 0.5 hours
Connecting to an Identity Managment system: 2-24 hours (on quarter through four working days),
depending on complexity, familiarity, availability of local expertise and (if necessary) gaining secure
access to systems.
Source Setup
Public source creation (per source): 1 hour
Secure Source creation (per source): 2-24 hours (on quarter through four working days), depending on
complexity, familiarity, availability of local expertise and (if necessary) gaining secure access to systems.
Crawling time: dependent on source size, type, etc: 1 hour to several days
Source verification, testing and updating: 4 hours per source
Creation of custom source plug-in: 2 to 20 days, depending on complexity
8/9/2019 Implementing Oracle Secure Enterprise Search 11g
http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 8/9
Implementing Oracle Secure Enterprise Search 11g
8
Use Interface Building
Customizing hitlist results through XSLT: 1 day
Altering standard look and feel through template customization: 1 - 2 days
Building a “standard” Web Services Application from sample: 2 hours (user familiar with Web Services
and Oracle JDeveloper) to 2 days (novice)
Building a complete custom UI – variable, typically 2 to 10 days including testing
Administration
Source and index optimization scheduling: 2 hours
Monitoring of query statistics, and query tuning: variable, suggest 2-4 hours per week
Fixing failed crawler schedules: variable, suggest 0-2 hours per week
User Training
Use of Basic Search page: 5 minutes per user
Use of Advanced Search page: 30 minutes per user
Conclusion
Oracle Secure Enterprise Search is easy to install, configure and use. However, as with any Enterprise
software product, it pays to plan in advance the steps required to prepare and roll out the project to
your users.
8/9/2019 Implementing Oracle Secure Enterprise Search 11g
http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 9/9
White Paper Title
July 2011
Author: Roger Ford
Contributing Authors: Stefan Buchta, Jinyu
Wang, Shijun Cheng
Oracle Corporation
World Headquarters
500 Oracle Parkway
Redwood Shores, CA 94065
U.S.A.
Worldwide Inquiries:
Phone: +1.650.506.7000
Fax: +1.650.506.7200
oracle.com
Copyright © 2011, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the
contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by a ny
means, electronic or mechanical, for any purpose, without our prior written permission.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license
and are trademarks or registered trademarks of SPARC International, Inc. UNIX is a registered trademark licensed through X/Open
Company, Ltd. 1010