implementing oracle secure enterprise search 11g

9
Implementing Oracle Secure Enterprise Search 11g  An Oracle White Pape r July 2011 Implementing Oracle Secure Enterprise Search 11g

Upload: demontiger

Post on 01-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Implementing Oracle Secure Enterprise Search 11g

8/9/2019 Implementing Oracle Secure Enterprise Search 11g

http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 1/9

Implementing Oracle Secure Enterprise Search 11g

 An Oracle White Paper

July 2011

Implementing Oracle

Secure Enterprise Search 11g

Page 2: Implementing Oracle Secure Enterprise Search 11g

8/9/2019 Implementing Oracle Secure Enterprise Search 11g

http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 2/9

Implementing Oracle Secure Enterprise Search 11g

Introduction ....................................................................................... 3 

Planning Considerations .................................................................... 3 

Hardware / OS platform ................................................................. 3 

Hardware Sizing ............................................................................ 3 

The need for secure versus public searching ................................. 4 

The Identity Platform ..................................................................... 4 

User interface (standard or customized Web Services application) 5 

The data sources to be crawled ..................................................... 5 

Custom Connectors ....................................................................... 6 

Recrawl schedules ........................................................................ 6 

Index defragmentation schedules .................................................. 6 

High Availability Strategies ............................................................ 6 

Firewall / DMZ issues .................................................................... 7 

Process and Timescales ................................................................... 7 

Installation ..................................................................................... 7 

Source Setup ................................................................................. 7 

Use Interface Building ................................................................... 8 

 Administration ................................................................................ 8 

User Training ................................................................................. 8 

Conclusion ........................................................................................ 8 

Page 3: Implementing Oracle Secure Enterprise Search 11g

8/9/2019 Implementing Oracle Secure Enterprise Search 11g

http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 3/9

Implementing Oracle Secure Enterprise Search 11g

3

Introduction

Oracle Secure Enterprise Search is a product from Oracle which allows you to search all your

enterprise data sources from a single simple, convenient interface. Searching enterprise data is made assimple for the users as searching the internet.

One of the overriding design goals of Secure Enterprise Search was simplicity of installation and

implementation.

Nevertheless, there is still some planning that needs to take place in order to achieve a successful

implementation.

 This document covers the steps necessary to install and deploy Oracle Secure Enterprise Search. It is

expected that customers will refer to this document when creating their own project plan for

implementing Secure Enterprise Search.

Here’s a brief list of decisions that need to be made as part of the initial plan. We will expand on eachof them later.

Hardware/OS platform

 The need for secure versus public searching

 The identity platform (such as Oracle Internet Directory or Microsoft Active Directory) to use

User interface (standard or customized Web Services application)

 The data sources to be crawled

 Any custom crawler plug-ins that need to be written

Recrawl schedules

Index defragmentation schedules

High Availability Strategies

Firewall / DMZ issues

Planning Considerations

Hardware / OS platform

Secure Enterprise Search runs on Microsoft Windows, and also on a variety of Unix/Linux platforms.

SES is platform-neutral, all connectors and datasources will work on any supported platform.

For certain datasources (current examples are the NTFS file crawler and the Exchange email crawler) it

is necessary to run an agent program on a Windows system. Hence to crawl these sources it will be

necessary to have access to a suitable Windows server system (and a suitably privileged account on that

system) in order to run the agent.

Note that the choice of identity platforms (see following sections) does not need to influence the OS

platform –  you can connect to Microsoft Active Directory from a Linux or Unix machine, for example.

Hardware Sizing

Page 4: Implementing Oracle Secure Enterprise Search 11g

8/9/2019 Implementing Oracle Secure Enterprise Search 11g

http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 4/9

Implementing Oracle Secure Enterprise Search 11g

4

 The hardware must be adequately specified for the expected system load. This document will not cover

sizing as such, since sizing guidelines change rapidly as hardware improves. However, the following list

shows the information you would need to gather in order to calculate the correct hardware

specification:

 The quantity of data (eg. in GB) to be indexed

 The mix of data - is it all plain text, is it mixed document types, is much of the data in non-indexable

formats such as audio, image or video files?

 The number of discrete documents to be indexed

 The number of searches expected per minute, which can often be estimated by findin the number of

users, and figuring out how many searches they might each run, per day.

The need for secure versus public searching

Do you need to index and search private content? If all the content you search is public, then there isno need to use an identity platform. This will simplify your implementation. If you do need to search

private content, you will need to connect to a directory server of some sort, which is covered in the

next section.

The Identity Platform

For secure searching, you need to have a user directory of some sort, which users may be authenticated

agains when they log in, and which may provide the Access Control List (ACL) information which is

stored in Secure Enterprise Search for certain types of secure document.

Directory plugins can be either LDAP (Lightweight Directory Access Protocol) Directories such as

Microsoft Active Directory or Open LDAP, or they can be the special plugins which work with the

internal user directory within a particular application - for example Lotus Notes or Oracle Content

Server.

 The actual choice of platform will largely depend on what is already in use to secure your datasources.

 Note:  It is important to realise that only one identity plug-in can be active in one SES instance. If you

have a mixture of sources, for example

Oracle Portal secured by Oracle Internet Directory, and

Exchange email secured by Microsoft Active Directory

 The normal solution is to install two instances of SES, each secured by a different identity manager,

and federate between then. The federation process allows for a user to log onto one SES system, butsearch seamlessly on a remote system, even when the user's username on the remote system might be

different from his username on the SES system he logged onto. For more information on configuring

SES Federation, please see the Secure Enterprise Search Administration Guide.

 Another option may be to use a virtual identity management system search as Oracle Access Manager,

 which is able to link multiple directories together .

Page 5: Implementing Oracle Secure Enterprise Search 11g

8/9/2019 Implementing Oracle Secure Enterprise Search 11g

http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 5/9

Implementing Oracle Secure Enterprise Search 11g

5

User interface (standard or customized Web Services application)

 The simplest way to use SES is to use the default searching GUI (Graphical User Interface). This is

available “out of the box” and can be used with no further customization for any types data sources.

 The default GUI may be customized in a number of ways. Initially, you may customize the search

result set ("hitlist") by adding XSLT (XML Stylesheet Language Transformation) code to specify the

layout and content of the hitlist. Going further, you can use templates to specify the overall layout and

"look-and-feel" of the default search GUI. For more information on hitlist and template

customization, see the SES Administrators Guide (TODO: chapter what? Should we provide a link to

the doc?)

 Taking customization further, it is possible to create a complete custom search application - or to

imbed the SES search within other applications - by using the Web Services API provided as part of

SES.

The data sources to be crawledPerhaps the most important –  and time consuming –  part of the process. You must decide what data

you wish to index, what method (“source type”) you will use to crawl those sources, and define the

information needed to crawl them.

For each source type, there are various parameters to be set. We will consider here a Web source type,

as this is a common choice.

 A web source type must have one or more entry points. This is the point from which the crawler will

start searching for links to follow. We could choose http://www.oracle.com as a starting point, for

example. We might also wish to add http://technology.oracle.com as a second starting point, to

ensure we don’t miss any links. 

 Then we need to consider include and exclude rules. By default, if http://www.oracle.com is our start

point, then a page called http://download.oracle.com would not  be crawled, as it does not fit the

default include rule (which only matches URLs containing the string www.oracle.com). We would need

to set up a new include rule, for just oracle.com, or perhaps for download.oracle.com

 We might also want to exclude some pages, where they are not considered helpful to include in search

results. A “logs” directory might be excluded with an exclude rule of www.oracle.com/logs.

In some cases, it may not be possible to directly crawl all the pages in a site. Possible reasons for this

might include

 The pages are normally served as dynamic content depending on customer input (for example help

pages)

 The pages are linked to via flash-based hyperlinks (or complex JavaScript links), which are not

followed by the SES crawler.

In this case, the best method is to create a page consisting of a long list of URLs to each page you want

to be indexed, similar to a site map. This page can then be used as an extra starting point for the

Page 6: Implementing Oracle Secure Enterprise Search 11g

8/9/2019 Implementing Oracle Secure Enterprise Search 11g

http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 6/9

Implementing Oracle Secure Enterprise Search 11g

6

crawler. Obviously, it is best if this page can be automatically generated and kept up to date by

 whatever production system you use to create web pages.

Other settings that are worth paying particular attention to are the “crawl depth”, “Index Dynamic

Page” and “Honor Robots Exclusion” settings under the Crawling Parameters tab. Mis-setting of these(or leaving them at the defaults) may cause you to index far fewer pages, or far more, than you

intended.

Custom Connectors

 There are several different source types provided with SES, and for many customers these will be

sufficient to index all their content. Other customers, however, may find that they have information

repositories that cannot easily be crawled using the standard crawl types. In this case, a crawler plug-in

 will be required. Oracle is constantly developing and releasing new connectors, so a customer looking

for a connector is advised to

Check whether the required connector is provided with the latest release of SES, and considerupgrading to that if appropriate

See if there is a suitable connector listed for download on the SES page at

http://technology.oracle.com 

Consider writing a custom connector plug-in for the data source.

Recrawl schedules

For each of your data sources, you must decide how often they should be recrawled, and at what times

of day. This will depend partly on how efficient the recrawl is. For example Oracle Portal is able to tell

SES which pages have changed, without SES having to do any actual crawling. On the other hand, a

 web source must always be fully crawled in order to find new or modified documents. For a sourcethat is to be recrawled daily, or less frequently, it is best to identify the quietest t ime of day so that the

crawler can be launched when it will have minimum impact on users running queries.

Index defragmentation schedules

Over time, as content changes, the Oracle Text indexes used by Secure Enterprise Search become

fragmented. This fragmentation slows down queries. The amount of fragmentation can be monitored

on the Global Settings / Index Optimization page. Typically, a fragmentation level greater than 30%

should be a cause for concern.

For most customers, scheduling a optimization task once a week will be sufficient. However when data

sources have a high turnover of documents, it may be beneficial to run daily optimizations.

High Availability Strategies

 An integral part of planning for SES is creating a strategy to handle software or hardware failure on the

SES server. In the current release of SES, database backup and recovery is not supported. It is possible

Page 7: Implementing Oracle Secure Enterprise Search 11g

8/9/2019 Implementing Oracle Secure Enterprise Search 11g

http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 7/9

Implementing Oracle Secure Enterprise Search 11g

7

to backup system metadata, and then recover this to a freshly installed SES system, but the actual

datasources must be recrawled.

 This process can take some time, so will not always be acceptable where SES is a critical resource. In

this case, Oracle Corporation recommends running two SES instances in parallel. The two instancesshould be given the same set of datasources to crawl, and the two systems can be used –  if required –  

for load sharing as well as backup purposes.

Firewall / DMZ issues

 This last topic will probably only apply to public-facing SES servers. A common scenario is that a

corporation wishes to index public material which is either inside or outside its corporate firewall, but

does not wish to place the SES server outside the firewall.

In this case, most customers will decide that the SES server itself should be inside the firewall, or in a

“De-Militarized Zone” (DMZ), with a web server such as Oracle Application Server (OAS) acting as

the front end on another machine outside the firewall.Instructions for fronting SES with OAS are in the SES Administrators Guide.

Process and Timescales

Once all the above considerations have been settled, it is time to set out a schedule for implementation.

Obviously, the actual timescale for each stage will depend very much on the amount of work required

 –  if several custom crawlers must be written it’s going to take considerably longer than if all sources

 will use standard source types.

Listed below are some suggested timescales.

Installation

Preinstallation –  checking packages, OS settings, etc: 2 hours

Installation of SES: 0.5 hours

Connecting to an Identity Managment system: 2-24 hours (on quarter through four working days),

depending on complexity, familiarity, availability of local expertise and (if necessary) gaining secure

access to systems.

Source Setup

Public source creation (per source): 1 hour

Secure Source creation (per source): 2-24 hours (on quarter through four working days), depending on

complexity, familiarity, availability of local expertise and (if necessary) gaining secure access to systems.

Crawling time: dependent on source size, type, etc: 1 hour to several days

Source verification, testing and updating: 4 hours per source

Creation of custom source plug-in: 2 to 20 days, depending on complexity

Page 8: Implementing Oracle Secure Enterprise Search 11g

8/9/2019 Implementing Oracle Secure Enterprise Search 11g

http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 8/9

Implementing Oracle Secure Enterprise Search 11g

8

Use Interface Building

Customizing hitlist results through XSLT: 1 day

 Altering standard look and feel through template customization: 1 - 2 days

Building a “standard” Web Services Application from sample: 2 hours (user familiar with Web Services

and Oracle JDeveloper) to 2 days (novice)

Building a complete custom UI –  variable, typically 2 to 10 days including testing

 Administration

Source and index optimization scheduling: 2 hours

Monitoring of query statistics, and query tuning: variable, suggest 2-4 hours per week

Fixing failed crawler schedules: variable, suggest 0-2 hours per week

User Training

Use of Basic Search page: 5 minutes per user

Use of Advanced Search page: 30 minutes per user

Conclusion

Oracle Secure Enterprise Search is easy to install, configure and use. However, as with any Enterprise

software product, it pays to plan in advance the steps required to prepare and roll out the project to

your users.

Page 9: Implementing Oracle Secure Enterprise Search 11g

8/9/2019 Implementing Oracle Secure Enterprise Search 11g

http://slidepdf.com/reader/full/implementing-oracle-secure-enterprise-search-11g 9/9

 

White Paper Title

July 2011

 Author: Roger Ford

Contributing Authors: Stefan Buchta, Jinyu

Wang, Shijun Cheng

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores, CA 94065

U.S.A.

Worldwide Inquiries:

Phone: +1.650.506.7000

Fax: +1.650.506.7200

oracle.com

Copyright © 2011, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the

contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other

warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or

fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by a ny

means, electronic or mechanical, for any purpose, without our prior written permission.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

 AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license

and are trademarks or registered trademarks of SPARC International, Inc. UNIX is a registered trademark licensed through X/Open

Company, Ltd. 1010