netezza-pdf

34
© Copyright IBM Corporation 2009 Trademarks Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Page 1 of 34 Loading a large volume of Master Data Management data quickly: Using MDM Server maintenance services batch Neeraj Singh ([email protected]) Advisory Software Engineer IBM Yongli An ([email protected]) MDM Performance Manager IBM 14 August 2009 The maintenance services for IBM InfoSphere™ Master Data Management Server solution address the needs of clients in the first phase of implementing initial load solutions. Using MDM, clients need to perform initial and delta loads, typically as a batch. This article focuses on the maintenance transaction approach to perform initial loads, including an introduction, installation, and setup. It also covers performance tuning tips and best practices. You can leverage recommendations in this article as guidance in your own MDM Server initial load solutions using maintenance services. View more content in this series Introduction IBM InfoSphere Master Data Management Server (MDM Server) is an enterprise application that helps companies gain control of business information by enabling them to manage and maintain a complete and accurate view of their master data. MDM Server provides a unified operational view of their customers, accounts, and products, and it provides an environment that processes updates to and from multiple channels. It aligns these front office systems with multiple back office systems in real time, providing a single source of truth for master data. The maintenance services for IBM InfoSphere Master Data Management (MDM) Server solution is built to address the needs of clients in the first phase of implementing initial load solutions. At this stage, clients deploy InfoSphere MDM Server for master data management, when data is loaded into the MDM Server repository but most data changes are still coming from existing legacy systems. With MDM Server, the client performs initial and delta loads, typically in a batch. Initial load is the original movement of data from source systems into the MDM Server repository when

Upload: maha-lingam

Post on 15-Jan-2015

1.647 views

Category:

Technology


0 download

DESCRIPTION

netezza document for beginners

TRANSCRIPT

Page 1: netezza-pdf

© Copyright IBM Corporation 2009 TrademarksLoading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 1 of 34

Loading a large volume of Master Data Managementdata quickly: Using MDM Server maintenance servicesbatch

Neeraj Singh ([email protected])Advisory Software EngineerIBM  

Yongli An ([email protected])MDM Performance ManagerIBM

14 August 2009

The maintenance services for IBM InfoSphere™ Master Data Management Server solutionaddress the needs of clients in the first phase of implementing initial load solutions. UsingMDM, clients need to perform initial and delta loads, typically as a batch. This article focuseson the maintenance transaction approach to perform initial loads, including an introduction,installation, and setup. It also covers performance tuning tips and best practices. You canleverage recommendations in this article as guidance in your own MDM Server initial loadsolutions using maintenance services.

View more content in this series

IntroductionIBM InfoSphere Master Data Management Server (MDM Server) is an enterprise application thathelps companies gain control of business information by enabling them to manage and maintain acomplete and accurate view of their master data. MDM Server provides a unified operational viewof their customers, accounts, and products, and it provides an environment that processes updatesto and from multiple channels. It aligns these front office systems with multiple back office systemsin real time, providing a single source of truth for master data.

The maintenance services for IBM InfoSphere Master Data Management (MDM) Server solutionis built to address the needs of clients in the first phase of implementing initial load solutions. Atthis stage, clients deploy InfoSphere MDM Server for master data management, when data isloaded into the MDM Server repository but most data changes are still coming from existing legacysystems. With MDM Server, the client performs initial and delta loads, typically in a batch. Initialload is the original movement of data from source systems into the MDM Server repository when

Page 2: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 2 of 34

the repository is empty. Delta loads are regular (such as daily) data updates from source systemsinto InfoSphere MDM Server.

There are two different approaches to loading data into InfoSphere MDM Server in batch.The maintenance service batch approach loads data into InfoSphere MDM Server using themaintenance services invoked by the Batch Processor. Alternatively, data can be loaded directlyinto the database using DataStage jobs.

This article shares an IBM team's experience performing case studies focusing on theMaintenance Transaction approach using InfoSphere MDM Server version 8.0.1.

The article starts with an introduction to MDM Server Maintenance Transactions. Then it goeson to cover the basic installation and setup steps of the MDM Server environment, includingDB2® database server, WebSphere® Application Server, InfoSphere MDM Server, MDM ServerMaintenance Transactions, and batch processor. The article covers a high-level summary of keyperformance results based on internal case studies. It concludes with a list of performance tuningtips and best practices to get optimal performance while doing initial data load. Using this article,you can leverage the IBM team's experience, and you can use recommendations as guidance inyour own InfoSphere MDM Server initial load solutions.

Introducing the MDM Server service batch approachThe MDM Server service batch approach loads data into MDM Server using the maintenancetransactions batch processor invokes or using any other batch framework. Because MDM Serverservices process the data during load, this approach provides the best level of business datavalidation. You can use the same set of maintenance transactions for both initial and delta loads.

To create the setup that uses this option, you need to install InfoSphere MDM Server capable ofrunning maintenance transactions. You also need to prepare the input data in a format that theBatch Processor can consume.

What are maintenance transactions?InfoSphere MDM Server creates a unique internal identifier for each record or business entity thatserves as its internal key. The regular InfoSphere MDM Server services expect the internal key tobe provided as part of the update service request, to ensure that services can identify the correctbusiness entity in the database. However, when data flows into InfoSphere MDM Server directlyfrom external applications such as legacy systems, the internal key is not known, and often thenature of the data change is also not known.

Maintenance transactions address this problem. These transactions do not require the internalkey as part of the input. They also do not require the external system to specify whether this entityneeds to be added or updated in InfoSphere MDM Server. Instead of the internal key, maintenancetransactions expect the business key as part of the input, which is the unique identifier of thebusiness entity in external applications. Maintenance transactions use the business key providedin the load operation to locate the correct instance of the business entity in the database. If anexisting entity is found, it is updated using the appropriate transaction, such as updateParty. If no

Page 3: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 3 of 34

existing entity is found, a new entity is created in InfoSphere MDM Server using the appropriatetransaction, such as addParty.

There are many types of maintenance transactions, including maintainParty,maintainPersonName, and maintainContractPlus. For a completelist of the transactions and more details about them, refer to theMDMRapidDeploymentPackage_CompositeMaintenanceServices.pdf document, available as partof the EntryLevelMDM patch.

Maintenance transactions are not part of default InfoSphere MDM Server 8.0.1 distribution andinstallation. You need to obtain and install EntryLevelMDM patch to use these transactions.

Note: Maintenance transactions are part of default InfoSphere MDM Server 8.5 distribution. Theyare provided with source code as part of the MDM Server Samples distribution archive. You needto install them on top of an existing InfoSphere MDM Server 8.5 instance. See Resources for a linkto instructions. It's recommended that you get assets from the FTP site mentioned in the Get theInstaller section in this article to ensure you have the latest version.

Batch transaction processing

You can use maintenance transactions to load data using MDM Server Batch, or they canbe invoked as any other service exposed by MDM Server using the RMI or JMS messagingmechanisms. This article focuses on the invocation batch method. InfoSphere MDM Serverprovides two ways to perform batch transaction processing. You can use either the J2SEBatch processor framework or the WebSphere Application Server eXtended Deployment batchframework. This article focuses on the first option: the J2SE Batch Processor framework.

The J2SE Batch processor framework is a J2SE client application, and it is part of a defaultInfoSphere MDM Server installation. The batch processor is a multi-threaded application that canprocess large volumes of batch data. It can process multiple records from the same batch inputsimultaneously, increasing the throughput. Additionally, you can run multiple instances of the batchprocessor simultaneously, each one processing a separate batch input and pointing to the sameserver or to different servers.

Each batch record in the batch input flows through the batch processor in the following sequence:

1. The reader consumer reads the record from the batch input. The submitter consumer sends itto the request/response framework for parsing and processing.

2. The parser transforms the input request into one or more business objects.3. After passing through business proxy, business processing and persistence logic are applied

to the business objects.4. The application responses are sent to the constructor in order to construct the desired batch

output response.5. The constructed response is returned to the batch processor.6. The writer records the transaction outcome in the writer log, if necessary. For example,

FailedWriter logs any failed messages.

Page 4: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 4 of 34

The batch processor is shipped with pre-built readers and writers that can be used as is. Thedefault reader expects the batch input is an XML data format where each line contains one XMLrequest. The default writer writes the response in the XML format. You can also use the InfoSphereMDM Server batch processor to process batch files containing messages in SIF format.

If your input data is not in the format specified above, you need to convert them to the requiredformat, or use a customized reader and parser. It is possible to customize many of the componentsof the Batch Processor, but customization is not within the scope of this article.

Understanding software and hardware requirementsThe following is a typical system topology for InfoSphere MDM Server deployment usingQualityStage from Information Server for Standardization and Matching:

• Application Server and InfoSphere MDM Server are installed on one physical box or LPARwith the correct CPU capacity (Server1). The number of CPUs depends on the overallthroughput requirements.

• The database server is installed on another physical box or LPAR (Server2) with well-equipped IO capacity.

• IIS Server should be installed either on the database server or on a third physical box orLPAR (Server3) with adequate IO bandwidth.

• IIS Client is used to configure QS jobs, and it is installed on a Windows® computer.

To efficiently maximize the performance for the given configuration, follow the following generalguidelines:

• The ratio of the number of CPUs on InfoSphere MDM Server and DB server can range from2:1 to 3:1. For example, if you have a database server with 4 CPUs, the recommendednumber of CPUs on the MDM Server box is at least 8 CPUs in order to well-utilize the CPUcapacity on the database server.

• You should have 5 to 10 physical disk spindles available for each CPU on the databaseserver.

• The ratio of the number of CPUs on InfoSphere MDM Server and IIS server can range from2:1 to 1:1. For example, if you have MDM Server with 8 CPUs, the recommended number ofCPUs on the IIS server box is between 4 and 8.

Note: You only need IIS server if you plan to use QualityStage for standardization and matching(such as suspect processing). InfoSphere MDM Server default configuration does not useQualityStage.

Exploring the example environmentThis section briefly describes the example environment, including hardware and softwareinformation, in each layer in the stack. It also describes the system topology used in the tests.

Software and hardware stack• Server 1 (AppServer and InfoSphere MDM Server)

Page 5: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 5 of 34

• Hardware• Machine type: IBM 9116-561, PowerPC® POWER5™• CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit• Memory/IO: 32 GB RAM, 6 internal disks

• Software• OS : AIX® Version 5300-06 (64 bit)• WebSphere® Application Server ND 6.1.0.11 (32 bit)• InfoSphere MDM Server 8.0.1 + EntryLevelMDM patch

• Server 2 (DB2® database Server)• Hardware

• Machine type: IBM 9116-561, PowerPC POWER5• CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit• Memory/IO : 32 GB RAM, 6 internal disks + 40 external disks

• Software• OS : AIX Version 5300-06 (64 bit)• DB2® database server v9.5 (64 bit)

• Server 3 (Information Server)• Hardware

• Machine type: IBM 9116-561, PowerPC POWER5• CPUs: 8 core Power5 with 16 threads, 1.5GHz , 64 bit• Memory/IO : 32 GB RAM, 6 internal disks

• Software• OS : AIX Version 5300-06 (64 bit)• IIS v8.0.1

• Server 4 (IIS Client - To configure QualityStage jobs, not needed while running the test)• Hardware

• 32 bit x86 machine• Software

• OS : Windows 2003 Server• IIS client version 8.0.1 for Windows

System topology

For InfoSphere MDM Server to use QualityStage jobs for standardization and matching, you needServer3 and Server4, as shown in Figure 1. For default standardization and matching algorithmsfrom InfoSphere MDM Server, Server1 and Server2 are sufficient.

Page 6: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 6 of 34

Figure 1. System topology

Installing the componentsThe purpose of this section is to show the high-level steps required to get the needed softwareinstalled in the test environment. The steps focus on the Maintenance services-related steps, whilebriefly mentioning the prerequisite software installation, including WebSphere® Application Server,DB2 database server, InfoSphere MDM Server, and InfoSphere Information Server.

Installation prerequisitesThe prerequisite installations include WebSphere Application Server, DB2 database server, andInfoSphere Information Server. For installation instructions, see each product's Information Centerin Resources.

1. On Server1, install IBM WebSphere Application Server Network Deployment, Version 6.1, andupgrade it with Fixpack 11.

2. On Server2, install DB2 Database Server, Version 9.5.3. On Server3, install IIS Server, Version 8.0.1.4. On Server4 (Windows machine), install IIS client.

InfoSphere MDM Server InstallationFor InfoSphere MDM Server installation, see Resources for a link to the information center. Youcan install it on a standalone WebSphere Application Server or on a WebSphere ApplicationServer cluster.

Installation of Entry Level MDM Server patch for maintenance servicesFollow the steps in this section to apply the Entry Level MDM (ELMDM) Server patch, whichenables you to use maintenance transactions.

Page 7: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 7 of 34

These instructions assume that you have already installed InfoSphere MDM Server and haveapplied all the required fixpacks. These instructions are based on software stack mentioned in theTest Environment section.

Step 1. Get the installer.

Maintenance transactions are not part of the default installation of MDM Server, and they needto be installed separately. If you have a service agreement with IBM, you can get the installerfor maintenance transactions by logging into the Secure File Transfer site and finding https://testcase.boulder.ibm.com/www/prot/MDM_RDP/?T. At the time of writing, the latest installablepackage is https://testcase.boulder.ibm.com/www/prot/MDM_RDP/MDMServer801_RDP801/ELMDM-20090407.tar.gz. Contact your IBM service representative if you need help getting thispackage.

For more instructions, see the chapter titled Installing Rapid Deployment Package forMDM Server Maintenance transactions and MDM Customizations in the documentMDMRapidDeploymentPackage_InstallGuide.pdf. You can find this document under the directoryDocs when you uncompress the installer.

Step 2. Make required backups before installing.

The installer makes changes to the InfoSphere MDM Server Database. As a precaution, youmight want to make a backup of this database before running the installer. The installer createsbackup copies of files that it changes. These files are named *.beforeELMDM. However, theyget overwritten during subsequent installer runs. So before you invoke the installer again for anyreason, ensure you have moved the previous set of files to a safe place.

The files modified by the installer are:

• MDM Server home directory installable .ear file. For example, /usr/IBM/MDM_801/installableApps/MDM.ear

• A set of files in the <MDM_Instance>.ear directory under WebSphere Application Server. Forexample, /opt/IBM/WebSphere/AppServer/profiles/AppSrv1/installedApps/myHostCell01/MDM_801.ear/

Step 3. Prepare the installer.

Complete the following steps to prepare the installer.

a. Create a new base directory named setup.b. Extract the installer (.tar.gz file) in this directory. It creates several directories, including one

named install.c. Go to directory setup/install/DB2 database server.d. Give execute permissions for all the scripts using the command chmod 755 *.she. Connect to the InfoSphere MDM Server database and execute the SQL below. The schema

name is assumed to be mySchema.

Page 8: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 8 of 34

Listing 1. SQL to execute

db2 "insert into mySchema.DataAssociation values(25083715210700005,'a_name',current_timestamp,'a_description',null)"

Step 4. Customize a clustered environment.

This step is not required if your MDM Server is a standalone server. If you are installing ELMDMon a Clustered MDM Server installation (MDM Server running on a cluster of WebSphereApplication Servers), make the following modifications in the scripts.

a. In setVariables.sh, add the line in Listing 2 at the beginning of the script. NAME_OF_SERVERrefers to the name of the WebSphere Application Server instance that is a member of thecluster.

Listing 2. Added line

#add the line belowexport SRV_NAME=NAME_OF_SERVER

b. In the scripts install_DisableHVL.sh, install_EnableHVL.sh, and install_ELPCustom.sh, makethe changes shown in Listing 3.

Listing 3. Changes to script files

#comment out the line below and replace with the new line as shown below#$CURRENT/restartServer.sh $WAS_HOME $NODE_NAME $APP_NAME $ADMIN_USER $ADMIN_PASSWORD#add the line below$CURRENT/restartServer.sh $WAS_HOME $NODE_NAME $SRV_NAME $ADMIN_USER $ADMIN_PASSWORD

c. In the install_ELPTx.sh script, make the changes in Listing 4.

Listing 4. The install_EPLTx.sh script

#comment out the line below and replace with the new line as shown below#$LOC/restartServer.sh $WAS_HOME $NODE_NAME $APP_NAME $ADMIN_USER $ADMIN_PASSWORD#add the line below$LOC/restartServer.sh $WAS_HOME $NODE_NAME $SRV_NAME $ADMIN_USER $ADMIN_PASSWORD

Step 5. Optionally modify the installer to help in debugging.

Complete the following steps to modify the installer to debug.

a. At the beginning of each script, add set -xb. Add the verbose option to db2 calls by replacing all occurrences of db2 -tf with db2 -tvf in the

scripts below:• runsql.sh• install_ELPCustom.sh• install_EnableHVL.sh• install_DisableHVL.sh

Step 6. Set your environment variables

Page 9: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 9 of 34

Modify the setVariables.sh script according to your environment. The values given in Listing 5 areexamples. Read the comments and instructions embedded within the example.

Listing 5. Extract from the setVariables.sh scriptexport WAS_HOME=/opt/IBM/WebSphere/AppServerexport CELL_NAME=myhostCell01

#set the profile name used by WAS running MDM Server. such as AppSrv01 and Custom01export NODE_NAME=Custom01export APP_NAME=MDM_801

#The Name of the WebSphere Application Server running MDM Server,#You will have this only if you followed Step 4 aboveexport SRV_NAME=Cluster_member1

export INSTALL_HOME=/usr/IBM/MDM_801

# IIS Server Version: Could be 801 or 81export IIS_SRV_VERSION=801

export DB_NAME=MDMDBexport DB_USER=myDBuserexport DB_PASSWORD=myDBpasswordexport TABLE_SPACE=TABLESPACE1export INDEX_SPACE=INDEXSPACE1export LONG_SPACE=LONGSPACE1

export TRIG=COMPOUNDexport DEL_TRIG=TRUE

export APPLICATION_NAME='WebSphere Customer Center'export APPLICATION_VERSION=8.0.1.0export DEPLOY_NAME=MDM_801

#You need to set this only if you are integrating QualityStage with MDM Server.#Please note the back slashes. The number 2809 here refers to the#bootstrap port of WebSphere Application Server instance running IIS server.export ISP_URL='iiop:\/\/myIISserver.mylab.ibm.com:2809'

Step 7. Execute the scripts.

a. Execute install_ELPTx.sh.b. If you are integrating InfoSphere MDM Server with QualityStage, run the

install_ELPCustom.sh script as well.

Step 8. Check for errors.

Go through all the log files to ensure there are no errors.

Step 9. Repeat steps for a clustered environment.

If you are installing in a clustered environment, complete the steps below for each cluster member.

a. Reconfigure setVariables.sh to point to another cluster member.b. Run the additionalClusterInstall.sh script.c. If you are integrating InfoSphere MDM Server with QualityStage, run the

install_ELPCustom.sh script.

Page 10: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 10 of 34

Note: As part of the install_ELPCustom.sh script, there are changes made to InfoSphere MDMServer database. Some of these changes cannot be executed more than once (such as a DBinsert). Either ignore these errors during repeated execution of this script, or alter the script so thatit does not attempt to repeat the database operations.

Step 10. Configure the SIF parser.

Complete this step only if you want to use a SIF parser. Otherwise, skip to Step 11. The exampleuses the default XML parser. To configure the batch processor to use the SIF parser, modify thefollowing:

a. In the DWLCommon_extention.property file, which is in properties.jar on server runtimeenvironment, set sif_compatibility_mode = on.

b. In batch extension property file, set ParserAndExecConfiguration.Parser = SIF.

For more details, see the section SIF Parser inMDMRapidDeploymentPackage_CompositeMaintenanceServices.pdf.

Step 11. Restart the InfoSphere MDM Server.

Restart the InfoSphere MDM Server, including all the servers in a cluster.

Integration of InfoSphere MDM Server with QualityStage

If you want to use default standardization and matching algorithms from InfoSphere MDMServer, these steps are not needed, and you can continue to Optimizing performance with keyconfiguration parameters. However, if you want InfoSphere MDM Server to use QualityStage forstandardization and matching, this section describes how to configure them.

These instructions assume the following:

• InfoSphere MDM Server is installed and all the required fixpacks are applied.• EntryLevelMDM is installed.• The IIS server and IIS client are installed. The version of the IIS client must be the same as

that of the IIS server.• The software stack is similar to that described in the Software and hardware stack section of

the example environment.

See Resources to access the documentation for InfoSphere MDM Server and QS integration(MDM Server Developers Guide, chapter titled Integrating IBM Information Server QualityStagewith IBM InfoSphere Master Data Management Server). The instructions in this article complementthose mentioned in the developer's guide. However, there are a few configuration changesmentioned in this article that are helpful during the installation.

Step 1. Change security settings.

Page 11: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 11 of 34

If global security is enabled on the WebSphere Application Server running IIS, the transactionprotocol security on that server must be disabled. To disable protocol security on a server,complete the following steps in the administrative console:

a. In the administrative console, click Servers > Application Servers > server_name. Theproperties of the application server are displayed in the content pane.

b. Under Container Settings, expand Container Services and click Transaction Service todisplay the properties page for the transaction service.

c. Under Additional Properties, click Custom Properties.d. On the Custom Properties page, click New.e. Type DISABLE_PROTOCOL_SECURITY in the Name field, and type TRUE in the Value

field.f. Click Apply or OK.

g. Click Save to save your changes to the master configuration.h. Restart the server.

Optionally, if WebSphere Application Server application security is turned on for InfoSphere MDMServer, the LTPA keys need to be shared between the MDM WebSphere Application Server celland the IIS WebSphere Application Server cell. For detailed instructions, refer to the WebSphereApplication Server Information Center (see Resources).

Step 2. Get the installer.

The installable components are part of the same bundle that you used while installing maintenanceservices. You will find them in the QualityStage folder.

Step 3. Create the IIS project.

Use the IIS Administrator Client to connect to the IIS server. Create a new project calledELMDMQS.

Step 4. Import the IIS project.

1. Log into the ELMDMQS project through the DataStage and QualityStage Designer.2. Click Import > Datastage Components.3. Browse to the ELMDMQS.dsx file under the EntryLevelMDM\QualityStage folder you

extracted above.4. Import the file.

Step 5. Provision imported rule sets.

You need to provision imported rule sets to the designer client before a job that uses them can becompiled. Complete the following steps to provision imported rule sets.

a. In the Designer client, find the rule set within the repository tree ELMDMQS > ELMDMRT >Standardization Rules > MDMQS.

Page 12: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 12 of 34

b. Select the rule set by right-clicking and selecting Provision All from the menu, as shown inFigure 2.

Figure 2. Provisioning rule sets

c. Repeat the steps for all the rulesets listed below.• MDMQS\Standardization Rules\MDMCanada\CAADDR\MDMCAADDR• MDMQS\Standardization Rules\MDMCanada\CAAREA\MDMCAAREA• MDMQS\Standardization Rules\MDMUSA\USADDR\MDMUSADDR• MDMQS\Standardization Rules\MDMUSA\USAREA\MDMUSAREA• MDMQS\Standardization Rules\MNADKEYS\MNADKEYS• MDMQS\Standardization Rules\MNNAME\MNNAME• MDMQS\Standardization Rules\MNNMKEYS• MDMQS\Standardization Rules\MNPHONE\MNPHONE• MDMQS\Standardization Rules\MNSPOST\MNSPOST

Step 6. Prepare test data and configure parameters

a. Copy the provided test data (*.csv files and *.txt) into a directory on your IIS server (not the IISclient) called /data01/ELMDMQS.

b. Open the parameter set ELMDMQS_Data_Directory under ELMDMQS\ELMDMRT\ParameterSets (in the Repository view of the designer).

c. Double-click on the Parameter set.d. Go to the Values tab and set the value of the parameter DATADIR to the directory path into

which you just copied the test data (/data01/ELMDMQS/ in this example), as shown in Figure3. Note the slash (/) at the end of the parameter value.

Page 13: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 13 of 34

Figure 3. Parameter set

e. Under the ELMDMQS\ELMDMRT\Shared Containers folder, double-click to open the sharedcontainer MDMQSPartySuspectReferenceMatchOrganization.

f. Set the file paths of data set stages Data_Frequency and Reference_Frequency to the samepath that you provided for ELMDMQS_Data_Directory.DATADIR to in the previous step, asshown in Figure 4.

Page 14: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 14 of 34

Figure 4. Edit input file path

g. Click OK to save the changes.h. Close the stage, clicking Yes when it prompts you to save the changes in the stage.i. Repeat the above steps for MDMQSPartySuspectReferenceMatchPerson.

Step 7. Compile the jobs.

a. Compile all the jobs inside the ELMDMQS\ELMDMRT\Jobs folder and its subfolders usingTool > Multiple Job compile from the designer client's menu.

b. Follow the instructions in the wizard, and start compiling.

Note: Batch versions of jobs can be found in the ELMDMQS\ELMDMRT\Jobs folder. InformationService Director (ISD) versions of these jobs can be found in the ELMDMQS\ELMDMRT\Jobs\ISDfolder.

Step 8. Generate match frequency data

a. Use the director client to run the job ELMDMQS\ELMDMRT\Jobs\MDMQS_Person_Match_Frequency_Generation to generate the match frequencydata. When completed, it generates files PersonRefMatchTransFreq.txt andPersonRefMatchCandFreq.txt, as shown in Figure 5.

Page 15: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 15 of 34

Figure 5. Generating match frequency data

b. Similarly, run ELMDMQS\ELMDMRT\Jobs\MDMQS_Org_Match_Frequency_Generation togenerate files OrgRefMatchTransFreq.txt and OrgRefMatchCandFreq.txt

Step 9. Run the test jobs.

a. Use the director client to run the following batch jobs to test that they execute successfully onyour system before you use the ISD jobs:

• All jobs in ELMDMQS\ELMDMRT\Standardization Testing• All the Jobs in ELMDMQS\ELMDMRT\Match Testing

b. After running the jobs, view the output in the Sequential file to check the result

Step 10. Deploy services using ISD

a. Log on to the IBM Information Server (IIS) console.b. Click File > Import Information Services Project > Browse for the file

ELMDMQS_ISDProject.xml in the EntryLevelMDM\QualityStage directory.c. Keep all the default settings, and click Import.d. Open the Information Service Application (ELMDMQS) contained in the imported project.e. Click Develop, as shown in Figure 6.

Page 16: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 16 of 34

Figure 6. Selecting the Develop icon

f. Click Information Services Application.g. On the resulting screen, double-click the ELMDMQS application to open it.h. Go into Edit mode.i. In the Select a View window, click Services > ELMDMQSService, as shown in Figure 7.

Page 17: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 17 of 34

Figure 7. Configuring jobs using ISD

j. In the expanded tree, select Operations, and double-click the operations one at a time to editeach of them.

Page 18: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 18 of 34

Figure 8. Checking the project name

k. Edit each of the operations as follows:i. Ensure that the project name is correct, as shown in Box 1 in Figure 8. When you

created the new project using the administration client, if you chose ELMDMQS as thename of the project, you can keep the defaults. If you specified another name, ensurethat the project name and the job names are correct. To check the project and jobnames, click the Edit button, and browse to the project and job in the ISD folder.

ii. Ensure that the Group Arguments into Structure option is enabled for inputs, as shown inBox 2 in Figure 8.

iii. Change the input data type according to Table 1 below, as shown in Box 3 in Figure 8.iv. Check or uncheck the Accept array checkboxes according to Table 1, as shown in Box 4

in Figure 8 (the checkbox should show a checkmark if the table entry indicates Yes).v. Check or uncheck the output data type and Accept array checkboxes on the output tab

according to Table 1.

Table 1. ISD job configuration

Operation name Operation job name Inputs accept array Input data type Outputs return array Output data type

standardizeAddress ISD_MDMQS_Address_StandardizationNo AddressInput No AddressOutput

Page 19: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 19 of 34

elPersonMatch ISD_MDMQS_Party_Suspect_Reference_Match_PersonYes ELPersonMatchInput Yes ELPersonMatchOutput

elOrgMatch ISD_MDMQS_Party_Suspect_Reference_Match_OrgYes ELOrgMatchInput Yes ELOrgMatchOutput

standardizePhoneNumberISD_MDMQS_Phone_StandardizationNo PhoneNumberInput No PhoneNumberOutput

standardizeOrgName ISD_MDMQS_Organization_StandardizationNo OrgNameInput No OrgNameOutput

standardizePersonNameISD_MDMQS_Person_StandardizationNo PersonNameInput No PersonNameOutput

l. On the Provider Properties tab, modify the credentials according to your setup, as shown inFigure 9.

Figure 9. Modifying your credentials

m. Save and close the application.n. Deploy the application by clicking on the Develop menu. Figure 10 shows an example. Note

the highlighted box that shows Select the Application ELMDMQS.

Page 20: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 20 of 34

Figure 10. Deploying the application

o. Click Deploy, as shown in the Figure 10.p. Leave the defaults, and click Deploy to start the deployment.

Step 11. Set configuration values for QualityStage.

Note: This example integration is being done for an InfoSphere MDM Server installation onwhich maintenance services are installed. During the installation of maintenance services, if youran install_ELPCustom.sh then you can skip to Optimizing performance with key configurationparameters.

Set the configuration values according to Table 2 in order to properly communicate with the IIS-QSserver.

Table 2. Configuration modificationsConfiguration name Default value

/IBM/ThirdPartyAdapters/IIS/defaultCountry 185

/IBM/ThirdPartyAdapters/IIS/initialContextFactory This configuration element is used in conjunction with the provider URLto use JNDI registry initial context. A typical value for this element iscom.ibm.websphere.naming.WsnInitialContextFactory.

/IBM/ThirdPartyAdapters/IIS/providerURL iiop://<yourQSServer>:<QSServerBootstrapPort>. For example: iiop://myIIS.torolab.ibm.com:2809.

Page 21: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 21 of 34

/IBM/Party/Standardizer/Name/className com.ibm.mdm.thirdparty.integration.iis8.adapter.InfoServerStandardizerAdapter

/IBM/Party/Standardizer/Address/className com.ibm.mdm.thirdparty.integration.iis8.adapter.InfoServerStandardizerAdapter

Step 12: Use QualityStage (QS) name and address standardization.

Use QS to standardize names and addresses that are entered into InfoSphere MDM Server. SeeStandardizing name, address and phone number information in the MDM developer's guide (seeResources) for more information.

Step 13: Using QualityStage in suspect duplicate processing.

QualityStage can be used with the InfoSphere MDM Server Suspect Duplicate Processing (SDP)feature. See Configuring IBM Information Server QualityStage integration for SDP in the MDMdeveloper's guide (see Resources) for more information on using QualityStage with SDP.

Optimizing performance with key configuration parametersAfter you install the InfoSphere MDM Server, tune the key configuration parameters for optimalperformance.

InfoSphere MDM Server and batch processor configuration1. Increase the number of submitters to increase parallelism. Do this by editing the file

<MDM_installation_Folder>/BatchProcessor/properties/Batch.properties. On an 8-way MDMServer box, 24 submitters are optimal.

2. Increase JVM heap settings for the batch processor. Do this by editing the file<MDM_installation_Folder>/BatchProcessor/bin/runbatch.sh. For example: for 24 submitters,512MB of heap is sufficient.

3. Reduce BatchProcessor logging by setting the threshold to ERROR. Do this by editing<MDM_installation_Folder>/BatchProcessor/Log4J.properties and setting the loggingthreshold to ERROR, if it is not already. For example: log4j.appender.file.Threshold=ERROR.

4. Reduce MDM Server logging by setting the threshold to ERROR. Do this by editingLog4J.properties inside the properties.jar file at <WebSphere_Location>/profiles/<ServerName>/installedApps/<CellName>/<InstanceName>/properties.jar.

WebSphere Application Server configuration1. Increase the JDBC connection pool size to support the parallelism.

a. From the WebSphere Administration Console, go to Resources >JDBC > Data sources> DWLCustomer > Connection pool properties

b. Increase the value for Maximum connections. The example setup uses 50.2. Increase the prepared statement cache size.

a. The size of the prepared statement cache depends on the number of unique SQLstatements used in your application. For InfoSphere MDM Server, set it to 300 andmonitor the application to determine if the cache size needs to be increased.

b. It can be changed from the WebSphere Administration Console. Go to Resources> JDBC > Data sources > DWLCustomer > Connection pools > WebSphereApplication Server data source properties.

Page 22: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 22 of 34

3. Increase the EJB cache size. Do this by using the WebSphere Administration Console to goto Servers > Application servers > [ServerName] > EJB Container Settings > EJB cachesettings. The example uses 4000.

4. Change the JVM heap size and GC policy.a. From the WebSphere Administration Console, go to Servers > Application servers >

[ServerName] > Java and Process Management > Process Definition > Java VirtualMachine.

b. Indicate the initial heap size as 512 MB and the maximum heap size as 1024 MB.c. Use gencon GC policy for better performance. To use this GC policy, specify -

Xgcpolicy:gencon under Generic JVM arguments. While testing the example using thegencon GC policy, sometimes WebSphere Application Server generates unnecessaryheapdumps. To disable this behavior, do the following after the server is started:

i. From the WebSphere Administration Console, go to Servers > Applicationservers > [ServerName] > Performance > Performance and Diagnostic AdvisorConfiguration > Runtime (tab).

ii. Uncheck the check box (ensure the checkbox is empty) for Enable automatic heapdump collection.

Database tuning (DB2)It is recommended to follow best practices and recommendations to set up a database server. Itis also recommended to closely monitor your database performance and to tune your databaseas needed for optimal performance and productive resource usage. This section briefly describesseveral recommendations on configuring and tuning a DB2 database. The basic concepts alsoapply to other types of databases.

• Typically it is recommended that you use one set of dedicated disks for DB2 transaction logsand you use another set of dedicated disks for DB2 table spaces. If possible, it is even betterto use different disk controllers for DB2 transaction logs and DB2 table spaces, because thisgives you the flexibility to configure the disk controllers independently for different I/O patternsto favor writes instead of a mix of writes and reads.

• Ensure read and write cache is enabled on the storage system. Monitor the cacheeffectiveness, and configure the cache size properly.

• Properly plan the table spaces to ensure balanced I/O operations across all of the availabledisks. This avoids hot spots in your database and avoids limiting your overall databaseperformance to the bandwidth of a few of the busiest disks. This maximizes the utilization ofall the I/O bandwidth available from all the physical disks.

• In addition to a well-planned table space layout over the I/O system, one of the biggestconfiguration parameters that affects performance dramatically is the database buffer poolsize. Pay close attention to the overall buffer pool hit ratio, which tells how often it needs togo to the physical disks (which is very expensive) for the needed data that is found in thedatabase buffer pools.

• Strive for a buffer pool hit ratio of 80% or higher for data, and 90% or higher for indexes.Typically in MDM Server implementations, start with one big buffer pool for both data andindexes. If necessary, separate data and indexes into two different buffer pools to help ensurea good index buffer pool hit ratio.

Page 23: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 23 of 34

• Because an MDM Server enables a good amount of customization and extension, analyzethe most expensive SQLs from the database snapshot or other tools. Ensure that those SQLshave optimal access plans with the best indexes in place.

Those recommendations should be considered together to achieve what you need forperformance, because the behavior of one area might be just a symptom of another incorrectlyconfigured or misbehaving area.

Understanding performance test methodology used in the example

Input data preparation

The maintainContractPlus transaction was used for testing the example. Because the defaultparser from the BatchProcessor was used, the input data format had to be LineFeed delimitedXML transactions.

The first step toward getting the input data set was to create seed-data. The seed-data wasgenerated using a home-grown, Java-based tool with key distributions based on U.S. Censusdata (2000). Some realistic data was added to make the overall parties closely match a typicalMDM business scenario. The seed-data contained details such as name, gender, date of birth,addresses.

As a second step, a template for maintainContractPlus transaction was created. This template hadvariables for key party details that needed to be filled in with generated seed-data. Another home-grown, Java-based tool was used to generate the XML transactions. One such transaction yieldedone person with one name, one address, one contract, and one contact method. Table 3 showsthe detailed profile of database tables populated by a single transaction. The example run used atotal of one million such records as one input data set, representing one party and its associatedattributes.

Suspect duplicate data preparation

The data generated in the example so far was primarily clean. A similar approach was usedto generate dirty data, which included 40% duplicates. This data set was used when SuspectDuplicate Processing was turned on.

During the initial load, the input data might have duplicate entries, where details from one recordclosely resemble those from another one. Such records are termed as suspect duplicates.Depending on how closely two records match, suspect duplicates are assigned a match category.To determine the match category, some critical data fields are used while comparing two records.The critical data fields include first name, last name, address, date of birth, gender, and socialsecurity number. Based on comparison results, the suspect duplicates are assigned a match-score and a non-match-score, and then the match category is derived. Depending on the matchcategory, InfoSphere MDM Server takes appropriate actions for the suspect duplicates.

When testing the example, two sets of data were used:

Page 24: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 24 of 34

• 100% clean data with no suspect duplicates in the input data set• 60% clean data with 40% of the records as suspect duplicates.

The example test included 4 types of suspect duplicates in the 60% clean data set. Populationof each type of suspect duplicate was kept equal, and they were randomly distributed in the datausing home-grown, Java-based tools.

The details of this data set are shown in Table 3.

Table 3. Details of input data with suspects

sr# Matching criticaldata details

Non-matchingcritical data details

Population Weight (match/non-match score)

Match category

1 Gender, FirstName,LastName, Address,DOB, SSN

None 10% 63/0 A1

2 Gender, FirstName,LastName, DOB,SSN

Address 10% 60/3 A2

3 Gender, Address,DOB, SSN

FirstName, LastName 10% 55/4 A2

4 Gender, Address, LastName DOB

First Name (and SSNfield is empty)

10% 46/1 B

The scores and categories in the Table 3 are calculated by InfoSphere MDM Server's deterministicmatching approach, which is the default implementation for party-matching. In contrast,QualityStage matching offers a probabilistic matching approach, and it calculates only onecomposite weight.

Data profile

Table 4 shows the population of InfoSphere MDM Server database tables when the two sets ofinput data are loaded.

Table 4. Database population

Table name 100% clean data 60% clean data

ADDRESS 1,000,000 700,000

ADDRESSGROUP 1,000,000 900,000

CONTACT 1,000,000 900,000

CONTACTMETHOD 1,000,000 900,000

CONTACTMETHODGROUP 1,000,000 900,000

CONTEQUIV 1,000,000 1,000,000

CONTRACT 1,000,000 1,000,000

CONTRACTCOMPONENT 1,000,000 1,000,000

CONTRACTROLE 1,000,000 1,000,000

IDENTIFIER 1,000,000 900,000

Page 25: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 25 of 34

LOBREL 1,000,000 900,000

LOCATIONGROUP 2,000,000 1,800,000

MISCVALUE 1,000,000 1,000,000

PERSON 1,000,000 900,000

PERSONNAME 1,000,000 900,000

PERSONSEARCH 1,000,000 900,000

SUSPECT 0 300,000

Test methodologyDifferent tests were performed to check stability and scalability and to measure the overheadassociated with several commonly used features. All the tests were conducted in two solutionconfigurations:

• The MDM Server only solution, where InfoSphere MDM Server uses its own algorithm forstandardization and matching. In this case, IBM Information Server is not required.

• MDM Server + QS solution, where InfoSphere MDM Server uses QualityStage to do thestandardization and matching.

The methodology for all these tests was similar:

1. Set up the systems. Do the configuration and tuning of various components as mentioned inprevious sections.

2. Prepare a set of input data with 10000 records using the approach mentioned.3. Load the input data with 10000 records using 1 submitter in the batch processor. This is done

to avoid deadlocks while working with an empty database.4. Perform DB2 reorgchk on all the tables to update statistics.5. Create a backup of the MDM Server database at this stage, and use it is as the starting point

for all the tests.

The following steps were used to run the example test:

1. Restore the database using the backup copy.2. Change the database configuration if required for the test. For example, you may want to

switch OFF Suspect Duplicate Processing.3. Restart WebSphere Application Server running InfoSphere MDM Server.4. Run data collection scripts in the background, which collect CPU statistics, IO statistics, and

database snapshots.5. Start the test to load the selected input dataset.6. Collect the logs from InfoSphere MDM Server, WebSphere Application Server, and DB2

database server.7. Derive response time and throughput from transactiondata.log as generated by InfoSphere

MDM Server.

Measuring performance resultsThis section describes the performance measurements including the following:

Page 26: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 26 of 34

• Results showing very stable performance throughput and response time• Performance overhead of some commonly used features in the context of initial data loading• Scalability of throughput

Test 1: Stability of throughput and response time

The purpose of this test is to show whether the throughput and response times remain stable asthe loading progresses and as the database size increases. This test also measures the systemresource usage pattern along the test. The data for throughput and response time is derived fromtransactiondata.log, as generated by InfoSphere MDM Server.

Various tests were conducted for both MDM Server only and MDM Server + QS scenarios, and allof them showed good stability. Table 5 shows the configuration settings for the first test.

Table 5. Test 1 configuration

Parameter Value

Hardware/Software stack As described in example test environment

InfoSphere MDM Server heap size Initial : 512MB; Max 1024MB

InfoSphere MDM Server JVM GC policy gencon

Number of submitters in batch processor 24

Batch processor JVM memory 512MB

ISD job configurations (applicable to MDMServer + QS scenario only)

Default

Type of transaction used MaintainContractPlus

Total volume 1 million parties and their associated records

Input data quality 60% clean40% suspected duplicates of various types

Name standardization ON (default)

Address standardization ON (StandardFormatingIndicator to N in therequestXML)

Suspect duplicate processing ON

History triggers Enabled

Test 1 results: Stability results

Figure 11 shows the throughput and response times captured for the MDM Server only scenario.The chart shows that throughput and response time are stable during the whole run duration. Theresults for the MDM Server + QS scenario are similar.

Page 27: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 27 of 34

Figure 11. Throughput and response time

Figure 12 shows that by configuring a sufficient number of submitters to the required number,almost all CPU resources on WebSphere Application Server running InfoSphere MDM Server canbe used, and the system does not have any other bottlenecks. Figure 10 also shows the resourceusage on other systems.

Figure 12. Resource usage

Page 28: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 28 of 34

Test 2: Feature overheads

The purpose of the tests is to measure the overhead of four commonly used features ofInfoSphere MDM Server. Under this series of tests, the overhead of the following were measured:

• Name standardization• Address standardization• Suspect duplicate processing• History triggers

Overhead is expressed as a percentage reduction in throughput per unit of time when thefeature is enabled. For example, 5% overhead associated with a particular feature means that ifthroughput was 100 transactions per second (TPS), it becomes 95 TPS due to overhead when thefeature is enabled. Throughput is measured as total data volume loaded / total time taken.

Various tests were conducted for both MDM Server only and MDM Server + QS scenarios,enabling one or more features at a time. In the MDM Server + QS scenario, the overheads ofstandardization and suspect duplicate processing should be higher because they involve extraprocessing by QualityStage.

Table 6 shows the configuration settings for the second test.

Table 6. Test 2 configuration

Parameter Value

Hardware/Software stack As described in example test environment

InfoSphere MDM Server heap size Initial: 512MB ; Max 1024MB

InfoSphere MDM Server JVM GC policy Default

Number of submitters in batch processor 24

Batch processor JVM memory 512MB

ISD job configurations (applicable to MDM Server + QS scenario only) Default

Type of transaction used MaintainContractPlus

Total volume 1 million parties and their associated records

Input data quality a) 100% clean; b) 60% clean

Following are some notes about the configuration:

• Name standardization was turned ON or OFF by setting /IBM/Party/ExcludePartyNameStandardization/enabled to FALSE or TRUE, respectively.

• Address standardization was effectively switched ON or OFF by settingStandardFormatingIndicator to N/Y in the transaction request XMLs.

• Suspect duplicate processing was switched ON or OFF by setting the following to TRUE orFALSE respectively in the configuration table:

• /IBM/Party/SuspectProcessing/enabled• /IBM/Party/SuspectProcessing/AddParty/returnSuspect

Page 29: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 29 of 34

Test 2 results: Feature overheadsStandardization

The following table shows the overhead of standardization only for the MDM Server only scenario.Tests were conducted with both datasets (100% clean and 60% clean) when suspect duplicateprocessing was switched ON. History triggers were enabled during these tests.

Table 7. Overhead of standardizationOverhead SDP OFF SDP ON (100% clean) SDP ON (60% clean)

Overhead of name standardization 2% 3% 3%

Overhead of addressstandardization

2% 2% 0%

Overhead of name and addressstandardization

4% 3% 2%

Note: With 60% clean data, there are fewer unique addresses. This can result in less overhead.

Suspect duplicate processing

Table 8 shows the overhead of suspect duplicate processing with and without standardization inthe MDM Server only scenario. Tests were conducted with both datasets (100% clean and 60%clean). History triggers were enabled during these tests.

Table 8. Overhead of suspect duplicate processingOverhead 100% clean data 60% clean data

Overhead of suspect duplicate processing 3% 20%

Overhead of suspect duplicate processingalong with name and address standardization

6% 21%

History triggers

If history triggers are enabled, the IO requirement on the database server increases significantly(nearly doubles). With enough IO bandwidth provided, the overhead is small (approximately 5%).

Test 3: Scalability testsBy definition, scalability is a measure of how well the throughput increases when more load isput on the system. However, for the example test, the number of processors did not actually vary.Instead, the number of parallel requests to the InfoSphere MDM Server were changed by varyingthe number of submitters in the batch processor. Data points were collected between 1 submitterand 24 submitters, at which point the system was clearly saturated.

The test was conducted for both the MDM Server only and the MDM Server + QS scenarios. Testswere conducted in different configurations, and all of them showed near linear scalability.

Table 9 shows the configuration settings for the third test.

Page 30: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 30 of 34

Table 9. Test 3 configurationParameter Value

Hardware/Software stack As described in example test environment

InfoSphere MDM Server heap size Initial: 512MB; Max 1024MB

InfoSphere MDM Server JVM GC policy Default

Number of submitters in batch processor Varied between 1 to 24

Batch processor JVM memory 512MB

ISD job configurations (applicable to the MDM Server + QS scenarioonly)

Default

Type of transaction used MaintainContractPlus

Total volume 15000 to 100,000 records

Input data quality 60% clean

Name standardization ON (default)

Address standardization ON (StandardFormatingIndicator to N in the requestXML)

Suspect duplicate processing ON

History triggers Enabled

Test 3 results: Scalability resultsFigure 13 shows the scalability for the MDM Server only scenario. As shown by green line, thethroughput increases almost linearly with an increase in the number of submitters. The exampleconfiguration utilized more than 90% of CPU capacity on the server running InfoSphere MDMServer. The results for MDM Server + QS are similar.

Figure 13. Scalability of InfoSphere MDM Server with SDP ON

Page 31: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 31 of 34

ConclusionDesigned to provide flexibility in its deployments, developed on leading technology, and offeringunmatched performance and scalability, InfoSphere Master Data Management Server hasbeen the leading choice for a large number of organizations across a range of industries whenimplementing their MDM solutions. As the leader, IBM has the largest number of successfullydeployed MDM implementations in the market today.

This article explained what maintenance services are and how to set up maintenance services inan InfoSphere MDM Server environment. You saw enough details about configuration and tuningtips so you can follow and get maintenance service batch up and running with high performance.This article also covers the steps for setting up Information Server QualityStage for standardizationand matching, if such configuration is required. Some key performance data points from variouscommon scenarios are described, and they show that maintenance services, when being usedfor initial load, provides sustainable high performance and excellent scalability. Finally, this articlesummarized performance overhead measurements of some key features commonly used in MDMServer implementations. You might find them useful for capacity planning an MDM Server systembased on the chosen features and for ensuring the required performance during initial load.

AcknowledgmentsWe would like to thank Lena Woolf, Berni Schiefer, and Karen Chouinard for their input andsuggestions. We would also like to thank the other MDM Server team members for their supportduring this project.

Notices©IBM Corporation 2009. All Rights Reserved.

THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED FOR INFORMATIONALPURPOSES ONLY. ALTHOUGH EFFORTS WERE MADE TO VERIFY THE COMPLETENESSAND ACCURACY OF THE INFORMATION CONTAINED IN THIS DOCUMENT, IT IS PROVIDED“AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THISINFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICHARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS DOCUMENT OR ANY OTHER DOCUMENTATION. NOTHINGCONTAINED IN THIS DOCUMENT IS INTENDED TO, OR SHALL HAVE THE EFFECT OFCREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITSOR THEIR SUPPLIERS AND/OR LICENSORS); OR ALTERING THE TERMS AND CONDITIONSOF THE APPLICABLE LICENSE AGREEMENT GOVERNING THE USE OF IBM SOFTWARE.

All performance data contained in this publication was obtained in the specific operatingenvironment and under the conditions described above and is presented as an illustration only.Performance obtained in other operating environments may vary and customers should conducttheir own testing.

Page 32: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 32 of 34

Performance is based on measurements and projections using standard IBM benchmarks ina controlled environment. The actual throughput or performance that any user will experiencewill vary depending upon many factors, including considerations such as the amount ofmultiprogramming in the user's job stream, the I/O configuration, the storage configuration, andthe workload processed. Therefore, no assurance can be given that an individual user will achieveresults similar to those stated here.

The information in this document concerning non-IBM products was obtained from the supplier(s)of those products. IBM has not tested such products and cannot confirm the accuracy of theperformance, compatibility or any other claims related to non-IBM products. Questions about thecapabilities of non-IBM products should be addressed to the supplier(s) of those products.

The information contained in this publication is provided for informational purposes only. Whileefforts were made to verify the completeness and accuracy of the information contained in thispublication, it is provided AS IS without warranty of any kind, express or implied. In addition, thisinformation is based on IBM’s current product plans and strategy, which are subject to changeby IBM without notice. IBM shall not be responsible for any damages arising out of the use of, orotherwise related to, this publication or any other materials. Nothing contained in this publicationis intended to, nor shall have the effect of, creating any warranties or representations from IBM orits suppliers or licensors, or altering the terms and conditions of the applicable license agreementgoverning the use of IBM software.

References in this publication to IBM products, programs, or services do not imply that they willbe available in all countries in which IBM operates. Product release dates and/or capabilitiesreferenced in this presentation may change at any time at IBM’s sole discretion based on marketopportunities or other factors, and are not intended to be a commitment to future product or featureavailability in any way. Nothing contained in these materials is intended to, nor shall have the effectof, stating or implying that any activities undertaken by you will result in any specific sales, revenuegrowth, savings or other results.

Page 33: netezza-pdf

ibm.com/developerWorks/ developerWorks®

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 33 of 34

Resources

Learn

• See IBM Redbook™Master Data Management: Rapid Deployment Package for MDM formore instructions.

• Refer to the IBM InfoSphere MDM Server Information Center for more instructions.• Refer to the WebSphere Application Server, Version 6.1 Information Center to install IBM

WebSphere Application Server Network Deployment, Version 6.1, and upgrade it withFixpack 11.

• Refer to the IBM DB2 Database for Linux®, UNIX®, and Windows Information Center toinstall DB2 Database Server, Version 9.5.

• Refer to the IBM Information Server Information Center to install IIS Server, Version 8.0.1.• Learn more from IBM Redpaper WebSphere Customer Center: Understanding Performance• Discover DB2 Tuning Tips for OLTP Applications from this classic developerWorks article.• Explore the Information Management Software for z/OS Solutions Information Center.• Learn more about Information Management at the developerWorks Information Management

zone. Find technical documentation, how-to articles, education, downloads, productinformation, and more.

• Stay current with developerWorks technical events and webcasts.

Get products and technologies

• Build your next development project with IBM trial software, available for download directlyfrom developerWorks.

Discuss

• Participate in the discussion forum for this content.• Check out the developerWorks blogs and get involved in the developerWorks community.

Page 34: netezza-pdf

developerWorks® ibm.com/developerWorks/

Loading a large volume of Master Data Management dataquickly: Using MDM Server maintenance services batch

Page 34 of 34

About the authors

Neeraj Singh

Neeraj R Singh is currently a senior performance engineer working on MasterData Management Server performance. He has prior experience leading the Javatechnologies test team for functional, system, and performance tests as technicallead and test project leader. He joined IBM in 2000 and holds a Bachelors Degree inElectronics and Communications Engineering.

Yongli An

Yongli An is an experienced performance engineer focusing on Master DataManagement products and solutions. He is also experienced in DB2 database serverand WebSphere performance tuning and benchmarking. He is an IBM CertifiedApplication Developer and Database Administrator - DB2 for Linux, UNIX, andWindows. He joined IBM in 1998. He holds a bachelor degree in Computer Scienceand Engineering and a Masters degree in Computer Science. Currently Yongli is themanager of the MDM performance and benchmarks team, focusing on Master DataManagement Server performance and benchmarks, and helping customers achieveoptimal performance for their MDM systems.

© Copyright IBM Corporation 2009(www.ibm.com/legal/copytrade.shtml)Trademarks(www.ibm.com/developerworks/ibm/trademarks/)