application programming interface and utility reference

IBM DB2 Intelligent Miner

Application Programming Interfaceand Utility ReferenceVersion 6 Release 1

SH12-6395-00

��

Note!Before using this information and the product it supports, be sure to read the general information under “Appendix B.Notices” on page 329.

First Edition, September 1999

This edition applies to Version 6 Release 1 of:

IBM DB2 Intelligent Miner for Data, 5697-IM3IBM DB2 Intelligent Miner for Data for AS/400, 5655-IM3IBM DB2 Intelligent Miner for Data for OS/390, 5733-IM3

and to all subsequent releases and modifications until otherwise indicated in new editions.

This edition replaces SH12-6326-01.

© Copyright International Business Machines Corporation 1996, 1999. All rights reserved.US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contractwith IBM Corp.

Contents

About This Book . . . . . . . . . . v

Chapter 1. Overview of the IntelligentMiner . . . . . . . . . . . . . . . 1Introduction . . . . . . . . . . . . . 1The Intelligent Miner architecture . . . . . . 2

Client components . . . . . . . . . . 2Server components . . . . . . . . . . 3Interfaces . . . . . . . . . . . . . 4Platforms . . . . . . . . . . . . . 4

Using the Environment Layer API . . . . . . 5Storage management . . . . . . . . . 5

Using the Result API . . . . . . . . . . 8Client tool registration . . . . . . . . . 8Environment variables for tool registration 9

Building and running your application . . . . 9Building your application for AIX, OS/2,WIN32, or Sun Solaris . . . . . . . . . 10Building your application for AS/400 . . . . 11Building your application for OS/390 . . . . 12Running your application . . . . . . . . 13

Migration considerations . . . . . . . . . 15Using mining bases from previous versions ofthe Intelligent Miner . . . . . . . . . . 15Using the statistics mode of the Clusteringmining functions of Version 1. . . . . . . 15Migrating applications . . . . . . . . . 16

Chapter 2. The Environment Layer API 19The structure of the Environment Layer API . . . 19

General structure . . . . . . . . . . . 19Detailed structure. . . . . . . . . . . 20General class description . . . . . . . . 24DataTable class and the Data class . . . . . 25RunSelection class and Settings class . . . . 27

Data type definitions . . . . . . . . . . 28Basic data types . . . . . . . . . . . 28Enumerated types . . . . . . . . . . 29The return data type. . . . . . . . . . 29Exception handling . . . . . . . . . . 29

Base class . . . . . . . . . . . . . . 30IDMBase. . . . . . . . . . . . . . 30

Auxiliary classes . . . . . . . . . . . . 33IDMTimeStamp . . . . . . . . . . . 33IDMBaseMatrix . . . . . . . . . . . 35IDMMatrix . . . . . . . . . . . . . 36IDMMatrix0 . . . . . . . . . . . . 37

Data table . . . . . . . . . . . . . . 37IDMDataTable . . . . . . . . . . . . 38IDMStaticTable . . . . . . . . . . . 40IDMFlatFileTable . . . . . . . . . . . 41IDMPipeTable . . . . . . . . . . . . 43IDMDB2Table . . . . . . . . . . . . 44IDMMatrixTable . . . . . . . . . . . 49

Data fields . . . . . . . . . . . . . . 51

IDMDataField . . . . . . . . . . . . 51IDMFlatFileField . . . . . . . . . . . 55IDMMatrixField . . . . . . . . . . . 56Computed fields . . . . . . . . . . . 57

Mining base . . . . . . . . . . . . . 71IDMMiningBase . . . . . . . . . . . 71

The base class of the mining classes . . . . . 80IDMMiningClass . . . . . . . . . . . 80

Mining settings . . . . . . . . . . . . 82IDMData . . . . . . . . . . . . . 83IDMNameMapping . . . . . . . . . . 84IDMValueMapping . . . . . . . . . . 86IDMDiscretization . . . . . . . . . . 88IDMItemCategory . . . . . . . . . . 90IDMTaxonomyRelation . . . . . . . . . 91IDMTaxonomy. . . . . . . . . . . . 93

Mining run selection. . . . . . . . . . . 95IDMAtomicSelection . . . . . . . . . . 95IDMAndSelections . . . . . . . . . . 97IDMSelections . . . . . . . . . . . . 97

Item constraints . . . . . . . . . . . . 97IDMAtomicConstraint . . . . . . . . . 97IDMAndConstraints . . . . . . . . . . 98IDMConstraints . . . . . . . . . . . 98

Mining results . . . . . . . . . . . . . 98IDMResult . . . . . . . . . . . . . 99IDMResultSet . . . . . . . . . . . . 104IDMBrowseFormat . . . . . . . . . . 105IDMBrowseFormatKey . . . . . . . . . 106

Settings . . . . . . . . . . . . . . . 107IDMSettings . . . . . . . . . . . . 107

Data settings . . . . . . . . . . . . . 113IDMAssocSettings . . . . . . . . . . 113IDMSeqPatternSettings . . . . . . . . . 115IDMSimSeqSettings . . . . . . . . . . 118IDMDescQuantSampleSettings . . . . . . 121IDMClusteringSettings . . . . . . . . . 125IDMClassifySettings . . . . . . . . . . 135IDMPredictionSettings . . . . . . . . . 142

Preprocessing settings . . . . . . . . . . 150IDMProcessingSettings . . . . . . . . . 151IDMAggregateValues . . . . . . . . . 153IDMCalculateValues . . . . . . . . . . 158IDMCleanUpDataSources . . . . . . . . 163IDMConvertToLowercaseOrUppercase . . . . 165IDMCopyRecordsToFile. . . . . . . . . 170IDMDiscardRecordsWithMissingValues . . . 173IDMDiscretizationIntoQuantiles . . . . . . 178IDMDiscretizationUsingRanges . . . . . . 184IDMEncodeMissingValues . . . . . . . . 190IDMEncodeNonvalidValues . . . . . . . 195IDMFilterFields . . . . . . . . . . . 202IDMFilterRecords . . . . . . . . . . . 207IDMFilterRecordsUsingAValueSet . . . . . 211IDMGetRandomSample. . . . . . . . . 217IDMGroupRecords . . . . . . . . . . 222

© Copyright IBM Corp. 1996, 1999 iii

IDMJoinDataSources . . . . . . . . . . 228IDMMapValues . . . . . . . . . . . 233IDMPivotFieldsToRecords . . . . . . . . 240IDMRunSQL . . . . . . . . . . . . 246

Statistics settings . . . . . . . . . . . . 247IDMStatLinearRegression . . . . . . . . 247IDMStatUnivariateCurve . . . . . . . . 252IDMStatPrinComAnalysis . . . . . . . . 256IDMStatFactorAnalysis . . . . . . . . . 258

Repeatable sequences settings . . . . . . . 262IDMSequence . . . . . . . . . . . . 262

Chapter 3. The Result API . . . . . . 267Result APIs for associatons and sequentialpatterns . . . . . . . . . . . . . . . 267

Result APIs for associatons and sequentialpatterns Version 2 . . . . . . . . . . 268

Data structures . . . . . . . . . . . . 269Associations rules structure . . . . . . . 269Frequent item set structure . . . . . . . 269Statistics structure . . . . . . . . . . 269Sort order enumeration . . . . . . . . . 270Sequential Patterns structure . . . . . . . 270Structures of the Result API Version 6 . . . . 270

General functions for the Associations andSequential Patterns Result API . . . . . . . 271

Functions of the Result API Version 2 . . . . 271Functions of the Result API Version 6 . . . . 272

Functions for associations rules . . . . . . . 273Functions of the Result API Version 2 . . . . 273Functions of the Result API Version 6 . . . . 274

Functions for frequent item sets . . . . . . . 275Functions of the Result API Version 2 . . . . 275Functions of the Result API Version 6 . . . . 276

Statistics functions for the Associations andSequential Patterns Result API . . . . . . . 277

Functions of the Result API Version 2 . . . . 277Functions of the Result API Version 6 . . . . 278

Sequential Patterns functions . . . . . . . . 278Functions of the Result API Version 2 . . . . 279Functions of the Result API Version 6 . . . . 280

Result APIs for classification, clustering,prediction, and descriptive statistics . . . . . 281

IDMDBasicDescrStatsResult . . . . . . . 282IDMDBasicPartition . . . . . . . . . . 284IDMDClassificationResult . . . . . . . . 285IDMDClusteringResult . . . . . . . . . 286IDMDCluster . . . . . . . . . . . . 288IDMDPredictionResult . . . . . . . . . 288IDMDRegion . . . . . . . . . . . . 290IDMDQuantileResult . . . . . . . . . 291IDMDDescrStatsQuantResult . . . . . . . 292IDMDPartition. . . . . . . . . . . . 293

Result APIs for Statistical functions . . . . . . 293IDMStatisticsResult . . . . . . . . . . 293IDMStatTable . . . . . . . . . . . . 294IDMStatCovarianceMatrix . . . . . . . . 295IDMStatLinearRegressionResult . . . . . . 295IDMStatLinRegTable . . . . . . . . . . 296IDMStatLinRegANOVA. . . . . . . . . 296IDMStatUnivariateCurveResult . . . . . . 297IDMStatPrinComAnalysisResult . . . . . . 298IDMStatFactorAnalysisResult . . . . . . . 299IDMStatFactorInputAnalysis . . . . . . . 299IDMStatFactorStatistic . . . . . . . . . 300IDMStatFactorRotation . . . . . . . . . 300IDMStatFactorStructure . . . . . . . . . 301IDMStatFactorRegression . . . . . . . . 301

Data Sample Result API . . . . . . . . . 302IDMDDataSample . . . . . . . . . . 302

Auxiliary classes for the Result API. . . . . . 303Overview on IDMField Classes . . . . . . 303IDMGeneralField . . . . . . . . . . . 304IDMField . . . . . . . . . . . . . 305IDMNumericField . . . . . . . . . . 306IDMCategoricalField. . . . . . . . . . 308IDMMultiField. . . . . . . . . . . . 308IDMMultiNumericField. . . . . . . . . 308IDMMultiCategoricalField . . . . . . . . 308IDMContinuousStatistics . . . . . . . . 308IDMDiscreteStatistics . . . . . . . . . 310IDMArray . . . . . . . . . . . . . 311

Appendix A. Sample applicationsusing the Environment Layer API . . . 313Sample application using flat files . . . . . . 313

Source code of the flat file sample program 314Running the flat file sample program . . . . 319

Sample application using DB2 . . . . . . . 320Source code of the DB2 sample application 320Running the DB2 sample program . . . . . 326Output of the DB2 sample . . . . . . . . 326

Appendix B. Notices. . . . . . . . . 329Trademarks and Service Marks . . . . . . . 330

Glossary. . . . . . . . . . . . . . 331

Bibliography . . . . . . . . . . . . 337IBM Corporation . . . . . . . . . . . . 337Other Documentation . . . . . . . . . . 337

Index . . . . . . . . . . . . . . . 339

iv IBM DB2 Intelligent Miner API and Utility Reference

About This Book

The IBM DB2 Intelligent Miner for Data Version 6, in the following chaptersreferred to as Intelligent Miner, is a suite of mining, processing, and statisticsfunctions that you can use to analyze large databases. It also provides visualizationtools for viewing and interpreting mining results. The server software runs on AIX,AS/400, OS/390, Sun Solaris, and Windows NT operating systems. You can useAIX, OS/2, and Windows clients. Unless otherwise stated, all functions that aredescribed apply to all supported servers and clients.

This book is intended for application builders and database or data warehouseadministrators.

You should be familiar with C++ classes, C functions, and C structures. You shouldalso know how to use the IBM Collection Class Library delivered with the C++Compiler of IBM. You should be familiar with the functions of the IntelligentMiner and know how to use them. Using the Intelligent Miner for Data provides allthe information required to install and run the program.

You can use the Intelligent Miner functions through the following ApplicationProgramming Interfaces (APIs):v The Environment Layer APIv The Result API

The Environment Layer API addresses the following phases of the data-miningprocess:v Data definitionv Data miningv Export of results

The Result API provides a set of C++ classes and C structures to represent theresults of a mining run together with C++ and C functions to retrieve and sortthem.

This book describes the functions of the various parts of these APIs. It explainshow you can use these functions to integrate the Intelligent Miner into largersolutions.

© Copyright IBM Corp. 1996, 1999 v

vi IBM DB2 Intelligent Miner API and Utility Reference

Chapter 1. Overview of the Intelligent Miner

This chapter gives an overview of the Intelligent Miner architecture and a briefdescription of the individual components. The interface concepts are introducedand the APIs described in this book are explained briefly. Information on theplatforms supported and some programming tips on building API applications arealso provided.

Introduction

A data-mining operation is typically made up of several distinct steps combiningthe execution of data-mining functions with data preprocessing functions. The userof the data-mining application wants to be able to select and control the executionof these functions in a flexible way.

The Intelligent Miner provides the means for this selection and control. It containsthe following features:v A set of data-mining functions providing the most frequently required

data-mining technologies. These include statistical functions that provide variousstatistical and forecasting methods to help users in analyzing the input data andsupport the business decision.

v A processing library providing functions for data transformation.v A set of API functions called Environment Layer API to control the execution of

the data-mining functions.v A set of API functions called Result API to manage the results of data-mining

runs.v A client/server structure to provide communication between data-mining and

data preprocessing functions on the server, and administrative and visualizationfunctions on the client.

v An Administration Graphical User Interface (Administration GUI) to control theexecution of the data-mining functions and the preprocessing functions, and tomanage the results of data-mining runs.

v The capability to define and execute sequences of functions and miningoperations both through the GUI and the Environment Layer API.

© Copyright IBM Corp. 1996, 1999 1

The Intelligent Miner architecture

Client components

Figure 1 shows the following client components:

Administration Graphical User Interface (Administration GUI)The Administration GUI provides the means for end users to specify input,output, and control parameters for the mining functions and datapreprocessing functions as well as the management of results.

VisualizerDifferent visualizers are available to display the results to the user.

Environment Layer APIThe Environment Layer API provides a set of data types and classes tocontrol the behavior of the data-mining functions and data preprocessingfunctions. These classes are made available as C++ classes that allow youto build applications combining different functions.

Figure 1. The Intelligent Miner architecture. The Environment Layer API and the Result API are also available on theserver.

2 IBM DB2 Intelligent Miner API and Utility Reference

Result API (load)This is the Result API. It provides ways of handling the results of themining runs. On the client side the results can be loaded for visualizationand other applications.

Client/server componentThis component addresses the communication between data types, clientclasses, and server classes. It is implemented using Remote Procedure Calls(RPC).

Server components

Figure 1 shows the following server components:

Processing libraryThe Processing library allows users to collect and prepare the input datafrom various databases.

Data mining functionsThe following mining functions are available:v Associationsv Classification

– Tree Classification– Neural Classification

v Clustering– Demographic Clustering– Neural Clustering

v Repeatable Sequencesv Sequential Patternsv Statistics Functionsv Similar Sequencesv Value Prediction

– Radial Basis Function (RBF) Prediction– Neural Prediction

You can use classification, clustering, value prediction, and statisticsfunctions in a cooperative way. This means that you can use the outputof one function as input to another function.

Data Access componentThe Data Access component provides C++ classes to:v Access data fieldsv Provide field statistics

Access is provided to flat files and to relational database tables in a waytransparent to the user of the Data Access component. Principal users ofthe Data Access component are the mining functions and the datapreprocessing functions. The Data Access component provides a logicalview of the data, which must be in tabular format. Data can be of typereal, integer, or string.

Result API (save)The Result API provides ways of handling the results of the mining runs.On the server side the results are written to result files by the miningfunctions.

Chapter 1. Overview of the Intelligent Miner 3

NoteThe Data Access component and the Result API (save) are internalcomponents of the Intelligent Miner. These APIs are not public. They can onlybe used directly by the preprocessing functions and the mining functions.Customer applications make implicit use of these APIs through theEnvironment Layer API.

Interfaces

The Environment Layer API externalizes the Intelligent Miner functions throughC++ classes. The methods of these classes support the following tasks:v Setting of parameters that control the operation of individual functionsv Requesting execution of the function at the server. Execution can be synchronous

or asynchronous with respect to the client processv Requesting the status of functions executing asynchronouslyv Requesting the results of functions that have completed normallyv Requesting execution of a sequence of functions at the serverv Requesting reason codes for abnormal completionv Stopping an asynchronous function request

Functions provided by the Environment Layer API are made available to the userthrough the Administration GUI.

Viewers are available to assist users in interpreting the results of mining runs.Vendor tools can be registered and integrated to enhance the analysis andvisualizer capabilities of the Intelligent Miner.

Platforms

The Intelligent Miner Version 6 servers including the Environment Layer API andthe Result API (load) are available on the following platforms:v AIXv AS/400v OS/390v Sun Solarisv Windows NT

The Result API (load) for associations and sequential patterns is not available onOS/390 and AS/400 at this time. You can work with results on the OS/390platform by using the Result API from the other available platforms.

A parallel version of the Intelligent Miner is available for AIX, Sun Solaris, andWindows NT. Support for DB2/PE or UDB EEE is included.

The client functions (API, GUI, and visualization components) are delivered for thefollowing platforms:v AIXv OS/2v WIN32 (that is, Windows 95 and Windows NT)


The Intelligent Miner imposes as few restrictions as possible on the platform andformat of the input data and delivers the mining results in a format that can beused by other software products for display or for further analysis.

Using the Environment Layer API

The Intelligent Miner is using a client/server architecture. Data mining isperformed on the server. Data definition and the interpretation of the results aredone on the client. The Environment Layer API provides an interface to the clientfunctions triggering the execution of the server functions. These functions includethe following tasks:v Definition of the data to be minedv Management of the mining process itselfv Handling of the results of the mining runs

The Administration GUI is based on this API.

The Environment Layer API provides C++ classes and methods as well as Cstructures and C functions that allow application programs to use the functions ofthe Intelligent Miner.

The IBM Collection Class Library is used to define groups of objects and toprovide access to the objects within the groups using the methods for storage andretrieval of objects.

The export part of the Environment Layer API enables the transfer of miningresults to other software products, like spreadsheets, visualization tools, and textprocessing systems. The C++ structures and functions required to export themining results are provided by this part of the API.

Storage management

This section describes how the classes of the Environment Layer API manage theirstorage. Refer to “Chapter 2. The Environment Layer API” on page 19 for anoverview of the objects and classes.

The mining base (class IDMMiningBase) holds, in IKeySortedSet or ISortedRelationcollections, pointers to all of the following objects that are associated with themining base:v Data settingsv Mining settingsv Preprocessing settingsv Statistics settingsv Repeatable Sequences settingsv Mining results (only of the classes IDMResult and IDMResultSet)

If such an object is constructed by a constructor (but not the default constructor) orby the createObject method, the pointer to the constructed object is added to theappropriate collection in the mining base object. The object is not copied when thepointer is added to the appropriate collection.

Certain classes have objects of other classes as mandatory or optional members.Examples are the mining settings classes that contain a data object as a mandatory


member specifying the data to be analyzed by a mining function. Such a referenceis implemented as a pointer to the object in the appropriate collection.

To keep the mining base and its objects in a consistent state, an object is removedfrom the appropriate collection of the mining base if this object is deleted by thedestructor. Furthermore, if there are other objects containing this object as anobligatory member, these objects are deleted, too. For optional references, if anobject contains a pointer to the deleted object, this pointer is set to NULL.

Deleting an object might result in a cascade of deletes of other objects. This has theconsequence that if the memory for such an object is allocated on the stack, a coredump occurs when the object goes out of scope. To prevent this, objects shouldalways be constructed using the new-operator or the createObject method.

The cascade of deletes might not always be desired for other reasons. Therefore,the deleteObject method is provided for each class. It checks first for references tothe deleted object and calls the destructor only if no references exist. It returns anerror if it detects any such references.

Figure Figure 2 on page 7 shows which class objects are mandatory or optionalmembers of other class objects. Class X => class Y indicates that class X is amandatory member of class Y. Class X —> class Y indicates that class X is anoptional member of class Y.


All objects that are not held in IKeySortedSet or ISortedRelation collections in themining base are copied when a pointer or a reference to the object is an inputparameter (directly or as an element of a collection) to a method of another class.For these objects you have to manage the storage yourself. This is valid for thefollowing classes and class hierarchies:

IDMNameMapping -> IDMDataField-> IDMItemCategory

IDMItemCategory => IDMTaxonomyRelationIDMTaxonomyRelation => IDMTaxonomyIDMValueMapping => IDMValueMappingField

-> IDMClassifySettings-> IDMClusFieldParams

IDMDiscretization => IDMDiscretizationFieldIDMData => IDMAssocSettings

=> IDMSeqPatternSettings=> IDMSimSeqSettings=> IDMClassifySettings=> IDMClusteringSettings=> IDMPredictionSettings=> IDMDesQuantSampleSettings=> IDMStatCrossCorrelation=> IDMStatCorrMatrices=> IDMStatLinearRegression=> IDMStatUnivariateCurve=> IDMStatPrinComAnalysis=> IDMStatFactorAnalysis

IDMAssocSettings -> IDMSequenceIDMSeqPatternSettings ->IDMSimSeqSettings ->IDMClusteringSettings ->IDMClassifySettings ->IDMPredictionSettings ->IDMDesQuantSampleSettings ->IDMProcessingSettings + childs ->IDMStatCrossCorrelation ->IDMStatCorrMatrices ->IDMStatLinearRegression ->IDMStatUnivariateCurve ->IDMStatPrinComAnalysis ->IDMStatFactorAnalysis ->IDMSequence ->

Figure 2. Mandatory and optional members of class objects


See “Appendix A. Sample applications using the Environment Layer API” onpage 313 for sample applications.

Using the Result API

The Result API provides a set of C++ classes and C structures to represent theresults of a mining run together with C++ and C functions to retrieve and sortthem. It provides the basis for writing export routines to other software products.

The Result API is independent of the Environment Layer API to make its use asflexible as possible. This independence allows the creation of conversion routinesfor mining results that run on platforms different to the current one.

Client tool registration

The Intelligent Miner supports client tool registration which API users can use toregister executables specific to a platform or result type. You can use theseexecutables for converting or visualizing results.

Tool registration files

The Intelligent Miner provides executables for converting or visualizing results. Ifyou want to provide your own executables, they must be registered. You canregister platform-specific executables to be started by Intelligent Miner bymodifying the tool registration files.

Two tool registration files are delivered with Intelligent Miner, a client-registrationfile and a server-registration file.

Client-registration file idmcsctr.datAs shipped, the client-registration file idmcsctr.dat contains entries used tostart programs that allow you to visualize results. It also contains entriesused by the client to define parameters for exporting results.

You might find it necessary to change the client-registration file if you are:v Providing an executable program on the client that uses the Result API

to get results and display them for a user to browse

IDMDataTable

IDMStaticTableIDMFlatFileTableIDMDB2TableIDMPipeTableIDMMatrixTable

IDMDataFieldIDMFlatFileFieldIDMMatrixFieldIDMComputedField

IDMValueMappingFieldIDMDiscretizationFieldIDMFunctionField

IDMMatrixIDMMatrix0IDMClusFieldParams


v Providing a browser path name for a spreadsheet to browse the resultsin comma-separated variables format

v Modifying the list of browsers used by Intelligent Miner

The client-registration file idmcsctr.dat contains comments and examplesthat explain how to specify the parameters expected by the IntelligentMiner. It is installed in the directory specified by IDM_BIN_DIR on theclient.

Exception: On AIX clients in language versions other than English of theIntelligent Miner, the file is read from the directory/usr/lpp/IMiner/nls/<lang>, where <lang> is substituted by the languageidentifier. For example, in Simplified Chinese, the file is read from thedirectory /usr/lpp/IMiner/nls/zh_CN.

Note: Always make a backup copy of the file idmcsctr.dat before modifying it.

Server-registration file idmcsstr.datAs shipped, the server-registration file idmcsstr.dat contains informationused by the server to start executables. These executables convert resultsinto comma-separated variables format or a format suitable for use by theResult API at the client (IdmApi format).

You might find it necessary to change the server-registration file if you areproviding an executable program on the server that uses the Result API toget results, and converts them into a format not already supported by theIntelligent Miner.

The server-registration file idmcsstr.dat is installed in the directoryspecified by IDM_BIN_DIR on the host machine. This file containscomments that explain the parameters expected by the Intelligent Minerwhen it reads the file.

Note: Always make a backup copy of the file idmcsstr.dat before modifying it.

Accessing client tool registration files from your application

The mining result classes of the Environment Layer API (IDMResult,IDMBrowseFormat, IDMBrowseFormatKey) provide methods to export and browsethe mining results according to the parameters you specified in the client toolregistration files.

Environment variables for tool registration

You can specify the environment variable IDM_MAX_EXP_RESULT_SIZE at theclient. It will be used as a default disk size limit for results copied to the clientmachine. Specify this value in number of bytes, without using commas or otherpunctuation marks. Intelligent Miner’s server executable that converts results intocomma-separated variables format recognizes this parameter.

Building and running your application

You can build and run your own applications on the different platforms. TheIntelligent Miner Application Development Toolkit (ADT) must be installedtogether with the Intelligent Miner server or client on the system where you wantto build an application using the Intelligent Miner APIs.


Building your application for AIX, OS/2, WIN32, or Sun Solaris

The libraries and the files required to build an application for a standard AIXinstallation are provided under the following paths:

Libraries/usr/lpp/IMiner/lib/

Header files/usr/lpp/IMiner/include/

The libraries and the files required to build an application for a standard WIN32 orOS/2 installation are provided under the following paths:

Libraries<BASE-DIRECTORY>\lib\

Header files<BASE-DIRECTORY>\include\

The libraries and the files required to build an application for a standard SunSolaris installation are provided under the following paths:

Libraries/opt/IMiner/lib

Header files/opt/IMiner/include

The Environment Layer API for the product is contained in the following libraries:

libidm.a AIX and Sun Solaris

idm.lib WIN32 and OS/2

On AIX, the libidmts.a library contains a thread safe version of the EnvironmentLayer API. Although the library is built as thread safe, all calls of the EnvironmentLayer API must be serialized. Applications that use this library have to becompiled and linked with the command xlC_r of the IBM CSet++ compilercompiler.

The Result API for associations and sequential patterns as described in “ResultAPIs for associatons and sequential patterns” on page 267 is included in thefollowing libraries:

libidmex.a Result API Version 2 for AIX and Sun Solaris

libidmasres.a Result API Version 6 for AIX and Sun Solaris

idmex.lib Result API Version 2 for WIN32 and OS/2

idmasres.lib Result API Version 6 for WIN32 and OS/2

idmex.dll Result API Version 2 additionally for OS/2 and WIN32

idmasres.dll Result API Version 6 additionally for OS/2 and WIN32

A Java Result API for associations and sequential patterns results is stored in thelibrary idmasres.jar.

The Result API for classification, clustering, prediction, and descriptive statistics iscontained in the following libraries:


libidmac.a AIX and Sun Solaris

idmac.lib WIN32 and OS/2

This Result API is described under “Result APIs for classification, clustering,prediction, and descriptive statistics” on page 281.

The Result API for the statistics functions Factor Analysis, Principal ComponentAnalysis, Linear Regression, and Univariate Curve Fitting are included in thefollowing libraries:

libidmsr.a AIX and Sun Solaris

idmsr.lib WIN32 and OS/2

This Result API is described in “Result APIs for Statistical functions” on page 293.

Prerequisites for AIX, OS/2, WIN32, and Sun Solaris

The Intelligent Miner APIs are programmed in C++. Depending on the platform,the following compilers are required to build Intelligent Miner applications:

AIX IBM CSet++ Version 3.1.4

OS/2 IBM VisualAge C++ Version 3.0 and the following fixpacks:

CTO307For class libraries

CTC306For the compiler

Sun Solaris IBM CSet++ for the Solaris Operating Environment Version 1.1.1.4

WIN32 IBM VisualAge C++ Version 3.5 and the following fixpacks:

WTO353For class libraries

WTC354For the compiler

Building your application for AS/400

The libraries and the files required to build your own application are providedwith the standard IBM DB2 Intelligent Miner for AS/400 installation. Thefollowing files are provided:

Message file/QIBM/ProdData/IMiner/bin/idmall.msg (English only)

Environment Layer API libraryQsys.lib/QIDM.lib/LIBIDM.SRVPGM

Result API for classification, clustering, prediction, and descriptive statistics/Qsys.lib/QIDM.lib/IDMDADLL.SRVPGM

Result API for the statistical functions Factor Analysis, Principal ComponentAnalysis, Linear Regression, and Univariate Curve Fitting

/Qsys.lib/QIDM.lib/IDMSRES.SRVPGM


Prerequisites

The Intelligent Miner APIs are written in C++. For AS/400, a native C++ compileris not available. Therefore a cross-compiler and the AS/400 cross-compilerback-end are required. For WIN32, a cross-compiler is available. Therefore thenecessary header files are not part of the Intelligent Miner for AS/400. They mustbe installed together with the Intelligent Miner ADT.

Following components are required on the client:v IBM VisualAge C++ Version 3.5v WTO353 for the class librariesv WTC354 for the compilerv IBM VisualAge C++ for OS/400 V3R7 (5716-CX5)v Client Access/400 Windows/95 Client or Client Access/400 Windows/NT Client

On the server, VisualAge for C++ for AS/400 (host components 5716-CX4 and5716-CX5-Base) is required.

Building your application for OS/390

The header files required to build an application are provided on your IntelligentMiner client CD. You can use FTP to transmit these files from the directory/include into an HFS directory of the OS/390 UNIX system services.

The following files are provided:

Message file/usr/lpp/IMiner/bin/idm.msg

Side definition decks

hlq.SIDMSDEF(IDMSDELS)For the Environment Layer API

hlq.SIDMSDEF(IDMSDDAS)For the DataAccess API (serial)

hlq.SIDMSDEF(IDMSDDAP)For the DataAccess API (parallel)

Prerequisites, compile options, and link considerations forOS/390

The Intelligent Miner APIs are programmed in C++. The OS/390 C/C++ compileris a prerequisite for building Intelligent Miner applications.

If your application is accessing DB2 tables, the preprocessor macroIDM_DB2_NO_CLI must be defined when compiling your application. You can dothis by adding the compiler option DEFINE(IDM_DB2_NO_CLI) or, if you areusing OS/390 Unix System Services, by adding -DIDM_DB2_NO_CLI.

You must include the header files provided for the Environment Layer API.

In JCL, the LSEARCH compiler option should list the Intelligent Miner MVS dataset header files and the SEARCH compiler option in the following order:v The HFS directory one level above the Intelligent Miner rpc directoryv The system HFS directory /usr/include/


v The Intelligent Miner MVS data set header filesv LE SCEEH.+ header filesv C++ SCLBH.+ header files

In OS/390 Unix System Services, the files listed above for the SEARCH compileroption should be included with the -I compiler option.

After your application has compiled cleanly, you need to link it with theEnvironment Layer API library. You can do this using JCL or under the OS/390Unix System Services (USS).

In the prelink step, you must add DD statements that point to the side definitiondecks. For example:v //ELSD DD DISP=SHR.DSN=hlq.SIDMSDEF(IDMSDELS)v //DASD DD DISP=SHR.DSN=hlq.SIDMSDEF(IDMSDDAS)

You must also add the following binder control statements to the binder inputstream:v INCLUDE ELSDv INCLUDE DASD

Running your application

This section describes the environment required for your application to run. It alsolists commonly used environment variables.

Environment for AIX, OS/2, WIN32, or Solaris applications

You can build applications that can run on the client or on the server. If yourapplication runs on a machine different from the Intelligent Miner server, theapplication must select the host on which the server is running at run time. UserID and password for the server host must also be provided. The class IDMBaseprovides static data members to hold the server host name, user ID, and password.If your application runs on the same machine as the Intelligent Miner server instand-alone mode, no host name is required. If user ID and password are notspecified for the server, the client user ID is used instead.

Users working with DB2 must specify the DB2 user ID and password, if the userID and password specified for the server are not allowed to connect to thedatabases. See Using the Intelligent Miner for Data for information on how to startand stop the Intelligent Miner server process.

Environment for an OS/390 application

The application you build runs on a server with the IBM Intelligent Miner forOS/390 installed, where the mining functions also run. When your applicationruns on a different server than the mining functions, your application must selectthe host for the server where the mining functions are to run. User ID andpassword must also be provided. The class IDMBase provides static data membersto hold the server host name, the user ID, and the password. If the applicationruns on the same server as the mining functions in stand-alone mode, no hostname is required. When accessing DB2, a host name is always required, even if theapplication and the mining functions are running on the same machine. Also theDB2 subsystem ID must be provided.


Environment for AS/400

The application you built runs on the server where the Intelligent Miner forAS/400 is installed. The mining functions are also running on this server.

Your application must select the host for the server where you want to run themining functions, even if the functions and your application are on the samemachine. You must always specify the user ID and the password.

Environment variables

The following environment variables might be useful when you are running yourapplication:

IDM_BIN_DIRYou can use this variable to define the directory in which the executablemodules, for example, mining functions, viewers, auxiliary programs, areinstalled. This variable is set automatically when the Intelligent Minerserver process is started. You must set IBM_BIN_DIR when yourapplication runs on the same machine as the Intelligent Miner, and themining, processing, or statistics functions run synchronously. In that caseno Intelligent Miner server process must be started.

IDM_MNB_DIRThis variable is used to define the directory that contains the mining bases.The Environment Layer API creates a new subdirectory called idmmnb inthe defined directory. This idmmnb directory contains the individualmining bases. If client and server are running on different machines, andthe variable IDM_MNB_DIR is not set, the $HOME directory is used. Ifthis variable is set on the server before starting the server daemon, allconnecting clients will share the same set of mining bases. If you want torun the client and the server on the same machine, and if you want to startthe mining run synchronously, this variable must be set. In that case noIntelligent Miner server process must be started.

If you are using Windows NT and the variable is not set on the server, thevalue of the variable IDM_HOME_DIR is used.

IDM_RES_DIRYou can use this variable to define the directory that contains the results ofa mining run. The Environment Layer API creates a new subdirectorycalled idmres in the defined directory. This idmres directory contains theindividual result files (result file (.dat)), and some auxiliary files (settingfile (.set)), output parameter file (.out), kernel error file (.err), kernel lasterror file (.lst), trace file (.trc), status files (.sts).

If the variable IDM_RES_DIR is not set and client and server are runningon different machines, the directory pointed to by IDM_MNB_DIR is used.If you want to run client and server on the same machine, and if you wantto start the mining run synchronously, you must specify this variable. Inthat case no Intelligent Miner server process must be started.

If you are using Windows NT and if the variable is not set on the server,the value of the variable IDM_HOME_DIR is used.

IDM_DEBUGOn the client, if this variable is set to on, the system shows all function andviewer calls as they are submitted. If this variable is not set or is set tosome other value, the function and viewer call information is suppressed.


IDM_CLI_USEDIf this variable is set, the client is accessing DB2 directly. Note that DB2CAE or DB2 Connect must be installed and configured. If this variable isnot set (default), the client is accessing DB2 through the Intelligent Minerserver.

IDM_HOME_DIRThis variable applies only to Windows NT. You can use it on the server todefine your own directory to store the mining bases and the result files.For each user, a subdirectory is located in the directory specified byIDM_HOME_DIR. In this subdirectory, the Environment Layer API createsnew subdirectories called idmmnb and idmres. For example, if theenvrionment variable IDM_HOME_DIR is set to E:\imhome, the miningbases of user Ted are located in E:\imhome\ted\idmmnb and the resultfiles are located in E:\imhome\ted\idmres. IDM_HOME_DIR is used onlyif the variables IDM_MNB_DIR and IDM_RES_DIR are not set.

See Using the Intelligent Miner for Data for more information on how to use and setenvironment variables for Intelligent Miner.

Migration considerations

The following sections describe how you can use mining bases created withprevious versions of the Intelligent Miner.

Using mining bases from previous versions of the IntelligentMiner

The structure of the mining bases has changed due to the new and enhancedfunctions of the Intelligent Miner Version 6. Nevertheless, you can use miningbases of previous versions, because the Intelligent Miner Version 6 can load thesemining bases. You only need to open an existing mining base of a previousversion, perform changes, if required, and save it. The mining base is automaticallyconverted to the new Version 6 format.

Note: Be aware that this mining base cannot be used any longer by the previousversions of the Intelligent Miner. If you need to use previous versions of theIntelligent Miner again, save all the files residing in the subdirectoriesidmmnb and idmres before using the Intelligent Miner Version 6.

Using the statistics mode of the Clustering mining functionsof Version 1

The Clustering mining functions of Version 1 provided a statistics mode. Thismode was replaced by the Bivariate Statistics function. If mining bases createdwith Version 1 contain one or more Clustering settings in which statistics modewas selected, these settings will automatically be converted toIDMDescQuantSampleSettings. On the graphical user interface (GUI), these objectsare represented as Bivariate Statistics settings.


Migrating applications

Migrating applications written for Intelligent Miner for DataVersion 1

Functions of the Environment Layer API of the Intelligent Miner for Data Version 1are not supported any longer. If you have built an application using theEnvironment Layer API of the Intelligent Miner for Data Version 1 or if you usedthe Version 1 compatibility support of the Intelligent Miner Version 2, you mustreplace all functions of Version 1 with the appropriate functions of Version 6.

Migrating applications written for Intelligent Miner Version 2

Migrating applications using the Environment Layer API: If you have built anapplication using the Environment Layer API of Version 2, you must recompileyour application and link it to the Environment Layer API of Version 6. You mustalso change the data type of the parameter quantileLimits of the classIDMDescQuantSampleSettings from ISequence<IDMINTEGER>& toISequence<IDMREAL>& in the constructor and the methods createObject, get andupdate.

Starting a mining run on the first node listed in the hostlist file is not supportedany longer. If you specify 1 as number of processes in the start method of classIDMSettings, the number of processes is set to 0. This means that a serial miningrun is started independently from the hostlist file.

The method startAll of the class IDMSequence does not support the number ofprocesses parameter any longer. The following definition is new:startAll( IDMBOOLEAN synchRunFlag=IDM_TRUE, IDMINTEGER tracelevel=0 );

Migrating programs using the Result API: Minor code changes are required tomake a Version 1 application work with the Result API of the Intelligent MinerVersion 6.

The Result API of the Intelligent Miner for Data Version 1 supported the followingfield types:v IDMContinuousFieldv IDMDiscreteFieldv IDMDiscreteNumericFieldv IDMDiscreteNonNumericField

With Intelligent Miner Version 6 these fields are replaced by the following fields:v IDMFieldv IDMNumericFieldv IDMCategoricalField

The header file idmdfld.hpp contains typedef statements that allow an applicationbuilt with the Intelligent Miner for Data Version 1 to still use the Version 1 fieldtypes. However, the method getStatistics() needs to be replaced by the followingmethods:v getDiscrStatistics() for IDMDiscreteFieldv getContStatistics() for IDMContinuousField

So while making these changes, consider to introduce the new field types that aresupported by the Intelligent Miner Version 6 according to the typedef statements inthe header file idmdfld.hpp:


// definitions for backward compatibility with IM Version 1typedef IDMField IDMDiscreteField;typedef IDMNumericField IDMContinuousField;typedef IDMNumericField IDMDiscreteNumericField;typedef IDMCategoricalField IDMDiscreteNonNumericField;


Chapter 2. The Environment Layer API

The mining functions run on mining data and produce mining results. TheEnvironment Layer API manages the information about this data, thus dealing withthe so-called meta-mining data. This consists of the schema definitions for flat filedata, the parameters of the mining runs, and references to the results. To make thisinformation reusable for later mining runs, the meta-mining data is stored inmining bases.

This chapter contains a detailed description of the objects and methods that areprovided by the Environment Layer API.

The structure of the Environment Layer API

The structure of the Environment Layer API is shown first in diagram form andthen as a detailed tree with all classes and subclasses.

General structure


Detailed structure

Repeatable-SequencesCl.

Statistical-FunctionCl.

Processing-LibraryCl.

MiningRun-SettingCl.

MiningBaseCl. MiningCl.

BaseCl.

Primary Classes

Secondary Classes:

Key:

A B: B inherits all features from A

A B: A uses B

SettingsCl. ResultCl.TaxonomyCl.DataCl.

ItemConstraintCl.RunSelectionCl.

AuxiliaryCl.

DataTableCl.

BaseCl.

DataFieldCl.

Figure 3. General class structure with primary and secondary classes


The Environment Layer API object classes do not hold the current mining data orthe mining results. They hold only the information about this data.

Object classes have the following characteristics:

Chapter 2. The Environment Layer API 21

v They have attributesv You can create instancesv Their behaviour is defined by methods

Many classes use objects from other classes as attributes. You can modify therelationship between these objects in the following ways:

Embedded objectsAn embedded object can be mandatory or optional for the master object. Achange in the embedded object affects all master objects that use thisobject. This is a child-master update.

Pasted objectsThe master object holds only a copy of the child object. When the childobject is changed, the master objects using this child object are notupdated.

See “Storage management” on page 5 to check which of the embedded objects aremandatory or optional for other objects, and which of the class objects are pasted(copied) into other objects.

Figure 4 on page 23 shows the most important master-child relationships.


The class IDMMiningBase includes instances of subclasses of the classIDMMiningClass in so-called extents. You can identify the extent instances of theseclasses by their name attribute that must be unique within the object class.

The following generic methods are defined for almost all subclasses of the classIDMMiningClass:

constructorCreates an object initialized by the specified attribute values and insertsitself into the corresponding mining-base extent if the attribute valuessatisfy certain consistency conditions.

createObjectCalls the constructor. If the object is inconsistent, it is deleted.

Major-Embedded-Objects-Relationships between Classes:

Major Pasted-Objects-Relationships between Classes:

SettingsCl. (IDMSettings)

DataCl. (IDMData)

DataFieldCl. (IDMDataField)

IDMDiscretization

IDMTaxonomyRelation

IDMItemCategory

DataCl.

IDMDataTable

IDMDataTable

IDMStaticTable IDMComputedField

IDMTaxonomyCl. (IDMTaxonomy)

IDMValuesMappingIDMNameMapping

Key:

A B: B uses embedded objects derived from Class A

A B: B uses pasted objects derived from Class A

Figure 4. Relationships between classes


destructorReleases the allocated memory. If the object is contained in a mining-baseextent, it removes itself from the extent. If other objects refer to this objectas optionally embedded, the reference is removed. If objects refer to thisobject as mandatory embedded, they are deleted.

deleteObjectChecks if the object is mandatory or optionally embedded in other objects.If the object is mandatory or optionally embedded, the deleteObjectmethod fails. If the object is not mandatory or optionally embedded, thedestructor is called.

Note: You should always use the createObject and the deleteObjectmethods to ensure that the objects are not created on the stack.Future releases of the Intelligent Miner will not support theconstructors as public methods any longer.

get Retrieves the attribute values of an object.

updateUpdates an object with the specified attribute values.

General class description

MiningBase class

A mining base is the repository for the meta-mining data. It is the logical unit forcollecting the information of the following objects:v Data settingsv Mining settingsv Preprocessing settingsv Statistics settingsv Repeatable Sequences objectsv Mining results (only the classes IDMResult and IDMResultSet)

The objects are collected in so-called extents. References to objects of other miningbases are not allowed within a mining base.

Mining bases allow you to structure the meta-mining data. For example, a miningbase can be created to analyze the sales transaction data of supermarkets andanother one to analyze the respective data for department stores.

When a mining base has been loaded, it is locked for other users. Two users canhave access to two different mining bases simultaneously. Multiple users can haveaccess to the same mining base simultaneously, but only the first user can savechanges to the mining base.

When deleting a mining base, all objects associated with it are also deleted becausethey become inaccessible after the removal of the parent mining base.

In addition to the generic methods described in “The structure of the EnvironmentLayer API” on page 19, the following methods are defined for mining bases only:

load Loads the objects belonging to the mining base into main memory. Theloaded mining base is accessible to other users in read-only mode.


save Saves the objects belonging to the mining base on hard disk and unlocksthe mining base.

saveAsSaves all mining-base objects on your hard disk under a differentmining-base name.

export Exports all mining-base objects to files of your choice on the client.

importImports all previously exported mining-base objects from files on the clientinto main memory.

getExtendRetrieves all objects of the appropriate class of the mining base. Itssignature determines the class of objects that are retrieved.

getElementRetrieves an object of the appropriate class by name. Its signaturedetermines the class of object to which the name refers.

DataTable class and the Data class

The objects of these classes are created and modified in the data-definition phase.

DataTable class

Mining data can reside in a flat file, a pipe, or in a database. In all cases it must bein tabular form, that is, it can be described as a table consisting of rows (records)and columns (fields). A data table is a set of such structured data fields.

For example, a data table containing the orders of a mail order company can havethe following data fields:v Datev Customer numberv Order numberv Article number

Data types: The elements of one data field belong to one data type. TheEnvironment Layer API supports the following basic data types for fields:

Binary (categorical with only two different values)Data containing categorical fields with a specific NULL and NONNULLvalue.

Continuous numericData containing real values. The number of distinct values is unlimited.Statistics are maintained for the ranges of continuous data. IntelligentMiner maps database data types of DOUBLE, REAL, DECIMAL, FLOAT,NUMERIC, DEC, DOUBLE PRECISION, and PACKED to the continuousnumeric data type.

Discrete nonnumeric (categorical)Data containing string values. Intelligent Miner maps database data typesof CHAR, VARCHAR, DATE, TIME, TIMESTAMP, TIMESTMP, GRAPHIC,VARGRAPHIC, and LONG VARGRAPHIC to the categorical data type.

Discrete numericData containing integer or real values. The number of distinct values is


limited, for example, in age or salary data. Statistics are maintained foreach value in discrete numeric data. Discrete numeric values can also becyclic. For example, the days of a week represented by numbers from 1 to 7are cyclic.

NumericData containing integer or real values. Use this data type to let IntelligentMiner determine if a numeric field should be treated as a discrete numericor a continuous numeric field. If the number of distinct values in a field is:v Below a certain threshold, Intelligent Miner treats it as a discrete

numeric fieldv Above a certain threshold, Intelligent Miner treats it as a continuous

numeric field

The default threshold value is 100. Intelligent Miner maps database datatypes of INTEGER, BIGINT, and SMALLINT to the numeric data type.

Multi-value fields: Usually, one row contains one value per column. However,this might be not appropriate for all kinds of data. An example is sales transactiondata where a sales transaction identified by a transaction key can contain severalitems.

Note: This feature will not be provided for this release. However, it is reflected inthe signature of the API functions to allow an easier migration to laterreleases.

Computed fields: Data tables can be extended by computed fields. Computedfields enhance the flexibility of the Intelligent Miner because they allow a certainpart of the preprocessing functionality to be executed dynamically during a miningrun.

Because you can use a computed field as the input to another computed field, youcan compose functions. The following types of functions are supported:

Discretization of continuous variablesThis mapping is defined by splitting the value range of a continuousvariable into intervals and associating each interval with a discrete value.Discretization functions are represented by discretization objects.

Value mappingA value mapping maps discrete values to other values. For example, thedays of the week can be mapped to integers, Monday to 1, Tuesday to 2,Wednesday to 3, until each day of the week is mapped to an integer. Valuemapping is represented by a value mapping object.

Functions defined on the values of data fields and other attributesThese functions are implemented by C++ functions. A set of predefinedfunctions exists (like sum, which is the sum of the values of continuousnumeric fields). On AIX, a mechanism is provided that allows you todefine and add functions.

Data class

These are the mining data objects that describe the data to be mined. They consistof a data table. A particular use of a field by a mining task, for example, the usageof a field called Product as the item identifier for the Associations mining function,is not specified at data-definition time.


A data object has the following attributes:

Name The name of a data object.

Data tableThe data table where the mining data is located.

Use modeThe use mode of the data object, input only, write once, or input output.

RunSelection class and Settings class

When you have defined the mining-data objects in the data-definition phase andprocessed this data, the next step is to run the mining functions on this data. Thefirst thing you have to do is specifying the parameters for the functions.

One of the parameters is the data object representing the data on which the miningfunction is to run. You can also select a subset of this data for mining using dataselections.

RunSelection class

Data selections allow the selection of a part of the data specified by a data objectfor mining. For example, if you have sales transaction data from a certain period oftime and there are month and day data fields in the data tables of thecorresponding data object, data selection allows you to mine only on thetransactions coming from certain months or certain days.

The selection conditions are specified in disjunctive normal form, this means theyare a disjunction of conjunctions of atomic selections.

Atomic selections consist of the following attributes:

A sign Indicating if the selection condition is negated

A predicateThe name of the selection predicate

A list of names of data fieldsThe arguments of the selection predicate

A set of predefined predicates (like equal, less than, or greater than) is provided.

Settings class

The Settings class provides the parameters of the mining functions, thepreprocessing functions, the statistics functions, or the sequence settings objects. Itincludes the following settings:v Associationsv Classificationv Clusteringv Preprocessingv Sequential Patternsv Similar Sequencesv Statisticsv Predictionv Repeatable Sequences


Result class

The results of a mining run can be accessed through the result objects. Several resultobjects can be united in a result set object. It is possible to export a result to a file aswell as to import results from a file. After having exported a result to a file, it ispossible to export these mining results to other products, like spreadsheets orvisualization tools, by using the Results API. The following class-specific methodshave been defined for result objects:

export resultsExports the results to a file whose name is specified as an input parameter.

import resultsImports the results from a file whose name is specified as an inputparameter.

Data type definitions

This section gives an overview of the data types available in the EnvironmentLayer API.

Basic data types

The following basic data types for programming are used to be independent of theplatform-dependent implementations of the C-data types. Most of the names havebeen adopted from the type definition of the Call Level Interface of DB2.

Table 1. The basic data types of the Intelligent Miner

Data type Description Example C-Type

IDMCHAR character char

IDMINTEGER integer long int

IDMBOOLEAN integer int

IDMSMALLINT small integer short int

IDMLONGINT long integer long long int

IDMNUMBER unsigned integer unsigned long

IDMREAL double precision real number double

IDMDOUBLE double precision real number double

IDMCHAR

Note that the type IDMCHAR is not guaranteed to be implemented as char. Futurereleases of the Intelligent Miner might use wchar_t to define IDMCHAR. That is,the size of IDMCHAR might be greater than 1.

IDMBOOLEAN

The type IDMBOOLEAN is used for truth values, where a value x is interpreted asfalse if it is zero and as true if it is nonzero. For convenience, two constantsIDM_TRUE and IDM_FALSE are defined with their natural interpretations.

Note: Do not use equality with the IDMBOOLEAN data type. For example, do notwrite if ( x == IDM_TRUE ), use if ( x ) instead.


Enumerated types

The libraries contain several enumerated types. Types and constants are not listedhere, but can be found in the files idmaglob.h and idmppglb.h (preprocessinglibrary).

The return data type

IDMRETURN is the data type of the return value of several member functions. Itprovides a return code indicating success of the function or an error. It can havethe following values:IDM_SUCCESS 0IDM_WARNING 1IDM_WRONG_PARAMETERS -1IDM_ERROR -2IDM_SEVERE -3

In general, an operation is successful if its return code is IDM_SUCCESS orIDM_WARNING.

Exception handling

The C++ API functions are either member or static member functions. Most APIfunctions except for constructors and destructors return a value of typeIDMRETURN.

Constructors have a separate return parameter of type IDMRETURN, whichindicates the return value in case the constructor itself fails. Destructors whosepurpose is to free the memory occupied by an object are always successful. Use themember function deleteObject for safe and consistent delete operations.

This return value indicates:v If the operation succeeded: IDMRETURN = IDM_SUCCESS

v If the operation succeeded with some warnings: IDMRETURN = IDM_WARNING

v If the operation failed due to wrong parameters: IDMRETURN =IDM_WRONG_PARAMETERS

v If the operation failed due to an internal or runtime error: IDMRETURN =IDM_ERROR

v If the operation failed due to no memory or a severe runtime error: IDMRETURN =IDM_SEVERE

Information about an exception is stored in instances of the class IDMException.The base class IDMBase has a (private) static member variable holding theexception object corresponding to the last execution of the member functions of itsderived classes. This exception object can be retrieved by the static memberfunction IDMException* getException() that is defined in class IDMBase.

To retrieve the detailed exception information from an IDMException object, usethe following member functions:v get( IDMExceptionType &excType, IDMCOMPONENT &comp, IDMINTEGER &excId,

IDMRETURN &severity )

v get( IDMExceptionType &excType, IDMCOMPONENT &comp, IDMINTEGER &excId,IDMRETURN &severity, IString &excMessage )


v getInfo( IString &info )

v getInfo(IString &info1, IString &info2, IString &info3, IString &info4,IString &info5, IString &info6, IString &info7, IString &info8, IString&info9 )

The getInfo function can only be used if the exception type IDMExceptionTypereturned by the get function is IDM_API_EXCEPTION. It should be called aftercalling the get function to retrieve the variable part of the exception textexcMessage, which could be a data field name or a file name.

IDMCOMPONENT is defined by:typedef enum { IDM_DATA_ACCESS_API,

IDM_PPROCESSING,IDM_KERNEL,IDM_STATISTICS,IDM_ENVLAYER_API,IDM_CS_API,IDM_IBM_CLASS_LIB,IDM_DB2_CLI,IDM_RESULT_API,IDM_GUI } IDMCOMPONENT;

This indicates in which component of Intelligent Miner the exception occurred.

IDMExceptionType is defined by:typedef enum { IDM_CLI_EXCEPTION,

IDM_IBMCL_EXCEPTION,IDM_API_EXCEPTION } IDMExceptionType;

This indicates whether the exception occurred in the DB2 Call Level Interface, in afunction of the IBM collection and application classes, or in the API.

Base class

All following classes inherit from the IDMBase class. When running inclient/server mode the server name must be set using this class. The server namecan also be queried using this class.

IDMBase

Header file: idmcbase.hpp

Format:typedef enum {IDM_UNKNOWN_PLATFORM,

IDM_AIX, // AIX serialIDM_MVS, // OS/390 serialIDM_OS400, // OS/400IDM_AIX_PE, // AIX parallelIDM_MVS_PE, // OS/390 parallelIDM_SUN, // Sun Solaris serialIDM_NT, // WIN NT serialIDM_SUN_PE, // Sun Solaris parallelIDM_NT_PE } IDM_Platform; // WIN NT parallel

typedef struct IDMDB2ServerUserPair{

IString databaseServer;IString uid;IString passwd;


};

class IDMBase {

protected:static IString cvHostName;static IDMPLATFORM cvPlatform;static IString cvUserId;static IString cvPassword;static IString cvDB2UserId;static IString cvDB2Password;static IString cvWorkingDirectory;static IDMINTEGER cvMemorySize;static IKeySet< IDMDB2ServerUserPair, IString > cvDB2ServerUser;static IString cvVersion;static IDMException *pcvException;

public:static IDMINTEGER ivTraceLevel;static IDMRETURN setHostName( IString );static IString getHostName();static void setUserId(IString);static IString getUserId();static void setPassword(IString);static IString getPassword();static IDM_Platform getPlatform();static void setDB2UserId( IString );static IString getDB2UserId();static void setDB2Password( IString );static IString getDB2Password();

static void setDB2ServerUserPairs(IKeySet<IDMDB2ServerUserPair, IString> serverUserPairs );

static IKeySet< IDMDB2ServerUserPair, IString > getDB2ServerUserPairs();

static IDMBOOLEAN getDB2ServerUser( IString databaseServer,IDMDB2ServerUserPair &serverUser );

static IDMRETURN addDB2ServerUser( IDMDB2ServerUserPair serverUser );

static IDMRETURN setMemorySize(IDMINTEGER);static IDMINTEGER getMemorySize();

static IDMRETURN setWorkingDirectory(IString);static IString getWorkingDirectory();

static const IString getVersion();

static IDMException* getException;virtual const IDMException* getException() const;static void setTraceLevel ( IDMINTEGER )static IDMINTEGER getTraceLevel()

};

Data members:

cvHostNameThe server name. It is the name of the host where the IBM IntelligentMiner server is running. If cvHostName is not set, Intelligent Miner runsin stand-alone mode.

cvPlatformThe server platform. It is the platform on which the Intelligent Minerserver is running. On the platforms where parallel mining is supported, adisjunction between Intelligent Miner and Intelligent Miner parallel ismade.


cvUserIdThe user ID for the server. It must be set if the Intelligent Miner client andserver are running on different machines. It can be set if client and serverare running on the same machine.

cvPasswordThe password for the server. It must be set, together with cvUserid, if theIntelligent Miner client and server are running on different machines. Itcan be set, together with cvUserid, if client and server run on the samemachine.

cvDB2UserIdThe DB2 user ID. If working with DB2, this user ID is used to connect todatabase servers that are not specified in the cvDB2ServerUser key set.

cvDB2PasswordThe DB2 password. If working with DB2, this DB2 password must be set ifcvDB2UserId is also set.

cvDB2ServerUserIt is possible to specify different DB2 user IDs and passwords for differentdatabase servers. The database server and the user ID combinations arestored in this key set. For database servers not specified in this key set thevalues in cvDB2UserId and cvDB2Password are used.

cvWorkingDirectoryThe name of the working directory on the Intelligent Miner server wheretemporary information necessary to compute the mining run results shouldbe stored. Supplying a value for the working directory may result in afaster execution of the mining run.

cvMemorySizeThe size of the available main memory. This value is only exploited by theAssociations, Sequential Patterns, Time Sequences, and the TreeClassification function. Changing this value does not affect the result.

cvVersion;The version number of Intelligent Miner.

pcvExceptionA pointer to the exception object having been created for the last exceptionof a method that has been applied to IDMBase or one of its derivedclasses.

ivTraceLevelIf set to a value between 1 and 10, a trace file resx.trc is created during arun. It resides in the directory identified by the IDM_RES_DIRenvironment variable on the Intelligent Miner server.

Member functions:

setHostNameSets the server name (cvHostName). If you want to run the IntelligentMiner in standalone mode, this method must also be called withsetHostName(″″), because some initializations are performed.

getHostNameRetrieves the server name (cvHostName).

setUserIdSets the user ID (cvUserID).


getUserIdRetrieves the user ID (cvUserID).

setPasswordSets the password (cvPassword).

getPasswordRetrieves the password (cvPassword).

getPlatformRetrieves the platform (cvPlatform).

setDB2UserIdSets the DB2 user ID (cvDB2UserID).

getDB2UserIdRetrieves the DB2 user ID (cvDB2User ID).

setDB2PasswordSets the DB2 password (cvDB2 password).

setDB2ServerUserPairsSets the database server and user ID combinations (cvDB2ServerUser).

getDB2ServerUserPairsRetrieves the database server and user ID combinations (cvDB2ServerUser).

getDB2ServerUserRetrieves the database server and user ID combination for the specifieddatabase server.

addDB2ServerUserAdds a database server and user ID combination to the key set.

setMemorySizeSets the memory size (cvMemorySize).

getMemorySizeRetrieves the memory size (cvMemorySize).

setWorkingDirectorySets the working directory (cvWorkingDirectory).

getWorkingDirectoryRetrieves the working directory (cvWorkingDirectory).

getVersionRetrieves the Intelligent Miner version (cvVersion).

getExceptionStatic method for returning a pointer to the last exception object(pcvException).

getStaticExceptionVirtual method for returning a pointer to the last exception object.

Auxiliary classes

IDMTimeStamp

The class IDMTimeStamp sets a time stamp to objects of classes at creation orupdate time, for example, class IDMMiningBase.


Header file: idmctime.hpp

Format:class IDMTimeStamp {

IDate ivDate;ITime ivTime;

public:IDMTimeStamp();∼IDMTimeStamp();IDMTimeStamp( const IDMTimeStamp &ts );IDMTimeStamp& operator= (const IDMTimeStamp &ts );void refresh();const IDate& getDate() const;const ITime& getTime() const;IString asString() const;void elapsedTime(IDMINTEGER &nbDays,

IDMINTEGER &nbHours,IDMINTEGER &nbMinutes,IDMINTEGER &nbSeconds) const;

};

Data members:

ivDateThe date of the time stamp.

ivTimeThe time of the time stamp.

Member functions:

IDMTimeStampThe constructor. It initializes ivDate and ivTime with current date andtime.

∼IDMTimeStampThe destructor.

IDMTimeStamp( const IDMTimeStamp &ts )The copy constructor.

IDMTimeStamp& operator= (const IDMTimeStamp &ts )The assignment operator.

refreshUpdates ivDate and ivTime with current date and time.

getDateRetrieves the date. It returns an object of class IDate.

getTimeRetrieves the time. It returns an object of class ITime.

asStringReturns the timestamp as a string according to the value of the LC_TIMElocale.

elapsedTimeReturns the elapsed time of the time stamp value to now, that is, to thetime when it has been called.


IDMBaseMatrix

You can use the following classes to specify matrix tables (class IDMMatrixTable).Class IDMBaseMatrix is the abstract base class of class IDMMatrix and classIDMMatrix0. It collects the member variables and member functions that arecommon to class IDMMatrix and IDMMatrix0.

Header file: idmcmat.hpp

Format:template <class Type>class IDMBaseMatrix {protected:

IDMINTEGER ivNbRows;IDMINTEGER ivNbColumns;Type ivInitialValue;Type *pivMatrix;IDMBOOLEAN ivIsSymmetrical;

public:IDMBaseMatrix( const IDMBaseMatrix<Type> &matrix );IDMBaseMatrix<Type>& operator= (const IDMBaseMatrix<Type> &matrix );

IDMINTEGER setValue( IDMINTEGER rowIndex, IDMINTEGER columnIndex,Type value);

Type getValue(IDMINTEGER rowIndex, IDMINTEGER columnIndex) const;

IDMINTEGER getNbRows() const;IDMINTEGER getNbColumns() const;

IDMBOOLEAN isSymmetrical() const;}

Data members:

ivNbRowsNumber of rows in matrix.

ivNbColumnsNumber of columns in matrix.

ivInitialValueValue that is used to initialize the matrix.

pivMatrixPointer to the matrix.

ivIsSymmetricalIf this value is IDM_TRUE, the matrix is symmetrical, that is, every updateof a cell (rowIndex,columnIndex) implies an update of the cell(columnIndex,rowIndex) with the same value.

Member functions:

IDMBaseMatrix( const IDMBaseMatrix<Type> &matrix )The copy constructor.

IDMBaseMatrix<Type>& operator= (const IDMBaseMatrix<Type> &matrix )The assignment operator.

setValueSets a value to the given coordinates, that is, on (rowIndex, columnIndex).If the matrix is symmetrical, the cell (rowIndex, columnIndex) is updated


with the same value. The row and column indices range from 1 to thenumber of rows and columns, respectively. If one of these indices is out ofrange, an error ID different from 0 is returned.

getValueRetrieves a value from the given coordinates. If they are out of their range,the initial value is returned.

getNbRowsReturns the number of rows of the matrix.

getNbColumnsReturns the number of columns of the matrix.

isSymmetricalReturns IDM_TRUE if the matrix is symmetrical.

IDMMatrix

Class IDMMatrix specifies a matrix where the indices start with 1.


Format:template < class Type>class IDMMatrix:public IDMBaseMatrix<Type> {public:

IDMMatrix();IDMMatrix( IDMINTEGER &errorId,

IDMINTEGER nbRows, IDMINTEGER nbColumns,Type initialValue,IDMBOOLEAN isSymmetrical=IDM_FALSE);

IDMMatrix( const IDMMatrix<Type> &matrix );IDMMatrix<Type>& operator= ( const IDMMatrix<Type> &matrix );

∼IDMMatrix();};

Member functions:

IDMMatrix()The default constructor.

IDMMatrix( IDMINTEGER &errorId, IDMINTEGER nbRows, IDMINTEGERnbColumns, Type initialValue, ... )

Builds the matrix with the given number of rows and columns, andinitializes it with the given initialValue. An error ID different from 0 isreturned if:v The value for the number of rows or columns is not positivev The values for the number of rows and columns are different if the

matrix is symmetricalv A memory allocation error occurs

∼IDMMatrixThe destructor.

IDMMatrix( const IDMMatrix<Type> &matrix )The copy constructor.

IDMMatrix<Type>& operator= ( const IDMMatrix<Type> &matrix )The assignment operator.


IDMMatrix0

Class IDMMatrix0 specifies a matrix where the indexes start with 0.


Format:template < class Type>class IDMMatrix0: public IDMBaseMatrix<Type> {public:

IDMMatrix0();IDMMatrix0( IDMINTEGER &errorId,

IDMINTEGER nbRows, IDMINTEGER nbColumns,Type initialValue,IDMBOOLEAN isSymmetrical=IDM_FALSE);

IDMMatrix0( const IDMMatrix0<Type> &matrix );IDMMatrix0<Type>& operator= ( const IDMMatrix0<Type> &matrix );

∼IDMMatrix0();};

Member functions:

IDMMatrix0()The default constructor.

IDMMatrix0( IDMINTEGER &errorId, IDMINTEGER nbRows, IDMINTEGERnbColumns, Type initialValue, ... )

Builds the matrix with the given number of rows and columns, andinitializes it with the given initialValue. An error ID different from 0 isreturned if:v The value for the number of rows or columns is not positivev The values for the number of rows and columns are different if the

matrix is symmetricalv A memory allocation error occurs

∼IDMMatrix0The destructor.

IDMMatrix0( const IDMMatrix0<Type> &matrix )The copy constructor.

IDMMatrix0<Type>& operator= ( const IDMMatrix0<Type> &matrix )The assignment operator.

Data table

The IDMDataTable objects describe the data tables containing the data to be minedas well as auxiliary data defining a name mapping, a taxonomy relation, a valuemapping, or a discretization function.

The data table consists of a static part (represented by the class IDMStaticTable)that can reside in flat files (subclass IDMFlatFileTable), in a database(IDMDB2Table), in a pipe (IDMPipeTable), or can consist of a matrix(IDMMatrixTable), and a dynamic part consisting of a set of computed fieldswhose values are computed simultaneously.


IDMDataTable

The class IDMDataTable describes a data table containing data accessed by thisapplication.

Header file: idmcdatb.hpp

Format:typedef enum { IDM_FILE_TABLE_TYPE,

IDM_PIPE_TABLE_TYPE,IDM_DB2_TABLE_TYPE,IDM_MATRIX_TABLE_TYPE

} IDM_TableType;

class IDMDataTable: public IDMBase {IDMStaticTable* pivStaticTable;ISequence<IDMComputedField*> ivComputedFields;

public:IDMDataTable();∼IDMDataTable();IDMDataTable(IDMRETURN& rc,IDMStaticTable* pStaticTable,

ISequence<IDMComputedField*>& compFields);IDMDataTable( const IDMDataTable &dataTable );IDMDataTable& operator= (const IDMDataTable & );

IDMRETURN update(IDMStaticTable* pStaticTable);IDMRETURN update(IDMStaticTable* pStaticTable,

ISequence<IDMComputedField*>& compFields);IDMRETURN get(IDMStaticTable *&pStaticTable,

ISequence<IDMComputedField*>& compFields);

IDMRETURN addComputedField( IDMComputedField* pCompField);IDMRETURN removeComputedField( IString fieldName);

IDMRETURN getFieldNames(ISortedSet<IString>& fieldNames);const IDMDataField* getField(IString fieldName) const;IDMBOOLEAN checkDataTable();IDMRETURN getTableType( IDM_TableType &type );virtual const IDMException* getStaticException() const;

};

Data members:

pivStaticTableA pointer to the static table object representing the static part of the datatable.

ivComputedFieldsThe sequence of computed fields representing the dynamic part of thetable.

Member functions:

IDMDataTable()The default constructor. It constructs a data table object and sets thepointer to the static table to NULL. This object is invalid for the miningprocess. A data table is valid if it contains a pointer to a valid static tableand no computed fields or a sequence of pointers to valid computed fieldobjects. Update the data table using the update method to set a valid datatable object.


IDMDataTable(IDMRETURN&, IDMStaticTable*,ISequence<IDMComputedField*>&)

The constructor initializes the member variables with a pointer to a statictable and a sequence of computed fields.

IDMDataTable(const IDMDataTable &dataTable)The copy constructor.

IDMDataTable& operator= (const IDMDataTable &)The assignment operator.

∼IDMDataTable()The destructor.

updateUpdates the member variables with new values for static table or statictable and computed fields.

get Retrieves the data members out of the data table object.

addComputedFieldAdds a new computed field at the end of the sequence of computed fields.

removeComputedFieldRemoves the computed field with the given name from the sequence ofcomputed fields.

getFieldNamesRetrieves the set of the names of the fields (static as well as computedones) of a data table.

getFieldReturns the field with the specified name. A NULL pointer is returned ifsuch a field does not exist.

checkDataTableReturns IDM_TRUE if the data table object is valid. Otherwise, the methodreturns IDM_FALSE. A data table is valid if it contains a pointer to a validstatic table and no computed fields or a sequence of pointers to validcomputed field objects. A data table that is constructed by the defaultconstructor is invalid, because the pointer to the static table is set to NULL.It becomes valid if the object is updated using the update method.

getTableTypeRetrieves the type of the static table.v Type is set to IDM_FILE_TABLE_TYPE if the static table is an instance of

the class IDMFlatFileTable.v Type is set to IDM_PIPE_TABLE_TYPE if the static table is an instance of

the class IDMPipeTable.v Type is set to IDM_DB2_TABLE_TYPE if the static table is an instance of

the class IDMDB2Table.v Type is set to IDM_MATRIX_TABLE_TYPE if the static table is an

instance of the class IDMMatrixTable.

getStaticExceptionReturns a pointer to the last exception object.


IDMStaticTable

IDMStaticTable is a virtual base class for data tables containing a static part. It hasthe following subclasses:

typedef enum { IDM_FILE_TABLE_TYPE,IDM_PIPE_TABLE_TYPE,IDM_DB2_TABLE_TYPE,IDM_MATRIX_TABLE_TYPE

} IDM_TableType;

Header file: idmcfptb.hpp

Format:class IDMStaticTable : public IDMBase {public:

virtual ∼IDMStaticTable();virtual IDMRETURN getFieldNames(ISortedSet<IString>&

fieldNames) =0;virtual const IDMDataField* getField(IString fieldName) const =0;IDM_TableType getTableType();virtual const IDMException* getStaticException() const;

};

Member functions:

getFieldNamesRetrieves the set of field names of a static table. (Because this is a virtualfunction, it is implemented by the subclasses.)

getFieldRetrieves the field with the given names. A NULL pointer is returned ifsuch a field does not exist. Because this is a virtual function, it isimplemented by the subclasses.

getTableTypeRetrieves the type of the static table:v IDM_FILE_TABLE_TYPE is returned if the static table is an instance of

class IDMFlatFileTable.v IDM_PIPE_TABLE_TYPE is returned if the static table is an instance of

class IDMPipeTable.v IDM_DB2_TABLE_TYPE is returned if the static table is an instance of

class IDMDB2Table.v IDM_MATRIX_TABLE_TYPE is returned if the static table is an instance

of class IDMMatrixTable.


Figure 5. IDMStaticTable class structure


IDMFlatFileTable

The class IDMFlatFileTable describes tables that reside in flat files. Data in flat fileshas to be in fixed record format, that is, the record length is fixed and the datafields have to be at fixed positions in the records.

Header file: idmcfptb.hpp

Format:class IDMFlatFileTable : public IDMStaticTable {

IDMINTEGER ivRecordLength;ISequence<IString> ivFileNames;IKeySortedSet<IDMFlatFileField*, IString> ivDataFields;

public:IDMFlatFileTable();∼IDMFlatFileTable();

IDMFlatFileTable(IDMRETURN &rc,ISequence<IString>& fileNames,IDMINTEGER recordLength,IKeySortedSet<IDMFlatFileField*, IString> &dataFields);

IDMFlatFileTable(IDMRETURN &rc,ISequence<IString>& fileNames);

IDMFlatFileTable( const IDMFlatFileTable &fileTable );IDMFlatFileTable& operator= (const IDMFlatFileTable &fileTable );IDMRETURN get(ISequence<IString>& fileNames,

IDMINTEGER& recordLength,IKeySortedSet<IDMFlatFileField*, IString>& dataFields);

IDMRETURN update(ISequence<IString>& fileNames,IDMINTEGER recordLength,IKeySortedSet<IDMFlatFileField*, IString>& dataFields);

virtual IDMRETURN getFieldNames(ISortedSet<IString> &fieldNames );virtual const IDMDataField* getField(IString fieldName) const;

static IDMRETURN listDir( const ISequence<IString>& dirPath,ISortedSet<IString> &dirs,ISortedSet<IString> &files );

static IDMRETURN listDir( const IString dirPath,ISortedSet<IString> &dirs,ISortedSet<IString> &files );

static IDMRETURN getFileInfo( IDMINTEGER &recordLength,IString &fileSample,const IString fileName,IDMINTEGER maxLookAhead=

IDM_LOOKAHEAD_DEFAULT,IDMINTEGER nbSampleLines=

IDM_NB_SAMPLE_LINES_DEFAULT );

};

Data members:

ivRecordLengthThe length of each record in the file table. The record length is counted innumber of bytes including new line characters.

ivFileNamesThe names of the files this file table consists of.

ivDataFieldsThe list of flat file data fields of this file table.


Member functions:

IDMFlatFileTable()The default constructor.

IDMFlatFileTable(IDMRETURN, ISequence<IString>&, IDMINTEGER,IKeySortedSet<IDMFlatFileField*,IString>&)

Constructs an IDMFlatFileTable object and initializes the member variableswith the given input arguments. If 0 is specified as record length, therecord length is computed if possible.

IDMFlatFileTable(IDMRETURN, ISequence<IString>&)Constructs an IDMFlatFileTable object with a list of file names andinitializes the set of data fields with an empty set. This constructor shouldbe used for creating output tables. Therefore, it returns IDM_WARNING ifone of the files already exists.

∼IDMFlatFileTable()The destructor.

IDMFlatFileTable( const IDMFlatFileTable &fileTable )The copy constructor.

IDMFlatFileTable& operator= (const IDMFlatFileTable &fileTable )The assignment operator.

get Retrieves the values of the member variables.

updateUpdates the member variables with the given input arguments.

getFieldRetrieves the field with the given names. A NULL pointer is returned ifsuch a field does not exist.

getFieldNamesRetrieves the set of flat field names of a flat file table.

listDir(const ISequence<IString>& dirPath, ... )Computes the files and (sub)directories of a directory. Each sequenceelement of dirPath contains one part of the whole directory path. Forexample, the path /u/home/user has the sequence elements u, home, anduser.

listDir(const IString dirPath, ... )Computes the files and (sub)directories of a directory. dirPath specifies thewhole path of the directory.

getFileInfoComputes the record length and the sample of the specified file.maxLookahead specifies the number of bytes of the file to be examined fordetermining the record length. The default is 4096. The first nbSampleLinesrecords are returned in case the record length can be determined,otherwise, the first maxLookahead characters are returned as the filesample. The default for nbSampleLines is 100.

The record length returned can be used as input parameter for theconstructor. In the returned file sample the single records are delimited bythe newline character. One record in the file sample (newline characterincluded) has not necessarily the same length as the returned recordlength. There can be a difference of one byte. The reason is the differentnewline character handling in the different Intelligent Miner serverplatforms.


IDMPipeTable

The class IDMPipeTable describes tables whose records are computed by a pipe.Pipe tables are supported under AIX only. They must be in fixed record format, forexample, the record length is fixed and the data fields have to be at fixed positionsin the records.

Format:class IDMPipeTable : public IDMStaticTable {

IDMINTEGER ivRecordLength;IString ivCommand;IKeySortedSet<IDMFlatFileField*, IString> ivDataFields;

public:IDMPipeTable();∼IDMPipeTable();

IDMPipeTable(IDMRETURN &rc,IString command,IDMINTEGER recordLength,IKeySortedSet<IDMFlatFileField*,IString> &dataFields );

IDMPipeTable( const IDMPipeTable &pipeTable );IDMPipeTable& operator= (const IDMPipeTable & );IDMRETURN get(IString &command,

IDMINTEGER& recordLength,IKeySortedSet<IDMFlatFileField*,IString>& dataFields);

IDMRETURN update(IString command,IDMINTEGER recordLength,IKeySortedSet<IDMFlatFileField*,IString>& dataFields);

virtual IDMRETURN getFieldNames(ISortedSet<IString>&);virtual const IDMDataField* getField(IString fieldName) const;

};

Data members:

ivRecordLengthThe record length of the file table including any characters indicating anew line.

ivCommandThe command to be executed.

ivDataFieldsThe list of flat file data fields of this file table.

Member functions:

IDMPipeTable()The default constructor.

IDMPipeTable(IDMRETURN&, ISequence<IString>&, IDMINTEGER,IKeySortedSet<IDMFlatFileField*,IString>&);

Constructs an IDMPipeTable object and initializes the member variableswith the given input arguments.

∼IDMPipeTable()The destructor.

IDMPipeTable( const IDMPipeTable &pipeTable )The copy constructor.

IDMPipeTable& operator= (const IDMPipeTable & );The assignment operator.





getFieldRetrieves the field with the given name. A NULL pointer is returned ifsuch a field does not exist.

IDMDB2Table

The class IDMDB2Table describes a DB2 table. Running an application asIntelligent Miner client connecting to an Intelligent Miner server using this class itdepends on the environment whether some static methods can be used. There is aset of static methods that do not require a DB2 Client Application Enabler V2.1(CAE) or higher on the Intelligent Miner client side. The DB2 catalog tables on theIntelligent Miner server side are accessed via the Intelligent Miner client/servercomponent.

Another set of static methods requires and uses CAE on the Intelligent Miner clientside to access the DB2 catalog tables or access the DB2 catalog tables on theIntelligent Miner server side via the Intelligent Miner client/server componentdependent on an environment variable IDM_CLI_USED. If IDM_CLI_USED is set,the methods require CAE on the Intelligent Miner client side. If this variable is notset, the methods do not need any DB2 on the Intelligent Miner client side. Usingthis set of methods, you can switch between the two ways of accessing DB2.

Header file: idmcdb2.hpp

Format:typedef struct IDMDB2TableInfo {

IString name;IString schema;

};

typedef struct IDMDBHandles {IString dbname;int henv;int hdbc;

};

class IDMDB2Table : public IDMStaticTable {IString ivDBServerName;IString ivSchema;IString ivTableName;IKeySortedSet<IDMDataField*, IString> ivDataFields;IString ivTablespace;IString ivDatabase;IDMBOOLEAN ivAppendOutputFlag;

public:IDMDB2Table();IDMDB2Table( IDMRETURN &rc, IString DBServerName,

IString schema, IString tableName,IString uid="", IString password="" );

IDMDB2Table( IDMRETURN &rc, IString DBServerName,IString schema,IString tableName,IKeySortedSet<IDMDataField*,IString> &DBColumns );


IDMDB2Table( const IDMDB2Table &db2Table );IDMDB2Table& operator= (const IDMDB2Table &db2Table );

∼IDMDB2Table();

IDMRETURN get( IString &DBServerName,IString &schema,IString &tableName,IKeySortedSet<IDMDataField*,IString> &dbColumns);

IDMRETURN getFieldNames(ISortedSet<IString> &fieldNames);virtual const IDMDataField* getField(IString fieldName) const;IDMRETURN refresh( SQLHENV henv, SQLHDBC hdbc );IDMRETURN refresh( IString uid="", IString password="" );IDMRETURN isPartitionedDB2Table( IDMBOOLEAN &isPT,

IDMINTEGER &nbPartitions );IDMRETURN setTablespace( const IString tablespace );IDMRETURN getTablespace( IString &tablespace );

IDMRETURN setMVSDatabase( const IString database );IDMRETURN getMVSDatabase( IString &database );

IDMRETURN supportsTablespaces( IDMBOOLEAN &supports );static IDMRETURN supportsTablespaces( IString databaseServer,

IDMBOOLEAN &supports );

IDMRETURN supportsMVSDatabase( IDMBOOLEAN &supports );static IDMRETURN supportsMVSDatabase( IString databaseServer,

IDMBOOLEAN &supports );

void setAppendOutputFlag( IDMBOOLEAN append);IDMBOOLEAN& getAppendOutputFlat();

// Using the following static methods you can switch between DB2// access on the Intelligent Miner client side using DB2 CAE// or DB2 access on the Intelligent Miner server side// via the Intelligent Miner client/server component.// The switch is done via the environment variable IDM_CLI_USED.

static IDMRETURN initializeDB2( SQLHENV *henv, SQLHDBC *hdbc );static IDMRETURN terminateDB2( SQLHENV henv, SQLHDBC hdbc );static IDMRETURN connectDB2DatabaseServer( SQLHENV henv,

SQLHDBC hdbc,IString databaseServer,IString uid="",IString password="" );

static IDMRETURN disconnectDB2DatabaseServer( SQLHDBC hdbc );static IDMRETURN getDB2DatabaseServers( SQLHENV henv,

ISortedSet< IString > &databaseServers );static IDMRETURN getDB2Tables( SQLHENV henv, SQLHDBC hdbc,

ISequence< IString > &schemas,ISequence< IString > &tables );

static IDMRETURN getDB2Tables( SQLHENV henv, SQLHDBC hdbc,ISequence< IDMDB2TableInfo > &aTableList );

static IDMRETURN getDB2Tables( SQLHENV henv, SQLHDBC hdbc,IString schema,ISequence< IString > &tables );

static IDMRETURN getDB2Schemas( SQLHENV henv, SQLHDBC hdbc,ISequence< IString > &schemas );

static IDMRETURN getDB2Columns( SQLHENV henv, SQLHDBC hdbc,IString schema,IString table,ISortedSet< IString > &columns );

static IDMRETURN getDB2Columns( SQLHENV henv, SQLHDBC hdbc,IString schema,IString table,IKeySortedSet< IDMDataField*,


IString > &columns );static IDMRETURN queryDatabaseServer( SQLHENV henv,

SQLHDBC hdbc,IString DBServerName,IDMBOOLEAN &uid );

static IDMRETURN queryDatabaseServer( IString DBServerName,IDMBOOLEAN &uid );

// Using the following static methods DB2 access is only possible via// the Intelligent Miner client/server component. No DB2 CAE V2// is needed on the Intelligent Miner client side.

IDMRETURN refreshColumns(IString uid="", IString passwd="");IDMRETURN refreshColumns(IDMDBHandles *pHandles);

static IDMRETURN getDB2DatabaseServers( ISortedSet<IString> &databases );

static IDMRETURN connectDB2DatabaseServer( IString databaseServer,IDMDBHandles *&pHandles,IString uid="",IString passwd="");

static IDMRETURN disconnectDB2DatabaseServer( IDMDBHandles *pHandles );

static IDMRETURN getDB2Schemas( IDMDBHandles *pHandles,ISequence< IString > &schemas);

static IDMRETURN getDB2Tables( IDMDBHandles *pHandles,ISequence< IString > &schemas,ISequence< IString > &tables );

static IDMRETURN getDB2Tables( IDMDBHandles *pHandles,ISequence< IDMDB2TableInfo > &aTableList);

static IDMRETURN getDB2Tables( IDMDBHandles *pHandles,IString schema,ISequence< IString > &tables );

static IDMRETURN getDB2Columns( IDMDBHandles *pHandles,IString schema,IString table,ISortedSet< IString > &columns );

static IDMRETURN getDB2Columns( IDMDBHandles *pHandles,IString schema,IString table,IKeySortedSet< IDMDataField*,IString > &columns );

};

Data members:

ivDBServerNameThe name of the database server.

ivSchemaThe schema of the table.

ivTableNameThe name of the database table.

ivDataFieldsThe set of data fields whose names are the names of the columns of theDB2 table.

ivTablespaceThe name of the tablespace to be used to create the output table. See Usingthe Intelligent Miner for Data for information on how to use the tablespaceattribute.

ivDatabaseThe name of the MVS database to be used to create the output table. The


database is only used if the platform is IDM_MVS or IDM_MVS_PE. If thisvalue is not set for platform IDM_MVS or IDM_MVS_PE, the table iscreated in the default database.

ivAppendOutputFlagIf set to IDM_TRUE, the output of a mining run is appended to the table ifit exists. If the table does not exist, it is created.

Member functions:

IDMDB2TableThe default constructor.

IDMDB2Table(IDMRETURN &rc, IString DBServerName, IString schema,IString tableName, IString uid=″″, IString password=″″ )

Constructs an IDMDB2Table object with a given database server name,schema name, and table name. The set of data fields is determined in theconstructor, which means that a database connect happens in theconstructor. The user ID and password are optional. They must bespecified if the owner of the process running the constructor is not allowedto connect to the database. If the user ID and the password are specified,they must be the same as in the cvDB2UserId and cvDB2Password datamembers in class IDMBase or as specified in data membercvDB2ServerUser in class IDMBase for the appropriate database server. Anerror occurred during construction if the return code is not equal toIDM_SUCCESS.

IDMDB2Table( IDMRETURN &rc, IString DBServerName, IString schema,IString tableName, IKeySortedSet<IDMDataField*,IString> DBColumns );

Constructs an IDMDB2Table object with a given database server name,schema name, table name, and table columns. The set of columns can beempty. No database connect happens. An error occurred duringconstruction if rc is not equal to IDM_SUCCESS.

IDMDB2Table( const IDMDB2Table &db2Table )The copy constructor.

IDMDB2Table& operator= (const IDMDB2Table &db2Table )The assignment operator.

∼IDMDB2TableThe destructor.

get Retrieves the database server name, the table name, the schema name, andthe column names of the database table.

getFieldNamesRetrieves the column names.

getFieldRetrieves the field with the specified name. A NULL pointer is returned ifsuch a field does not exist.

refresh( SQLHENV henv, SQLHDBC hdbc );Refreshes the DB2 object, this means that the set of column names areupdated.

refresh( IString uid=″″, IString password=″″ )Refreshes the DB2 object, this means that the set of column names areupdated. This method connects to the database first. The user ID andpassword are optional. If the user ID and the password are specified, theyare used for the connection. If they are not specified and there is an entry


in the IDMBase::cvDB2ServerUser key set for the database server, the userID and the password of this entry are used to connect. If there is no entry,IDMBase::cvDB2UserID and IDMBase::cvDB2Password are used to connect.

isPartitionedDB2TableIf the table is partitioned, IDM_TRUE is returned in isPT and the numberof partitions is returned in nbPartitions. Otherwise, isPT is set toIDM_FALSE.

setTablespaceSets the tablespace where the output table should be created in. Thetablespace name is ignored if platform is IDM_OS400 or if platform isIDM_AIX or IDM_AIX_PE and the database server belongs to a paralleldatabase instance.

getTablespaceRetrieves the tablespace. An empty string is returned if the tablespace wasnot set.

setMVSDatabaseSets the database where the output table should be created in. Thedatabase name is ignored if the platform is not IDM_MVS orIDM_MVS_PE.

getMVSDatabaseRetrieves the database. An empty string is returned if the database was notset.

supportsTablespacesIf the database management system the database server belongs tosupports tablespaces, supports is set to IDM_TRUE.

supportsMVSDatabaseIf the database management system the database server belongs to isDB2/MVS, supports is set to IDM_TRUE.

setAppendOutputFlagSets ivAppendOutputFlag to IDM_TRUE or IDM_FALSE.

getAppendOutputFlagReturns the value of ivAppendOutputFlag.

initializeDB2Initializes the DB2 environment. This method must be called before anyother method is called that deals with DB2 (for example, terminateDB2,connectDB2DatabaseServer, disconnectDB2DatabaseServer,getDB2Databases, getDB2Tables, getDB2Columns). InitializeDB2 returns anenvironment handle henv and a connection hdbc. These handles areneeded as input arguments in subsequent DB2 method calls.

terminateDB2Invalidates and frees the environment handle and connection handle. Thismethod must be called to terminate dealing with DB2.

connectDB2DatabaseServerEstablishes a connection to the database server with the given name. Theuser ID and password are optional. If the user ID and the password arespecified, they are used for the connection. If they are not specified andthere is an entry in the IDMBase::cvDB2ServerUser key set for the databaseserver, the user ID and the password of this entry are used to connect. Ifthere is no entry, IDMBase::cvDB2UserID and IDMBase::cvDB2Passwordare used to connect.


disconnectDB2DatabaseServerCloses the connection associated with the database connection handlehdbc.

getDB2DatabaseServersReturns a sorted set of database servers available in the environment. Adatabase server must be cataloged to be available.

getDB2Tables( ...., ISequence< IString > &schemas, ISequence< IString > &tables) Returns a sequences of data table names available in the connected

database server and a sequence of schemas. The first elements of eachsequence are combined to <schema name>.<table name>, the second elementof each sequence, and so on.

getDB2Tables( ...., ISequence< IDMDB2TableInfo > &aTableList )Returns a sequence of IDMDB2TableInfo structures. Each structure containsa DB2 table name and its schema name.

getDB2Tables( ...., IString schema, ISequence< IString > &tables );For a given schema, this returns all DB2 table names in a sequence.

getDB2SchemasReturns all schema names in a sequence.

getDB2ColumnsReturns a sorted set of columns available in the given data table, or aIKeySortedSet collection of pointers to data field objects where each datafield object represents one column in the data table.

queryDatabaseServer( SQLHENV henv, SQLHDBC hdbc, IString DBName,IDMBOOLEAN &uid )

Checks whether the database server with a given name needs a user IDand password to connect to it. If UID is IDM_FALSE, the database needsno user ID and password.

queryDatabaseServer( IString DBName, IDMBOOLEAN &uid )Checks whether the database server with a given name needs a user IDand a password to access it. In this method DB2 is initialized first. If UIDis IDM_FALSE, the database needs no user ID and password.

refreshColumns(IString uid=″″, IString passwd=″″)Refreshes the set of columns of the IDMDB2Table object. This methodconnects to the database first. The user ID and password are optional. Ifthe user ID and the password are specified, they are used for theconnection. If they are not specified and there is an entry in theIDMBase::cvDB2ServerUser key set for the database server, the user ID andthe password of this entry are used to connect. If there is no entry,IDMBase::cvDB2UserID and IDMBase::cvDB2Password are used to connect.

refreshColumns(IDMDBHandles *pHandles)Refreshes the set of columns of the IDMDB2Table object.

IDMMatrixTable

The class IDMMatrixTable describes the matrix tables.

Header file: idmcmtb.hpp

Format:


class IDMMatrixTable : public IDMStaticTable {

IDMMatrix<IString> ivMatrix;IKeySortedSet<IDMMatrixField*, IString> ivDataFields;

public:IDMMatrixTable();∼IDMMatrixTable();

IDMMatrixTable(IDMRETURN &rc,const IDMMatrix<IString> &matrix,const IKeySortedSet<IDMMatrixField*,IString> &dataFields);

IDMMatrixTable(IDMRETURN &rc,const IDMMatrix<IString> *pMatrix,const IKeySortedSet<IDMMatrixField*,IString>&dataFields);

IDMMatrixTable( const IDMMatrixTable &MatrixTable );IDMMatrixTable& operator= (const IDMMatrixTable & );IDMRETURN get(IDMMatrix<IString> &matrix,

IKeySortedSet<IDMMatrixField*,IString>& dataFields);IDMRETURN get(IDMMatrix<IString> *&pMatrix,

IKeySortedSet<IDMMatrixField*,IString>& dataFields);IDMRETURN update(const IDMMatrix<IString> &matrix,

const IKeySortedSet<IDMMatrixField*,IString>& dataFields);IDMRETURN update(const IDMMatrix<IString> *pMatrix,

const IKeySortedSet<IDMMatrixField*,IString>& dataFields);virtual IDMRETURN getFieldNames(ISortedSet<IString>&);virtual const IDMDataField* getField(IString fieldName) const;

};

Data members:

ivMatrixThe matrix itself.

ivDataFieldsThe list of data fields for this matrix table.

Member functions:

IDMMatrixTable()The default constructor.

IDMMatrixTable(IDMRETURN&, ... )Constructs an IDMMatrixTable object and initializes the member variableswith the given input arguments.

∼IDMMatrixTable()The destructor.

IDMMatrixTable( const IDMMatrixTable &matrixTable )The copy constructor.

IDMMatrixTable& operator= ( const IDMMatrixTable & );The assignment operator.




getFieldRetrieves the field with the given name. A NULL pointer is returned ifsuch a field does not exist.


Data fields

Data fields are described by the class IDMDataField and its subclasses. There areno specific field types for databases. Instances of IDMDataField are sufficient forDB2 tables.

IDMDataField

The class IDMDataField describes the common properties of data fields. A datafield has a name, a type, and a cardinality.

The field type can be IDM_NUMERIC (numeric fields), IDM_CONT_NUMERIC(continuous fields), IDM_DISCR_NUMERIC (numeric discrete fields),IDM_CATEGORICAL (nonnumeric discrete fields), or IDM_BINARY (nonnumericdiscrete fields with only two values). The field type IDM_NUMERIC allows you tolet the system decide if a numeric field should be treated as discrete or continuous.If the number of distinct values in a field is below a certain threshold (100), theIntelligent Miner treats it as a discrete numeric field. If the value is above thethreshold, Intelligent Miner treats it as a continuous field. Every field type isassociated with a (field) data type. The value of a numeric field (field typesIDM_NUMERIC, IDM_CONT_NUMERIC, IDM_DISCR_NUMERIC) is representedby a real value (field data type IDM_REAL_TYPE), for categorical and binary fields(field types IDM_CATEGORICAL and IDM_BINARY) these are character strings(field type IDM_STRING_TYPE).

The cardinality can be IDM_SINGLE_VALUE (single-value field) orIDM_MULTIPLE_VALUE (multi-value field). Furthermore, discrete fields can beassociated with a name mapping that maps the field values to meaningfuldescriptions. Numeric fields can be cyclic (like the days of the week represented bythe numbers from 1 to 7). A cycle is defined by its beginning and length. For thestatistics of continuous fields, you can define the number of histogram buckets orthe bucket width.

Header file: idmcffld.hpp

Format:typedef enum {IDM_SINGLE_VALUE, IDM_MULTIPLE_VALUE} IDM_Cardinality;

typedef enum {IDM_CATEGORICAL = 0,IDM_CONT_NUMERIC,IDM_DISCR_NUMERIC,IDM_UNDEFINED,IDM_NUMERIC,

IDMDataField

IDMComputedField IDMMatrixField

IDMDiscretizationField IDMFunctionFieldIDMValueMappingField

IDMFlatFileField

Figure 6. Data fields


IDM_BINARY } IDM_FieldType;

typedef enum {IDM_STRING_TYPE=0,IDM_REAL_TYPE=1,IDM_UNDEFINED_TYPE } IDM_FieldDataType;

typedef enum {IDM_ABSOLUTE,IDM_STAND_DEV,IDM_RANGE,IDM_NICE_ABSOLUTE,IDM_NICE_STAND_DEV,IDM_NICE_RANGE

} IDM_WidthUnit;

class IDMDataField {

protected:IString ivName;IDM_FieldType ivFieldType;IDMREAL ivCycleLength;IDMREAL ivCycleBegin;IDMREAL ivNbBuckets;IDMREAL ivBucketWidth;IDMNameMapping *pivNameMapping;IDM_Cardinality ivCardinality;

IDM_WidthUnit ivBucketWidthUnit;ISequence<IDMREAL> ivBucketsLimits;ISequence<IString> ivValueRange;IDMINTEGER ivPrecision;IDMREAL ivLowestLimit;IDMREAL ivHighestLimit;IString ivDBDataType;

public:IDMDataField();IDMDataField( IDMRETURN &rc, IString& name,

IDM_FieldType fieldType);IDMDataField( const IDMDataField &dataField );IDMDataField& operator= (const IDMDataField &dataField );∼IDMDataField();

IString getName();IDMRETURN get(IString &fieldName, IDM_FieldType& fieldType,

IDM_Cardinality& card);

void setCardinality(IDM_Cardinality card);

IDMRETURN setFieldType(IDM_FieldType fieldType);IDM_FieldType getFieldType() const;IDM_FieldDataType getDataType() const;

IString getDB2DataType() const;

IDMRETURN setCycleVars(IDMREAL cycleLength=0,IDMREAL cycleBegin=1);

IDMRETURN getCycleVars(IDMREAL &cycleLength,IDMREAL &cycleBegin);

IDMRETURN setBucketVars(IDMREAL nbBuckets,IDMREAL bucketWidth);IDMRETURN getBucketVars(IDMREAL& nbBuckets, IDMREAL& bucketWidth);

IDMRETURN setNameMapping( IDMNameMapping* pNmp);IDMRETURN getNameMapping( IDMNameMapping *&pNmp) const;


IDMRETURN setBucketWidth(IDMREAL width,IDM_WidthUnit bucketWidthUnit=IDM_ABSOLUTE);

IDMRETURN getBucketWidth(IDMREAL &width,IDM_WidthUnit &bucketWidthUnit);

IDMRETURN setNbOfBuckets( IDMINTEGER nbBuckets,IDMREAL lowestLimit=-2,IDMREAL highestLimit=+2,IDM_WidthUnitbucketWidthUnit=IDM_NICE_STAND_DEV);

IDMRETURN getNbOfBuckets( IDMINTEGER &nbBuckets,IDMREAL &lowestLimit,IDMREAL &highestLimit,IDM_WidthUnit &bucketWidthUnit);

IDMRETURN getBucketLimits(ISequence<IDMREAL> &bucketsLimits,IDM_WidthUnit &bucketUnit);

IDMRETURN setBucketLimits(ISequence<IDMREAL> &bucketsLimits,IDM_WidthUnit bucketUnit=IDM_ABSOLUTE);

IDMRETURN setPrecision(IDMINTEGER);IDMREAL getPrecision();IDMRETURN setValueRange(ISequence<IString> &valueRange);IDMRETURN getValueRange(ISequence<IString> &valueRange);IDMBOOLEAN checkField();virtual const IDMException* getStaticException() const;

};

Data members:

ivNameThe name of the data field.

ivFieldTypeThe field type.

ivCycleLengthCycle length for numeric cyclic fields. Initialized to 0.

ivCycleBeginCycle begin for numeric cyclic fields. Initialized to 1.

ivNbBucketsNumber of histogram buckets for continuous (real) fields.

ivBucketWidthWidth of histogram buckets for continuous (real) fields.

ivCardinalityThe field type is either IDM_SINGLE_VALUE or IDM_MULTIPLE_VALUE(currently only single value supported).

pivNameMappingPointer to a name mapping object describing a mapping from discretevalues to descriptions.

ivBucketWidthUnitSpecifies the unit of the bucket width. If set to IDM_ABSOLUTE, thebucket width is specified by an absolute value. If set to IDM_STAND_DEV,the bucket width is specified according to the standard deviation. If set toIDM_RANGE, the bucket width is specified according to the rangebetween the minimum and maximum value.

ivBucketsLimitsSpecifies the bucket limits.


ivValueRangeSpecifies a value range of the field, for example, red yellow blue for adiscrete field.

ivPrecisionSpecifies the precision if the field is a numeric field (IDM_NUMERIC,IDM_DISCR_NUMERIC, or IDM_CONT_NUMERIC).

ivDBDataTYpeSQL data type of the field in case the field belongs to a DB2 table.

Member functions:

IDMDataField()Default constructor.

IDMDataField(IDMRETURN &rc, IString& name, IDMFIELDTYPE fieldType)Constructs a data field object with a given name and field type.

IDMDataField(const IDMDataField &dataField)The copy constructor.

IDMDataField& operator= (const IDMDataField &dataField)The assignment operator.

∼IDMDataField()The destructor.

getNameRetrieves the name of the field.

get Retrieves the name, the field type, and the cardinality of the field.

setCardinalityUsed for setting the cardinality of data fields of DB2 tables.

setFieldTypeUpdates the field type. This function is useful for DB2 fields to overwritethe field type derived from the type of the column of the database table.

getFieldTypeRetrieves the type of the field.

getDataTypeRetrieves the data type of the field type of the field.

getDB2DataTypeRetrieves the SQL data type in case the field belongs to a DB2 table.

setCycleVarsSets the values for the cycle length and cycle begin. The field type is notchecked as it can be changed by the setFieldType function.

getCycleVarsRetrieves the values for the cycle length and cycle begin. The field type isnot checked as it can be changed by the setFieldType function.

setBucketVarsSets the values for the number of buckets and the bucket width. The fieldtype is not checked as it can be changed by the setFieldType function.

getBucketVarsRetrieves the values for the number of buckets and the bucket width. Thefield type is not checked as it can be changed by the setFieldType function.


setNameMappingSets the name mapping of this field. The field type is not checked as it canbe changed by the setFieldType function.

getNameMappingRetrieves the name mapping of this field if one is specified, otherwise aNULL pointer is returned. The field type is not checked as it can bechanged by the setFieldType function.

setBucketWidthSets the bucket width. If IDM_ABSOLUTE is given as bucketWidthUnit,the width specifies an absolute value. If bucketWidthUnit is set toIDM_STAND_DEV, the bucketWidth is width times the standard deviation.If bucketWidthUnit is set to IDM_RANGE, the bucketWidth is width timesthe range between minimum and maximum value.

getBucketWidthRetrieves the bucket widths and the according bucketWidthsUnit.

setNbOfBucketsSets the number of buckets and, if given, the lowest and highest limit andthe bucketWidthUnit.

getNbOfBucketsRetrieves the number of buckets and the lowest and highest limit andbucketWidthUnit.

setBucketLimitsSets the bucket limits.

getBucketLimitsRetrieves the bucket limits.

setPrecisionSets the precision.

getPrecisionRetrieves the precision.

setValueRangeSets the value range.

getValueRangeRetrieves the value range.


IDMFlatFileField

A flat file field is a logical field that can consist of several column ranges on a flatfile. A column itself is described by its start and end position.

Header file: idmcffld.hpp

Format:class IDMFlatFileField : public IDMDataField {

ISequence<IDMINTEGER> ivBeginEndPositions;

public:IDMFlatFileField();IDMFlatFileField(IDMRETURN &rc,


IString fieldName,IDM_FieldType fieldType,ISequence<IDMINTEGER>& beginEndPositions,IDM_Cardinality=IDM_SINGLE_VALUE);

IDMFlatFileField( const IDMFlatFileField &field );IDMFlatFileField& operator= (const IDMFlatFileField &field );∼IDMFlatFileField();IDMRETURN get(IString &fieldName,

IDM_FieldType& fieldType,ISequence<IDMINTEGER>& beginEndPositions,IDM_Cardinality& card);

IDMRETURN update(ISequence<IDMINTEGER>&);};

Data members:

ivBeginEndPositionsIf n is the number of physical columns a flat file field consists of,ivBeginEndPositions has 2×n elements. For the i-th column the elementnumber 2×i-1 indicates its start position and the element number 2×i itsend position.

For example, if the sequence has the elements1 5 7 9 11 14

the first column of the field starts at position 1 and ends at position 5, thesecond column of the field starts at position 7 and ends at position 9, andthe third column of the field starts at position 11 and ends at position 14.

Member functions:

IDMFlatFileField()Default constructor.

IDMFlatFileField(IDMRETURN&, IString, IDM_FieldType,ISequence<IDMINTEGER>&, IDM_Cardinality )

Constructor initializing the member variables with their arguments.

IDMFlatFileField( const IDMFlatFileField &field )The copy constructor.

IDMFlatFileField& operator= (const IDMFlatFileField &field )The assignment operator.

∼IDMFlatFileField()The destructor.

get Retrieves the name, the type, the sequence of the begin-and-end positions,and the cardinality of a field.

updateUpdates the begin-and-end positions of a flat-file field. The field name andthe cardinality cannot be updated because computed fields might dependon this field as an input parameter.

IDMMatrixField

A matrix field is a logical field that consists of one matrix column.

Header file: idmcmtb.hpp

Format:


class IDMMatrixField : public IDMDataField {friend class IDMMatrixTable;IDMINTEGER ivMatrixColumn;

public:IDMMatrixField();IDMMatrixField(IDMRETURN &rc,

IString fieldName,IDM_FieldType fieldType,IDMINTEGER matrixColumn,IDM_Cardinality=IDM_SINGLE_VALUE);

IDMMatrixField( const IDMMatrixField &field );IDMMatrixField& operator= (const IDMMatrixField &field );∼IDMMatrixField();IDMRETURN get(IString &fieldName,

IDM_FieldType& fieldType,IDMINTEGER &matrixColumn,IDM_Cardinality& card);

IDMRETURN update(IDMINTEGER matrixColumn);};

Member functions:

IDMMatrixField()Default constructor.

IDMMatrixField(IDMRETURN&, IString, IDM_FieldType, IDMINTEGER,IDM_Cardinality)

The constructor initializing the member variables with their arguments.

IDMMatrixField( const IDMMatrixField &field )The copy constructor.

IDMMatrixField& operator= (const IDMMatrixField &field )The assignment operator.

∼IDMMatrixField()The destructor.

get Retrieves the name, type, column number, and cardinality of a field.

updateUpdates the column number of a matrix field. The field name andcardinality cannot be updated because computed fields can depend on thisfield as an input parameter.

Computed fields

Computed fields are fields whose values are computed dynamically from thevalues of other fields in a record. A computed field has field names or constants asinput parameters.

IDMComputedField

IDMComputedField is the base class of the computed field classes. It providesstatic methods for encoding constant values as strings.

Header file: idmccfld.hpp

Format:typedef enum { IDM_VALUE_MAPPING_FIELD_TYPE, IDM_DISCRETIZATION_FIELD_TYPE,

IDM_FUNCTION_FIELD_TYPE } IDM_CompFieldType;


class IDMComputedField : public IDMDataField {

public:static IString encode(IString);static IString encode(IDMREAL);static IDMBOOLEAN decode( const IString encodedString,

IString &decodedString,IDMBOOLEAN &isConstant,IDM_FieldDataType &fDataType );

IDM_CompFieldType getCompFieldType();};

Static member functions:

encode(IString)Encode a string constant as a string that is distinguishable from a fieldname.

encode(IDMREAL)Encodes a real or integer number as a string.

decodeDecodes an encoded argument. It returns IDM_TRUE if the encodedStringcan be decoded. For constants fDataType is set to the corresponding fielddata type (IDM_REAL_TYPE or IDM_STRING_TYPE) of the data type ofthe constant.

getCompFieldTypeReturns the type of the computed field.v IDM_VALUE_MAPPING_FIELD_TYPE is returned if the computed field

is an instance of class IDMValueMappingField.v IDM_DISCRETIZATION_FIELD_TYPE is returned if the computed field

is an instance of class IDMDiscretizationField.v IDM_FUNCTION_FIELD_TYPE is returned if the computed field is an

instance of class IDMFunctionField.

IDMValueMappingField

A value mapping is defined by an instance of the class IDMValueMapping (see“IDMValueMapping” on page 86). It defines a mapping from values to othervalues. A default value can be defined if a mapping does not exist for a givenvalue. Otherwise, the value will not be valid.


Format:class IDMValueMappingField : public IDMComputedField {

IDMValueMapping* pivValueMapping;ISequence<IString> ivArgumentFields;IString ivDefaultValue;

public:IDMValueMappingField();IDMValueMappingField(IDMRETURN &rc,

IString name,IDMValueMapping* pValueMapping,ISequence<IString> argumentFields);

IDMValueMappingField( const IDMValueMappingField &field );IDMValueMappingField& operator= (const IDMValueMappingField &field );∼IDMValueMappingField();IDMRETURN get(IString &name,

IDM_FieldType&,


IDMValueMapping *&pValueMapping,ISequence<IString> &argumentFields,IDM_Cardinality &card);

IDMRETURN update(ISequence<IString> argumentFields);IDMRETURN setDefaultValue(IString);IDMRETURN setDefaultValue(IDMREAL);void unsetDefaultValue();IDMBOOLEAN getDefaultValue(IString&);IDMBOOLEAN getDefaultValue(IDMREAL&);IDMRETURN getValueMapping( IDMValueMapping *&pVmp );IDMValueMapping* getValueMapping() const;

};

Data members:

pivValueMappingThe value mapping object.

ivArgumentFieldsArgument fields are the fields the value mappings are applied to. Theyshould be discrete (numeric or nonnumeric) and of the same data type asthe argument fields of the value mapping object.

ivDefaultValueThe default value as a string.

Member functions:

IDMValueMappingField()The default constructor.

IDMValueMappingField( IDMRETURN&, ISEQUENCE<IString>,IDMValueMapping*, ISEQUENCE<IString>)

The constructor initializing the member variables with their arguments.The field type (IDM_FieldType) is equal to the field type (IDM_FieldType)of the value field of the value mapping object.

IDMValueMappingField( const IDMValueMappingField &field )The copy constructor.

IDMValueMappingField& operator= (const IDMValueMappingField &field )The assignment operator.

∼IDMValueMapping()The destructor.


updateUpdates the argument field.

setDefaultValue(IString)Sets the string default value for nonnumeric fields (field type =IDM_CATEGORICAL). For continuous fields the string is transformed to areal number, for numeric fields it is transformed to an integer.

setDefaultValue(IDMREAL)Sets the default value for numeric fields.

unsetDefaultValueResets the default value; a default value is not defined after this operation.

getDefaultValue(IString&)Retrieves the default value as string. Returns IDM_FALSE if a defaultvalue has not been set.


getDefaultValue(IDMREAL&)Retrieves the default value as real number for numeric fields. ReturnsIDM_FALSE if a default value has not been set.

getValueMappingReturns a pointer to the value mapping object.

IDMDiscretizationField

A discretization field has a numeric field as an argument field. Its values are thediscretized values of this numeric field. The discretization is defined by an objectof the class IDMDiscretization.


Format:class IDMDiscretizationField : public IDMComputedField {

IString ivArgumentField;IDMDiscretization *pivDiscretization,

public:IDMDiscretizationField();IDMDiscretizationField(IDMRETURN &rc,

IString name,IDMDiscretization* pDiscr,IString argumentField );

IDMDiscretizationField( const IDMDiscretizationField &field );IDMDiscretizationField& operator= (const

IDMDiscretizationField &field );∼IDMDiscretizationField();IDMRETURN get(IString &name,

IDM_FieldType &fieldType,IDMDiscretization* &pDiscr,IString argumentField,IDM_Cardinality &card);

IDMRETURN update(IString argumentField);IDMRETURN getDiscretization( IDMDiscretization *&pDis );IDMDiscretization *getDiscretization() const;

};

Data members:

pivDiscretizationPointer to discretization object that describes the discretization.

ivArgumentName of the field to be discretized. Its field type should be IDM_REAL.

Member functions:

IDMDiscretizationField()The default constructor.

IDMDiscretizationField(IDMRETURN&, IString, IDMDiscretization*, IStringargumentField)

The constructor that initializes the member variables with the values of theinput arguments. The field type and cardinality are not specified. The fieldtype (IDM_FieldType) is equal to the field type (IDM_FieldType) of thevalue field of the discretization object. The cardinality is equal to the one ofthe argument field.

IDMDiscretizationField( const IDMDiscretizationField &field )The copy constructor.


IDMDiscretizationField& operator= (const IDMDiscretizationField &field )The assignment operator.

∼IDMDiscretizationField()The destructor.


updateUpdates the name for the argument field.

getDiscretizationReturns a pointer to the discretization object.

IDMFunctionField

Function fields are fields whose values are computed by a C++ function that hasother fields as input parameters.

In addition to this, a mechanism is provided that allows you to add your own C++functions.


Format:class IDMFunctionField : public IDMComputedField {

ISequence<IString> ivArgumentFields;IString ivFunctionName;

public:IDMFunctionField();∼IDMFunctionField();IDMFunctionField(IDMRETURN &rc,

IString fieldName,IString functionName,ISequence<IString> argumentFields,IDM_FieldType fieldType,IDM_Cardinality=IDM_SINGLE_VALUE);

IDMFunctionField( const IDMFunctionField &field );IDMFunctionField& operator= (const IDMFunctionField &field );IDMRETURN get(IString &fieldName,

IDM_FieldType &fieldType,IString &functionName,ISequence<IString> &argumentFields,IDM_Cardinality &card);

IDMRETURN update(ISequence<IString> &argumentFields);};

Data members:

ivArgumentFieldsThe list of argument fields.

ivFunctionNameThe name of the function. For a list of built-in functions, see “Built-infunctions for computed fields” on page 64.

Member functions:

IDMFunctionField()The default constructor.


IDMFunctionField(IDMRETURN&, IString, IString, ISequence<IString>,IDM_FieldType, IDM_Cardinality)

The constructor that initializes the member variables with the inputarguments. It checks if the types (IDM_FieldType) of the argument fields iscompatible with the signature of the function.

IDMFunctionField( const IDMFunctionField &field )The copy constructor.

IDMFunctionField& operator= (const IDMFunctionField &field )The assignment operator.

∼IDMFunctionField()The destructor.


updateUpdates the sequence of argument fields.

IDMFunctionDeclaration

This class is used for retrieving the declaration of functions that can be used fordefining computed fields.


Format:class IDMFunctionDeclaration : public IDMBase {

IString ivFunctionName;IDM_FieldDataType ivReturnDataType;IDM_Cardinality ivCardinality;IDMINTEGER ivNumberOfArgs;static IDMBOOLEAN cvFunctionsDeclared;static IKeySet<IDMFunctionDeclaration*, IString> cvFunctionDecls;ISequence<IDMINTEGER> ivArgTypes;IDMFunctionDeclaration();IDMFunctionDeclaration(IString functName,

IDM_FieldDataType returnDataType,IDM_Cardinality card,IDMINTEGER nbArgs,ISequence<IDMINTEGER>& argTypes);

xIDMFunctionDeclaration();static IDMRETURN load();

public:static ISortedSet<IString> getAllFunctionNames();static const IDMFunctionDeclaration*

getFunctionDeclaration(IString functName);void get(IString& functName,

IDM_FieldDataType& returnDataType,IDM_Cardinality& card,IDMINTEGER nbArgs,ISequence<IDMINTEGER>& argType);

static void decodeArgType(IDMINTEGER argType,IDM_Cardinality &card,IDMBOOLEAN &isNumeric,IDMBOOLEAN &isCategorical);

IString const& functionName() const {return ivFunctionName;

}virtual const IDMException* getStaticException() const;

};


Data members:

cvFunctionDeclsThe sequence of the declarations of all available functions. The functiondeclarations are stored in the file idmdfnct.dcl in the IDM_BIN_DIRdirectory on the server. The first time the function declarations are accessedthis file is read and the sequence is initialized. After this initialization thefile will not be read again.

ivFunctionNameThe name of the function.

ivReturnDataTypeThe return field data type of the function.

ivCardinalityThe cardinality of the function.

ivNumberOfArgsThe number of arguments of the function, the value 0 means that thefunction can have an arbitrary number of arguments.

ivArgTypesThe sequence of required argument types of the function. An argumenttype denotes the required field data types (IDM_STRING_DATA_TYPEand/or IDM_REAL_DATA_TYPE) together with the required cardinality.The sequence can have less elements than the number of arguments. If thisis the case and the sequence has N elements, the arguments N+1, ...,ivNumberOfArgs must match the N-th argument type. For example, ifivArgTypes contains only 1 element, this argument type is relevant for allarguments.

Member functions:

getAllFunctionNames()Returns the names of the available functions as a sorted set.

getFunctionDeclaration(IString):Returns the function declaration of the function with the given name. ANULL pointer is returned if such a function does not exist.

get(IString&, ...)Returns the data members of the function declaration, that is, name, returndata type, cardinality, number of arguments, and the sequence of argumenttypes.

decodeArgType(IDMINTEGER,...)Decodes the integer value of an argument type to the required cardinalityand the allowed field data types. If IDM_REAL_DATA_TYPE is allowed asa field data type, isNumeric is true, if IDM_STRING_DATA_TYPE ispossible, isCategorical is true. It is possible that both field data types areallowed.

functionName()Returns the function name of the function.



Built-in functions for computed fields

The following built-in functions are available for defining computed fields. Thecardinality of the arguments is always single-value.

real abs ( real x )Returns the absolute value of x, that is, |x|. The returned value is valid ifx is valid.

real round ( real x )Returns the rounded value of x that is the nearest floating-point integervalue to x. If x lies exactly halfway between the two nearest floating-pointinteger values, any of those two values can be returned. The returnedvalue is valid if x is valid.

real sqrt ( real x )Returns the square root of x. The returned value is valid if x is valid andx>0.

real log ( real x )Returns the natural logarithm of x. The returned value is valid if x is validand x>0.

real log10 ( real x )Returns the logarithm base 10 of x. The returned value is valid if x is validand x>0.

real exp ( real x )Returns ′e|x′, that is, the Euler number e raised to the power of x. Thereturned value is valid if x is valid and x>0.

real sin ( real x ), real cos ( real x ), real tan ( real x )The sin, cos, and tan functions return the sine, cosine, and tangent,respectively, of their parameters, which are in radians. The returned valueis valid if x is valid. These functions lose accuracy when passed a largevalue for the x parameter.

real min ( real x1, real x2, ... real xN )Returns the minimum value of the parameter values x1, x2, ..., xN. Invalidparameter values are ignored. The returned value is valid if there is at leastone valid parameter value.

real max ( real x1, real x2, ... real xN )Returns the maximum value of the parameter values x1, x2, ..., xN. Invalidparameter values are ignored. The returned value is valid if there is at leastone valid parameter value.

real average ( real x1, real x2, ... real xN )Returns the average value of the parameter values x1, x2, ..., xN. Invalidparameter values are ignored. The returned value is valid if there is at leastone valid parameter value.

real sum ( real x1, real x2, ... real xN )Returns the sum of the parameter values x1, x2, ..., xN. Invalid parametervalues are ignored. If there is no valid parameter value, the value 0 isreturned.

real product ( real x1, real x2, ... real xN )Returns the product of the parameter values x1, x2, ..., xN. Invalidparameter values are ignored. If there is no valid parameter value, thevalue 1 is returned.


real equal ( real x1, real x2 )Returns 1 if x1 and x2 are equal and 0 otherwise. The returned value isvalid if x1 and x2 are valid.

real notEqual ( real x1, real x2 )Returns 1 if x1 and x2 are not equal and 0 otherwise. The returned value isvalid if x1 and x2 are valid.

real lessOrEqual ( real x1, real x2 )Returns 1 if x1 is less or equal to x2 and 0 otherwise. The returned value isvalid if x1 and x2 are valid.

real lessThan ( real x1, real x2 )Returns 1 if x1 is less than x2 and 0 otherwise. The returned value is validif x1 and x2 are valid.

real subtract ( real x1, real x2 )Returns the difference between x1 and x2. The returned value is valid if x1and x2 are valid.

real divide ( real x1, real x2 )Returns the quotient of x1 and x2. The returned value is valid if x1 and x2are valid and x2 is not 0.

real ifr ( real b, real x1, real x2 )Returns x1 if b is valid and b is not equal to 0, otherwise x2 is returned.

real ifrValid ( real/string v, real x1, real x2 )Returns x1 if v is valid, otherwise x2 is returned. This function can be usedto encode invalid values.

real movingAverage ( real n, real x )Computes the average of the last n values of x. The value of n should be aconstant, otherwise the value of n of the first record is taken.

real stringToReal( string s )Converts the string s to a real number. The returned value is valid if s isvalid and s is a string representing a real number.

string stringToUpper ( string s )Converts all characters of string s to uppercase letters. The returned valueis valid if s is valid.

string stringToLower ( string s )Converts the characters of string s to lowercase letters. The returned valueis valid if s is valid.

real stringEqual ( string s1, string s2 )Returns 1 if s1 and s2 are equal and 0 otherwise. The returned value isvalid if s1 and s2 are valid.

string concat ( string s1, string s2 )Returns the strings s1 and s2 concatenated. The returned value is valid ifs1 and s2 are valid.

string ifs ( real b, string s1, string s2 )Returns s1 if b is valid and b is not equal to 0, otherwise s2 is returned.

string ifsValid ( real/string v, string s1, string s2 )Returns s1 if v is valid, otherwise s2 is returned. This function can be usedto encode invalid values.


string realToString( real x )Converts the real number x to a string. The returned value is valid if x isvalid.

Defining your own functions for computed fields

You can define functions for computed fields in a C++ program which isdynamically linked to the Intelligent Miner executable. This program shouldcontain a call to idmAddRealFunction or to idmAddStringFunction for eachfunction that is to be added to the set of functions allowed in computed fields.Once added, a user-defined function will be called on demand whenever a newdata record is processed during a mining run.

Adding a user-defined function is done along the following steps:1. The function itself needs to be defined.2. The function needs to be made known to the data access API, that is, it needs

to be declared as a user-defined function.3. An object file needs to be created.4. The function declaration needs to be generated by calling the idmefdcl

executable on the server.

Note: This information applies only to AIX:

The C++ program file containing the definitions for the user-definedfunctions is referenced as YourUDF.cpp. YourUDF.cpp might consist ofmore than one file. You can add compiler options if required.

Intelligent Miner programs are stored in the directory identified by theIDM_BIN_DIR environment variable. Ensure that you have writepermission for this directory. It contains an executable file calledidmUDFinstall.

Change to the directory identified by the IDM_BIN_DIR environmentvariable and type idmUDFinstall YourUDF.cpp This command compilesYourUDF.cpp into an executable file stored in this directory. It alsoupdates an internal table of available functions for computed fields. Thenew executable file is loaded dynamically at run time when you start amining function.

To de-install, remove the new executable and call idmefdcl.

Defining a user-defined function

A user-defined function, for example, someUdf, needs to have the followingsignature:IDMBOOLEAN someUdf(IDMField** ppFields,

IDMINTEGER nbFields,IDM_ComputationState compState,IDMREAL/IDMCHAR* &value,IDMINTEGER &fieldWidth,void *&pAuxStorage);

withtypedef enum {

IDM_COMP_INIT,IDM_COMP_EVAL,


IDM_COMP_UNDO,IDM_COMP_REINIT,IDM_COMP_CLEANUP

} IDM_ComputationState;

ppFieldsAn array of IDMField pointers with nbFields elements. These are theargument fields of the function.

compStateIndicates the state of the computation. It can have the following values:

IDM_COMP_INITInitialization of the computation which is called when a data inputis opened, that is, before the first record is read. In this state thefieldWidth value of the computed field needs to be computed. Thisvalue needs to be known in case an output table is created withthis computed field. The field width value must not be modifiedlater as this would result in flat file output tables withnon-constant record length. For functions computing string valuesa character string of length fieldWidth can be allocated andassigned to the ″value″ return parameter. Furthermore, theauxiliary storage can be allocated. Auxiliary storage can be usefulin case the function needs to memorize information that is notcontained in the argument fields. Cases where this is necessary arefunctions that need to know the values of a field of precedingrecords or some aggregated information (like the current averagevalue).

IDM_COMP_EVALThis evaluation state is called when the field values of a newrecord are retrieved. The evaluation of the function is based oninformation contained in the argument field pointer array ppFields(in particular the value of a field and its validity) together with theone available in pAuxStorage. The computed value needs to beassigned to the return parameter ″value″. If the computed value isvalid, IDM_TRUE needs to be returned, otherwise IDM_FALSE.

IDM_COMP_UNDOUndo the computation. It is called when a record does not satisfy afiltering condition. If the computation of the last value has someside effect in the auxiliary storage, this side effect should beundone in this state.

IDM_COMP_REINITReinitialize the auxiliary storage. It is called when the records of adata input have been read and the input is rewound to the begin.At the beginning of a subsequent pass over the data it might benecessary to reinitialize certain parts of the information stored inthe auxiliary storage.

IDM_COMP_CLEANUPCleanup the computation. It is called when a data input is closed.The memory allocated during the initialization needs to be deleted.This concerns the value-character array for functions computing,string values, and the auxiliary storage.

Value The return value of the function. Its data type is IDMREAL for functionscomputing real values and IDMCHAR* for those returning characterstrings.


fieldWidthThe width of the field computed by this function.

pAuxStorageA pointer to auxiliary storage managed by the function itself.

Declaring a user-defined function

User-defined functions are declared by calling the following functions:v idmAddRealFunction for functions computing real valuesv idmAddStringFunction for functions returning character stringsIDMRETURN idmAddRealFunction(const IDMCHAR* pName,

IDMBOOLEAN (*pFunct)(IDMField**,IDMINTEGER,IDM_ComputationState,IDMREAL&,IDMINTEGER&,void*&),

IDMArray<IDMExtendedFieldType*>*pSignature,

IDMINTEGER nbOfArguments,IDMBOOLEAN functionOfCurrentRecord =

IDM_TRUE,IDMINTEGER stableAfterPass = 0);

IDMRETURN idmAddStringFunction(const IDMCHAR* pName,IDMBOOLEAN (*pFunct)(IDMField**,

IDMINTEGER,IDM_ComputationState,IDMCHAR*&,IDMINTEGER&,void*&),

IDMArray<IDMExtendedFieldType*>*pSignature,

IDMINTEGER nbOfArguments,IDMBOOLEAN functionOfCurrentRecord =

IDM_TRUE,IDMINTEGER stableAfterPass = 0);

These functions have the following parameters:

pNameThe (external) name of the function used in the computed field definitions.It needs not to be identical with the name of the C++ functions defining it.

pFunctThe function pointer of the defining C++ function.

pSignatureThe array of extended field types (class IDMExtendedFieldType) describingthe possible field data types and cardinality of the argument fields. Thearray may contain less elements than the possible number of argumentsspecified as the next parameter. In that case the last extended field type ofthis array is relevant for the remaining arguments too. A special case is ifthis array contains only one element, which means that all arguments needto match the same extended field type.

nbOfArgumentsThe number of arguments, 0 means that an arbitrary number of argumentsis allowed.


functionOfCurrentRecordThe function depends on the values of the current record only; that is, anauxiliary buffer is not required and no undo action is necessary if thefunction is called with IDM_COMP_UNDO computation state. If thefunctions of all computed fields have this property, the functions are nevercalled when this is done with the undo computation state.

stableAfterPassThe computation yields identical results for a given record after the givenpass number. This can occur, for example, for a function that replaces aninvalid value by the mean value of a field which is known only after thefirst pass. This information is not exploited in the current version.

IDMExtendedFieldTypeThe extended field type allows to specify the field data types andcardinality of an argument of a user-defined function.

class IDMExtendedFieldType {

public:

IDMExtendedFieldType(IDMCARDINALITY card);IDMExtendedFieldType(IDMCARDINALITY card, IDM_FieldDataType fType);xIDMExtendedFieldType();

static IDMArray<IDMExtendedFieldType*>*signature(IDMCARDINALITY card);

static IDMArray<IDMExtendedFieldType*>*signature(IDMCARDINALITY card,

IDM_FieldDataType fType);};

Member functions:

IDMExtendedFieldType(IDMCARDINALITY)Constructs an extended field type object for the cardinality card. The fielddata types are irrelevant, that is, IDM_REAL_DATA_TYPE andIDM_STRING_DATA_TYPE are allowed as field data types for argumentswith this extended field type.

IDMExtendedFieldType(IDMCARDINALITY card, IDM_FieldDataType fType);Constructs an extended field type object for the cardinality card and fielddata type fType.

∼IDMExtendedFieldType();The destructor.

signature(IDMCARDINALITY card)Creates an array containing an extended field type object constructed byIDMExtendedFieldType(IDMCARDINALITY) constructor. This signaturemeans that all arguments need to have a cardinality card.

signature(IDMCARDINALITY card, IDM_FieldDataType fType)Creates an array containing an extended field type object constructed byIDMExtendedFieldType(IDMCARDINALITY, IDM_FieldDataType)constructor. This signature means that all arguments need to havecardinality card and field data type fType.

Compile the program

A program with user-defined functions must have a main function which must benamed IDM_UDF_MAIN. The name IDM_UDF_MAIN is defined as a preprocessor


symbol in the file idmdglob.hpp. This main function must callidmAddRealFunction() or idmAddStringFunction() for each function to bereferenced in computed fields. The main function must have C linkage, that is,declare it with extern ″C″ int IDM_UDF_MAIN();. The main function must returnan integer value, where 0 is interpreted as success, any other value signals afailure. The compile and link options depend on the operating system on whichthe Intelligent Miner server program is running. Refer to the system-specificinstallation documentation for further details.

Update database containing function declarations

The Intelligent Miner maintains a simple database containing declarations for allfunctions which can be used in computed fields. This information is stored in thefile idmdfnct.dcl in the standard Intelligent Miner binary directory. In order toupdate this file, the executable idmefdcl must be called whenever the set ofuser-defined functions changes.

Example

The following C++ program shows an example where a user-defined functioncalled myDistance is defined.

Header file: myfdcl.cpp

Format:#include <stdio.h> // for test output#include "idmdfnct.hpp"

extern "C" int IDM_UDF_MAIN(); // must have a C name externally

// myDistance takes two input fields with real values// and returns the absolute differenceIDMBOOLEAN myDistance(IDMField** argFields,

IDMINTEGER nbArgs,IDM_ComputationState state,IDMREAL& value,IDMINTEGER&size,void*&dummy)

{IDMBOOLEAN isValid = IDM_TRUE;

switch(state) {case IDM_COMP_INIT:

size = 10;break;

case IDM_COMP_EVAL: {IDMREAL argvalA = 0;IDMREAL argvalB = 0;IDMBOOLEAN isValidA = argFields[0]->getRealValue(argvalA);IDMBOOLEAN isValidB = argFields[1]->getRealValue(argvalB);isValid = isValidA && isValidB;value = argvalA>argvalB?argvalA-argvalB:argvalB-argvalA;}

}

return isValid;}

int IDM_UDF_MAIN(){

printf("%s() called\n",IDM_STR_UDF_MAIN);fflush(stdout);


IDMRETURN frc = IDM_SUCCESS;

frc = idmAddRealFunction("myDistance",myDistance,IDMExtendedFieldType::signature(IDM_SINGLE_VALUE,

IDM_REAL_TYPE),0);

return frc;}

Mining base

A mining base is the repository for the meta-mining data. It is the logical unit forcollecting the information of the following objects:v Data settingsv Mining settingsv Preprocessing settingsv Statistics settingsv Repeatable Sequences settingsv Mining results (only IDMResult and IDMResultSets)

The objects are collected in so-called extents. References to objects of other miningbases are not allowed within a mining base.

When a mining base has been loaded, it is locked for other users. Two users canhave access to two different mining bases simultaneously. Multiple users can haveaccess to the same mining base simultaneously, but only the first user can savechanges to the mining base.

When deleting a mining base, all objects associated with it are deleted too, becausethey become inaccessible after the removal of the parent mining base.

This class describes all the objects belonging to a mining base as well as all thefunctions associated with it.

IDMMiningBase

The following objects belong to one mining base. They are collected in the classIDMMiningBase:v Data objectsv Item-Category objectsv Taxonomy-Relation objectsv Taxonomy objectsv Name-Mapping objectsv Value-Mapping objectsv Discretization objectsv Associations-settings objectsv Sequential-Patterns-settings objectsv Time-Sequences-settings objectsv Clustering-settings objectsv Classification-settings objectsv Value-Prediction-settings objectsv Repeatable-Sequence-settings objects


v Preprocessing-settings objectsv Statistics-settings objectsv Result-Set objectsv Result objects

The different object types are grouped together in IKeySortedSet collections(Extends).

Header file: idmcmnb.hpp

Format:#define IDM_NB_MINING_CLASSES 43

typedef enum {IDM_READ,IDM_WRITE

} IDM_Permission;

typedef enum {IDM_NO_TYPE,IDM_DATA,IDM_NAME_MAPPING,IDM_TAXONOMY,IDM_ITEM_CATEGORY,IDM_TAXONOMY_RELATION,IDM_VALUE_MAPPING,IDM_DISCRETIZATION,IDM_ASSOC_SETTINGS,IDM_SEQ_PATTERN_SETTINGS,IDM_SIM_SEQ_SETTINGS,IDM_CLASSIFY_SETTINGS,IDM_CLUSTERING_SETTINGS,IDM_PREDICTION_SETTINGS,IDM_SEQUENCE,IDM_COPY_RECORDS_TO_FILE,IDM_AGGREGATE_VALUES,IDM_FILTER_FIELDS,IDM_CLEANUP_DATA_SOURCES,IDM_CALCULATE_VALUES,IDM_DISCRETIZATION_INTO_QUANTILES,IDM_DISCARD_RECORDS_WITH_MISSING_VALUES,IDM_DISCRETIZATION_USING_RANGES,IDM_ENCODE_MISSING_VALUES,IDM_ENCODE_NONVALID_VALUES,IDM_FILTER_RECORDS_USING_A_VALUE_SET,IDM_GROUP_RECORDS,IDM_RUN_SQL,IDM_JOIN_DATA_SOURCES,IDM_CONVERT_TO_LOWERCASE_OR_UPPERCASE,IDM_PIVOT_FIELDS_TO_RECORDS,IDM_FILTER_RECORDS,IDM_GET_RANDOM_SAMPLE,IDM_MAP_VALUES,IDM_DESC_QUANT_SAMPLE_SETTINGS,IDM_STAT_FACTOR_ANALYSIS,IDM_STAT_LINEAR_REGRESSION,IDM_STAT_PRIN_COM_ANALYSIS,IDM_STAT_UNIVARIATE_CURVE,IDM_RESULT_SET,IDM_RESULT

} IDM_MiningClass;

class IDMMiningBaseInfo {IDMMiningBaseInfo();

public:


IString ivName; //Mining base nameIDMText ivComment; //Mining base commentIBoolean operator==(IDMMiningBaseInfo const & mnbInfo) const

{return ivName==mnbInfo.ivName};IBoolean operator<(IDMMiningBaseInfo const& mnbInfo) const

{return ivName<mnbInfo.ivName};IDMMiningBaseInfo(){};xIDMMiningBaseInfo(){};

};class IDMMiningBase : public IDMBase {

IDMRETURN ivRc;IString ivName;IDM_Permission ivPermission;IDMBOOLEAN ivChange;IDMText ivComment;IDMTimeStamp ivCreationTimeStamp;IDMTimeStamp ivUpdateTimeStamp;ISortedRelation< IDMTaxonomyRelation*, IString > ivTaxonomyRelations;

IKeySortedSet<IDMMiningClass*,IString> ivMclExtends[IDM_NB_MINING_CLASSES+1];ISequence<IDMSettings*> ivRunningSettings;

public:

IDMMiningBase();IDMMiningBase( IDMRETURN &rc, IString name );IDMMiningBase( IDMRETURN &rc, IString name, IString comment);∼IDMMiningBase() ;IDMRETURN update( IString name, IString comment="" );IDMRETURN get( IString &name );IDMRETURN get( IString &name, IString &comment,

IDM_Permission &permission,IDMTimeStamp &creationTimeStamp,IDMTimeStamp &updateTimeStamp);

IDMRETURN deleteObject( );

IDMRETURN getElement( IString, IDM_MININGCLASS, IDMMiningClass *&object );IDMRETURN getElement( IString, IDM_MININGCLASS, IDMSettings *&object );

IDMRETURN getElement( IString, IDMData *&data );IDMRETURN getElement( IString, IDMNameMapping *&nameMapping );IDMRETURN getElement( IString, IDMValueMapping *&valueMapping );IDMRETURN getElement( IString,

IDMDiscretization *&discretization );IDMRETURN getElement( IString,

IDMItemCategory *&itemCategory );IDMRETURN getElement( IString, IString,

IDMTaxonomyRelation *&taxRel );IDMRETURN getElement( IString, IDMTaxonomy *&tax );IDMRETURN getElement( IString, IDMAssocSettings *&ass );IDMRETURN getElement( IString, IDMSeqPatternSettings *&sps );IDMRETURN getElement( IString, IDMResultSet *&resultSet );IDMRETURN getElement( IString, IDMResult *&result );IDMRETURN getElement( IString, IDMClassifySettings*& clf );IDMRETURN getElement( IString, IDMSimSeqSettings*& sim );IDMRETURN getElement( IString, IDMClusteringSettings *&seg );IDMRETURN getElement( IString, IDMPredictionSettings *&pred );IDMRETURN getElement( IString, IDMSequence *&seq);IDMRETURN getElement( IString,

IDMDiscardRecordsWithMissingValues *&ppf );IDMRETURN getElement( IString, IDMFilterRecords *&ppf );IDMRETURN getElement( IString, IDMJoinDataSources *&ppf );IDMRETURN getElement( IString, IDMGroupRecords *&ppf );IDMRETURN getElement( IString, IDMGetRandomSample *&ppf );IDMRETURN getElement( IString, IDMFilterFields *&ppf );IDMRETURN getElement( IString,


IDMFilterRecordsUsingAValueSet *&ppf );IDMRETURN getElement( IString, IDMMapValues*&ppf );IDMRETURN getElement( IString, IDMCopyRecordsToFile *&ppf );IDMRETURN getElement( IString, IDMPivotFieldsToRecords *&ppf );IDMRETURN getElement( IString, IDMRunSQL *&ppf );IDMRETURN getElement( IString, IDMAggregateValues *&ppf );IDMRETURN getElement( IString, IDMCleanUpDataSources *&ppf );IDMRETURN getElement( IString, IDMCalculateValues *&ppf );IDMRETURN getElement( IString,

IDMDiscretizationUsingRanges*&ppf );IDMRETURN getElement( IString,

IDMEncodeNonvalidValues *&ppf );IDMRETURN getElement( IString,

IDMDiscretizationIntoQuantiles *&ppf );IDMRETURN getElement( IString,

IDMConvertToLowercaseOrUppercase *&ppf );IDMRETURN getElement( IString, IDMEncodeMissingValues *&ppf );IDMRETURN getElement( IString, IDMStatFactorAnalysis *&sts);IDMRETURN getElement( IString, IDMStatLinearRegression *&sts);IDMRETURN getElement( IString, IDMStatPrinComAnalysis *&sts);IDMRETURN getElement( IString, IDMStatUnivariateCurve *&sts);

IDMRETURN getExtend( IKeySortedSet<IDMData*, IString>&instances);

IDMRETURN getExtend( IKeySortedSet<IDMNameMapping*, IString>&instances);

IDMRETURN getExtend( IKeySortedSet<IDMValueMapping*, IString>&instances);

IDMRETURN getExtend( IKeySortedSet<IDMDiscretization*,IString> &instances);

IDMRETURN getExtend( IKeySortedSet<IDMItemCategory*, IString>&instances);

IDMRETURN getExtend( ISortedRelation<IDMTaxonomyRelation*,IString> &instances);

IDMRETURN getExtend( IKeySortedSet<IDMTaxonomy*, IString>&instances);

IDMRETURN getExtend( IKeySortedSet<IDMAssocSettings*, IString>&instances);

IDMRETURN getExtend( IKeySortedSet<IDMSeqPatternSettings*,IString> &instances);

IDMRETURN getExtend( IKeySortedSet<IDMResultSet*, IString>&instances);

IDMRETURN getExtend( IKeySortedSet<IDMResult*, IString>&instances);

IDMRETURN getExtend(IKeySortedSet<IDMClassifySettings*,IString> &instances);

IDMRETURN getExtend(IKeySortedSet<IDMSimSeqSettings*,IString> &instances);

IDMRETURN getExtend( IKeySortedSet<IDMClusteringSettings*,IString> &instances );

IDMRETURN getExtend( IKeySortedSet<IDMPredictionSettings*,IString> &instances );

IDMRETURN getExtend( IKeySortedSet<IDMProcessingSettings*,IString> &instances );

IDMRETURN getExtend( IKeySortedSet<IDMSequence*,IString> &instances );

IDMRETURN getExtend(IKeySortedSet<IDMDiscardRecordsWithMissingValues* ,

IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMFilterRecords*, IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMJoinDataSources* , IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMGroupRecords*, IString> &instances );


IDMRETURN getExtend(IKeySortedSet<IDMGetRandomSample* , IString> &instances );

IDMRETURN getExtend(IKeySortedSet<IDMFilterFields*, IString> &instances );

IDMRETURN getExtend(IKeySortedSet<IDMFilterRecordsUsingAValueSet*, IString>

&instances );IDMRETURN getExtend(

IKeySortedSet<IDMMapValues*, IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMCopyRecordsToFile*, IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMPivotFieldsToRecords*, IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMRunSQL* , IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMAggregateValues*, IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMCleanUpDataSources* , IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMCalculateValues*, IString> &instances );IDMRETURN getExtend(

IKeySortedSet<IDMDiscretizationUsingRanges*, IString>&instances);

IDMRETURN getExtend(IKeySortedSet<IDMEncodeNonvalidValues* , IString> &instances);

IDMRETURN getExtend(IKeySortedSet<IDMDiscretizationIntoQuantiles*, IString> &instances);

IDMRETURN getExtend(IKeySortedSet<IDMConvertToLowercaseOrUppercase*,IString> &instances);

IDMRETURN getExtend(IKeySortedSet<IDMEncodeMissingValues*, IString> &instances );

IDMRETURN getExtend( IKeySortedSet<IDMStatisticsSettings*, IString> &instances );IDMRETURN getExtend( IKeySortedSet<IDMStatFactorAnalysis*, IString> &instances );IDMRETURN getExtend( IKeySortedSet<IDMStatLinearRegression*, IString> &instances );IDMRETURN getExtend( IKeySortedSet<IDMStatPrinComAnalysis*, IString> &instances );IDMRETURN getExtend( IKeySortedSet<IDMStatUnivariateCurve*, IString> &instances );

IDMRETURN save();IDMRETURN saveAs( IString name );IDMRETURN saveAsWithoutResults( IString name );IDMRETURN export( IString path, IString filenameWithoutExtension );IDMRETURN exportWithoutResults( IString path,

IString filenameWithoutExtension );IDMRETURN import( IString path, IString filenameWithoutExtension );

IString getName();IDMRETURN setName(IString);IDMRETURN setComment(IString);IString getComment();const IDMTimeStamp& getCreationTimeStamp();IString getCreationTimeStampString();const IDMTimeStamp& getUpdateTimeStamp();IString getUpdateTimeStampString();

IDM_Permission queryPermission();IDMBOOLEAN queryChange();

static IDMRETURN createObject( IString name,IDMMiningBase *&pMiningBase );

static IDMRETURN createObject(IString name, Istring commentIDMMiningBase *&pMiningBase );

static IDMRETURN load(IString miningBaseName,IDMMiningBase *&pMiningBase );

static IDMRETURN loadFromString( IString loadString,IDMMiningBase *&pMnb );


static IDMRETURN loadComment( IString miningBaseName,IString &comment );

static IDMRETURN getMiningBaseNames(ISortedSet<IString>&miningBaseNames );

static IDMRETURN getMiningBaseInfos( IsortedSet<const IDMMiningBaseInfos>&info );

ISequence<IDMSettings*> getRunningSettings();virtual const IDMException* getStaticException() const;

};

Data members:

ivNameThe name of a mining base.

ivPermissionSpecifies whether read (IDM_READ) or write (IDM_WRITE) permission tothe mining base is given. If a mining base is loaded but locked by anotheruser, only read permission is given to the mining base. With permissionIDM_READ the mining base cannot be changed. To change it, it must besaved under a different name using the saveAs method. The mining basecannot be saved with the old name.

ivChangeSpecifies whether the mining base has been changed after loading.

ivCommentAny text.

ivCreationTimeStampThe date and time the mining base was created.

ivUpdateTimeStampThe date and time of the last update of the mining base.

ivTaxonomyRelationsThe SortedRelation collections of pointers to the taxonomy-relation objectsthat belong to the mining base.

ivMclObjectsAn array of IKeySortedSet collections. For each class that is derived fromIDMMiningClass one array element is available in this array. Refer to thedescription of IDMMiningClass “IDMMiningClass” on page 80. Theenumeration IDM_MiningClass lists all classes as enumerated values thatare derived from IDMMiningClass.

ivRunningSettingsAll settings that are currently running are stored in this sequence.

Member functions:

IDMMiningBase()The default constructor.

IDMMiningBase( IDMRETURN &rc, IString name )Constructs a mining base object with a given mining base name. An erroroccurred during construction if the return code is not equal toIDM_SUCCESS. The mining base object should then be deleted using themethod deleteObject(). The constructed mining base has write permission.Other users cannot change it until the mining base is closed. Calling thedestructor closes the mining base.


IDMMiningBase( IDMRETURN &rc, IString name, IString comment );Constructs a mining base object with a given mining base name and amining base comment. The name and the comment are written to a filecalled mnbases.dat. This file is contained in the idmmnb directory that islocated in the directory identified by the IDM_MNB_DIR environmentvariable. An error occurred during construction if the return code is notequal to IDM_SUCCESS. The mining base object should then be deletedusing the method deleteObject(). The constructed mining base has writepermission. Other users cannot change it until the mining base is closed.Calling the destructor closes the mining base.

∼IDMMiningBaseThe destructor. It deletes the mining base in memory. All objects in itsextends are deleted too. The mining base is not deleted on disk. If thedestructor is called before the mining base has been saved, all changes inthe mining base are lost.

createObjectConstructs a mining base object with a given name and returns the objectif no error occurred. Otherwise the object is deleted and pMiningBase is setto NULL. The constructed mining base has write permission. Other userscannot change it until it is deleted using the destructor.

createObject and commentConstructs a mining base object with a given name and comment andreturns the object if an error occurred. Otherwise, the object is deleted andpMiningBase is set to NULL. The name and the comment is written to afile called mnbases.dat. This file is contained in the idmmnb directory,which is located in the directory identified by the IDM_MNB_DIRenvironment variable. The constructed mining base has write permission.Other users cannot change it until it is closed by using the destructor.

deleteObjectRemoves the mining base from disk and calls the destructor. The miningbase cannot be removed if it is loaded with read permission.

updateUpdates the name and the comment of the mining base in the mnbases.datfile. This file is contained in the idmmnb directory that is located in thedirectory identified by the IDM_MNB_DIR environment variable.

get(IString&)Returns the name of the mining base.

get(IString&, IString&, ...)Returns the name, comment, read/write permission, creation, and updatetime stamp of the mining base.

getExtendReturns the IKeySortedSet or ISortedSet collection of the meta-data.

getElementReturns a pointer to the object with the given name out of theIKeySortedSet or ISortedRelation collection of the meta-data. If no objectwith the given name exists, a NULL pointer is returned.

save Saves the mining base with all objects. The mining base cannot be saved ifit is loaded with read permission.

saveAsSaves a mining base with a new name. The new saved mining base is


accessed with write permission. The old mining base is not deleted fromthe disk and can be loaded again later if required. After saveAs themining-base object represents the saved new mining base.

saveAsWithoutResultsSaves a mining base with a new name. The results of the mining base arenot saved in the new mining base. The new saved mining base is accessedwith write permission. The old mining base is not deleted from the diskand can be loaded again later if required. After saveAs the mining-baseobject represents the saved new mining base.

export Exports the mining base into the given path on the client. Different files arecreated all beginning with given filenameWithoutExtension. The created fileshave the following extensions:

.mnb Mining base file.

.des Description file that contains descriptions of the data objects, namemapping objects, discretization objects, value mapping objects,taxonomy relation objects and preprocessing objects. This file canbe changed before the mining base is imported. See Using theIntelligent Miner for Data for an explanation of the description file.

.X1, .X2, ..., .XnResult files of the mining base. X1 to Xn are corresponding to thenames of the result files resXn.dat.

exportWithoutResultsExports the mining base into the given path on the client. The results of themining base are not exported. Different files are created all beginning withgiven filenameWithoutExtension. The created files have the followingextensions:


.des Description file that contains descriptions of the data objects, namemapping objects, discretization objects, value mapping objects,taxonomy relation objects, and processing objects. This file could bechanged before the mining base is imported. SeeUsing the IntelligentMiner for Data for an explanation of the description file.

importImports the mining base from the given path. The mining base object thatimports the mining base in the given path can either be a new mining baseor an existing mining base. If it is an existing mining base and objects, forexample, data objects, have the same names in the existing mining baseand in the mining base to be imported, the names in the mining base to beimported get a suffix like _1 or _2 until all duplicate names have such asuffix. See Using the Intelligent Miner for Data for more detailed information.

In the given path all files beginning with the given filenameWithoutExtensionand the following extensions are used for importing:


.des Description file that contains descriptions of the data objects, namemapping objects, discretization objects, value-mapping objects,taxonomy-relation objects, and processing objects. This file can bechanged before the mining base is imported if the data sources aredifferent, for example, different names and different location. Thefield names and column defininitions of the data sources must bethe same. The data sources in this file are integrated into the


mining base file during importing. See Using the Intelligent Minerfor Data for an explanation of the description file and how tochange it.

.X1, .X2, ..., .XnResult files of the exported mining base. These files are integratedinto the mining base during importing.

getNameReturns the name of the mining base.

setNameSets the name or updates the name of the mining base.

setCommentSets a comment to the mining base. The comment is only set in memory.When the mining base is saved, it is written to the mnbases.dat file. Thisfile is contained in the idmmnb directory that is located in the directoryidentified by the IDM_MNB_DIR environment variable.

getCommentReturns the comment of the mining base.

getCreationTimeStampRetrieves the time stamp of the creation of the mining base.

getCreationTimeStampStringRetrieves the time stamp of the creation of the mining base as a stringaccording to the value of the LC_TIME local.

getUpdateTimeStampRetrieves the time stamp of the last update of the mining base.

getUpdateTimeStampStringRetrieves the time stamp of the last update of the mining base as a stringaccording to the value of the LC_TIME local.

getMiningBaseNamesRetrieves the names of all created mining bases and returns them in asorted set.

getMiningBaseInfosRetrieves the names and comments of all created mining bases and returnsthem as objects of the class IDMMiningBaseInfo in a sorted set.

load Loads the mining base with the given name and its objects into mainmemory and returns a pointer to the mining base object. The mining baseobject is not deleted if an error occurred during loading. If pMiningBase isNULL, no memory was available to construct the object.

Before loading a mining base, the method getMiningBaseNames can becalled to get a sorted set of all available mining base names. The loadedmining base is accessed with write permission, if it is not locked byanother user. If it is locked by another user, the mining base is accessedwith read permission. In this case the mining base can be changedtemporarily, but for changes on the disk the method saveAs must be used.

loadFromStringLoads the mining base from the string loadString and returns a pointer tothe mining base. Note that the mining base has no name and is not savedon disk, but all methods like getElement or getExtend can be used to getdata or settings information from the mining base.


These methods can be used by visualizers that use the Result API todisplay the result of a mining run and that want to display also the miningsettings or the data source that were used for the mining run. The miningsettings are saved in the result and can be retrieved by methodIDMDBasicDescrStatsResult::getMiningSettings( ). This method returns anIDMCHAR*, which is then input to loadFromString.

loadCommentReads the comment from the specified mining base.

queryPermissionReturns the permission status of the mining base. IDM_READ means readpermission, IDM_WRITE means write permission.

queryChangeReturns a Boolean value that specifies whether the mining base has beenchanged after loading or creating. IDM_TRUE means that it has beenchanged. IDM_FALSE means that it has not be changed.

getRunningSettingsReturns the currently running settings.


The base class of the mining classes

The class IDMMiningClass is the abstract base class of all classes that have anextend in a mining base; that is, these are the mining data, the settings and theresult classes. Its main purpose is to collect the member variables and functionsthat are common to all these classes.

IDMMiningClass

Header file: idmcmcl.hpp

Format:typedef enum {

IDM_NO_TYPE,IDM_DATA,IDM_NAME_MAPPING,IDM_TAXONOMY,IDM_ITEM_CATEGORY,IDM_TAXONOMY_RELATION,IDM_VALUE_MAPPING,IDM_DISCRETIZATION,IDM_ASSOC_SETTINGS,IDM_SEQ_PATTERN_SETTINGS,IDM_SIM_SEQ_SETTINGS,IDM_CLASSIFY_SETTINGS,IDM_CLUSTERING_SETTINGS,IDM_PREDICTION_SETTINGS,IDM_SEQUENCE,IDM_COPY_RECORDS_TO_FILE,IDM_AGGREGATE_VALUES,IDM_FILTER_FIELDS,IDM_CLEANUP_DATA_SOURCES,IDM_CALCULATE_VALUES,IDM_DISCRETIZATION_INTO_QUANTILES,IDM_DISCARD_RECORDS_WITH_MISSING_VALUES,IDM_DISCRETIZATION_USING_RANGES,


IDM_ENCODE_MISSING_VALUES,IDM_ENCODE_NONVALID_VALUES,IDM_FILTER_RECORDS_USING_A_VALUE_SET,IDM_GROUP_RECORDS,IDM_RUN_SQL,IDM_JOIN_DATA_SOURCES,IDM_CONVERT_TO_LOWERCASE_OR_UPPERCASE,IDM_PIVOT_FIELDS_TO_RECORDS,IDM_FILTER_RECORDS,IDM_GET_RANDOM_SAMPLE,IDM_MAP_VALUES,IDM_DESC_QUANT_SAMPLE_SETTINGS,IDM_STAT_FACTOR_ANALYSIS,IDM_STAT_LINEAR_REGRESSION,IDM_STAT_PRIN_COM_ANALYSIS,IDM_STAT_UNIVARIATE_CURVE,IDM_RESULT_SET,IDM_RESULT

} IDM_MiningClass;

class IDMMiningClass : public IDMBase {

protected:IDMRETURN ivRc;IString ivName;IDMText ivComment;IDMTimeStamp ivCreationTimeStamp;IDMTimeStamp ivUpdateTimeStamp;IDM_MiningClass ivClassType;IDMMiningBase *pivMiningBase;

public:IString getName();virtual IDMRETURN setName(IString)=0;IDMMiningBase* getMiningBase();IDMRETURN setComment(IString);IString getComment();const IDMTimeStamp& getCreationTimeStamp();IString getCreationTimeStampString();const IDMTimeStamp& getUpdateTimeStamp();IString getUpdateTimeStampString();IDM_MiningClass getClassType();

IDMRETURN get(IDM_MiningClass& classType,IString& name,IString& comment,IDMTimeStamp& creationTimeStamp,IDMTimeStamp& updateTimeStamp);

IDMRETURN get(IDM_MiningClass& classType,IString& name,IString& comment,IString& creationTimeStampString,IString& updateTimeStampString);

virtual const IDMException* getStaticException() const;};

Data members:

ivNameThe name of the mining class object. The name should not contain newline characters. For each derived class the name is used as a key in thecorresponding extent of the mining base class (IDMMiningBase).

ivCommentA string that can contain new lines. It can be used as a comment or as adescription of the object.


ivCreationTimeStampThe time and date the object has been created.

ivUpdateTimeStampThe time and date the object has been changed the last time.

ivClassTypeThe type of the derived class the object belongs to.

pivMiningBaseA pointer to the mining base the object belongs to.

Member functions:

getNameRetrieves the name of the object.

setNameSets the name of the object.

getMiningBaseReturns a pointer to the mining base the object belongs to.

setCommentUpdates the comment for this object.

getCommentRetrieves the comment for this object.

getCreationTimeStampReturns the creation timestamp of this object.

getCreationTimeStampStringReturns the creation timestamp of this object as a string in a formataccording to the NLS settings.

getUpdateTimeStampReturns the timestamp of the last update of this object.

getUpdateTimeStampStringReturns the timestamp of the last update of this object as a string in aformat according to the NLS settings.

getClassTypeReturns the class type.

get Returns the members of IDMMiningClass.


Mining settings

The classes IDMData, IDMNameMapping, IDMValueMapping, IDMDiscretization,IDMItemCategory, IDMTaxonomyRelation, and IDMTaxonomy are described here.The correct definition of data objects and, if applicable, name mappings, valuemappings, discretizations, and taxonomies are prerequisites for a successful miningrun.


IDMData

Each of the mining settings classes and statistics function classes uses an IDMDataobject to hold the data table for the mining run.

Header file: idmcdat.hpp


IDM_INPUT_OUTPUT,IDM_INPUT_ONLY,IDM_OUTPUT_ONCE

} IDM_DataUseMode;

class IDMData : public IDMMiningClass {IDMDataTable ivDataTable;IDM_DataUseMode ivUseMode;

public:IDMData();IDMData( IDMRETURN &rc, IString name, IDMMiningBase *pMnb,

IDMDataTable &dataTable,IDM_DataUseMode=IDM_INPUT_OUTPUT );

IDMData( IDMRETURN &rc, IString name, IDMMiningBase *pMnb,IDMDataTable *pDataTable,IDM_DataUseMode=IDM_INPUT_OUTPUT );

∼IDMData();static IDMRETURN createObject( IString name,

IDMMiningBase *pMnb,IDMDataTable &dataTable,IDM_DataUseMode dataUseMode,IDMData *&pData );

static IDMRETURN createObject( IString name,IDMMiningBase *pMnb,IDMDataTable *pDataTable,IDM_DataUseMode dataUseMode,IDMData *&pData );

IDMRETURN deleteObject( );IDMRETURN update( IString name, IDMDataTable &dataTable,

IDM_DataUseMode=IDM_INPUT_OUTPUT );IDMRETURN update( IString name, IDMDataTable *pDataTable,

IDM_DataUseMode=IDM_INPUT_OUTPUT );IDMRETURN get( IString &name,

IDMMiningBase *&pMnb,IDMDataTable &dataTable,IDM_DataUseMode&);

IDMRETURN get( IString &name,IDMMiningBase *&pMnb,IDMDataTable *&pDataTable,IDM_DataUseMode&);

IDMRETURN setDataUseMode(IDM_DataUseMode useMode);IDM_DataUseMode getDataUseMode();

};

Data members:

ivDataTableA data table object describing the data source.

ivUseModeSpecifies if the data object can be used:v As an input and output data object (IDM_INPUT_OUTPUT)


v Only as an input data object (IDM_INPUT_ONLY)v Only once as an output data object and then as input only

(IDM_OUTPUT_ONCE)

Member functions:

IDMDataThe default constructor.

IDMData( IDMRETURN &rc, IString name, ...)Constructs a data object with a given name, mining base, and data table.The object is added to the IKeySortedSet collection of pointers to IDMDatainstances located in the mining base (class IDMMiningBase). An erroroccurred during constructing, if the return code is not equal toIDM_SUCCESS. The IDMData object should be deleted using the methoddeleteObject().

∼IDMDataThe destructor. If the object belongs to the data object extent of its miningbase, it is removed from this IKeySortedSet. Furthermore, all objects aredeleted that have this object as a data member. These objects belong to oneof the IDMSettings subclasses.

createObjectConstructs a data object with given values and returns it if no erroroccurred. The object is added to the IKeySortedSet collection of pointers toIDMData instances located in the mining base (class IDMMiningBase). Ifan error occurred, the object is deleted and pData is set to NULL.

deleteObjectIf the object is referenced by other objects, it returns IDM_ERROR.Otherwise, it calls the destructor.

updateUpdates the data members.

get Returns the name, the mining base, and the data table.

setDataUseModeSets ivUseMode to the given value.

getDataUseModeReturns the value of ivUseMode.

IDMNameMapping

In transactions the items can be represented by numbers (like barcodes). To obtaina readable output you can provide a mapping of the identifiers of the items tomeaningful names or descriptions. This is called name mapping.

The class IDMNameMapping defines name mappings on categorical fields.

Header file: idmcnmp.hpp

Format:class IDMNameMapping : public IDMMiningClass {

IDMDataTable ivDataTable;IString ivItemField;IString ivDescriptionField;

public:IDMNameMapping();


IDMNameMapping( IDMRETURN &rc,IString name,IDMMiningBase *pMiningBase,IDMDataTable &dataTable,IString itemField,IString descriptionField );

IDMNameMapping( IDMRETURN &rc,IString name,IDMMiningBase *pMiningBase,IDMDataTable *pDataTable,IString itemField,IString descriptionField );

∼IDMNameMapping();

static IDMRETURN createObject( IString name,IDMMiningBase *pMiningBase,IDMDataTable &dataTable,IString itemField,IString descriptionFields,IDMNameMapping *&pNameMapping );

static IDMRETURN createObject( IString name,IDMMiningBase *pMiningBase,IDMDataTable *pDataTable,IString itemField,IString descriptionFields,IDMNameMapping *&pNameMapping );

IDMRETURN deleteObject( );IDMRETURN update( IString name,

IDMDataTable &dataTable,IString itemField,IString descriptionField );

IDMRETURN update( IString name,IDMDataTable *pDataTable,IString itemField,IString descriptionField );

IDMRETURN get( IString &name,IDMMiningBase *&pMiningBase,IDMDataTable &dataTable,IString &itemField,IString &descriptionFields );

IDMRETURN get( IString &name,IDMMiningBase *&pMiningBase,IDMDataTable *&pDataTable,IString &itemField,IString &descriptionFields );

};

Data members:

ivDataTableA data table object describing the name mapping data source.

ivItemFieldA data field name of the data table specifying the field where the itemcode is located.

ivDescriptionFieldA data field name of the data table specifying the field where thedescription of the item code is located.

Member functions:

IDMNameMappingThe default constructor.


IDMNameMapping( IDMRETURN &rc, IString name, ...)Constructs a name mapping object with a given name, mining base, datatable, item field, and description field. The object is added to theIKeySortedSet collection of pointers to IDMNameMapping instanceslocated in the mining base (class IDMMiningBase). An error occurredduring construction, if the return code is not equal to IDM_SUCCESS.Delete the IDMNameMapping object using the method deleteObject().

∼IDMNameMappingThe destructor. If the object belongs to the name mapping object extent ofits mining base, it is removed from this IKeySortedSet. Furthermore, allpointers in other objects to this object are set to NULL. Pointers to namemapping objects occur in IDMItemCategory and IDMDataField objects.

createObjectConstructs a name mapping object with given values and returns it, if noerror occurred. The object is added to the IKeySortedSet collection ofpointers to IDMNameMapping instances located in the mining base (classIDMMiningBase). If an error occurred, the object is deleted andpNameMapping is set to NULL.


updateUpdates the name, the data table, the item field, and the description field.

get Returns the name, the mining base, the data table, the item field, and thedescription field.

IDMValueMapping

A value mapping maps discrete values to other values. It is defined by a table thatconsists of one or multiple argument columns and a value column. For example, thedays of the week can be mapped to integers by a table where the argumentcolumn contains the names for the week days (like Monday, Tuesday) and thevalue column the corresponding number (for example, 1, 2).

The class IDMValueMapping describes a table that defines a value mapping.

Header file: idmcvmp.hpp

Format:class IDMValueMapping : public IDMMiningClass {

IDMDataTable ivDefinitionTable;ISequence<IString> ivArgFieldNames;IString ivValueFieldName;

public:IDMValueMapping();IDMValueMapping(IDMRETURN &rc,

IString name,IDMMiningBase *pMnb,IDMDataTable &definitionTable,ISequence<IString> &argFieldNames,IString valueFieldName);

IDMValueMapping(IDMRETURN &rc,IString name,IDMMiningBase *pMnb,IDMDataTable *pDefinitionTable,


ISequence<IString> &argFieldNames,IString valueFieldName);

IDMRETURN createObject(IString name,IDMMiningBase *pMnb,IDMDataTable &definitionTable,ISequence<IString> &argFieldNames,IString valueFieldName,IDMValueMapping *&pValueMapping);

IDMRETURN createObject(IString name,IDMMiningBase *pMnb,IDMDataTable *pDefinitionTable,ISequence<IString> &argFieldNames,IString valueFieldName,IDMValueMapping *&pValueMapping);

∼IDMValueMapping();IDMRETURN deleteObject();IDMRETURN update(IString name,

IDMDataTable &definitionTable ,ISequence<IString> &argFieldNames,IString valueFieldName);

IDMRETURN update(IString name,IDMDataTable *pDefinitionTable ,ISequence<IString> &argFieldNames,IString valueFieldName);

IDMRETURN get(IString& name,IDMMiningBase *&pMnb,IDMDataTable &definitionTable ,ISequence<IString> &argFieldNames,IString &valueFieldName);

IDMRETURN get(IString& name,IDMMiningBase *&pMnb,IDMDataTable *&pDefinitionTable ,ISequence<IString> &argFieldNames,IString &valueFieldName);

};

Data members:

ivDefinitionTableThe data table defining it.

ivArgFieldNamesThe names of the argument fields of this table.

ivValueFieldNameThe name of the value field of this table.

Member functions:

IDMValueMapping()The default constructor.

IDMValueMapping(IDMRETURN &rc, IString name, ...)Initializes the member variables with the given input parameters.Furthermore, it adds the object to the value mapping extent of the miningbase it belongs to, if the return code is equal to IDM_SUCCESS, or thereturn code is equal to IDM_WARNING.

∼IDMValueMapping()The destructor. If the object belongs to the value mapping object extent ofits mining base, it is removed from this IKeySortedSet. Furthermore, all


pointers in other objects to this object are set to NULL. Pointers to valuemapping objects occur in IDMValueMappingField objects.

createObjectCreates a value mapping object and returns it, if no error occurs.Furthermore, it adds the object to the value mapping extent of the miningbase it belongs to. If an error occurred, it returns a NULL pointer.

deleteObjectChecks whether there are fields defined by this value mapping. If this isthe case, it returns IDM_ERROR. Otherwise it calls the destructor.

updateUpdates the values of the member variables.


IDMDiscretization

Real values can be discretized; for example, the real value range can be split intointervals and every interval can be mapped to a discrete value. Such discretizationfunctions are described by discretization objects. They are represented by athree-column table where each row represents an interval. This table consists of thefollowing fields:v A boundary field, indicating the boundary of the interval; the boundaries have

to be in ascending order; also, duplicate boundary values are not allowed.v A field indicating which interval the boundary belongs to: “<” means that the

boundary belongs to the interval of the next row, other values like “<=” meanthat the boundary belongs to the interval of the current row.

v A value field with the associated discrete value for that interval.

Because n-boundaries split the range of real numbers into n+1 intervals, the lastrow of this table contains an entry for the value field only.

For example, on a highway a speed below 80 km/h might be considered as slow,between 80 km/h and 120 km/h as medium, and above 120 km/h as high. Thiscan be represented by the following discretization table:

Boundary Interval flag Value

80 < slow

120 <= medium

high

The class IDMDiscretization describes a table that defines a discretization function.

Header file: idmcdis.hpp

Format:class IDMDiscretization : public IDMMiningClass {

IDMDataTable ivDefinitionTable;IString ivBoundaryFieldName;IString ivFlagFieldName;IString ivValueFieldName;

public:IDMDiscretization(IDMRETURN& rc,

IString name,IDMMiningBase *pMnb,


IDMDataTable &definitionTable,IString boundaryFieldName,IString flagFieldName,IString valueFieldName);

IDMDiscretization(IDMRETURN& rc,IString name,IDMMiningBase *pMnb,IDMDataTable *pDefinitionTable,IString boundaryFieldName,IString flagFieldName,IString valueFieldName);

IDMRETURN createObject(IString functionName,IDMMiningBase *pMnb,IDMDataTable &definitionTable,IString boundaryFieldName,IString flagFieldName,IString valueFieldName,IDMDiscretization *&pDiscr );

IDMRETURN createObject(IString functionName,IDMMiningBase *pMnb,IDMDataTable *pDefinitionTable,IString boundaryFieldName,IString flagFieldName,IString valueFieldName,IDMDiscretization *&pDiscr );

∼IDMDiscretization();IDMRETURN deleteObject();IDMRETURN update(IString name,

IDMDataTable &definitionTable ,IString boundaryFieldName,IString flagFieldName,IString valueFieldName);

IDMRETURN update(IString name,IDMDataTable *pDefinitionTable ,IString boundaryFieldName,IString flagFieldName,IString valueFieldName);

IDMRETURN get(IString &name,IDMMiningBase *&pMnb,IDMDataTable &definitionTable,IString &boundaryFieldName,IString &flagFieldName,IString &valueFieldName);

IDMRETURN get(IString &name,IDMMiningBase *&pMnb,IDMDataTable *&pDefinitionTable,IString &boundaryFieldName,IString &flagFieldName,IString &valueFieldName);

};

Data members:

ivDefinitionTableThe data table defining the discretization.

ivBoundaryFieldNameThe name of the boundary field of the data table.

ivFlagFieldNameThe field whose value indicates to which interval the boundary valuebelongs.


ivValueFieldNameThe name of the value field of the data table.

Member Functions:

IDMDiscretization()The default constructor.

IDMDiscretization(IDMRETURN &rc, IString name, ...)Initializes the member variables with the given input parameters.Furthermore, the object is added to the discretization extent of the miningbase it belongs to, if the return code is equal to IDM_SUCCESS or thereturn code is equal to IDM_WARNING.

∼IDMDiscretization()The destructor. If the object belongs to the discretization object extent of itsmining base, it is removed from this IKeySortedSet. Furthermore, allpointers in other objects to this object are set to NULL. Pointers todiscretization objects occur in IDMDiscretizationField objects.

createObjectCreates a discretization object and returns it, if no error occurs.Furthermore, the object is added to the discretization extent of the miningbase it belongs to. If an error occurred, it returns a NULL pointer.

deleteObjectChecks whether there are fields defined by this discretization object. If thisis the case, it returns IDM_ERROR. Otherwise, it calls the destructor.

updateUpdates the field with the values of the given parameters.


IDMItemCategory

An item category consists of a name and an optional name-mapping object. Itemcategories are global within a mining base, that means, a IKeySortedSet collectionof item category objects is located in the appropriate mining-base object.

Header file: idmctax.hpp

Format:class IDMItemCategory : public IDMMiningClass {

IDMNameMapping *pivNameMapping;public:

IDMItemCategory( );IDMItemCategory( IDMRETURN &rc,

IString name,IDMMiningBase *pMiningBase,IDMNameMapping *pNameMapping );

∼IDMItemCategory( );static IDMRETURN createObject( IString name,

IDMMiningBase *pMiningBase,IDMNameMapping *pNameMapping,IDMItemCategory *&pItemCategory );

IDMRETURN deleteObject();IDMRETURN update( IString name,

IDMNameMapping *pNameMapping );IDMRETURN get( IString &name, IDMMiningBase *&pMiningBase,

IDMNameMapping *&pNameMapping );};


Data members:

pivNameMappingA pointer to an optional name-mapping object.

Member Functions:

IDMItemCategoryThe default constructor.

IDMItemCategory( IDMRETURN &rc, IString name, ...)Constructs an item category object with a given name, mining base, andname-mapping object. The object is added to the IKeySortedSet collectionof pointers to IDMItemCategory instances located in the mining base (classIDMMiningBase). An error occurred during constructing, if return code isnot equal to IDM_SUCCESS. The IDMItemCategory object should bedeleted using the method deleteObject().

∼IDMItemCategory( )The destructor. If the object belongs to the item category object extent of itsmining base, it is removed from this IKeySortedSet. Furthermore, allobjects are deleted that reference this object. These objects belong to theIDMTaxonomy and IDMTaxonomyRelation classes.

createObjectConstructs an item category object with given values and returns it if noerror occurred. The object is added to the IKeySortedSet collection ofpointers to IDMItemCategory instances located in the mining base (classIDMMiningBase). If an error occurred, the object is deleted andpItemCategory is set to NULL.


updateUpdates the name and the name-mapping object.

get Returns the name, the mining base, and the name-mapping object.

IDMTaxonomyRelation

Items, like specific articles, can be classified into different categories. For example,a bottle of Riesling from the Rheingau can be classified as a bottle of white wine oras a bottle of German wine. If such categories are not taken into account duringmining, interesting correlations might be missed. This would mean that theAssociations mining function could not find out that white wine sells together withseafood if the mining data contains only items like mussels, salmon, red snapper,Chablis, or Rhein Riesling.

A taxonomy object represents such a hierarchy or lattice of item categories. Becauselattices are allowed, one category may have multiple parent categories. This is thecase, for example, for the Rheingau Riesling that has white wine and German wineas parent categories.

A taxonomy relation defines the relation between a child and a parent-itemcategory. Taxonomy relations are global within a mining base, that means, aIKeySortedSet collection of taxonomy relation objects is located in the appropriatemining base object.



Format:class IDMTaxonomyRelation : public IDMMiningClass {

IDMItemCategory *pivChildItem;IDMItemCategory *pivParentItem;IDMDataTable ivDataTable;IString ivChildItemField;IString ivParentItemField;

public:IDMTaxonomyRelation( );IDMTaxonomyRelation( IDMRETURN &rc,

IDMItemCategory *pChildItem,IDMItemCategory *pParentItem,IDMMiningBase *pMiningBase,IDMDataTable &dataTable,IString childItemField,IString parentItemField );

IDMTaxonomyRelation( IDMRETURN &rc,IDMItemCategory *pChildItem,IDMItemCategory *pParentItem,IDMMiningBase *pMiningBase,IDMDataTable *pDataTable,IString childItemField,IString parentItemField );

∼IDMTaxonomyRelation( );static IDMRETURN createObject( IDMItemCategory *pChildItem,

IDMItemCategory *pParentItem,IDMMiningBase *pMiningBase,IDMDataTable &dataTable,IString childItemField,IString parentItemField,IDMTaxonomyRelation *&pTaxonomyRelation );

static IDMRETURN createObject( IDMItemCategory *pChildItem,IDMItemCategory *pParentItem,IDMMiningBase *pMiningBase,IDMDataTable *pDataTable,IString childItemField,IString parentItemField,IDMTaxonomyRelation *&pTaxonomyRelation );

IDMRETURN deleteObject();

IDMRETURN update( IDMDataTable &dataTable,IString childItemField,IString parentItemField );

IDMRETURN update( IDMDataTable *pDataTable,IString childItemField,IString parentItemField );

IDMRETURN get( IDMItemCategory *&pChildItem,IDMItemCategory *&pParentItem,IDMMiningBase *&pMiningBase,IDMDataTable &dataTable,IString &childItemField,IString &parentItemField );

IDMRETURN get( IDMItemCategory *&pChildItem,IDMItemCategory *&pParentItem,IDMMiningBase *&pMiningBase,IDMDataTable *&pDataTable,IString &childItemField,IString &parentItemField );

};

Data members:


pivChildItemA pointer to the child item of the taxonomy relation. Together withpivParentItem it is the key of the IKeySortedSet collection ofIDMTaxonomyRelation instances in class IDMMiningBase.

pivParentItemA pointer to the parent item of the taxonomy relation.

ivDataTableA data table object describing the taxonomy relation data source.

ivChildItemFieldA data field name of the data table specifying the field where the childitem is located.

ivParentItemFieldA data field name of the data table specifying the field where the parentitem is located.

Member functions:

IDMTaxonomyRelationThe default constructor.

IDMTaxonomyRelation( IDMRETURN &rc, ...)Constructs a taxonomy relation object with given values for the privatemember variables. The object is added to the IKeySortedSet collection ofpointers to IDMTaxonomyRelation instances located in the mining base(class IDMMiningBase). An error occurred during constructing, if thereturn code is not equal to IDM_SUCCESS. The IDMTaxonomyRelationobject should be deleted using the method deleteObject().

∼IDMTaxonomyRelation( )The destructor. If the object belongs to the taxonomy relation object extentof its mining base, it is removed from this IKeySortedSet. Furthermore, allobjects are deleted that have this object as a data member. These objectsbelong to the IDMTaxonomy class.

createObjectConstructs a taxonomy relation object with given values and returns it ifno error occurred. The object is added to the IKeySortedSet collection ofpointers to IDMTaxonomyRelation instances located in the mining base(class IDMMiningBase). If an error occurred, the object is deleted andpTaxonomyRelation is set to NULL.


updateUpdates the child item field, the parent item field, and the data table.

get Returns the values of the private member variables of the taxonomyrelation object.

IDMTaxonomy

A taxonomy is a directed acyclic graph that consists of nodes and directed arcs. Anode represents an item category and an arc a taxonomy relation.

A taxonomy object consists of a set of taxonomy relation objects.



Format:class IDMTaxonomy : public IDMMiningClass {

ISortedRelation < IDMTaxonomyRelation*, IString >ivTaxonomyRelations;

public:IDMTaxonomy( );IDMTaxonomy( IDMRETURN &rc, IString name,

IDMMiningBase *pMiningBase,ISortedRelation < IDMTaxonomyRelation*, IString >

&taxonomyRelations );∼IDMTaxonomy( );static IDMRETURN createObject( IString name,

IDMMiningBase *pMiningBase,ISortedRelation<IDMTaxonomyRelation*,

IString> &taxonomyRelations,IDMTaxonomy *&pTaxonomy );


ISortedRelation < IDMTaxonomyRelation*,IString > &taxonomyRelations );

IDMRETURN get( IString &name, IDMMiningBase *&pMiningBase,ISortedRelation < IDMTaxonomyRelation*,IString > &taxonomyRelations );

IDMRETURN addRelation( IDMTaxonomyRelation *&pTaxonomyRelation );IDMRETURN removeRelation( IDMTaxonomyRelation *&pTaxonomyRelation );

};

Data members:

ivTaxonomyRelationsA SortedRelation collection of taxonomy relation objects.

Member functions:

IDMTaxonomyThe default constructor.

IDMTaxonomy( IDMRETURN &rc, IString name, ...)Constructs a taxonomy object with a given name, mining base, and set oftaxonomy relation objects. The object is added to the IKeySortedSetcollection of pointers to IDMTaxonomy instances located in the miningbase (class IDMMiningBase). An error occurred during constructing, ifreturn code rc is not equal to IDM_SUCCESS. The IDMTaxonomy objectshould be deleted using the method deleteObject().

∼IDMTaxonomy( )If the object belongs to the taxonomy object extent of its mining base, it isremoved from this IKeySortedSet. Furthermore, all pointers in other objectsto this object are set to NULL. Pointers to taxonomy objects occur inIDMAssocSettings and IDMSeqPatternSettings objects.

createObjectConstructs a taxonomy object with given values and returns it if no erroroccurred. The object is added to the IKeySortedSet collection of pointers toIDMTaxonomy instances located in the mining base (classIDMMiningBase). If an error occurred, the object is deleted andpTaxonomy is set to NULL.



updateUpdates the name and the SortedRelation collection of taxonomy relationobjects. To update the taxonomy relation objects, you must get the wholecollection and then change the appropriate elements of the collection.

get Returns the name, the mining base, and the SortedSet collection oftaxonomy relation objects.

addRelationAdds a taxonomy relation to the SortedRelation collection.

removeRelationRemoves a taxonomy relation from the SortedRelation collection.

Mining run selection

Data selections allow the selection of a part of the data specified by a data objectfor mining. For example, if you have sales transaction data from a certain period oftime and there are month and day data fields in the data tables of thecorresponding data object, data selection allows you to mine only on thetransactions coming from certain months or certain days.

The selection conditions are specified in disjunctive normal form, this means theyare a disjunction of conjunctions of atomic selections. Atomic selections consist of:

A sign Indicating whether the selection condition is negated.

A predicateThe name of the selection predicate.

A list of names of data fieldsThe arguments of the selection predicate.

A set of predefined predicates (like equal, less than, greater than) is provided.

A partition of a data table, on which a mining run is to be done, can be specifiedusing the following selection classes:

IDMAtomicSelection

The class IDMAtomicSelection defines atomic selection conditions for the values ofdata fields. More complex selections are formed by using the IDMSelections andIDMAndSelections classes.

Header file: idmcsel.hpp

Format:class IDMAtomicSelection : public IDMBase {

IDMBOOLEAN ivSign;IString ivPredicate;ISequence<IString> ivArgFields;

public:IDMAtomicSelection();IDMAtomicSelection( IDMRETURN &rc, IDMBOOLEAN sign,

IString predicate,ISequence<IString> &argFields);

IDMAtomicSelection(IDMRETURN &rc,IDMBOOLEAN sign,IString predicate,IString argField1,


IString argField2);∼IDMAtomicSelection();IDMRETURN update( IDMBOOLEAN sign,

IString predicate,ISequence<IString> &argFields);

IDMRETURN get( IDMBOOLEAN &sign,IString& predicate,ISequence<IString> &argFields);

static IString encode( IString constStr );static IString encode( IDMREAL constReal );static IDMBOOLEAN decode( const IString encodedString,

IString &decodedString,IDMBOOLEAN &isConstant,IDM_FieldDataType &fDataType );

static ISet<IString> getPredicateNames();virtual const IDMException* getStaticException() const;

};

Data members:

ivSign The sign indicating whether the predicate should be negated (ivSign ==IDM_FALSE) (ivSign == IDM_TRUE).

ivPredicateThe name of the predicate.

ivArgFieldsThe names of the argument fields. Constants can be encoded as a stringusing the encode-static member functions.

Member functions:

IDMAtomicSelectionThe default constructor.

IDMAtomicSelection(IDMRETURN &rc, IDMBOOLEAN sign, ...)Initializes the member variables with the given input variables. It checks ifthe predicate exists. The compatibility of the signature with the types ofthe argument fields is done by the methods of the settings objects thathave selections as a member variable.

∼IDMAtomicSelectionThe destructor.

updateChanges the values of the private data members of the atomic selectionobject.

get Returns the values of the private data members of the atomic selectionobject.

encode( IString constStr );Encodes a string argument value as a string.

encode( IDMREAL constReal );Encodes an integer or real argument value as a string.

decodeDecodes an encoded argument. IDM_TRUE is returned if the encodedStringcan be decoded. For constants fDataType is set to the corresponding fielddata type (IDM_REAL_TYPE or IDM_STRING_TYPE) of the data type ofthe constant.


getPredicateNamesRetrieves the set of all available predicates. Currently, these are the basicpredicates of DB2: =,<>,>,>=,<,<=. All of them have an arbitrary number ofarguments.


IDMAndSelections

Defines the conjunctions of a sequence of IDMAtomicSelections instances.

Format:typedef ISequence<IDMAtomicSelections> IDMAndSelections;

IDMSelections

Defines the disjunction of a sequence of IDMAndSelections instances.

Format:typedef ISequence<IDMAndSelections> IDMSelections;

Item constraints

Item constraints are used for the Associations mining function only for specifyingthe items that should appear in the association rules or sequential patterns. Itemconstraints are a disjunction of conjunctions of atomic constraints.

IDMAtomicConstraint

The class IDMAtomicConstraint defines atomic constraints the items in a ruleshould satisfy. More complex constraints are formed by using theIDMItemConstraints and IDMAndConstraints classes.

Header file: idmcitem.hpp

Format:typedef enum { IDM_EQUAL } IDMConstraintOperator;

class IDMAtomicConstraint : public IDMBase {

IDMBOOLEAN ivSign;IDMConstraintOperator ivOperator;IString ivValue;

public:IDMAtomicConstraint();IDMAtomicConstraint(IDMRETURN &rc,

IDMBOOLEAN sign,IDMConstraintOperator op,IString value );

∼IDMAtomicConstraint();IDMRETURN update(IDMBOOLEAN sign,

IDMConstraintOperator op,IString value);


IDMRETURN get(IDMBOOLEAN &sign,IDMConstraintOperator& op,IString& value);

virtual const IDMException* getStaticException() const;};

Data members:

ivSign The sign indicating whether the constraint operator should be negated(ivSign == IDM_FALSE) (ivSign == IDM_TRUE).

ivOperatorThe constraint operator, for Version 2 only equality will be supported.

ivValueThe value the items should be compared with.

Member functions:

IDMAtomicConstraintThe default constructor.

IDMAtomicConstraint(IDMRETURN &rc, IDMBOOLEAN sign, ...)Initializes the member variables with the given input variables.

∼IDMAtomicConstraintThe destructor.

updateChanges the values of the private data members of the atomic constraintobject.

get Returns the values of the private data members of the atomic constraintobject.


IDMAndConstraints

Defines the conjunction of a sequence of IDMAtomicConstraints instances.

Format:typedef ISequence<IDMAtomicConstraints> IDMAndConstraints;

IDMConstraints

Defines the disjunction of a sequence of IDMAndConstraints instances.

Format:typedef ISequence<IDMAndConstraints> IDMConstraints;

Mining results

You can access the results of a mining run through the result objects. Several resultobjects are united in a result set object.


IDMResult

The class IDMResult holds information about a mining result, not a mining resultitself. The mining result itself is located in a file.

Header file: idmcres.hpp


IDM_NO_RESULT,IDM_ASS_RESULT,IDM_SPS_RESULT,IDM_SIM_RESULT,IDM_CLF_TREE_RESULT,IDM_CLF_TREE_TEST_RESULT,IDM_CLF_TREE_APPL_RESULT,IDM_CLF_NEURAL_RESULT,IDM_CLF_NEURAL_TEST_RESULT,IDM_CLF_NEURAL_APPL_RESULT,IDM_CLUS_DEMO_RESULT,IDM_CLUS_DEMO_TEST_RESULT,IDM_CLUS_DEMO_APPL_RESULT,IDM_CLUS_NEURAL_RESULT,IDM_CLUS_NEURAL_TEST_RESULT,IDM_CLUS_NEURAL_APPL_RESULT,IDM_PRE_RESULT,IDM_PRE_TEST_RESULT,IDM_PRE_NEURAL_RESULT,IDM_PRE_NEURAL_TEST_RESULT,IDM_DESC_STAT_QUANT_RESULT,IDM_STAT_UNIVARIATE_CURVE_RESULT,IDM_STAT_LINEAR_REGRESSION_RESULT,IDM_STAT_PRIN_COM_ANALYSIS_RESULT,IDM_STAT_FACTOR_ANALYSIS_RESULT,IDM_DATA_SAMPLE_RESULT,IDM_GENERIC_RESULTIDM_GENERIC_RESULTIDM_DESC_STAT_QUANT_SAMPLE_RESULT

}IDM_ResultType;

typedef IKeySortedSet< IDMBrowseFormat, IDMBrowseFormatKey > IDMBrowseFormatDefs;

class IDMResult : public IDMMiningClass {public:

IDMResult();IDMResult( IDMRETURN &rc, IString name, IDMMiningBase *pMiningBas

IString resultSetName = "");IDMResult( IDMRETURN &rc, IString name, IDMResult &result,

IString resultSetName = "");∼IDMResult();static IDMRETURN createObject( IString name,

IDMMiningBase *pMiningBase,IDMResult *&pResult,IString resultSetName = "");

IDMRETURN deleteObject();IDMRETURN update(IString name);IDMRETURN put(IString newResultName, IString resultSet );IDMRETURN get( IString &name, IDMMiningBase *&pMiningBase);

IDMRETURN export(IString fileName, IString exportFormat,IString exportOptionDesc ) const;

IDMRETURN export(IString fileName ) const;


IDMRETURN export( const IString& exportFile, const IDMBrowseFormat& aFormatDef ) const;

static IDMRETURNgetExportFormats( const IDM_RESULT_TYPE& resultType,

IDMBrowseFormatDefs& outputFormats );

static IDMRETURNgetBrowseFormats( const IDM_RESULT_TYPE& resultType,

IDMBrowseFormatDefs& outputFormats ) ;

IDMRETURN

launchBrowser( const IDMBrowseFormat& browseFormat ) const;

IDMRETURN import(IString fileName );static IDMRETURN initializeExport();static IDMRETURN terminateExport();static IDMRETURN getExportFormats(ISortedSet<IString> &exportFormats );static IDMRETURN getExportOptions(IString exportFormat,

ISortedSet<IString> &exportOptions );

IDMRETURN setResultSetPointer( IDMResultSet *pResultSet );IDM_RESULT_TYPE getResultType();IDMRETURN setResultType( IDM_RESULT_TYPE type );IString getResultTypeString() const;

};

Data members:

ivResultTypeSpecifies by which mining function this result was written. When starting amining run, the type of the result is set. For each result type, an entry inthe client tool registration files idmcsstr.dat and idmcsctr.dat is available.Dependent on the result type, result transformer program names andbrowser names are specified in these files. The following result types areavailable:

IDM_NO_RESULTThe result has no specific type.

IDM_ASS_RESULTThe result was produced by an Associations mining run.

IDM_SPS_RESULTThe result was produced after a Sequential Patterns mining run.

IDM_SIM_RESULTThe result was produced after a Similar Sequences mining run.

IDM_CLF_TREE_RESULTThe result was produced after a Tree Classification mining run.

IDM_CLF_TREE_TEST_RESULTThe result was produced by a test run (classification) or a statisticsrun (clustering).

IDM_CLF_TREE_APPL_RESULTThe result was produced by an application run.

IDM_CLF_NEURAL_RESULTThe result was produced after a Neural Classification mining run.


IDM_CLF_NEURAL_TEST_RESULTThe result was produced by a test run (classification) or a statisticsrun (clustering).

IDM_CLF_NEURAL_APPL_RESULTThe result was produced by an application run.

IDM_CLUS_DEMO_RESULTThe result was produced after a Demographic Clustering miningrun.

IDM_CLUS_DEMO_TEST_RESULTThe result was produced by a test run (classification) or a statisticsrun (clustering).

IDM_CLUS_DEMO_APPL_RESULTThe result was produced by an application run.

IDM_CLUS_NEURAL_RESULTThe result was produced after a Neural Clustering mining run.

IDM_CLUS_NEURAL_TEST_RESULTThe result was produced by a test run (classification) or a statisticsrun (clustering).

IDM_CLUS_NEURAL_APPL_RESULTThe result was produced by an application run.

IDM_PRE_RESULTThe result was produced after an RBF Prediction training run.

IDM_PRE_TEST_RESULTThe result was produced after an RBF Prediction test run.

IDM_PRE_NEURAL_RESULTThe result was produced after a Neural Prediction training run.

IDM_PRE_NEURAL_TEST_RESULTThe result was produced after a Neural Prediction test run.

IDM_DESC_STAT_QUANT_RESULTThe result contains descriptive statistics and quantile results of thefunction for descriptive statistics, quantiles, and sampling (DQS).

IDM_STAT_UNIVARIATE_CURVE_RESULTThe result was produced by the Univariate Curve Fitting function.

IDM_STAT_LINEAR_REGRESSION_RESULTThe result was produced by the Linear Regression function.

IDM_STAT_PRIN_COM_ANALYSIS_RESULTThe result was produced by the Principal Component Analysisfunction.

IDM_STAT_FACTOR_ANALYSIS_RESULTThe result was produced by the Factor Analysis function.

IDM_DATA_SAMPLE_RESULTThe result contains an embedded data table, produced by the DQSfunction as a data sample or by any other function.

IDM_GENERIC_RESULTThe result was produced by a function whose parameters aredefined by the IDMGenericSettingsClass.


IDM_DESC_STAT_QUANT_SAMPLE_RESULTThe result contains an embedded data table produced by the DQSfunction as a data sample if the data member ivSampleType ofclass IDMDescQuantSampleSettings is set toIDM_EXPLORE_FORMAT.

ivResultFileThe name of the file the result should be written to.

pivResultSetPointer to the result set if the result was assigned to a result set.

Member functions:

IDMResultThe default constructor.

IDMResult(IDMRETURN &rc, IString name, IDMMiningBase *pMiningBase,IString resultSetName)

Constructs a result object with given name and mining base. A file namefor the result file is specified by the constructor. The result file is locked ifthe mining base is accessed with write access. If no resultSetName is given,the result object is added to the ivResult IKeySortedSet collection located inthe mining base (class IDMMiningBase). If the resultSetName is set, theIDMResult object is not added to the ivResult IKeySortedSet collectionlocated in the mining base. An error has occurred if the return code is notequal to IDM_SUCCESS. The result set object should be deleted usingdeleteObject().

IDMResult( IDMRETURN &rc, IString name, IDMResult &result, IStringresultSetname )

Constructs a new result object with a new name from an existing resultobject. The result file of the new result object is locked if the mining base isaccessed with write permission. If no resultSetName is given, the resultobject is added to the ivResult IKeySortedSet collection located in themining base (class IDMMiningBase). If the resultSetName is set, theIDMResult object is not added to the ivResult IKeySortedSet collectionlocated in the mining base. It is added to the result set with the givenname.

∼IDMResultThe destructor. The file containing the result is deleted.

createObjectConstructs a result object with given values and returns it if no erroroccurred. The result file of the new result object is locked if the miningbase is accessed with write permission. If no resultSetName is given, theresult object is added to the ivResult IKeySortedSet collection located in themining base (class IDMMiningBase). If the resultSetName is set, theIDMResult object is not added to the ivResult IKeySortedSet collectionlocated in the mining base. It is added to the result set with the givenname. If an error occurred, the result object is deleted and pResult is set toNULL.

deleteObjectIf the result object is located in a result set, it is removed from the resultset. Otherwise, it is removed from the IKeySortedSet collection located inthe mining base object. At last the destructor is called.


updateChanges the name of the result object.

get Returns the name of the result object and a pointer to mining base object.

put Puts a result object, which is located in the mining base, into a result setwith the specified name. You can choose a new name.

export (IString fileName)Copies a result into the given fileName. When running in client/servermode, the file into which the result is copied will be located on the client.

IDMRETURN export( const IString& exportFile , const IDMBrowseFormat&aFormatDef ) const

Exports the result into file (name specified in parameter 1) in a formatdetermined by the export format (specified in parameter 2).

If an environment variable named IDM_MAX_EXP_RESULT_SIZE wasspecified, and the format definition (parameter 2) does not specify one,then this method uses the value specified in the environment variable. Toexport a result, you can write an application that performs the followingsteps:1. getExportFormats()2. Choose IDMBrowseFormat from the list of export formats3. Call this export method using export format

static IDMRETURN getExportFormats( const IDM_RESULT_TYPE& resultType,IDMBrowseFormatDefs& outputFormats );

This method updates your IDMBrowseFormatDefs (parameter 2) with allthe export formats that match the requested result type (parameter 1). Theformats are stored on the client in a file named idmcsctr.dat. Fileidmcbrwf.hpp contains the definitions for export format definitions.

static IDMRETURN getBrowseFormats( const IDM_RESULT_TYPE& resultType,IDMBrowseFormatDefs& outputFormats ) ;

This method updates your IDMBrowseFormatDefs (parameter 2) with allthe browse formats that match the requested result type (parameter 1). Theformats are stored on the client in a file named idmcsctr.dat. Fileidmcbrwf.hpp contains the definitions for browse format definitions.

importImports the rules that are contained in the file with the given file nameinto the result file of the result object. When running in client/servermode, the file with a given file name is located on the client.

Import is not allowed if the mining base is accessed with read permissionand the result file is locked by a user who has accessed the mining basewith write permission.

The result type of the result object (member ivResultType) is not changedduring importing. You have to ensure that the type of the result matchesthe content of the result file. Use the method setResultType to set theappropriate type of the result.

IDMRETURN launchBrowser (const IDMBrowseFormat& browseFormat) const;This method executes the browser named in the IDMBrowseFormat passedas parameter 1. To launch a browser, you can write an application thatperforms the following steps:1. IDMResult::getBrowseFormats(...)


2. Choose one IDMBrowseFormat from the set of IDMBrowseFormatDefsfrom the previous step. The chosen IDMBrowseFormat is used in thenext step.

3. resultObject->launchBrowser(...)

The browser is executed synchronously, so you might want to call thelaunchBrowser method from a different thread than the rest of yourprocess.

IDMResultSet

The class IDMResultSet groups the result objects of mining runs.


Format:class IDMResultSet : public IDMMiningClass {

public:

IDMResultSet();

IDMResultSet( IDMRETURN &rc, IString name,IDMMiningBase *pMiningBase );

∼IDMResultSet() ;

static IDMRETURN createObject( IString name,IDMMiningBase *pMiningBase,IDMResultSet *&pResultSet );

IDMRETURN deleteObject();IDMRETURN update(IString Name );

IDMRETURN get( IString &name,IKeySortedSet<IDMResult*, IString> &results,IDMMiningBase *&pMiningBase );

IDMRETURN copyResult( IString resultName, IString newResultName,IString newResultSet );

IDMRETURN moveResult( IString resultName, IString newResultName,IString newResultSet );

};

Data members:

ivResultsA IKeySortedSet collection of IDMResult instances.

Member functions:

IDMResultSetThe default constructor.

IDMResultSet( IDMRETURN &rc, IString name,...)Constructs a result set object with given name and given mining base. Theresult set object is added to the ivResultSet IKeySortedSet collection locatedin the mining base (class IDMMiningBase). An error has occurred if thereturn code is not equal to IDM_SUCCESS. The result set object should bedeleted using deleteObject().


∼IDMResultSetThe destructor. The destructor is called for each result object of the resultset.

createObjectConstructs a result set object with given values and returns it if no erroroccurred. The result set object is added to the ivResultSet IKeySortedSetcollection located in the mining base (class IDMMiningBase). If an erroroccurred, the result set object is deleted and pResultSet is set to NULL.

deleteObjectRemoves the object from the IKeySortedSet collection located in the miningbase object and calls the destructor. All IDMResult objects located inivResults are deleted.

updateChanges the name of the result set object.

get Retrieves the name of the result set object, the mining base, and theivResults IKeySortedSet collection.

copyResultCopies an IDMResult object from the result set to another result set withthe given name. A new name for the IDMResult object can be specified.

moveResultMoves an IDMResult object from the result set to another result set withthe given name. You can specify a new name for the IDMResult object.

IDMBrowseFormat

Class IDMBrowseFormat represents a format definition which has been loadedfrom the client’s tool registration file (idmcsctr.dat). See idmcsctr.dat for commentson valid values for format definitions.


Format:class IDMBrowseFormat{

IDMBrowseFormatKey ivKey;IString ivBrowserName;IString ivBrowserParms;IDMREAL ivMaxResultSize;IDMRETURN ivState;

public:IDMBrowseFormat();IDMBrowseFormat( IDMRETURN rc

, const IDM_ResultType& resultType, const IString& resultFormat, const IString& menuText = "", const IString& browserName = "", const IString& browserParms = "", const IDMREAL maxResultSize = 0 );

IDMBrowseFormat( const IDMBrowseFormat& fmt );∼IDMBrowseFormat();

IDMBrowseFormat& operator= (IDMBrowseFormat const&);IBoolean operator== (IDMBrowseFormat const&) const;


IBoolean operator< (IDMBrowseFormat const&) const;

IDMRETURN update( const IDM_ResultType& resultType, const IString& resultFormat, const IString& menuText, const IString& browserName, const IString& browserParameters, const IDMREAL& maxNumberOfBytes );

IDMRETURN get ( IDM_ResultType& resultType, IString& resultFormat, IString& menuText, IString& browserName, IString& browserParms, IDMREAL& maxNumberOfBytes ) const;

IDMRETURN getLaunchString( IString& exeAndParmsIString& fileName ) const;

IDMBrowseFormatKey const& key() const;

IDMBOOLEAN isValid() const;

};

Data members:

ivKey A unique IDMBrowseFormatKey, defined below.

ivBrowserNameThe name of an executable program which can be called to show results.

ivBrowserParmsThe parameter string for ivBrowserName.

ivMaxResultSizeAn optional IDMREAL which specifies the number of bytes allowed in acsv-formatted exported result. If not specified, the value in theIDM_MAX_EXP_RESULT_SIZE environment variable is used.

IDMBrowseFormatKey

Class IDMBrowseFormatKey represents a unique key for a set of browser formatdefinitions. Definitions in idmcsctr.dat must not have the same combination ofresult type, result format, and browser format definitions.


Format:class IDMBrowseFormat;class IDMBrowseFormatKey;

typedef IKeySortedSet< IDMBrowseFormat, IDMBrowseFormatKey >IDMBrowseFormatDefs;

IDMBrowseFormatKey const& key( const IDMBrowseFormat& aFmt);

class IDMBrowseFormatKey{

IDM_ResultType ivResultType;IString ivResultFormat;IString ivMenuText;


IDMRETURN ivState;

public:IDMBrowseFormatKey();IDMBrowseFormatKey(const IDM_ResultType& type,

const IString& format,const IString& menuText );

∼IDMBrowseFormatKey();IDMBrowseFormatKey(const IDMBrowseFormatKey&);IDMBrowseFormatKey& operator= (IDMBrowseFormatKey const&);IBoolean operator== (IDMBrowseFormatKey const&) const;IBoolean operator< (IDMBrowseFormatKey const&) const;void setResultType( const IDM_ResultType& resultType);void setResultFormat( const IString& format );void setMenuText( const IString& string);IDM_ResultType resultType() const;IString resultFormat() const;IString menuText() const;IDMBOOLEAN isValid() const;};

Data members:

See file idmcsctr.dat for examples of valid values for the data membersivResultType, ivResultFormat, and ivMenuText.

ivResultTypeA string which corresponds to an IDM_RESULT_TYPE.

ivResultFormatA string which specifies the format of data in the result file.

ivMenuTextA string (translatable) which can be displayed to explain the browserformat definition.

Settings

General settings common to all mining run settings classes.

IDMSettings

The abstract class IDMSettings combines all data definitions and otherspecifications that are common to all mining run settings classes, preprocessingfunctions settings classes, statistics functions settings classes, and repeatablesequences settings classes.

Header file: idmcset.hpp

Format:class IDMSettings : public IDMMiningClass {

protected:

IDMData *pivData;IDMData *pivOutputData;IDMSelections ivSelection;IDMRemoteJob *pivRemoteJob;IDMResult ivResult;IDMText ivOptions;IDMBOOLEAN ivOptimizedForTime;


IString ivResultName;IDMBOOLEAN ivOverwriteResult;IDMText ivResultComment;

static ISequence<IDMSettings*> cvRunningSettings;

IDMINTEGER ivNbProcesses;IDMTimeStamp ivStartTimeStamp;

public:

IDMRETURN start(IDMBOOLEAN syncRunFlag=IDM_TRUE,IDMINTEGER numberOfNodes=-2,IDMINTEGER traceLevel=0);

IDMRETURN stop();

IDMRETURN getState(IDMJobState &jobState,IDMINTEGER &exitCode,IDMException *&pExc );

IDMRETURN getMsgFileNames(IString &errMsgFile, IString &traceFile);

IDMResult const& getResult() const;const IDMResult* getResultPointer() const;

void setNumberProcesses( IDMINTEGER number );

void getNumberProcesses( IDMINTEGER &number ) const;

IDMINTEGER getNumberProcesses( )const;

IDMRETURN getStatusInfo(ISequence<IString> &settingsNames,IDMMININGCLASS &settingsType,IString &kernelPhase,IDMINTEGER &iterationNumber,IDMREAL &progress,IString &qualityMeasure,IDMREAL &qualityValue,IString &statusMessage);

IDMRETURN getStatusInfo(ISequence<IString> &settingsNames,IString &settingsTypeString,IString &kernelPhase,IDMINTEGER &iterationNumber,IDMREAL &progress,IString &qualityMeasure,IDMREAL &qualityValue,IString &statusMessage);

IDMSelections const& getSelection() const;const IDMSelections* getSelectionPointer() const;IDMRETURN setSelection(IDMSelections& selection);IDMRETURN setOptions(IString);

IString getOptions() const;

virtual IDMRETURN optimizeForTime(IDMBOOLEAN optTime);

IDMBOOLEAN isOptimizedForTime() const;

virtual IDMRETURN setResultName( IString resName,IDMBOOLEAN overwriteResult=IDM_TRUE);

IDMRETURN setResultComment(IString comment);

IString getResultComment() const;


void getResultName(IString& resName,IDMBOOLEAN &overwriteResult) const;

static ISequence<IDMSettings*> getRunningSettings();

IDMRETURN get(IDMMININGCLASS& classType,IString& name,IString& comment,IDMTimeStamp& creationTimeStamp,IDMTimeStamp& updateTimeStamp,IString &options,IString &resultName,IDMBOOLEAN &overwriteResult,IDMBOOLEAN &optimizedForTime,IDMINTEGER &nbProcesses) const;

IDMRETURN get(IDMMININGCLASS& classType,IString& name,IString& comment,IString& creationTimeStampString,IString& updateTimeStampString,IString &options,IString &resultName,IDMBOOLEAN &overwriteResult,IDMBOOLEAN &optimizedForTime,IDMINTEGER &nbProcesses) const;

IDMRETURN set(IString comment,IString options,IString resultName,IDMBOOLEAN overwriteResult,IDMBOOLEAN optimizedForTime,IDMINTEGER nbProcesses = -2);

const IDMTimeStamp& getStartTimeStamp() const;

IString getStartTimeStampString() const;

IDMBOOLEAN elapsedTimeSinceStart(IDMINTEGER &nbDays,IDMINTEGER &nbHours,IDMINTEGER &nbMinutes,IDMINTEGER &nbSeconds);

};

Data members:

pivDataThe data object which contains the specification of the input data table.

pivOutputDataSome mining, processing, or statistics functions produce output data inaddition to the result of a run or instead of a result of a run. The outputdata is in tabular form and therefore can be used as input for anothermining run.

Where the output data should be written to (relational database table orflat file table) must be specified in this pivOutputData object. Thespecification of the fields of the output data is determined by the miningfunction. After the mining run, the pivOutputData object is updated withthe field specifications.

Any existing field specifications are overwritten by those determined bythe function. In general, the produced output data consists of a subset ofthe input fields and a set of fields specific to the type of settings, for


example, the cluster number and the cluster score for each recorddetermined in a neural or demographic clustering run. If the settings typedoes not support the generation of output data, this pointer will be null.

ivSelectionIn this selection object filters on the input data table can be specified. Thatmeans that, depending on the filters that are set, only a part of the inputdata table is used by the mining function. Specifying filters is optional.

ivResultThe result object of the mining or statistics run. This result object containsthe specification of the file into which the result of the mining or statisticsrun is written. The current result exists until a new run is done.

ivOptimizedForTimeIf set to IDM_TRUE, the mining run is optimized for time, if set toIDM_FALSE, the mining run is optimized for memory.

ivOptionsA string that can contain newline characters. It allows users to specifyfunction-specific power options.

ivResultNameWhen a mining run has terminated successfully, the result can be savedautomatically under a result object; ivResultName is the name of thisobject. If the ivResultName is the empty string, the mining result still canbe accessed through the result object of the settings object (ivResult).

ivOverwriteResultIf this flag is set to IDM_FALSE and a result object with the nameivResultName already exists, the mining result cannot be saved under aresult object with that name. The start method checks this condition andreturns IDM_ERROR if this is the case.

ivNbProcessesSpecifies the number of processes of the appropriate mining technique thatshould be started.

ivStartTimeStampThe time the last job was started. The initial value is the creation timestamp.

Member functions:

start Starts a function run. The parameter syncRunFlag specifies whether therun is called synchronously (IDM_TRUE) or asynchronously (IDM_FALSE).If the run is started asynchronously, the state of the job can be retrievedusing the getState method and the run can be stopped using the stopmethod.

The parameter numberOfProcesses can be specified to overwrite theivNbProcesses attribute that can be set with method setNumberProcesses. Thevalue -2 means that ivNbProcesses should not be overwritten. Refer to thedescription of method setNumberProcesses to get the possible values to useas numberOfProcesses parameter.

If the function does not support parallel runs and if ivNbProcesses is set toa value greater than 1, ivNbProcesses is set to 0 and the serial function isstarted. If you want to run a function in parallel mode and the parallelfunctions are not installed, an error occurs.


Which processors are used is specified in the host list file namedidmhost.list located in the directory that is identified by theIDM_MNB_DIR environment variable (private host list file) or if the file isnot there, located in the directory that is identified by the IDM_BIN_DIRenvironment variable (global host list file). See Using the Intelligent Minerfor Data for more information on the host list file and its use.

To make efficient use of the parallel processes, the data must be partitionedand distributed to the various nodes used in a mining run. See Using theIntelligent Miner for Data for detailed information on how to partition theinput data.

If a traceLevel is specified, the mining function writes trace messages intoa trace file. The trace level can be between 1 and 9.

stop Stops the run if started asynchronously. This method needs not to be calledif the run has stopped automatically.

getStateRetrieves the state of the mining run if started asynchronously. If jobStateis IDM_CS_JOB_STATE_EXITED, exitCode holds the exit code of themining run. If exitCode is not 0, pKernelExc holds the exception objectcorresponding to the kernel exception. Note that pKernelExc has to beinitialized before calling the getState() method.

getResultRetrieves the result of the settings object after a mining run (ivResult). Usethis method only if you have not set ivResultName to setResultName.

getResultPointerRetrieves a pointer to the result object of the settings object after themining run (ivResult). Use this method only if you have not setivResultName to setResultName.

getStatusInfoReturns the status information to be displayed by the progress indicator. Itconsists of the following parts:v A sequence of settings names representing the current call stack of

settings objects. This call stack starts with the name of the outermostsequence object and ends with the name of the currently executingfunction.

v The type of the currently executing kernel.v The kernel phase which needs to be specified for each function.v The number of the current iteration within this function phase.v The progress within the current iteration. This needs to be a number

between 0 and 1.v A quality measure and a quality value giving an idea of the qualitative

progress of the function.v A message providing additional information.

getDataReturns a pointer to the (input) data object pivData. If there is no suchobject, it returns a NULL pointer.

getOutputDataReturns a pointer to the output data object pivOutputData. If there is nosuch object, it returns a NULL pointer.


getSelectionRetrieves the IDMSelections object of this settings object.

setSelectionUpdates the IDMSelections object of this settings object.

setOptionsUpdates the power options of this settings object.

getOptionsRetrieves the power options of this settings object.

optimizeForTimeMost functions can optimize their performance with respect to time orspace. This flag indicates which kind of optimization you prefer. A warningis returned if the function cannot be optimized in the desired way, forexample, tree classification runs in time-optimized mode only.

isOptimizedForTimeReturns IDM_TRUE if the function runs in time-optimized mode andIDM_FALSE if the space requirements are optimized.

setResultNameUpdates the name of the result object that should be created or updatedwhen a mining run has terminated successfully together with the overwriteresult flag. This function returns IDM_ERROR in case the settings do notgenerate results, for example, preprocessing functions.

getResultNameRetrieves the result name and the value of the overwrite result flag.

getRunningSettingsDetermines the settings objects that are currently running.

setNumberProcessesSets the number of processes ivNbProcesses that should be started whencalling the start method. Set it to 0 if the serial function should be started.Set it to a value greater than 1 if the parallel function should be started. Setit to -1 if Intelligent Miner should determine the number of processesdependent on the available partitions of the input data or output data.

getNumberProcessesReturns the number of processes that were set with methodsetNumberProcesses.

initializeMiningBaseInitializes IDMMiningClass::pivMiningBase if a settings object has beenconstructed with the default constructor. This is necessary for applying theupdate method to the settings object.

getMsgFileNamesRetrieves the names of the files where error messages and traceinformation is written on the server.

get Retrieves the attributes of class IDMSettings and parent classIDMMiningClass.

set Sets attributes that cannot be modified by the derived settings classes.

getStartTimeStampRetrieves the start time of the last invocation of this settings.

getStartTimeStampStringRetrieves the start time of the last invocation of this settings as string.


elapsedTimeSinceStartRetrieves the elapsed time since the last start of the function. ReturnsIDM_FALSE if no job is currently running.

setResultCommentSets a comment to the result object that can be specified withsetResultName.

getResultCommentRetrieves the comment of the result object that can be specified withsetResultName.

Data settings

These classes contain the settings information for the different types of miningruns.

IDMAssocSettings

The class IDMAssocSettings combines all data definitions and other specificationsthat are necessary to start an association run and to compute the appropriateassociation rules.

Header file: idmcass.hpp

Format:class IDMAssocSettings : public IDMSettings {

IDMTaxonomy *pivTaxonomy;IString ivItemField;IString ivTransactionField;IDMDOUBLE ivMinConfidence;IDMDOUBLE ivMinSupport;IDMBOOLEAN ivSortFlag;IDMItemConstraints ivItemConstraints;IDMINTEGER ivMaxRuleSize;

public:IDMAssocSettings();IDMAssocSettings( IDMRETURN &rc,

IString name,IDMMiningBase *pMiningBase,IDMData *pData,IDMTaxonomy *pTaxonomy,IDMSelections &selection,IDMItemConstraints &itemConstraints,IString itemField,IString transactionField,IDMDOUBLE fMinConfidence,IDMDOUBLE fMinSupport,IDMINTEGER maxRuleSize=0);

∼IDMAssocSettings() ;static IDMRETURN createObject( IString name,

IDMMiningBase *pMiningBase,IDMData *pData,IDMTaxonomy *pTaxonomy,IDMSelections &selection,IDMItemConstraints &itemConstraints,IString itemField,IString transactionField,IDMDOUBLE fMinConfidence,IDMDOUBLE fMinSupport,IDMINTEGER maxRuleSize,


IDMAssocSettings *&pAssocSettings );IDMRETURN deleteObject();IDMRETURN update( IString name,

IDMData *pData,IDMTaxonomy *pTaxonomy,IDMSelections &selection,IDMItemConstraints &itemConstraints,IString itemField,IString transactionField,IDMDOUBLE fMinConfidence,IDMDOUBLE fMinSupport,IDMINTEGER maxRuleSize=0);

IDMRETURN get( IString &name,IDMMiningBase *&pMiningBase,IDMData *&pData,IDMTaxonomy *&pTaxonomy,IDMSelections &selection,IDMItemConstraints &itemConstraints,IString &itemField,IString &transactionField,IDMDOUBLE &fMinConfidence,IDMDOUBLE &fMinSupport,IDMINTEGER &maxRuleSize);

void setSortFlag( IDMBOOLEAN sortFlag );IDMBOOLEAN getSortFlag();IDMItemConstraints& getItemConstraints();IDMRETURN setItemConstraints(const IDMItemConstraints&);IDMRETURN setMaxRuleSize(IDMINTEGER);IDMINTEGER getMaxRuleSize();

};

Data members:

pivTaxonomyA pointer to a taxonomy object.

ivItemFieldThe name of the field in the input data to be used as the item identifier.

ivTransactionFieldThe name of the field in the input data to be used as the transactionidentifier.

ivMinConfidenceThe minimum confidence, expressed as a percentage in decimalrepresentation (may contain digits after the decimal point).

ivMinSupportThe minimum support expressed as a percentage in decimal representation(may contain digits after the decimal point).

ivSortFlagThis sort flag specifies whether the data needs to be sorted according tothe transaction ID. IDM_TRUE means that the data needs to be sorted.

ivItemConstraintsThe item constraints on the rules to be computed.

ivMaxRuleSizeThe maximum number of items of an association rule, that is, thegeneration of longer rules will be suppressed by the function.


Member functions:

IDMAssocSettingsThe default constructor.

IDMAssocSettings( IDMRETURN &rc, IString name,...)Constructs an association settings object with given values for the datamembers and adds the association object to the ivAssocSettingsIKeySortedSet collection located in the mining base (class IDMMiningBase).If no taxonomy should be used, specify NULL for pTaxonomy. An errorhas occurred if the return code is not equal to IDM_SUCCESS. Theassociation settings object should be deleted using deleteObject().

∼IDMAssocSettingsThe destructor. If the object belongs to the associations settings objectextent of its mining base, it is removed from this IKeySortedSet.

createObjectConstructs an association settings object with given values and returns it ifno error occurred. The association object is added to the ivAssocSettingsIKeySortedSet collection located in the mining base (class IDMMiningBase).If an error occurred, the object is deleted and pAssocSettings is set toNULL.

deleteObjectCalls the destructor and removes auxiliary files (like error message, tracefile) from the disk.

updateChanges the values of the data members.

get Retrieves the values of the data members.

setSortFlagSets the sortFlag ivSortFlag to IDM_TRUE or IDM_FALSE.

getSortFlagRetrieves the value of ivSortFlag.

getItemConstraintsRetrieves the item constraints.

setItemConstraintsUpdates the item constraints.

setMaxRuleSizeUpdates the value of the ivMaxRuleSize variable.

getMaxRuleSizeRetrieves the value of the ivMaxRuleSize variable.

IDMSeqPatternSettings

The class IDMSeqPatternSettings combines all data definitions and otherspecifications that are necessary to start a sequential-patterns run and to computethe appropriate sequential pattern rules.

Header file: idmcsps.hpp

Format:


class IDMSeqPatternSettings : public IDMSettings {IDMTaxonomy *pivTaxonomy;IString ivItemField;IString ivCustomerField;IString ivOrderingKeyField;IString ivTimeFormat;IDMDOUBLE ivMinCustSupport;IDMBOOLEAN ivSortFlag;IDMItemConstraints ivItemConstraints;IDMINTEGER ivMaxRuleSize;

public:IDMSeqPatternSettings();IDMSeqPatternSettings( IDMRETURN &rc,

IString name,IDMMiningBase *pMiningBase,IDMData *pData,IDMTaxonomy *pTaxonomy,IDMSelections &selection,IDMItemConstraints &itemConstraints,IString itemField,IString customerField,IString orderingKeyField,IString timeFormat,IDMDOUBLE fMinCustSupport,IDMINTEGER maxPatternSize=0 );

∼IDMSeqPatternSettings() ;static IDMRETURN createObject( IString name,

IDMMiningBase *pMiningBase,IDMData *pData,IDMTaxonomy *pTaxonomy,IDMSelections &selection,IDMItemConstraints &itemConstraints,IString itemField,IString customerField,IString orderingKeyField,IString timeFormat,IDMDOUBLE fMinCustSupport,IDMINTEGER maxPatternSize,IDMSeqPatternSettings *&pSeqPatternSettings );


IDMData *pData,IDMTaxonomy *pTaxonomy,IDMSelections &selection,IDMItemConstraints &itemConstraints,IString itemField,IString customerField,IString orderingKeyField,IString timeFormat,IDMDOUBLE fMinCustSupport,IDMINTEGER maxPatternSize=0 );

IDMRETURN get( IString &name,IDMMiningBase *&pMiningBase,IDMData *&pData,IDMTaxonomy *&pTaxonomy,IDMSelections &selection,IDMItemConstraints &itemConstraints,IString &itemField,IString &customerField,IString &orderingKeyField,IString &timeFormat,IDMDOUBLE &fMinCustSupport,IDMINTEGER &maxPatternSize );

void setSortFlag( IDMBOOLEAN sortFlag );IDMBOOLEAN getSortFlag();


IDMItemConstraints& getItemConstraints();IDMRETURN setItemConstraints(const IDMItemConstraints&);IDMRETURN setMaxRuleSize(IDMINTEGER);IDMINTEGER getMaxRuleSize();

};

Data members:

pivTaxonomyA pointer to a taxonomy object.

ivItemFieldThe name of the field in the input data to be used as the item identifier.

ivCustomerFieldThe name of the field in the input data to be used as the customeridentifier.

ivOrderingKeyFieldThe name of the field in the input data to be used as the ordering key. Theordering key defines the transaction sequences: date or time stamp, forexample.

ivTimeFormatA string specifying the time format, for example, ddmmyy.

ivMinCustSupportThe minimum customer support expressed as a percentage in decimalrepresentation (may contain digits after the decimal point).

ivSortFlagThe sort flag specifies whether the transactions need to be sorted accordingto the customer ID as primary key and the ordering key as secondary key.IDM_TRUE means that the data needs to be sorted.

ivItemConstraintsThe item constraints on the sequential patterns to be computed.

ivMaxRuleSizeThe maximum number of transactions in a sequential pattern, that is, thegeneration of longer rules will be suppressed by the function.

Member functions:

IDMSeqPatternSettingsThe default constructor.

IDMSeqPatternSettings( IDMRETURN &rc, IString name,...)Constructs a sequential pattern settings object with given values for alldata members and adds the sequential pattern object to theivSeqPatternSettings IKeySortedSet collection located in the mining base(class IDMMiningBase). If no taxonomy object should be used, NULL mustbe specified for pTaxonomy. An error has occurred if the return code is notequal to IDM_SUCCESS. The sequential pattern settings object can bedeleted using deleteObject().

∼IDMSeqPatternSettingsThe destructor. If the object belongs to the sequential pattern settings objectextent of its mining base, it is removed from this IKeySortedSet.

createObjectConstructs a sequential pattern settings object with given values andreturns it if no error occurred. The sequential pattern object is added to the


ivSeqPatternSettings IKeySortedSet collection located in the mining base(class IDMMiningBase). If an error occurred, the result object is deletedand pSeqPatternSettings is set to NULL.



get Retrieves the values of the data members.

setSortFlagSets the sort flag ivSortFlag to IDM_TRUE or IDM_FALSE.


getItemConstraintsRetrieves the item constraints.

setItemConstraintsUpdates the item constraints.

setMaxRuleSizeUpdates the value of the ivMaxRuleSize variable.

getMaxRuleSizeRetrieves the value of the ivMaxRuleSize variable.

IDMSimSeqSettings

The class IDMSimSeqSettings combines all data definitions and other specificationsthat are necessary to start a similar-time-sequences run and to compute theappropriate similarities in time sequences.

Header file: idmcsim.hpp

Format:class IDMSimSeqSettings : public IDMSettings {

IString ivSeqField;IString ivTimeField;IString ivValueField;IString ivTimeFormat;IDMDOUBLE ivEpsilon;IDMINTEGER ivGap;IDMREAL ivWindowSize;IDMREAL ivMatchFraction;IDMBOOLEAN ivSortFlag;

public:

IDMSimSeqSettings();IDMSimSeqSettings( IDMRETURN &rc,

IString name,IDMMiningBase *pMiningBase,IDMData *pData,IDMSelections &selection;IString seqField,IString timeField,IString valueField,IString timeFormat,


IDMDOUBLE epsilon,IDMINTEGER gap,IDMREAL windowSize,IDMREAL matchFraction );

∼IDMSimSeqSettings();static IDMRETURN createObject( IString name,

IDMMiningBase *pMiningBase,IDMData *pData,IDMSelections &selection;IString seqField,IString timeField,IString valueField,IString timeFormat,IDMDOUBLE epsilon,IDMINTEGER gap,IDMREAL windowSize,IDMREAL matchFraction,IDMSimSeqSettings *&pSimSeqSettings );


IDMData *pData,IDMSelections &selection;IString seqField,IString timeField,IString valueField,IString timeFormat,IDMDOUBLE epsilon,IDMINTEGER gap,IDMREAL windowSize,IDMREAL matchFraction );

IDMRETURN get( IString &name,IDMMiningBase *&pMiningBase,IDMData *&pData,IDMSelections &selection;IString &seqField,IString &timeField,IString &valueField,IString &timeFormat,IDMDOUBLE &epsilon,IDMINTEGER &gap,IDMREAL &windowSize,IDMREAL &matchFraction );

void setSortFlag( IDMBOOLEAN sortFlag );IDMBOOLEAN getSortFlag();

};

Data members:

ivSeqFieldA field name identifying the input sequence. Normally a data file consistsof a series of input sequences.

ivTimeFieldA field name identifying the time stamp in the sequence.

ivValueFieldA field name identifying the value belonging to the time stamp in thesequence.

ivTimeFormatA string specifying the time format, for example, ddmmyy.


ivEpsilonThe width of an envelope drawn around a given input sequence S (P). If asubsequence Pj of P is a projection of a subsequence Si of S (that is, thematching length for atomic subsequences is greater than the window size,and consecutive outliers smaller than the gap size between thesesubsequences are allowed) and if Pj lies within the envelope of Si, Si and Pjare called ″similar″.

ivGap The maximum length of successive outliers (that is, data points that do notfit into the user-defined envelope for a given sequence) which can beignored.

ivWindowSizeThe length of a subsequence (window) specifying the atomic unit formatching in which no outlier is allowed.

ivMatchFractionMinimum length of the similar subsequences which should be shown bythe visualizer. The matching length is given as the fraction of the sum ofthe length of the unscaled similar subsequences Si, Pj to twice the totallength of the shorter of the two input sequences S, P that are comparedtogether (range from 0 to 1). As a rule of thumb one can reckon that amatch fraction of 0.01 selects all similar subsequences longer than onepercent of the shorter input sequence.

ivSortFlagThe sort flag specifies whether the transactions need to be sorted accordingto the sequence ID (ivSeqField). IDM_TRUE means that the data needs tobe sorted.

Member functions:

IDMSimSeqSettingsThe default constructor.

IDMSimSeqSettings( IDMRETURN &rc, IString name,...)Constructs a similar-time-sequences-settings object with given values for alldata members and adds the object to the ivSimSeqSettings IKeySortedSetcollection located in the mining base (class IDMMiningBase). An error hasoccurred if the return code is not equal to IDM_SUCCESS. Thesimilar-time-sequences-settings object should be deleted usingdeleteObject().

∼IDMSimSeqSettingsThe destructor. If the object belongs to the similar-sequence-settings objectextend of its mining base, it is removed from this IKeySortedSet.

createObjectConstructs a similar-sequences-settings object with given values andreturns it if no error occurred. The object is added to the ivSimSeqSettingsIKeySortedSet collection located in the mining base (class IDMMiningBase).If an error occurred, the object is deleted and pSimSeqSettings is set toNULL.




get Retrieves the values of all private data members.

setSortFlagSets the sort flag ivSortFlag to IDM_TRUE or IDM_FALSE.


IDMDescQuantSampleSettings

The class IDMDescQuantSampleSettings holds the parameters of the function fordescriptive statistics, quantiles, and sampling.

The term quantile is defined as follows:Q is an N%-quantile of a value set S when approximately N% of the values in Sare less or equal to Q and approximately (100-N)% of the values are greater orequal to Q. The approximation is less exact when there are many values equalto Q. N is called quantileLimit. The 50%-quantile is generally known as themedian.

Note that a quantile (as a generalization of median) is a value in the range of thefield and not the percentage marker N. For example, given the values {2,2,6,12,40}the limits 20 and 50 determine the quantiles 2 and 6, respectively.

Header file: idmcdqs.hpp


IDM_NO_SAMPLING,IDM_FIRST_RECORDS,IDM_LAST_RECORDS,IDM_FIRST_LAST_RECORDS,IDM_EVERY_NTH_RECORD,IDM_RANDOM_PERCENT_REPRODUCIBLE,IDM_RANDOM_PERCENT_NONREPRODUCIBLE,IDM_RANDOM_NTH_REPRODUCIBLE,IDM_RANDOM_NTH_NONREPRODUCIBLE,IDM_RANDOM_FIXED_SIZE_REPRODUCIBLE,IDM_RANDOM_FIXED_SIZE_NONREPRODUCIBLE,IDM_BROWSE_SAMPLE,IDM_EXPLORE_SAMPLE

} IDM_SampleType;

class IDMDescQuantSampleSettings : public IDMSettings {

ISequence<IString> ivStatisticsFields;IString ivBivarStatField;IDMBOOLEAN ivComputeFTest;

ISequence<IString> ivQuantileFields;ISequence<IDMREAL> ivQuantileLimits;IDMINTEGER ivNbExtremeValues;

ISequence<IString> ivOutputFields;

ISequence<IString> ivSampleFields;IDMINTEGER ivSampleSize;IDM_SampleType ivSampleType;

public:IDMDescQuantSampleSettings();IDMDescQuantSampleSettings( IDMRETURN &rc,

IString name,


IDMMiningBase *pMnb,IDMData * pData,IDMSelections &selection,ISequence<IString> &statisticsFields,IString bivarStatField,IDMBOOLEAN computeFTest,ISequence<IString> &quantileFields,ISequence<IDMINTEGER> &quantileLimits,IDMINTEGER nbExtremeValues,IDMData * pOutputData,ISequence<IString> &outputFields,ISequence<IString> &sampleFields,IDMINTEGER sampleSize,IDM_SampleType sampleType );

∼IDMDescQuantSampleSettings();

static IDMRETURN createObject(IString name,IDMMiningBase *pMnb,IDMData * pData,IDMSelections &selection,ISequence<IString> &statisticsFields,IString bivarStatField,IDMBOOLEAN computeFTest,ISequence<IString> &quantileFields,ISequence<IDMREAL> &quantileLimits,IDMINTEGER nbExtremeValues,IDMData * pOutputData,ISequence<IString> &outputFields,ISequence<IString> &sampleFields,IDMINTEGER sampleSize,IDM_SampleType sampleType,IDMDescQuantSampleSettings *&pDQS );


IDMRETURN update(IString name,IDMData * pData,IDMSelections &selection,ISequence<IString> &statisticsFields,IString bivarStatField,IDMBOOLEAN computeFTest,ISequence<IString> &quantileFields,ISequence<IDMREAL> &quantileLimits,IDMINTEGER nbExtremeValues,IDMData * pOutputData,ISequence<IString> &outputFields,ISequence<IString> &sampleFields,IDMINTEGER sampleSize,IDM_SampleType sampleType );

IDMRETURN get(IString &name,IDMMiningBase *&pMnb,IDMData *&pData,IDMSelections &selection,ISequence<IString> &statisticsFields,IString &bivarStatField,IDMBOOLEAN &computeFTest,ISequence<IString> &quantileFields,ISequence<IDMREAL> &quantileLimits,IDMINTEGER &nbExtremeValues,IDMData *&pOutputData,ISequence<IString> &outputFields,ISequence<IString> &sampleFields,


IDMINTEGER &sampleSize,IDM_SampleType &sampleType );

}

Data members:

ivStatisticsFieldsThe list of field names for which descriptive univariate statistics should becomputed.

ivBivarStatFieldThe name of the field for which bivariate statistics should be computed.

ivComputeFTestSpecifies if the FTest should be applied to all pairs of numeric statisticsfields.

ivQuantileFieldsThe names of the continuous fields, quantiles and extreme values shouldbe computed for.

ivQuantileLimitsThe list of limits determining quantiles to be computed for each quantilefield in percent.

ivNbExtremeValuesThe number of extreme values to be computed for each quantile field.Extreme values are highest or lowest values in the range of thecorresponding field.

ivOutputFieldsThe list of names of fields whose values should be copied into the outputtable.

ivSampleFieldsThe list of names of fields whose values should be sampled.

ivSampleSizeA positive integer specifying the size of the sample. For details seeivSampleType below.

ivSampleTypeThe sampling type. Following sampling types are possible:

IDM_NO_SAMPLINGNo sample is created. The value of ivSampleSize is ignored.

IDM_FIRST_RECORDSThe sample consists of the beginning of the data. It contains thenumber of records given by ivSampleSize.

IDM_LAST_RECORDSThe sample consists of the end of the data. It contains the numberof records given by ivSampleSize.

IDM_FIRST_LAST_RECORDSThe sample consists of records from the beginning and the end ofthe data. The size is twice as large as ivSampleSize.

IDM_EVERY_NTH_RECORDEvery n-th record belongs to the sample, n is given byivSampleSize.


IDM_RANDOM_PERCENT_REPRODUCIBLEThe sample consists of records chosen at random. The seed for therandom number generation is a fixed value such that the sequenceof selected records is reproducible. ivSampleSize determines thepercentage of records.

IDM_RANDOM_PERCENT_NONREPRODUCIBLEThe sample consists of records chosen at random. The seed for therandom number generation is not fixed. ivSampleSize determinesthe percentage of records.

IDM_RANDOM_NTH_REPRODUCIBLEApproximately every n-th record is selected, n is given byivSampleSize.

IDM_RANDOM_NTH_NONREPRODUCIBLESame as above, but the selection cannot be reproduced.

IDM_RANDOM_FIXED_SIZE_REPRODUCIBLEThe sample consists of n records selected at random, n is given byivSampleSize.

IDM_RANDOM_FIXED_SIZE_NONREPRODUCIBLESame as above, but the selection cannot be reproduced.

IDM_BROWSE_SAMPLEA small sample consisting of the first 1000 records is selected.

IDM_EXPLORE_SAMPLEThe sample consists of a random selection of up to 5000 records.

Note that, in case of input from a database, the sample cannot beguaranteed to be reproducible, regardless of the sample type. Soindependent runs with the same settings can lead to different results whenDB2 is used.

Member functions:

IDMDescQuantSampleSettings()The default constructor.

∼IDMDescQuantSampleSettingsThe destructor. If the object belongs to the extent of its mining base, it isremoved from this IKeySortedSet.

IDMDescQuantSampleSettings( IDMRETURN &rc, IString name, .... )Constructs a settings object with the given values. The object is added tothe corresponding IKeySortedSet collection that is located in the miningbase (class IDMMiningBase). An error has occurred if the return code isnot equal to IDM_SUCCESS.

createObjectConstructs a settings object with given values and returns it, if no erroroccurred. The object is added to the corresponding IKeySortedSet collectionthat is located in the mining base (class IDMMiningBase). If an erroroccurred, the object is deleted and the reference parameter pDQS is set toNULL.




get Retrieves the values of all data members of this class.

IDMClusteringSettings

The class IDMClusteringSettings combines all data definitions and otherspecifications that are necessary to start a neural clustering or demographicclustering run.

Header file: idmcclus.hpp

Format:


typedef enum { IDM_ABSOLUTE,IDM_STAND_DEV,IDM_RANGE,IDM_NICE_ABSOLUTE,IDM_NICE_STAND_DEV,IDM_NICE_RANGE } IDM_WidthUnit;

typedef enum { IDM_NO_WEIGHTING,IDM_INFO_WEIGHTING,IDM_PROB_WEIGHTING,IDM_COMP_INFO_WEIGHTING,IDM_COMP_PROB_WEIGHTING

} IDM_WeightingType;

class IDMClusFieldParams: public IDMBase {IString ivFieldName;IDMREAL ivFieldWeight;IDM_WeightingType ivWeightingType;IDMREAL ivDistanceUnit;IDM_WidthUnit ivUnitOfDistanceUnit;IDMValueMapping *pivSimValueMapping;IString similarityFunction;

public:

IDMClusFieldParams();∼IDMClusFieldParams(IDMRETURN &rc,

IString fieldName,IDMREAL fieldWeight=1,IDM_WeightingType weightType= IDM_NO_WEIGHTING,IDMREAL distanceUnit=-1,IDM_WidthUnit unitOfDistanceUnit = IDM_ABSOLUTE,IDMValueMapping *pSimVmp=NULL,IString similarityFunction = "");

IDMClusFieldParams( const IDMClusFieldParams &fieldParams );IDMClusFieldParams& operator= (const IDMClusFieldParams & );

IDMClusFieldParams();IDMRETURN get(IDMREAL &fieldWeight,

IDM_WeightingType &weightingType,IDMREAL &distanceUnit,IDM_WidthUnit &unitOfDistanceUnit,IDMValueMapping *&pSimVmp,IString &similarityFunction );

IDMRETURN update(IDMREAL fieldWeight=1,IDM_WeightingType weightType= IDM_NO_WEIGHTING,IDMREAL distanceUnit=-1,IDM_WidthUnit unitOfDistanceUnit = IDM_ABSOLUTE,IDMValueMapping *pSimVmp=NULL,IString similarityFunction = "");

IString getFieldName() const;};


typedef enum { IDM_CLUS_TYPE_NO,IDM_CLUS_TYPE_DEMO,IDM_CLUS_TYPE_NEURAL } IDM_ClusteringType;

typedef enum { IDM_DEFAULT_MODEL,IDM_KOHONEN_MAP,IDM_BACKPROP } IDM_NetworkModel;

typedef enum { IDM_AS_VALID_VALUES,IDM_INCREASING_BUCKET_SIZES,IDM_AS_MISSING_VALUES,IDM_AS_EXTREME_VALUES } IDM_OutlierTreatment;

typedef enum { IDM_TRAINING_MODE,IDM_TEST_MODE,IDM_APPLICATION_MODE

} IDM_UseMode;

class IDMClusteringSettings: public IDMSettings {

IDM_ClusteringType ivClusType;IDM_UseMode ivUseMode;ISequence<IString> ivActiveFields;ISequence<IString> ivSupplementaryFields;ISequence<IString> ivOutputFields;IString ivClusterField;IString ivScoreField;IDMINTEGER ivMaxNumberOfClusters;IDMINTEGER ivMaxNumberOfPasses;IString ivClusterResult;IString ivConfidenceField;IString ivSecondClusterField;IString ivSecondScoreField;IKeySet<IDMClusFieldParams*,IString> ivClusFieldParams;IDM_OutlierTreatment ivTreatmentOfOutliers;

// demographic clustering special parametersIDMREAL ivAccuracy;IDMREAL ivSimilarityThreshold;

// neural clustering special parametersIDM_NetworkModel ivNetworkModel;IDMBOOLEAN ivNormalizeData;IDMINTEGER ivNbOfNetworkRows;IDMINTEGER ivNbOfNetworkColumns;

// only used for parallel demographic clusteringIDMINTEGER ivNPara;IDMINTEGER ivNStripe;IDMINTEGER ivNSpan;


public:IDMClusteringSettings();∼IDMClusteringSettings( IDMRETURN &rc,

IDM_ClusteringType clusType,IString name,IDMMiningBase *pMiningBase,IDMData *pData,IDMSelections &selection,IDM_UseMode useMode,ISequence<IString> &activeFields,ISequence<IString> &supplementaryFields,IDMData *pOutputData,ISequence<IString> &outputFields,IString clusterField,IString scoreField;IString confidenceField;IString secondClusterField,IString secondScoreField;IKeySet<IDMClusFieldParams*,IString>

&clusFieldParams,IDMINTEGER maxNumberOfPasses,IDMINTEGER maxNumberOfClusters,IDM_OutlierTreatment treatmentOfOutliers,IString clusterResult );

IDMClusteringSettings();static IDMRETURN createObject(IDM_ClusteringType clusType,

IString name,IDMMiningBase *pMiningBase,IDMData *pData,IDMSelections &selection,IDM_UseMode useMode,ISequence<IString> &activeFields,ISequence<IString> &supplementaryFields,IDMData *pOutputData,ISequence<IString> &outputFields,IString clusterField,IString scoreField,IString confidenceField,IString secondClusterField,IString secondScoreField,IKeySet<IDMClusFieldParams*,IString>

&clusFieldParams,IDMINTEGER maxNumberOfPasses,IDMINTEGER maxNumberOfClusters,IDM_OutlierTreatment

treatmentOfOutliers,IString clusterResult,IDMClusteringSettings *&pClus );


IDMRETURN deleteObject();IDMRETURN update( IDM_ClusteringType clusType,

IString name,IDMData *pData,IDMSelections &selection,IDM_UseMode useMode,ISequence<IString> &activeFields,ISequence<IString> &supplementaryFields,IDMData *pOutputData,ISequence<IString> &outputFields,IString clusterField,IString scoreField,IString confidenceField,IString secondClusterField,IString secondScoreField,IKeySet<IDMClusFieldParams*,IString>

&clusFieldParams,IDMINTEGER maxNumberOfPasses,IDMINTEGER maxNumberOfClusters,IDM_OutlierTreatment treatmentOfOutliers,IString clusterResult );

IDMRETURN get( IDM_ClusteringType &clusType,IString &name,IDMMiningBase *&pMiningBase,IDMData *&pData,IDMSelections &selection,IDM_UseMode &useMode,ISequence<IString> &activeFields,ISequence<IString> &supplementaryFields,IDMData *&pOutputData,ISequence<IString> &outputFields,IString &clusterField,IString &scoreField,IString &confidenceField,IString &secondClusterField,IString &secondScoreField,IKeySet<IDMClusFieldParams*,IString>

&clusFieldParams,IDMINTEGER &maxNumberOfPasses,IDMINTEGER &maxNumberOfClusters,IDM_OutlierTreatment treatmentOfOutliers,IString &clusterResult );

IDMRETURN setNeuralClusParameters( IDM_NetworkModel networkModel,IDMBOOLEAN normalizeData,IDMINTEGER nbOfNetworkRows,IDMINTEGER nbOfNetworkColumns);

IDMRETURN getNeuralClusParameters( IDM_NetworkModel &networkModel,IDMBOOLEAN &normalizeData,IDMINTEGER &nbOfNetworkRows,IDMINTEGER &nbOfNetworkColumns);

IDMRETURN setDemoClusParameters(IDMREAL similarityThreshold,IDMREAL accuracy);

IDMRETURN getDemoClusParameters(IDMREAL &similarityThreshold,IDMREAL &accuracy);

IDMRETURN setDemoClusParallelParameters(IDMINTEGER nPara,IDMINTEGER nStripe,IDMINTEGER nSpan);


Data members of IDMClusFieldParams:

The class IDMClusFieldParams specifies parameters for a data field.

ivFieldNameThe name of the field.

ivFieldWeightThe weight for this field. The weight specifies the relative importance ofthis field with respect to all other fields. For example, if the field weight is2.3, the value of the field is used with a factor of 2.3 during the clusteringof this record.

ivWeightingTypeIndicates which type of value weighting should be applied to the field:v No weighting (IDM_NO_WEIGHTING)v Probabilistic weighting (IDM_PROB_WEIGHTING)v Compensated probabilistic weighting

(IDM_COMP_PROB_WEIGHTING)v Information theoretic weighting (IDM_INFO_WEIGHTING)v Compensated information weighting (IDM_COMP_INFO_WEIGHTING)

This value is used only for demographic clustering.

ivDistanceUnitSpecifies the distance unit. Two records whose values in a field differ byone distance unit are considered 50% similar with respect to this field. Thisvalue is used only for demographic clustering.

ivUnitOfDistanceUnitIndicates whether the distance unit is specified as an absolute number oras a factor to be applied to the standard deviation or range. This value isused only for demographic clustering and only applies to numeric fields.

ivSimValueMappingFor discrete fields this allows to specify the similarity for pairs of possiblevalues. These similarities must be values between 0 and 1, 0 meaningcompletely different, and 1 meaning identical. This value is used only fordemographic clustering.

ivSimilarityFunctionThis parameter has been created for future use and is not supported inIntelligent Miner Version 2. (The name of the similarity function if the fieldis multi-valued. This value is used only for demographic clustering.)

Member functions of IDMClusFieldParams:

IDMClusFieldParams()The standard constructor.

IDMRETURN getDemoClusParallelParameters(IDMINTEGER &nPara,IDMINTEGER &nStripe,IDMINTEGER &nSpan);

IDM_ClusteringType getClusType();IDMRETURN setClusType( IDM_ClusteringType clusType);

IDMRETURN setTreatmentOfOutliers( IDM_OutlierTreatment treat);IDM_OutlierTreatment getTreatmentOfOutliers();virtual const IDMException* getException() const;

};


IDMClusFieldParams(IDMRETURN &rc,... )Constructs an IDMClusFieldParams object with given values for themember variables.

IDMClusFieldParams( const IDMClusFieldParams &fieldParams );The copy constructor.

IDMClusFieldParams& operator= (const IDMClusFieldParams & );The assignment operator.

∼IDMClusFieldParamsThe destructor.


updateUpdates the values of the member variables.

getFieldNameRetrieves the name of the field.

Data members of IDMClusteringSettings:

ivClusTypeIdentifies whether the object is a neural-clustering-settings object(IDM_CLUS_TYPE_NEURAL) or a demographic-clustering-settings object(IDM_CLUS_TYPE_DEMO).

ivUseModeSpecifies the mode for the clustering run. IDM_TRAINING_MODE meansthat the clusters are built. IDM_APPLICATION_MODE means that clustersof previous runs are used to place new records.

ivActiveFieldsA sequence collection of field names that are active in the clustering. Thesefields are included when deciding into which cluster a record should beplaced. Statistics are maintained for these fields.

ivSupplementaryFieldA sequence collection of field names that are supplementary to theclustering. These fields are ignored when deciding in which cluster arecord is placed. Only statistics are maintained for these fields.

ivOutputFieldsA sequence collection of field names that appear in the produced outputdata if pivOutputData is set.

ivClusterFieldThis is the name of the field in the output data into which the name of thecluster a record was assigned to is written. This name must be specified ifpivOutputData is set.

ivScoreFieldThis is the name of the field in the output data that holds the score foreach record against the cluster the record was assigned to. This name isoptional if pivOutputData is set.

ivConfidenceFieldThis is the name of the field in the output data into which the confidencevalue is written. Specifying this field is optional. The confidence valuedescribes how close the decision was to put the record in the best and notin the second best cluster. The value is between 0 and 1. A value near 1indicates that the record fits a lot better into the best than into the second


best cluster. A value near 0 indicates that the difference between puttingthe record into the best and the second best cluster is small.

ivSecondClusterFieldThis is the name of the field in the output data into which the number ofthe second best cluster is written. Specifying this field is optional.

ivSecondScoreFieldThis is the name of the field in the output data into which the score foreach record against the second best cluster is written. Specifying this fieldis optional.

ivClusFieldParamsThe key set of clustering field parameters objects. This key set of clusteringfield parameters objects is a subset of the active fields. The number ofobjects in this key set can also be zero.

ivMaxNumberOfClustersMaximum number of clusters that are to be created during a clusteringrun. Negative values are illegal. The value of 0 (zero) is interpreted as″unlimited″.

ivMaxNumberOfPassesMaximum number of clustering passes. If you did not specify a value, adefault value of 2 is used.

ivClusterResultThis is the name of a result object. For use modeIDM_APPLICATION_MODE the statistics and the cluster information isread from this result. This statistics and cluster information is used to placethe records of the input data.

In IDM_TRAINING_MODE an input result can also be specified. In thiscase the statistics of the fields are read out of the specified result. Note thatthe active fields must be a subset of the fields used in the cluster result.Names and types of the fields must match. The same is valid for thesupplementary fields. Specifying this result object name inIDM_TRAINING_MODE is optional. If no input result is to be used, thismust be an empty string.

ivAccuracyThe accuracy describes the change in the overall quality (score) of theclustering between two passes relative to the number of field entries.Therefore accuracy must be between 0 and 1. A reasonable value would be1:1000 (or 0.001), which means that, given 5 fields, one record in 5000 hasincreased its score by an average of 1. The function stops if the maximumnumber of passes or the accuracy is reached.

ivSimilarityThresholdDefines the threshold of the similarity of two records. For example, in asimple case with only categorical fields and no user-specified similaritymatrix, if the SimilarityThreshold is set to 0.25, two records will end up inthe same cluster if they coincide in 25% of their field values. The value forthe similarity threshold must be between 0 and 1. The default is 0.5.

ivTreatmentOfOutliersIndicates how outliers should be treated.v As valid values (IDM_AS_VALID_VALUES).


v The number of buckets should be extended, adding buckets ofincreasing size, so that the outliers fit in(IDM_INCREASING_BUCKET_SIZES).

v As missing values (IDM_AS_MISSING_VALUES).v As extreme values (IDM_AS_EXTREME_VALUES). The lowest limits or

highest limits are used instead of these values.

ivNetworkModelThe network model that should be used.

ivNormalizeDataNormalizing numeric data means that the minimum value found in aninput field is mapped to zero, the maximum to one, and the intermediateinputs are scaled within that range.

ivNbOfNetworkRowsThe number of network rows. Setting the number of network rows and thenumber of network columns (ivNbOfNetworkColumns) the neuralclustering function creates the clusters in the form of a rectangular grid.The number of network rows is one line in the grid. The number ofnetwork columns is the other line in the grid. The number of networkcolumns is appropriate to the maximum number of clusters. If workingwith ivNbOfNetworkRows and ivNbOfNetworkColumns,ivMaxNumberOfClusters does not need to be set.

ivNbOfNetworkColumnsThe number of network columns. See also the description of parameterivNbOfNetworkRows.

ivNParaThis value is only used by the parallel demographic function. It specifiesthe number of records after which the clustering models that are built bythe parallel processes are merged. The default is 10000.

ivNStripeThis value is only used by the parallel demographic function. It specifiesthe number of records per data stripe during the partitioning of the data. Ifthe program idmdiseg was used to partition the data and a stripe value wasspecified as input, this value should be the same. The default value is 1.

ivNSpanThis value is only used by the parallel demographic function. Specifieshow many records should be used to build the initial clustering model in aserial way. The default is 1440.

Member functions:

IDMClusteringSettingsThe default constructor.

∼IDMClusteringSettingsThe destructor. If the object belongs to the clustering settings object extentof its mining base, it is removed from this IKeySortedSet.

IDMClusteringSettings( IDMRETURN rc,... )Constructs a clustering-settings object with given values. The object isadded to the clustering-settings IKeySortedSet collection which is locatedin the mining base (class IDMMiningBase). An error has occurred if thereturn code is not equal to IDM_SUCCESS. The clustering-settings objectshould be deleted using deleteObject().


createObjectConstructs a clustering-settings object with given values and returns it ifno error occurred. The object is added to the clustering-settingsIKeySortedSet collection which is located in the mining base (classIDMMiningBase). If an error occurred, the object is deleted and referenceparameter pClus is set to NULL.


updateChanges the values of the data members that are common to demographicand neural clustering.

get Retrieves the values of the data members that are common to demographicand neural clustering.

setNeuralClusParametersSets the special neural clustering parameters.

getNeuralClusParametersRetrieves the special neural clustering parameters.

setDemoClusParametersSets the special demographic clustering parameters.

getDemoClusParametersRetrieves the special demographic clustering parameters.

setDemoClusParallelParametersSets the special demographic parallel parameters.

getDemoClusParallelParametersRetrieves the special demographic parallel parameters.

getClusTypeReturns the type of the clustering-settings object. IDM_CLUS_TYPE_DEMOmeans that the object is a demographic-clustering-setting object andIDM_CLUS_TYPE_NEURAL means that the object is aneural-clustering-setting object. IDM_CLUS_TYPE_NO means that no typeis specified.

setClusTypeSets the type of the clustering-settings object. IDM_CLUS_TYPE_DEMOmeans that the object is a demographic-setting object andIDM_CLUS_TYPE_NEURAL means that the object is aneural-clustering-setting object. IDM_CLUS_TYPE_NO means that theobject has no type.

setTreatmentOfOutliersSets the outlier treatment value.

getTreatmentOfOutliersRetrieves the outlier treatment value.



IDMClassifySettings

The class IDMClassifySettings combines all data definitions and otherspecifications that are necessary to start a neural classification run or a treeclassification run, and to compute the appropriate classification results.

Header file: idmcclf.hpp

Format:typedef enum { IDM_CLF_TYPE_NO,

IDM_CLF_TYPE_TREE,IDM_CLF_TYPE_NEURAL

} IDM_ClassificationType;

typedef enum { IDM_PRUNED_TREE,IDM_UNPRUNED_TREE

} IDM_TreeType;


} IDM_UseMode;

typedef enum { IDM_DEFAULT_MODEL,IDM_KOHONEN_MAP,IDM_BACKPROP

} IDM_NetworkModel;

typedef enum { IDM_AS_VALID_VALUES,IDM_INCREASING_BUCKET_SIZES,IDM_AS_MISSING_VALUES,IDM_AS_EXTREME_VALUES

} IDM_OutlierTreatment;

class IDMClassifySettings: public IDMSettings {

IDM_ClassificationType ivClfType;IDM_UseMode ivUseMode;ISequence<IString> ivInputFields;ISequence<IString> ivOutputFields;IString ivClassField;IString ivClassifyResult;IString ivOutputClassField;IString ivClassStrengthField;IDMValueMapping* pivErrorWeightValueMapping;

// neural classification special parametersIDMREAL ivMinCorrectClassifyRate;IDMREAL ivMaxIncorrectClassifyRate;IDMBOOLEAN ivComputeSensitivity;IDMBOOLEAN ivNormalizeData;IDM_NetworkModel ivNetworkModel;IDMINTEGER ivInFraction;IDMINTEGER ivOutFraction;IDMINTEGER ivMaxNbOfPasses;IDMINTEGER ivNbOfHiddenUnits1;IDMINTEGER ivNbOfHiddenUnits2;IDMINTEGER ivNbOfHiddenUnits3;IDMREAL ivLearnRate;IDMREAL ivMomentum;IDM_OutlierTreatment ivTreatmentOfOutliers;

// tree classification special parametersIDM_TreeType ivTreeType;IDMINTEGER ivMaxTreeDepth;


ISequence<IDMREAL> ivAttributeWeights;IDMREAL ivPurity;IDMINTEGER ivMinNbOfRecordsPerNode;IDMBOOLEAN ivProduceUnprunedTree;IDMBOOLEAN ivProduceDistributions;

public:IDMClassifySettings();∼IDMClassifySettings( IDMRETURN &rc,

IDM_ClassificationType clfType,IString name,IDMMiningBase *pMiningBase,IDMData *pData,IDMSelections selections,ISequence <IString > &inputFields,ISequence <IString > &outputFields,IString classField,IDM_UseMode useMode,IString classifyResult,IDMData *pOutputData,IString outputClassField );

IDMClassifySettings() ;

static IDMRETURN createObject(IDM_ClassificationType clfType,IString name,IDMMiningBase *pMiningBase,IDMData *pData,IDMSelections selections,ISequence <IString > &inputFields,ISequence <IString > &outputFields,IString classField,IDM_UseMode useMode,IString classifyResult,IDMData *pOutputData,IString outputClassField,IDMClassifySettings *&pClassifSettings );

IDMRETURN deleteObject();IDMRETURN update( IDM_ClassificationType clfType,

IString name,IDMData *pData,IDMSelections selections,ISequence <IString > &inputFields,ISequence <IString > &outputFields,IString classField,IDM_UseMode useMode,IString classifyResult,IDMData *pOutputData,IString outputClassField );

IDMRETURN get( IDM_ClassificationType &clfType,IString &name,IDMMiningBase *&pMiningBase,IDMData *&pData,IDMSelections &selections,ISequence <IString > &inputFields,ISequence <IString > &outputFields,IString &classField,IDM_UseMode &useMode,IString &classifyResult,IDMData *&pOutputData,IString &outputClassField );

IDMRETURN setNeuralClfParameters(IDM_NetworkModel networkModel,IDMREAL minCorrectClassifyRate,IDMREAL maxIncorrectClassifyRate,


IDMBOOLEAN computeSensitivity,IDMBOOLEAN normalizeData,IDMINTEGER inFraction,IDMINTEGER outFraction,IDMINTEGER maxNbOfPasses,IDMINTEGER nbOfHiddenUnits1,IDMINTEGER nbOfHiddenUnits2,IDMINTEGER nbOfHiddenUnits3,IDMREAL learnRate,IDMREAL momentum);

IDMRETURN getNeuralClfParameters(IDM_NetworkModel &networkModel,IDMREAL &minCorrectClassifyRate,IDMREAL &maxIncorrectClassifyRate,IDMBOOLEAN &computeSensitivity,IDMBOOLEAN &normalizeData,IDMINTEGER &inFraction,IDMINTEGER &outFraction,IDMINTEGER &maxNbOfPasses,IDMINTEGER &nbOfHiddenUnits1,IDMINTEGER &nbOfHiddenUnits2,IDMINTEGER &nbOfHiddenUnits3,IDMREAL &learnRate,IDMREAL &momentum);

IDMRETURN setTreeClfParameters(IDM_TreeType treeType,IDMBOOLEAN produceUnprunedTree,IDMBOOLEAN produceDistributions,IDMValueMapping *pErrorWeightTable,IDMINTEGER maxTreeDepth,IDMREAL purity,IDMINTEGER minNbOfRecordsPerNode,ISequence<IDMREAL> attributeWeights);

IDMRETURN getTreeClfParameters(IDM_TreeType treeType,IDMBOOLEAN &produceUnprunedTree,IDMBOOLEAN &produceDistributions,IDMValueMapping *&pErrorWeightTable,IDMINTEGER &maxTreeDepth,IDMREAL &purity,IDMINTEGER &minNbOfRecordsPerNode,ISequence<IDMREAL> &attributeWeights);

IDM_ClassificationType getClassifyAlg();IDMRETURN setClassifyAlg( IDM_ClassificationType alg );IDMRETURN setClassStrengthField(IString);IString getClassStrengthField();IDMRETURN setErrorWeightTable( const IDMValueMapping* );IDMValueMapping* getErrorWeightTable();

// for tree classification onlyIDMRETURN setMaxTreeDepth(IDMINTEGER);IDMINTEGER getMaxTreeDepth();IDMRETURN setAttributeWeights(const ISequence<IDMREAL>& attrWeights);ISequence<IDMREAL>& getAttributeWeights();

// for neural classification onlyIDMRETURN setTreatmentOfOutliers( IDM_OutlierTreatment );IDM_OutlierTreatment getTreatmentOfOutliers();};

Data members:

ivClfTypeIdentifies whether the object is a neural classification object


(IDM_CLF_TYPE_NEURAL) or a tree classification object(IDM_CLF_TYPE_TREE). [ Tree-induction and neural networks ]

ivUseModeSpecifies the operation mode of the classification function.IDM_TRAINING_MODE means that a model is built using the input fieldsand the known classification in the class field. IDM_TEST_MODE meansthat the built model is tested against the known classification in the classfield. An accuracy or error of the model is calculated.IDM_APPLICATION_MODE means that the model is used to classify newdata where the records do not need to have a class field. [ Tree-inductionand neural networks ]

ivInputFieldsA sequence collection of field names that are used to build theclassification model. [ Tree-induction and neural networks ]

ivOutputFieldsA sequence collection of fields that appear in the produced output data ifpivOutputData is set. Output data must be produced in the use modeIDM_APPLICATION_MODE. pivOutputData must be set to produceoutput data. [ Tree-induction and neural networks ]

ivClassFieldThe field that holds the known classification in the input data. [Tree-induction and neural networks ]

ivClassifyResultThe name of a result object. In use mode IDM_TEST_MODE this result isused to test it against the known classification in the input data. In usemode IDM_APPLICATION_MODE this result is used to assign aclassification out of the existing model to the records in the new inputdata. [ Tree-induction and neural networks ]

ivOutputClassFieldThis is the name of the field in the output data into which the predictedclassification of a record is written when producing output data. This namemust be specified if pivOutputData is set. [ Tree-induction and neuralnetworks ]

ivClassStrengthFieldThe name of the output field whose value indicates the strength of aclassification. [ Tree-induction and neural networks ]

ivMinCorrectClassifyRateSpecifies how many percent of the input data should be classified correctly(minimum rate). [ Neural networks ]

ivMaxIncorrectClassifyRateSpecifies the maximum percentage of the input data that might beclassified incorrectly. [ Neural networks ]

ivComputeSensitivityWhen this flag is set the relative importance of the input variables in usemode IDM_TRAINING_MODE is computed. For each input field thepercentage of contribution is added to the results. [ Neural networks ]

ivNormalizeDataNormalizing numeric data means that the minimum value found in aninput file is mapped to zero, the maximum to one, and the intermediateinputs are scaled within that range. [ Neural networks ]


ivInFractionThe input data is divided into in-fraction and out-fraction data. Thein-fraction data is used to build a model and the out-fraction data is usedto test the validity of the built model. The ivInFraction number specifiesthe number of records that are read continuously before out-fractionrecords are read. For example, if 3 is specified as in-fraction data and 2 asout-fraction data, then 3 records are read as in-fraction records, the next 2as out-fraction records, and the next 3 as in-fraction records. This process iscontinued until the last record of the input data is read. [ Neural networks]

ivOutFractionThe input data is divided into in-fraction and out-fraction data. Thein-fraction data is used to build a model and the out-fraction data is usedto test the validity of the built model. The ivOutFraction number specifiesthe number of records that are read continuously before in-fraction recordsare read. For example, if 3 is specified as in-fraction data and 2 asout-fraction data, 3 records are read as in-fraction records, the next 2 asout-fraction records, and the next 3 as in-fraction records. This process iscontinued until the last record of the input data is read. [ Neural networks]

ivMaxNbOfPassesSpecifies the maximum number of passes. [ Neural networks ]

ivNetworkModelThe network model that should be used. [ Neural networks ]

ivNbOfHiddenUnits1By specifying the number of hidden units (ivNbOfHiddenUnits1,ivNbOfHiddenUnits2, ivNbOfHiddenUnits3) the architecture of the neuralnetwork is set manually. This requires less processing time than automaticarchitecture determination (number of hidden units not specified). Usingautomatic architecture determination, the neural classification functionevaluates different neural network architectures with different numbers ofhidden layers including the processing units in these layers. Thesealternative models are trained for a fixed number of passes, and then thebest network architecture is selected for further training. [ Neural networks]

ivNbOfHiddenUnits2See description of ivNbOfHiddenUnits1. [ Neural networks ]

ivNbOfHiddenUnits3See description of ivNbOfHiddenUnits1. [ Neural networks ]

ivLearnRateThe learn rate controls to which extend the weights (adaptable connectionsbetween processing units in different layers) are dynamically adjustedduring the training process to improve the model convergence. Largervalues indicate higher change, so the network trains more quickly withhigher values, however, the accuracy is lower. If set to 0, the learn rate isdetermined automatically. [ Neural networks ]

ivMomentumThe momentum adjusts the change applied to a weight (adaptableconnection between processing units in different layers) by factoring inprevious weight updates. It acts as a smoothing parameter that reducesoscillation and helps attain convergence. Lower values mean more training


time and more influence of the outlier data on the weights. If set to 0, themomentum is determined automatically. [ Neural networks ]

ivTreatmentOfOutliersIndicates how outliers should be treated. [Neural networks]v As valid values (IDM_AS_VALID_VALUES).v The number of buckets should be extended, adding buckets of

increasing size, so that the outliers fit in(IDM_INCREASING_BUCKET_SIZES).

v As missing values (IDM_AS_MISSING_VALUES).v As extreme values (IDM_AS_EXTREME_VALUES). The lowest limits or

highest limits are used instead of these values.

ivTreeTypeThe type of the tree to be build. You can specify eitherIDM_PRUNED_TREE or IDM_UNPRUNED_TREE. If in doubt, specifyIDM_PRUNED_TREE, because IDM_UNPRUNED_TREE will lead tooverfitting effects. [ Tree-Induction ]

pivErrorWeightValueMappingThis mapping specifies the error weight associated with pairs of values forthe class field. When the referenced 2-argument value mapping is defined,the first argument identifies the actual value and the second argumentidentifies the predicted value. [Tree classification and neural networks]

ivMaxTreeDepthSpecifies the maximum depth of the unpruned classification tree. [Tree-Induction ]

ivPuritySpecifies the minimum purity for a leaf node. As soon as a leaf node of theunpruned tree reaches this purity value, it is not split anymore. [Tree-Induction ]

ivMinNbOfRecordsPerNodeSpecifies the minimum number of records per internal (non-leaf) node. [Tree-Induction ]

ivAttributeWeightsSpecifies the weight for each input field. [ Tree-Induction ]

ivProduceUnprunedTreeIf set to IDM_TRUE, an unpruned tree is produced. [ Tree-Induction ]

ivProduceDistributionsIf set to IDM_TRUE, distributions are produced. [ Tree-Induction ]

Member functions:

IDMClassifySettingsThe default constructor.

IDMClassifySettings(IDMRETURN rc, ...)Constructs a classification-settings object with given values for all commondata members. The object is added to the classification objectsIKeySortedSet collection located in the mining base (class IDMMiningBase).An error has occurred if the return code is not equal to IDM_SUCCESS.The classification-settings object should be deleted using deleteObject().


∼IDMClassifySettingsThe destructor. If the object belongs to the classification-settings objectextend of its mining base, it is removed from this IKeySortedSet.

createObjectConstructs a classification-settings object with given values for all commondata members. The object is added to the classification objectsIKeySortedSet collection located in the mining base (class IDMMiningBase).If an error occurred, the object is deleted and pClassifSettings is set toNULL.


updateChanges the values of the data members that are common to the NeuralClassification and Tree Classification function.

get Retrieves the values of all data members that are common to the NeuralClassification and Tree Classification function.

setNeuralClfParametersSets the special neural classification parameters.

getNeuralClfParametersRetrieves the special neural classification parameters.

setTreeClfParametersSets the special tree classification parameters.

getTreeClfParametersRetrieves the special tree classification parameters.

getClassifyAlgReturns the type of the classification-settings object. IDM_CLF_TYPE_TREEmeans that the object is a tree-induction classification setting object andIDM_CLF_TYPE_NEURAL means that the object is a neural-classification-settings object. IDM_CLF_TYPE_NO means that no type is specified.

setClassifyAlgSets the type of the clustering-settings object. IDM_CLF_TYPE_TREEmeans that the object is a tree-induction classification setting object andIDM_CLF_TYPE_NEURAL means that the object is a neural-classification-settings object. IDM_CLF_TYPE_NO means that the object has no specialtype.

setClassStrengthFieldUpdates the classification strength field.

getClassStrengthFieldRetrieves the classification strength field.

setMaxTreeDepthUpdates the maximum depth for the classification tree.

getMaxTreeDepthRetrieves the maximum depth for the classification tree.

setErrorWeightTableUpdates the error weight value mapping table.

getErrorWeightTableRetrieves the error weight value mapping table.


setAttributeWeightsUpdates the attribute weights.

getAttributeWeightsRetrieves the attribute weights.



IDMPredictionSettings

The class IDMPredictionSettings combines all data definitions and otherspecifications that are necessary to start a RBF prediction run and to compute theappropriate results.

Header file: idmcpre.hpp


IDM_TRAINING_MODE,IDM_TEST_MODE,IDM_APPLICATION_MODE

} IDM_UseMode;

typedef enum {IDM_PRED_TYPE_NO,IDM_PRED_TYPE_RBF,IDM_PRED_TYPE_NEURAL

} IDM_PredictionType;

typedef enum {IDM_DEFAULT_MODEL,IDM_KOHONEN_MAP,IDM_BACKPROP

} IDM_NetworkModel;

typedef enum {IDM_AS_VALID_VALUES,IDM_INCREASING_BUCKET_SIZES,IDM_AS_MISSING_VALUES,IDM_AS_EXTREME_VALUES

} IDM_OutlierTreatment;

typedef enum {IDM_NO_FEEDBACK,IDM_FROM_1ST_HIDDEN_LAYER,IDM_FROM_OUTPUT_LAYER

} IDM_FeedbackType;

class IDMPredictionSettings : public IDMSettings {

IDM_UseMode ivUseMode;ISequence<IString> ivActiveFields;ISequence<IString> ivSupplementaryFields;IString ivPredictedField;ISequence<IString> ivOutputFields;IString ivValueField;IDMINTEGER ivMaxNumberOfPasses;IDMINTEGER ivInFraction;IDMINTEGER ivOutFraction;IString ivPredictionResult;


IDM_PredictionType ivPredType;IDMBOOLEAN ivProduceGainChart;IDMBOOLEAN ivComputeQuantiles;IString ivLowerQuantileField;IString ivUpperQuantileField;ISequence<IDMREAL> ivQuantileLimits;

// RBF prediction specific parametersIString ivRegionNbField;IDMINTEGER ivMinNumberOfPasses;IDMINTEGER ivMaxNumberOfCentres;IDMINTEGER ivMinRegionSize;IDMREAL ivAccuracy;ISet<IString> ivPredictedValues;

// neural prediction specific parametersIDM_NetworkModel ivNetworkModel;IDMINTEGER ivNbOfHiddenUnits1;IDMINTEGER ivNbOfHiddenUnits2;IDMINTEGER ivNbOfHiddenUnits3;IDMBOOLEAN ivNormalizeData;

IDMREAL ivLearnRate;IDMREAL ivMomentum;IDMREAL ivMaxErrorRate;IDMREAL ivAvgErrorRate;IDMINTEGER ivWindowsSize;IDMINTEGER ivForecastHorizon;IDM_FeedbackType ivFeedbackType;IDMREAL ivDecayFactor;IDM_OutlierTreatment ivTreatmentOfOutliers;

public:

IDMPredictionSettings();IDMPredictionSettings( IDMRETURN &rc,

IDM_PredictionType predType,IString name,IDMMiningBase *pMiningBase,IDMData *pData,IDMSelections &selection,IDM_UseMode useMode,ISequence<IString> &activeFields,ISequence<IString> &supplementaryFields,IString predictedField,IDMData* pOutputData,ISequence<IString> &outputFields,IString valueField,IDMINTEGER maxNumberOfPasses,IDMINTEGER inFraction,IDMINTEGER outFraction,IString predictionResult,IDMBOOLEAN produceGainChart,IDMBOOLEAN computeQuantiles,IString lowerQuantileField,IString upperQuantileField );

∼IDMPredictionSettings() ;static IDMRETURN createObject( IDM_PredictionType predType,

IString name,IDMMiningBase *pMiningBase,IDMData *pData,IDMSelections &selection,IDM_UseMode useMode,ISequence<IString> &activeFields,ISequence<IString> &supplementaryFields,


IString predictedField,IDMData* pOutputData,ISequence<IString> &outputFields,IString valueField,IDMINTEGER maxNumberOfPasses,IDMINTEGER inFraction,IDMINTEGER outFraction,IString predictionResult,IDMBOOLEAN produceGainChart,IDMBOOLEAN computeQuantiles,IString lowerQuantileField,IString upperQuantileField,IDMPredictionSettings *&pPreSettings );

IDMRETURN deleteObject();IDMRETURN update( IDM_PredictionType predType,

IString name,IDMData *pData,IDMSelections &selection,IDM_UseMode useMode,ISequence<IString> &activeFields,ISequence<IString> &supplementaryFields,IString predictedField,IDMData* pOutputData,ISequence<IString> &outputFields,IString valueField,IDMINTEGER maxNumberOfPasses,IDMINTEGER inFraction,IDMINTEGER outFraction,IString predictionResult,IDMBOOLEAN produceGainChart,IDMBOOLEAN computeQuantiles,IString lowerQuantileField,IString upperQuantileField );

IDMRETURN get( IDM_PredictionType &predType,IString &name,IDMMiningBase *&pMiningBase,IDMData *&pData,IDMSelections &selection,IDM_UseMode &useMode,ISequence<IString> &activeFields,ISequence<IString> &supplementaryFields,IString &predictedField,IDMData* &pOutputData,ISequence<IString> &outputFields,IString &valueField,IDMINTEGER &maxNumberOfPasses,IDMINTEGER &inFraction,IDMINTEGER &outFraction,IString &predictionResult,IDMBOOLEAN produceGainChart,IDMBOOLEAN computeQuantiles,IString lowerQuantileField,IString upperQuantileField );

IDMRETURN setValuePredParameters(IString regionNbField,IDMINTEGER minNumberOfPasses,IDMINTEGER maxNumberOfCenters,IDMINTEGER minRegionSize,IDMREAL accuracy,ISet<IString>& predictedValues );

IDMRETURN getValuePredParameters(IString &regionNbField,IDMINTEGER &minNumberOfPasses,IDMINTEGER &maxNumberOfCenters,


IDMINTEGER &minRegionSize,IDMREAL &accuracy,ISet<IString>& predictedValues );

IDMRETURN setNeuralPredParameters(IDM_NetworkModel networkModel,IDMINTEGER nbOfHiddenUnits1,IDMINTEGER nbOfHiddenUnits2,IDMINTEGER nbOfHiddenUnits3,IDMREAL learnRate,IDMREAL momentum,IDMREAL maxErrorRate,IDMREAL avgErrorRate,IDMINTEGER windowsSize,IDMINTEGER forecastHorizon,IDM_FeedbackType feedbackType,IDMREAL decayFactor);

IDMRETURN getNeuralPredParameters(IDM_NetworkModel &networkModel,IDMINTEGER &nbOfHiddenUnits1,IDMINTEGER &nbOfHiddenUnits2,IDMINTEGER &nbOfHiddenUnits3,IDMREAL &learnRate,IDMREAL &momentum,IDMREAL &maxErrorRate,IDMREAL &avgErrorRate,IDMINTEGER &windowsSize,IDMINTEGER &forecastHorizon,IDM_FeedbackType &feedbackType,IDMREAL &decayFactor);

IDMRETURN setLogRegressionSettings();IDMRETURN setTimeSeriesSettings(IDMINTEGER windowsSize,

IDMINTEGER forecastHorizon,IDM_FeedbackType feedbackType=IDM_NO_FEEDBACK);

IDMRETURN setQuantileLimits( ISequence<IDMREAL>&quantileLimits);ISequence<IDMREAL>& getQuantileLimits();

IDMRETURN setTreatmentOfOutliers( IDM_OutlierTreatment );IDM_OutlierTreatment getTreatmentOfOutliers();

}

Data members:

ivUseModeSpecifies the mode of the prediction run. IDM_TRAINING_MODE meansthat a prediction model is built. IDM_TEST_MODE is used to test aprediction model against data that is not used for building the model.IDM_APPLICATION_MODE means that a model of a previous run is usedto predict the outcome for new input data records.

ivActiveFieldsA sequence collection of field names that are active in the prediction. Thesefields are relevant when a model is built, tested, or applied. Statistics aremaintained for these fields.

ivSupplementaryFieldsA Sequential collection of field names which are supplementary to theprediction. These fields are ignored when building a model. Only statisticsare maintained for these fields.

ivOutputFieldsA sequence collection of field names that appear in the produced outputdata, if pivOutputData is set.


ivValueFieldThe name of the field in the output data into which the predicted outcomein use mode IDM_APPLICATION_MODE is written. This name must bespecified if use mode is IDM_APPLICATION_MODE.

ivRegionNumberFieldThe name of the field in the output data to contain the name of the regionto which the record was assigned. The field is filled inIDM_TRAINING_MODE and in IDM_APPLICATION_MODE ifpivOutputData is set.

ivPredictedFieldThe name of the field containing the values to be predicted.

ivMaxNumberOfPassesMaximum number of training passes over the input data.

ivMinNumberOfPassesMinimum number of training passes over the input data. No checks forany stop criteria are done; especially no test for overtraining.

ivMaxNumberOfCentresMaximum number of centres that should be built.

ivMinRegionSizeMinimum size of a region. The minimum number of records that mustbelong to a region.

ivAccuracySpecifies the tolerated error, that is, no more iterations are done if theglobal error of the model is less than ivNormalizedErrror. Also regions areno longer refined if the regional error is smaller. The error is the root meansquared error computed on the outFraction data only, normalized with 4standard deviations in the predicted field to make the value comparableacross different models.

ivInFractionThe input data is divided into in-fraction and out-fraction data. Thein-fraction data is used to build a model and the out-fraction data is usedto test the validity of the built model. The ivInFraction number specifiesthe number of records that are read consecutively before out-fractionrecords are read. For example, if 3 is specified as in-fraction data and 2 asout-fraction data, then 3 records are read as in-fraction records, the next 2as out-fraction records, and the next 3 as in-fraction records. This process iscontinued until the last record of the input data is read.

ivOutFractionThe input data is divided into in-fraction and out-fraction data. Thein-fraction data is used to build a model and the out-fraction data is usedto test the validity of the built model. The ivOutFraction number specifiesthe number of records that are read continuously before in-fraction recordsare read. For example, if 3 is specified as in-fraction data and 2 asout-fraction data, 3 records are read as in-fraction records, the next 2 asout-fraction records, and the next 3 as in-fraction records. This process iscontinued until the last record of the inputdata is read.

ivPredictionResultThis is the name of a result object. For use modeIDM_APPLICATION_MODE the statistics and the region information isread out of this result. This statistics and region information is used topredict the outcome of the input data. In IDM_APPLICATION_MODE no


result is written. Only output data is produced. Note that the active fieldsmust be all or a subset of the fields used in the cluster result. Names andtypes of the fields must match. The same is true for the supplementaryfields. A result is produced only in use modes IDM_TRAINING_MODEand IDM_TEST_MODE. It contains the information used for prediction ordisplay of model quality. In IDM_APPLICATION_MODE an output table isproduced with predicted values for all records. In other modes an outputtable is optional.

ivPredictedValuesSpecifies the values to be predicted in case the predicted field iscategorical. The prediction is the probability of a record having one ofthese values.

ivPredTypeIdentifies whether the object is an RBF Prediction settings object(IDM_PRED_TYPE_RBF) or a Neural Prediction settings object(IDM_PRED_TYPE_NEURAL).

ivProduceGainChartIf set to IDM_TRUE, an embedded data sample is produced that can beused to visualize aspects of the model quality.

ivComputeQuantilesIf set to IDM_TRUE, quantiles are computed for the value to be predicted.

ivLowerQuantileFieldThe name of the field in the output data into which the lower limit of thequantile the record was assigned to is written. Specify this name ifpivOutputData and ivComputeQuantiles is set.

ivUpperQuantileFieldThe name of the field in the output data into which the upper limit of thequantile the record was assigned to is written. Specify this name ifpivOutputData and ivComputeQuantiles is set.

ivQuantileLimitsThe list of limits in percent determining quantiles to be computed for thepredicted field. If not set and ivComputeQuantiles is set to IDM_TRUE, thedefault quantile limits 2, 10, 25, 50, 75, 90, 98 are used. If set to an emptysequence, the quantile limits are 0 and 100.

ivNetworkModelThe network model that should be used.

ivNbOfHiddenUnits1By specifying the number of hidden units (ivNbOfHiddenUnits1,ivNbOfHiddenUnits2, ivNbOfHiddenUnits3) the architecture of the neuralnetwork is set manually. This requires less processing time than automaticarchitecture determination (number of hidden units not specified). Usingautomatic architecture determination, the neural classification functionevaluates different neural network architectures with different numbers ofhidden layers including the processing units in these layers. Thesealternative models are trained for a fixed number of passes and then thebest network architecture is selected for further training.

ivNbOfHiddenUnits2See description of ivNbOfHiddenUnits1.

ivNbOfHiddenUnits3See description of ivNbOfHiddenUnits1.


ivLearnRateThe learn rate controls to which extend the weights (adaptable connectionsbetween processing units in different layers) are dynamically adjustedduring the training process to improve the model convergence. Largervalues indicate higher change, so the network trains more quickly withhigher values, however, the accuracy is lower. If set to 0, the learn rate isdetermined automatically.

ivMomentumThe momentum adjusts the change applied to a weight (adaptableconnection between processing units in different layers) by factoring inprevious weight updates. It acts as a smoothing parameter that reducesoscillation and helps attain convergence. Lower values mean more trainingtime and more influence of the outlier data on the weights. If set to 0, themomentum is determined automatically.

ivMaxErrorRateThe maximum error rate. For the neural mining functions, an error is interms of RMS error.

ivAvgErrorRateThe average error represents the percentage of records in the testingsample used to determine if the specified error limit has been met. Valuesbetween 0.25 and 0.01 are reasonable. A value of 0.0 indicates a perfectprediction.

ivWindowsSizeThe window size represents the number of records in the input data usedby the mining function to predict a value. By default, one record is used topredict a value. For time-series forecasting, 3 or more records are requiredto accurately predict a value. The values for forecast horizon and windowsize are combined to create the size of a logical input record.

ivForecastHorizonThe forecast horizon represents the relationship between the input dataand the prediction field. If the forecast horizon is set to 0, input fields andprediction fields are based on the same record. To predict values in thefuture, specify the number of periods to predict.

ivFeedbackTypeThe feedback type. The feedback type is used to specify if there is anyfeedback (0), feedback from the input layer (1), or feedback from theoutput layer (2). This adds state information to the neural functions andcan help with time-series problems. This is not surfaced on the GUIVersion 2.

ivDecayFactorThe decay factor. When a feedback type of 1 or 2 is specified, thisparameter is used to control the relative strength of the prior activationvalue and the current activation value.

ivTreatmentOfOutliersIndicates how outliers should be treated.v As valid values (IDM_AS_VALID_VALUES).v The number of buckets should be extended, adding buckets of

increasing size, so that the outliers fit in(IDM_INCREASING_BUCKET_SIZES).

v As missing values (IDM_AS_MISSING_VALUES).


v As extreme values (IDM_AS_EXTREME_VALUES). The lowest limits orhighest limits are used instead of these values.

Member functions:

IDMPredictionSettingsThe default constructor.

∼IDMPredictionSettingsThe destructor. If the object belongs to the prediction settings object extendof its mining base, it is removed from this IKeySortedSet.

IDMPredictionSettings( IDMRETURN &rc, IString name, .... )Constructs a prediction-settings object with the given values. The object isadded to the prediction settings IKeySortedSet collection that is located inthe mining base (class IDMMiningBase). An error has occurred if the returncode is not equal to IDM_SUCCESS. The prediction-settings object shouldbe deleted using deleteObject().

createObjectConstructs a prediction-settings object with given values and returns it, ifno error occurred. The object is added to the prediction settingsIKeySortedSet collection that is located in the mining base (classIDMMiningBase). If an error occurred, the object is deleted and thereference parameter pPreSettings is set to NULL.


updateChanges the values of the data members that are common to the NeuralPrediction and RBF Prediction function.

get Retrieves the values of all data members that are common to the NeuralPrediction and RBF Prediction function.

setValuePredParametersSets the special RBF Prediction parameters.

getValuePredParametersRetrieves the special RBF Prediction parameters.

setNeuralPredParametersSets the special Neural Prediction parameters.

getNeuralPredParametersRetrieves the special Neural Prediction parameters.

setLogRegressionSettingsSets logistic regression settings.

setTimeSeriesSettingsSets window size, forecast horizon, and feedback type.

setQuantileLimitsSets the list of quantile limits.

getQuantileLimitsRetrieves the list of quantile limits.




Preprocessing settings

The Environment Layer API includes a preprocessing library of datatransformation functions. You can use these functions to transform data before,between, and after mining runs.

Input data: Each preprocessing function operates on an input data. Input data canbe tables or views in a relational database.

Files: You can use the preprocessing functions on data in files by first loading thefiles into a relational database.

Output data: The preprocessing functions produce output data, which can berelational database tables or views, or files.

With a few exceptions, which are noted in the descriptions of the individualfunctions, all preprocessing functions support output data types of relationaldatabase tables and views. To avoid duplication of data, output data is usually inthe form of a view. Tables can also be used as permanent output data.

Files are the only type of output data produced by the Copy Records To Filefunction.

Output data can be used as input to other preprocessing functions or to miningfunction runs.

Unloading data: You can use the Copy Records To File function to unload datafrom database tables and views to files before running mining functions.

Cleaning up the database: You can use the Clean Up Data Sources function at theend of a mining run to delete tables and views created by the preprocessingfunctions.

Completion: If a function completes successfully, all data transformations completeand the output data is committed to the database server.

If a function completes unsuccessfully, any data transformations in progress whenthe function fails are rolled back from the database server. This means that thedatabase server is returned to the same state it was in before the function was run.

One exception to this is when overwriting existing tables and views in DB2 forMVS. After performing a DROP on the original output data, a COMMIT isperformed. Thus, even if the function fails to locate the output data, the originaloutput data is deleted and cannot be restored automatically.

Indexes: Some preprocessing functions attempt to create indexes for the outputdata which match those defined for the input data. The attempt is made if all ofthe following criteria are met:v The input data is a DB table.v The output data is a DB table.v All of the index fields in an input data for a given index are included in the

output data.


For some database management systems, creating an index fails if any of the indexfields in the input data contain a + or − symbol in their name. These symbolscause ambiguity when creating an index.

If the preprocessing function cannot create a particular index, the function does notfail, but a warning is issued.

Include files: To use the preprocessing functions, include the header fileidmcmnb.hpp, which includes all of the preprocessing function header files.

DB2 partitioned table considerations: See Using the Intelligent Miner for Data fordetailed rules on partitioning.

IDMProcessingSettings

This class is an abstract base class for all preprocessing functions. All otherpreprocessing functions are derived from this class. It contains attributes andmethods common to all the preprocessing functions.

IDMProcessingSettings is derived from the class IDMSettings. See “IDMSettings”on page 107 for more information.

Header file: idmpcpar.hpp

Format:class IDMProcessingSettings : public IDMSettings{protected:

IString ivServerName;IString ivDatabaseName;IString ivTablespaceName;IString ivInputSchemaName;IString ivInputDataSourceName;IDM_OutputType ivOutputType;IString ivOutputSchemaName;IString ivOutputDataSourceName;IDM_OutputOption ivOverwriteExistingDataSource;IString ivOutputDataSourceComment;

public:virtual IDMRETURN setInputData(IDMData &inputData);

virtual IDMRETURN setOutputData(IDMData &outputData);

virtual IDMRETURN getOutputData(const IString &name,IDMData *&pOutputData,IDM_DataUseMode=IDM_INPUT_OUTPUT );

virtual IDMRETURN setResultName( IString resName,IDMBOOLEAN overwriteResult=IDM_TRUE)

virtual IDMRETURN optimizeForTime(IDMBOOLEAN optTime);

};

Data members:

ivServerNameThe name of the database server. To query the list of available database


servers, use the IDMDB2Table::getDB2DatabaseServers method. See“IDMDB2Table” on page 44 for more information on how to use the classand its methods.

ivDatabaseNameThis value is optional. If the output data type is IDM_OUTPUT_TABLE,you can use this value to specify the name of the database to which youwant to write the output data.

This value is ignored except if connected to a DB2 for MVS databaseserver.

ivTablespaceNameThis value is optional. If the output data type is IDM_OUTPUT_TABLE, you canuse this value to specify the tablespace to which you want to write theoutput data. See Using the Intelligent Miner for Data for more informationabout table space rules.

ivInputSchemaNameThe name of the schema to which the input data is assigned.

ivInputDataSourceNameThe name of the input data.

ivOutputTypeThe format of the output data. This value can be either IDM_OUTPUT_TABLEor IDM_OUTPUT_VIEW.

ivOutputSchemaNameThe name of the schema to which the output data will be assigned.

ivOutputDataSourceNameThe name of the output data.

ivOverwriteExistingDataSourceSpecifies whether to replace an existing data with the same name.

This value can be either IDM_NO_OVERWRITE, IDM_OVERWRITE, orIDM_APPEND_TO_TABLE.

IDM_NO_OVERWRITEIf a table with the same name as the output data exists in thedatabase, an error is returned.

IDM_OVERWRITEIf a table or view with the same name as the output data exists inthe database, it is dropped and a new table or view is created.

IDM_APPEND_TO_TABLEIf a table with the same name as the output data exists in thedatabase, a new table is not created.

If a view with the same name as the output data exists in thedatabase, the settings object fails.

When rows are appended to a relational database table, thecorresponding columns must be of compatible data types andlengths. For the existing table, the correspondence of columns isbased on the position of the column names in the CREATE TABLEstatement. For appended rows, the correspondence of columns isbased on the position of the column names in the INSERTstatement.


ivOutputDataSourceCommentProvides a descriptive comment about the output data. If the comment isgreater than 254 characters, it is truncated. See the appropriate DB2 SQLreference for details on creating comments.

Notes on inherited methods:

setInputDataChanges the server name, the input schema name, and the input dataname attributes of an object. The IDMData object must reference arelational database table or view. The database server attribute of theIDMData object must match ivServerName.

This method returns an error if the type of object is RunSQL orCleanUpDataSources.

setOutputDataChanges the output schema name and the output-data name-attributes ofan object. The IDMData object must reference a relational database.

The database server attribute of the IDMData object must matchivServerName.

Exception: For the setOutputData method on a Copy Records to File object,the IDMData object parameter must reference a flat file.

This method returns an error if the type of object is RunSQL orCleanUpDataSources.

getOutputDataCreates an IDMData object using the name supplied in the signature. Thisobject references the output data and is added to the mining base IDMDataextend.

This method returns an error if the type of object is RunSQL,CleanUpDataSources, or CopyRecordsToFile.

print Prints the object’s attributes.

setResultName()This function returns IDM_ERROR because preprocessing functions do notgenerate results.

setOptimizeForTime()The Copy Records to File preprocessing function can optimize itsperformance with regard to time or space. This flag indicates which kindof optimization you prefer.

For all other preprocessing functions, setOptimizeForTime() returns awarning since it is not an attribute used by the preprocessing functions.

IDMAggregateValues

This class performs aggregation functions on the selected fields. The set ofaggregation expressions is specified in the ivAggregationSeq entry.

Compounded aggregation expressions, like AVG(SALARY+COMM), are supported. Allfield names in the aggregation expression must be specified within an SQL columnfunction.

Null aggregation expressions are not supported. Record filtering is not supported.


The aggregate values function produces an output data that contains one record.The record contains one field per ivAggregationSeq entry.

Example:

Input data

Student# Course# Grade

321 p-101 90

101 a-099 88

101 a-440 96

003 m-101 89

003 p-101 77

321 a-440 97

Data member Value

ivAggregationSeq AVG("Grade"),Grade

Output data

Grade

89

Header file: idmpcagg.hpp

Format:class IDMAggregateValues : public IDMProcessingSettings{private:

AggregationSeq ivAggregationSeq;

public:static IDMRETURNcreateObject(

const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,IDMAggregateValues* &pAggSettings);

static IDMRETURNcreateObject(

const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,


IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,IDMAggregateValues* &pAggSettings);


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,IDMAggregateValues* &pAggSettings);


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,IDMAggregateValues* &pAggSettings);


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,AggregationSeq &aggregationSeq) const;

IDMAggregateValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,


const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq);

IDMAggregateValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq);

IDMAggregateValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq);

IDMAggregateValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq);

virtual ∼IDMAggregateValues() ;

};

Data members:

ivAggregationSeqNames a field and specifies the expression that defines the value of thefield. See “IDMGroupRecords” on page 222 for information on theAggregationPairStruct.

Member functions:


createObject(const IString &objName, IDMMiningBase *const pMiningBase,const IString &serverName, ...)

Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base.

If an error occurs in the constructor, the returned pointer will be null.

Delete the object using the deleteObject() method.

createObject(const IString &objName, IDMMiningBase *const pMiningBase,IDMData &inputData, ...)

Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base. The IDMData object provided is usedto determine the server name, the input schema name, and the input dataname.



createObject(const IString &objName, IDMMiningBase *const pMiningBase,const IString &serverName, ... IDMData &outputData ...)

Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base. The IDMData object provided is usedto determine the output schema name and the output data name.



The server name from the IDMData object must match the server nameinput parameter.

createObject(const IString &objName, IDMMiningBase *const pMiningBase,IDMData &inputData, ... IDMData &outputData ...)

Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base. The first IDMData object provided isused to determine the server name, the input schema name, and the inputdata name. The second IDMData object provided is used to determine theoutput schema name and the output data name.



The server name from the second IDMData object must match the servername from the first IDMData object.

deleteObject()Removes the object from the mining base, and then destructs it.

updateChanges the values of the object’s data members.

get Returns the values of the object’s data members.

IDMAggregateValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs an aggregate values object with the given values.

If rc is less than IDM_SUCCESS, an error occurred. Do not use the object toinvoke any methods.


IDMAggregateValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs an aggregate values object using an IDMData object. TheIDMData object provides the server name, input schema name, and inputdata name.


IDMAggregateValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs an aggregate values object with the given values. TheIDMData object provides the output schema name and output data name.The server name from the IDMData object must match the server nameinput parameter.


IDMAggregateValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ... IDMData&outputData ...)

Constructs an aggregate values object using two IDMData objects. Thefirst IDMData object provides the server name, input schema name, andinput data name. The second IDMData object provides the output schemaname and output data name. The server name from the second IDMDataobject must match the server name from the first IDMData object.


∼IDMAggregateValuesThe destructor.

IDMCalculateValues

This class creates new fields using SQL expressions.

This class supports, for example, SQL date functions, SQL arithmetic functions, andSQL string manipulation functions. It does not support SQL column functions.

The output data contains all the fields from the input data plus one additionalfield for each ivMathSeq entry.

An attempt is made to create indexes for the output data. See “Preprocessingsettings” on page 150 for more information on indexes.

Example:

Input data

Empl# Name Salary Comm

123 John Q. Public 34567.89 7654.32

234 Jane Doe 45678.90 6543.21

Data member Value

ivMathSeq "Salary"+"Comm", TotalPay


(100*"Comm")/("Salary"+"Comm"),CommPct

Output data

Empl# Name Salary Comm TotalPay CommPct

123 John Q. Public 34567.89 7654.32 42222.21 18

234 Jane Doe 45678.90 6543.21 52222.11 13

Header file: idmpccnv.hpp

Format:struct MathPairStruct{

IString mathExpression;IString newFieldName;

};

typedef ISequence<MathPairStruct> MathSeq;

class IDMCalculateValues : public IDMProcessingSettings{

private:MathSeq ivMathSeq;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const MathSeq &mathSeq,IDMCalculateValues* &pCvSettings);

static IDMRETURNcreateObject (

const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const MathSeq &mathSeq,IDMCalculateValues* &pCvSettings);


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,


const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const MathSeq &mathSeq,IDMCalculateValues* &pCvSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const MathSeq &mathSeq,IDMCalculateValues* &pCvSettings);

IDMRETURN deleteObject() ;

IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const MathSeq &mathSeq);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,MathSeq &mathSeq) const;

IDMCalculateValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const MathSeq &mathSeq);


IDMCalculateValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const MathSeq &mathSeq);

IDMCalculateValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const MathSeq &mathSeq);

IDMCalculateValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const MathSeq &mathSeq);

virtual ∼IDMCalculateValues() ;};

Struct members:

MathPairStructDefines the math expressions and output field names.

mathExpressionThe SQL expression used to create the new field.

newFieldNameThe name of the new field being created.

Data Member:

ivMathSeqLists the pairs of math expressions and output field names.

Member functions:







Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base. The IDMData object provided is usedto determine the server name, the input schema name, and the input dataname.









Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base. The first IDMData object is used todetermine the server name, input schema name, and input data name. Thesecond IDMData object provided is used to determine the output schemaname and the output data name.







IDMCalculateValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs a calculate values object with the given values.


IDMCalculateValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a calculate values object using an IDMData object. TheIDMData object provides the server name, input schema name, and inputdata name.



IDMCalculateValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs a calculate values object with the given values. The IDMDataobject provides the output schema name and output data name. The servername from the IDMData object must match the server name inputparameter.


IDMCalculateValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs a calculate values object using two IDMData objects. The firstIDMData object provides the server name, input schema name, and inputdata name. The second IDMData object provides the output schema nameand output data name. The server name from the second IDMData objectmust match the server name from the first IDMData object.


∼IDMCalculateValuesThe destructor.

IDMCleanUpDataSources

This class removes input data and output data from a database. If an attempt toremove one of the input data or output data in the list fails, no input data oroutput data is removed.

If an input data or output data in the list does not exist, this class does not attemptto remove it.

Header file: idmpccln.hpp

Format:struct CleanupPairStruct{

IDM_OutputType outputType;IString schemaName;IString dataSourceName;

};

typedef ISequence<CleanupPairStruct> CleanupSeq;

class IDMCleanUpDataSources : public IDMProcessingSettings{private:

CleanupSeq ivCleanupSeq;

public:static IDMRETURNcreateObject (

const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,


const IString &databaseName,const IString &tablespaceName,const CleanupSeq &cleanupSeq,IDMCleanUpDataSources* &pClnSettings);


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const CleanupSeq &cleanupSeq);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,CleanupSeq &cleanupSeq) const;

IDMCleanUpDataSources(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const CleanupSeq &cleanupSeq);

virtual ∼IDMCleanUpDataSources();

};

Struct members:

CleanupPairStructDefines the format of the output data, the schema to which the data isassigned, and the name of the output data.

outputTypeThe format of the data being removed from the database. Thisvalue can be either IDM_OUTPUT_TABLE or IDM_OUTPUT_VIEW.

schemaNameThe name of the schema to which the data is assigned.

dataSourceNameSpecifies the name of the data being removed from the database.

Data members:

ivCleanupSeqSpecifies the name, type, and schema of the data being removed from thedatabase.

Member functions:

createObjectCalls the constructor and, if successful, adds the object to the mining base.


Delete the object using the deleteObject method.

deleteObjectRemoves the object from the mining base and then destructs it.




IDMCleanUpDataSourcesConstructs a clean up data sources object with the given values.


∼IDMCleanUpDataSourcesThe destructor.

Notes on inherited methods: The following methods always return an error:v setInputData()v setOutputData()v getOutputData()

IDMConvertToLowercaseOrUppercase

This class converts one or more fields in an input data to either uppercase orlowercase.

The output data contains all the fields from the input data, plus a new field foreach ivCastingSeq entry.

If your database server does not support the UCASE and LCASE function, forexample DB2 for OS/390, DB2PE, DataJoiner, DB2 for AS400, then this functionproduces output data in the form of a table only.


Example:

Input data

Cust# Name Address Favorite Fruit

123 John Q. Public 123 Main St. APPles

234 Jane Doe 10 Downing St. bAnAnAs

Data member Value

ivCastingSeq Name,TRUE,UCName

Favorite Fruit,FALSE,LCFruit

Output data

Cust# Name Address Favorite Fruit UCName LCFruit

123 John Q. Public 123 Main St. APPles JOHN Q. PUBLIC apples

234 Jane Doe 10 Downing St. bAnAnAs JANE DOE bananas

Header file: idmpcluc.hpp

Format:


struct CastingStruct{

IString fieldName;IDMBOOLEAN upperCaseFlag;IString newFieldName;

};

typedef ISequence<CastingStruct> CastingSeq;

class IDMConvertToLowercaseOrUppercase : public IDMProcessingSettings{

private:CastingSeq ivCastingSeq;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const CastingSeq &castingSeq,IDMConvertToLowercaseOrUppercase* &pLucSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const CastingSeq &castingSeq,IDMConvertToLowercaseOrUppercase* &pLucSettings );


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const CastingSeq &castingSeq,IDMConvertToLowercaseOrUppercase* &pLucSettings );


const IString &objName,


IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const CastingSeq &castingSeq,IDMConvertToLowercaseOrUppercase* &pLucSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const CastingSeq &castingSeq);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,CastingSeq &castingSeq) const;

IDMConvertToLowercaseOrUppercase(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const CastingSeq &castingSeq);

IDMConvertToLowercaseOrUppercase(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const CastingSeq &castingSeq);


IDMConvertToLowercaseOrUppercase(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const CastingSeq &castingSeq);

IDMConvertToLowercaseOrUppercase(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const CastingSeq &castingSeq);

virtual ∼IDMConvertToLowercaseOrUppercase() ;};

Struct members:

CastingStructDefines the name of the field being converted, the flag, and the name ofthe field that contains the converted values.

fieldNameThe name of the field being converted.

upperCaseFlagA flag that specifies how the field is converted.

Set the flag to TRUE to convert to uppercase, or set it to FALSE toconvert to lowercase.

newFieldNameThe name of the field that contains the converted values.

Data members:

ivCastingSeqSpecifies the field name, conversion flag, and new field name for each fieldbeing converted.

Member functions:







Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base. The IDMData object provided is usedto determine the server name, input schema name, and input data name.









Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base. The first IDMData object provided isused to determine the server name, input schema name, and input dataname. The second IDMData object provided is used to determine theoutput schema name and the output data name.







IDMConvertToLowercaseOrUppercase(IDMRETURN &rc, const IString&objName, IDMMiningBase *const pMiningBase, const IString &serverName,...) Constructs a convert to lowercase or uppercase object with the given

values.


IDMConvertToLowercaseOrUppercase(IDMRETURN &rc, const IString&objName, IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a convert to lowercase or uppercase object using an IDMDataobject. The IDMData object provides the server name, input schema name,and input data name.


IDMConvertToLowercaseOrUppercase(IDMRETURN &rc, constIString &objName, IDMMiningBase *const pMiningBase, const IString


&serverName, ... IDMData &outputData ...)Constructs a convert to lowercase or uppercase object with the givenvalues. The IDMData object provides the output schema name and outputdata name. The server name from the IDMData object must match theserver name input parameter.


IDMConvertToLowercaseOrUppercase(IDMRETURN &rc, constIString &objName, IDMMiningBase *const pMiningBase, const IString&inputData, ... IDMData &outputData ...)

Constructs a convert to lowercase or uppercase object using twoIDMData objects. The first IDMData object provides the server name, inputschema name, and input data name. The second IDMData object providesthe output schema name and output data name. The server name from thesecond IDMData object must match the server name from the firstIDMData object.


∼IDMConvertToLowercaseOrUppercaseThe destructor.

IDMCopyRecordsToFile

This class copies data from relational database tables or views to a flat file. Thisclass can optionally sort the records in the input data.

The order in which the sorting fields are specified determines the sort order. Thefirst field specified has the highest precedence. For example, if you specified DEPT,JOB, the output data will be ordered by DEPT, then by JOB within DEPT.

For each field specified as a sort field, you must specify the sort sequence as eitherascending or descending.

The number of sort fields must be within the DB2 limits specified for the ORDERBY clause.

The output data is a flat file that contains all fields from the input data.

See “IDMFlatFileTable” on page 41 for more information on flat files.

Header file: idmpcutf.hpp

Format:struct SortPairStruct{

IString fieldName;IDMBOOLEAN ascendingFlag;

};

typedef ISequence<SortPairStruct> SortSeq;

class IDMCopyRecordsToFile : public IDMProcessingSettings{


private:

SortSeq ivSortSeq;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDMData *const pOutputData,const SortSeq &sortSeq,IDMCopyRecordsToFile* &pCrfSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDMData *const pOutputData,const SortSeq &sortSeq,IDMCopyRecordsToFile* &pCrfSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDMData *const pOutputData,const SortSeq &sortSeq);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,SortSeq &sortSeq) const;

IDMCopyRecordsToFile(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDMData *const pOutputData,const SortSeq &sortSeq);

IDMCopyRecordsToFile(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,


IDMData *const pOutputData,const SortSeq &sortSeq);

virtual ˜IDMCopyRecordsToFile() ;};

Struct members:

SortPairStructDefines the name of the field being copied and the sort sequence for thefield.

fieldNameThe name of the field being copied.

ascendingFlagSpecifies the sort sequence for a field.

Set the flag to TRUE to sort in ascending order, or set it to FALSE tosort in descending order.

Data members:

ivSortSeqLists the pairs of field names and sort flags.

Member functions:












IDMCopyRecordsToFile(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData*const pOutputData, ...)

Constructs a copy records to file object with the given values.

The IDMData object used for the output data must reference a file.



IDMCopyRecordsToFile(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ... IDMData *constpOutputData, ...)

Constructs a copy records to file object using an IDMData object. TheIDMData object provides the server name, input schema name, and inputdata name.

The IDMData object used for the output data must reference a file.


∼IDMCopyRecordsToFileThe destructor.

Notes on inherited methods:

setOutputData()Changes the output data. The IDMData object parameter must reference aflat file.

getOutputData(const IString &name, IDMData *&OutputData,IDM_DataUseMode=IDM_INPUT_OUTPUT)

This method returns an error.

IDMDiscardRecordsWithMissingValues

This class removes records that contain missing values (NULLs) in any of thespecified fields.

The output data contains all the fields from the input data.


Example:

Input data

Name Age City

P. Smith − New York

A. Dupont 34 Paris

K. Schmidt 52 Hamburg

Data member Value

ivCheckFieldNames Age

Output data

The age for P. Smith in the input data is missing, so the record containing P. Smithis not included in the output data:

Name Age City

A. Dupont 34 Paris


Header file: idmpcdrn.hpp


Format:typedef ISequence<IString> FieldSeq;

class IDMDiscardRecordsWithMissingValues : public IDMProcessingSettings{

private:FieldSeq ivCheckFieldNames;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldSeq &checkFieldNames,IDMDiscardRecordsWithMissingValues* &pDrnSettings);


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tableSpaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldSeq &checkFieldNames,IDMDiscardRecordsWithMissingValues* &pDrnSettings);


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldSeq &checkFieldNames,IDMDiscardRecordsWithMissingValues* &pDrnSettings);


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tableSpaceName,IDM_OutputType outputType,IDMData &outputData,


IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldSeq &checkFieldNames,IDMDiscardRecordsWithMissingValues* &pDrnSettings);


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldSeq &checkFieldNames);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,FieldSeq &checkFieldNames) const;

IDMDiscardRecordsWithMissingValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldSeq &checkFieldNames);

IDMDiscardRecordsWithMissingValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tableSpaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldSeq &checkFieldNames);

IDMDiscardRecordsWithMissingValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,


const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldSeq &checkFieldNames);

IDMDiscardRecordsWithMissingValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tableSpaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldSeq &checkFieldNames);

virtual ˜IDMDiscardRecordsWithMissingValues();

};

Data members:

ivCheckFieldNamesLists the fields to be checked for null values.

Member functions:























IDMDiscardRecordsWithMissingValues(IDMRETURN &rc, const IString&objName, IDMMiningBase *const pMiningBase, const IString &serverName,...) Constructs a discard records with missing values object with the given

values.


IDMDiscardRecordsWithMissingValues(IDMRETURN &rc, const IString&objName, IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a discard records with missing values object using anIDMData object. The IDMData object provides the server name, inputschema name, and input data name.


IDMDiscardRecordsWithMissingValues(IDMRETURN &rc, constIString &objName, IDMMiningBase *const pMiningBase, const IString&serverName, ... IDMData &outputData ...)

Constructs a discard records with missing values object with the givenvalues. The IDMData object provides the output schema name and outputdata name. The server name from the IDMData object must match theserver name input parameter.


IDMDiscardRecordsWithMissingValues(IDMRETURN &rc, constIString &objName, IDMMiningBase *const pMiningBase, const IString&inputData, ... IDMData &outputData ...)

Constructs a discard records with missing values object using twoIDMData objects. The first IDMData object provides the server name, inputschema name, and input data name. The second IDMData object providesthe output schema name and output data name. The server name from thesecond IDMData object must match the server name from the firstIDMData object.



∼IDMDiscardRecordsWithMissingValuesThe destructor.

IDMDiscretizationIntoQuantiles

This class assigns all the records of the input data to a specified number ofquantile ranges in the output data.

The input data is first sorted in ascending order on the specified input data field.The input data is then divided into the number of quantile ranges specified.1. The number of records in the first quantile is determined by dividing the

number of records in the input data by the number of quantile ranges specified.That number of records, starting with the first record of the sorted data, areplaced into the first quantile.

2. If the input field value of the last record in a quantile is equal to the input fieldvalue of the next record read, that record is placed into the same quantile. Therecords are handled in this manner until a record is reached that has an inputfield value that does not equal that of the last record in the quantile.

3. Starting with the first record that did not have an input field value thatmatched the input field value of the previous record, the records are divided bythe remaining number of quantile ranges. That number of records are placed inthe next quantile.

Steps 2 and 3 are repeated and all the quantile ranges are filled (see the examplebelow).

This function always creates a table as the output data. The output data containsfields in accordance with the following rules:1. If the input data is a table and contains a primary key, the output data contains

all the primary key fields from the input data, plus a new field that indicateswhich quantile each record is in.However, if any of the primary key fields in the input data contain a + or −symbol in their name, continue to rule 2.

2. Else, if the input data is a table and contains a unique index, the output datacontains all of the unique index fields from the input data, plus a new fieldthat indicates which quantile each record is in.However, if any of the unique index fields in the input data contain a + or −symbol in their name, the attempt to create the index fails, and another attemptis made using the next unique index in the input data (if one exists).If all attempts to create indexes in the output data fail, continue to rule 3.

3. Else the output data contains all the fields from the input data, plus a new fieldthat indicates which quantile each record is in.

You can use the Map Values function in a subsequent step to map the quantilenumbers to symbols.

If you want additional input data fields in your output data, you can use the JoinData Sources function in a subsequent step to join the output data with the inputdata.

Example:


Input data

Stud# Course Grade

123 German 78

234 German 79

345 German 80

456 German 68

567 German 69

678 German 70

789 German 91

890 German 90

Data members Value

ivInputFieldName Grade

ivNQuantileRanges 3

ivNewFieldName 3Tile

Output Data

Stud# Course Grade 3Tile

456 German 68 1

567 German 69 1

678 German 70 2

123 German 78 2

234 German 79 2

345 German 80 3

890 German 90 3

789 German 91 3

Header file: idmpcdiq.hpp

Format:class IDMDiscretizationIntoQuantiles : public IDMProcessingSettings{

private:IString ivInputFieldName;IDMINTEGER ivNQuantiles;IString ivNewFieldName;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,


const IString &outputTableComment,const IString &inputFieldName,IDMINTEGER nQuantiles,const IString &newFieldName,IDMDiscretizationIntoQuantiles* &pDiqSettings);


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,IDMINTEGER nQuantiles,const IString &newFieldName,IDMDiscretizationIntoQuantiles* &pDiqSettings);


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDMData &outputData,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,IDMINTEGER nQuantiles,const IString &newFieldName,IDMDiscretizationIntoQuantiles* &pDiqSettings);


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDMData &outputData,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,IDMINTEGER nQuantiles,const IString &newFieldName,IDMDiscretizationIntoQuantiles* &pDiqSettings);


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,


IDMINTEGER nQuantiles,const IString &newFieldName);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IString &outputSchemaName,IString &outputTableName,IDM_OutputOption &overwriteExistingTable,IString &outputTableComment,IString &inputFieldName,IDMINTEGER &nQuantiles,IString &newFieldName) const;

IDMDiscretizationIntoQuantiles(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,IDMINTEGER nQuantiles,const IString &newFieldName);

IDMDiscretizationIntoQuantiles(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,IDMINTEGER nQuantiles,const IString &newFieldName);

IDMDiscretizationIntoQuantiles(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDMData &outputData,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,IDMINTEGER nQuantiles,const IString &newFieldName);

IDMDiscretizationIntoQuantiles(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,


const IString &databaseName,const IString &tablespaceName,IDMData &outputData,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,IDMINTEGER nQuantiles,const IString &newFieldName);

virtual ˜IDMDiscretizationIntoQuantiles() ;

};

Data members:

ivInputFieldNameThe name of the field in the input data used to sort and divide the recordsinto quantile ranges.

ivNQuantileRangesThe number of quantile ranges to create.

ivNewFieldNameThe name of the new field in the output data.

Member functions:






Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base. The IDMData object provided is usedas the input for the settings object.









Calls the constructor with the corresponding signature and, if successful,adds the object to the mining base. The first IDMData object provided isused to determine the server name, input schema name, and input data


name. The second IDMData object provided is used to determine theoutput schema name and the output data name.







IDMDiscretizationIntoQuantiles(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs a discretization into quantiles object with the given values.


IDMDiscretizationIntoQuantiles(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a discretization into quantile ranges object using anIDMData object. The IDMData object provides the server name, inputschema name, and input data name.


IDMDiscretizationIntoQuantiles(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs a discretization into quantiles object with the given values.The IDMData object provides the output schema name and output dataname. The server name from the IDMData object must match the servername input parameter.


IDMDiscretizationIntoQuantiles(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs a discretization into quantiles object using two IDMDataobjects. The first IDMData object provides the server name, input schemaname, and input data name. The second IDMData object provides theoutput schema name and output data name. The server name from thesecond IDMData object must match the input server name.


∼IDMDiscretizationIntoQuantilesThe destructor.


IDMDiscretizationUsingRanges

This class maps ranges to discrete values by first splitting the value range of acontinuous field into intervals and then mapping each interval to a discrete value.

The mapping is defined using a range table that contains three fields. Theboundary field contains the interval limits. The flag field contains a flag thatindicates whether the greatest value in the interval belongs to this interval or theinterval of the next record. The value field contains the value used for themapping. See “IDMDiscretization” on page 88 for more information.

Records that contain null values do not fall within any interval, and are notprocessed.

This function always creates a table as the output data. The output data containsfields in accordance with the following rules:1. If the input data is a table and contains a primary key, the output data contains

all of the primary key fields from the input data, plus a new field that indicateswhich range each record is in.However, if any of the primary key fields in the input data contain a + or −symbol in their name, continue to rule 3.

2. Else, if the input data is a table and contains a unique index, the output datacontains all of the unique index fields from the input data, plus a new fieldthat indicates which range each record is in.If any of the unique index fields in the input data contain a + or − symbol intheir name, the attempt to create the index fails, and another attempt is madeusing the next unique index in the input data (if one exists).If all attempts to create indexes in the output data fail, continue to rule 3.

3. Else, the output data contains all the fields from the input data, plus a newfield that indicates which range each record is in.

Example:

Input data

Lic# Name Address State Rate ofSpeed

123 John Q. Public 123 Main St. CA 55

234 Jane Doe 57 1st St. CA 63

345 Jack Green 10 Downing St. CA 75

Range Data

Limit Flag Category

55 < slow

65 <= legal

fast

Data members Value

ivInputFieldName Rate of Speed

ivBoundaryFieldName Limit

ivFlagFieldName Flag


ivValueFieldName Category

ivNewFieldName Speed Category

Output Data

Lic# Name Address State Rate ofSpeed

SpeedCategory

123 John Q. Public 123 Main St. CA 55 legal

234 Jane Doe 57 1st St. CA 63 legal

345 Jack Green 10 Downing St. CA 75 fast

Header file: idmpcdur.hpp

Format:class IDMDiscretizationUsingRanges : public IDMProcessingSettings{

private:IString ivInputFieldName;IString ivRangeSchemaName;IString ivRangeDataSourceName;IString ivBoundaryFieldName;IString ivFlagFieldName;IString ivValueFieldName;IString ivNewFieldName;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,const IString &rangeSchemaName,const IString &rangeDataSourceName,const IString &boundaryFieldName,const IString &flagFieldName,const IString &valueFieldName,const IString &newFieldName,IDMDiscretizationUsingRanges* &pDurSettings);


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,const IString &rangeSchemaName,const IString &rangeDataSourceName,


const IString &boundaryFieldName,const IString &flagFieldName,const IString &valueFieldName,const IString &newFieldName,IDMDiscretizationUsingRanges* &pDurSettings);


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDMData &outputData,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,const IString &rangeSchemaName,const IString &rangeDataSourceName,const IString &boundaryFieldName,const IString &flagFieldName,const IString &valueFieldName,const IString &newFieldName,IDMDiscretizationUsingRanges* &pDurSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDMData &outputData,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,const IString &rangeSchemaName,const IString &rangeDataSourceName,const IString &boundaryFieldName,const IString &flagFieldName,const IString &valueFieldName,const IString &newFieldName,IDMDiscretizationUsingRanges* &pDurSettings);


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,const IString &rangeSchemaName,const IString &rangeDataSourceName,const IString &boundaryFieldName,const IString &flagFieldName,const IString &valueFieldName,const IString &newFieldName);

IDMRETURN setRangeData(IDMData &rangeData);


IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IString &outputSchemaName,IString &outputTableName,IDM_OutputOption &overwriteExistingTable,IString &outputTableComment,IString &inputFieldName,IString &rangeSchemaName,IString &rangeDataSourceName,IString &boundaryFieldName,IString &flagFieldName,IString &valueFieldName,IString &newFieldName) const;

IDMDiscretizationUsingRanges(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,const IString &rangeSchemaName,const IString &rangeDataSourceName,const IString &boundaryFieldName,const IString &flagFieldName,const IString &valueFieldName,const IString &newFieldName);

IDMDiscretizationUsingRanges(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,const IString &outputSchemaName,const IString &outputTableName,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,const IString &rangeSchemaName,const IString &rangeDataSourceName,const IString &boundaryFieldName,const IString &flagFieldName,const IString &valueFieldName,const IString &newFieldName);

IDMDiscretizationUsingRanges(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDMData &outputData,


IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,const IString &rangeSchemaName,const IString &rangeDataSourceName,const IString &boundaryFieldName,const IString &flagFieldName,const IString &valueFieldName,const IString &newFieldName);

IDMDiscretizationUsingRanges(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDMData &outputData,IDM_OutputOption overwriteExistingTable,const IString &outputTableComment,const IString &inputFieldName,const IString &rangeSchemaName,const IString &rangeDataSourceName,const IString &boundaryFieldName,const IString &flagFieldName,const IString &valueFieldName,const IString &newFieldName);

virtual ˜IDMDiscretizationUsingRanges() ;

};

Data members:

ivInputFieldNameName of the input data field that contains the continuous field.

ivRangeSchemaNameThe name of the schema to which the range data is assigned.

ivRangeDataSourceNameThe name of the range table or view.

ivBoundaryFieldNameThe name of the field in the range data that contains the upper boundaryvalue of the interval.

ivFlagFieldNameThe name of the field in the range data that contains a flag specifyingwhether the boundary value belongs to the current interval or the nextinterval.

The < character indicates that the next interval is used. Any other value orcharacter indicates that the current interval is used.

ivValueFieldNameThe name of the field in the range data that contains the values used formapping.

ivNewFieldNameName of the new field. If a name is not specified, the field name specifiedby the ivValueFieldName is used. If this field name already occurs in theoutput data, a new, unique field name is generated by adding anunderscore (_) and an integer to the existing field name. The integer isincremented until the new field name is unique.


Member functions:













The server name from the IDMData object must match the input servername.








setRangeDataSets the range data using an IDMData object. The IDMData object mustreference a relational database table or view.

The database server attribute of the IDMData object must match theivServerName attribute of the IDMDiscretizationUsingRanges object.



IDMDiscretizationUsingRanges(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs a discretization using ranges object with the given values.


IDMDiscretizationUsingRanges(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a discretization using ranges object using an IDMDataobject. The IDMData object provides the server name, input schema name,and input data name.


IDMDiscretizationUsingRanges(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs a discretization using ranges object with the given values.The IDMData object provides the output schema name and output dataname. The server name from the IDMData object must match the servername input parameter.


IDMDiscretizationUsingRanges(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs a discretization using ranges object using two IDMDataobjects. The first IDMData object provides the server name, input schemaname, and input data name. The second IDMData object provides theoutput schema name and output data name. The server name from thesecond IDMData object must match the input server name.


∼IDMDiscretizationUsingRangesThe destructor.

IDMEncodeMissingValues

This class encodes missing values (NULLs) in specified fields to the encodingvalues. The encoded values are placed in a new field in the output data. Eachinput data field can be encoded with a different value.

The output data contains all the fields in the input data, plus a field for eachivEncodingSeq entry.


Example:


Input data

Cust# Name Address State

123 John Q. Public 123 Main St. CA

222 - 29 Bayberry Rd. NY

234 Jane Doe 57 1st St. AL

333 - 35 Stephen Dr. NY

Data member Value

ivEncodingSeq Name,Occupant,Addressee

Output data

The names for customer numbers 222 and 333 are missing, so the encoding valueis placed in the Addressee field of the output data for those records:

Cust# Name Address State Addressee

123 John Q. Public 123 Main St. CA John Q. Public

222 - 29 Bayberry Rd. NY Occupant

234 Jane Doe 57 1st St. AL Jane Doe

333 - 35 Stephen Dr. NY Occupant

Header file: idmpcemv.hpp

Format:struct EncodingStruct{

IString inputFieldName;IString encodingValue;IString newFieldName;

};

typedef ISequence<EncodingStruct> EncodingSeq;

class IDMEncodeMissingValues : public IDMProcessingSettings{

private:EncodingSeq ivEncodingSeq;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const EncodingSeq &encodingSeq,IDMEncodeMissingValues* &pEmvSettings);

static IDMRETURN


createObject (const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const EncodingSeq &encodingSeq,IDMEncodeMissingValues* &pEmvSettings);


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const EncodingSeq &encodingSeq,IDMEncodeMissingValues* &pEmvSettings);


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const EncodingSeq &encodingSeq,IDMEncodeMissingValues* &pEmvSettings);


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const EncodingSeq &encodingSeq);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,


IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,EncodingSeq &encodingSeq) const;

IDMEncodeMissingValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const EncodingSeq &encodingSeq);

IDMEncodeMissingValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const EncodingSeq &encodingSeq);

IDMEncodeMissingValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const EncodingSeq &encodingSeq);

IDMEncodeMissingValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const EncodingSeq &encodingSeq);

virtual ˜IDMEncodeMissingValues() ;};

Struct members:


EncodingStructDefines the name of the field being encoded, the encoding value, and thename of the new field.

inputFieldNameThe name of the field in the input data to be encoded.

encodingValueThe value to use for encoding.

newFieldNameThe name of the field in the output data that contains the encodedvalues.

Data members:

ivEncodingSeqSpecifies the input data field name, encoding value, and new field namefor each field being encoded.

Member functions:























IDMEncodeMissingValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs an encode missing values object with the given values.


IDMEncodeMissingValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs an encode missing values object using an IDMData object. TheIDMData object provides the server name, input schema name, and inputdata name.


IDMEncodeMissingValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs an encode missing values object with the given values. TheIDMData object provides the output schema name and output data name.The server name from the IDMData object must match the server nameinput parameter.


IDMEncodeMissingValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs an encode missing values object using two IDMData objects.The first IDMData object provides the server name, input schema name,and input data name. The second IDMData object provides the outputschema name and output data name. The server name from the secondIDMData object must match the input server name.


∼IDMEncodeMissingValuesThe destructor.

IDMEncodeNonvalidValues

This class searches a field in an expected-values data for values that exist in thespecified input data field.


Values in the expected-values field are called valid values. Values in the input datafield that are not found in the expected-values field of the expected-values data arecalled nonvalid values.

If an input field value matches one of the valid values, the input field value iscopied to the new field. If an input field value is nonvalid, the encoding value iscopied into the new field.

The expected-values data must be a relational database table or view.

The output data contains all the fields in the input data, plus a field for the copiedor encoded values.


Example:

Input data




345 Jack Green 10 Downing St. UK

Expected-Values Data

State Code State Name

AK Alaska

AL Alabama

CA California

... ...

Data members Value

ivInputFieldName State

ivExpectedValuesFieldName State Code

ivEncodingValue 00

ivNewFieldName State2

Output Data

Cust# Name Address State State2

123 John Q. Public 123 Main St. CA CA

234 Jane Doe 57 1st St. AL AL

345 Jack Green 10 Downing St. UK 00

Header file: idmpcevn.hpp

Format:class IDMEncodeNonvalidValues : public IDMProcessingSettings{


private:IString ivInputFieldName;IString ivExpectedValuesSchemaName;IString ivExpectedValuesDataSourceName;IString ivExpectedValuesFieldName;IString ivEncodingValue;IString ivNewFieldName;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &inputFieldName,const IString &expectedValuesSchemaName,const IString &expectedValuesDataSourceName,const IString &expectedValuesFieldName,const IString &encodingValue,const IString &newFieldName,IDMEncodeNonvalidValues* &pEnvSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &inputFieldName,const IString &expectedValuesSchemaName,const IString &expectedValuesDataSourceName,const IString &expectedValuesFieldName,const IString &encodingValue,const IString &newFieldName,IDMEncodeNonvalidValues* &pEnvSettings );


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &inputFieldName,const IString &expectedValuesSchemaName,const IString &expectedValuesDataSourceName,


const IString &expectedValuesFieldName,const IString &encodingValue,const IString &newFieldName,IDMEncodeNonvalidValues* &pEnvSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &inputFieldName,const IString &expectedValuesSchemaName,const IString &expectedValuesDataSourceName,const IString &expectedValuesFieldName,const IString &encodingValue,const IString &newFieldName,IDMEncodeNonvalidValues* &pEnvSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &inputFieldName,const IString &expectedValuesSchemaName,const IString &expectedValuesDataSourceName,const IString &expectedValuesFieldName,const IString &encodingValue,const IString &newFieldName);

IDMRETURN setExpectedValuesData(IDMData &expectedValuesData);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,IString &inputFieldName,IString &expectedValuesSchemaName,IString &expectedValuesDataSourceName,IString &expectedValuesFieldName,IString &encodingValue,IString &newFieldName) const;

IDMEncodeNonvalidValues(IDMRETURN &rc,const IString &objName,


IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &inputFieldName,const IString &expectedValuesSchemaName,const IString &expectedValuesDataSourceName,const IString &expectedValuesFieldName,const IString &encodingValue,const IString &newFieldName);

IDMEncodeNonvalidValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &inputFieldName,const IString &expectedValuesSchemaName,const IString &expectedValuesDataSourceName,const IString &expectedValuesFieldName,const IString &encodingValue,const IString &newFieldName);

IDMEncodeNonvalidValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &inputFieldName,const IString &expectedValuesSchemaName,const IString &expectedValuesDataSourceName,const IString &expectedValuesFieldName,const IString &encodingValue,const IString &newFieldName);

IDMEncodeNonvalidValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &inputFieldName,const IString &expectedValuesSchemaName,


const IString &expectedValuesDataSourceName,const IString &expectedValuesFieldName,const IString &encodingValue,const IString &newFieldName);

virtual ˜IDMEncodeNonvalidValues() ;};

Data members:

ivInputFieldNameThe name of the input data field. Values in this field are compared withthe valid values in the expected-values field.

ivExpectedValuesSchemaNameThe name of the schema to which the expected-values data is assigned.

ivExpectedValuesDataSourceNameThe name of the table or view that contains the valid values.

ivExpectedValuesFieldNameThe name of the field in the expected-values data that contains the validvalues.

ivEncodingValueThe encoding value used when the input data field contains a nonvalidvalue.

ivNewFieldNameThe name of the new field in the output data. If no name is specified, thename of the expected-values field in the expected-values data is used. Ifthe name used for this field matches the name of a field in the input data,a unique name, generated from the name of the expected-values field, isused.

Member functions:






















setExpectedValuesDataSets the expected-values data using an IDMData object. The IDMDataobject must reference a relational database table or view.



IDMEncodeNonvalidValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs an encode nonvalid values object with the given values.


IDMEncodeNonvalidValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs an encode nonvalid values object using an IDMData object.The IDMData object provides the server name, input schema name, andinput data name.


IDMEncodeNonvalidValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs an encode nonvalid values object with the given values. TheIDMData object provides the output schema name and output data name.The server name from the IDMData object must match the server nameinput parameter.


IDMEncodeNonvalidValues(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs an encode nonvalid values object using two IDMData objects.


The first IDMData object provides the server name, input schema name,and input data name. The second IDMData object provides the outputschema name and output data name. The server name from the secondIDMData object must match the input server name.


∼IDMEncodeNonvalidValuesThe destructor.

IDMFilterFields

This class keeps or removes fields specified in the ivKeepOmitFieldNamesparameter, based on the value in the ivKeepFieldType parameter.

The output data contains either:v Only the fields in the listv All fields except those in the list


Example:

Input data


321 p-101 90

101 a-099 88

101 a-440 96

003 m-101 89

003 p-101 77

321 a-440 97

Data members Value

ivKeepFieldType IDM_OMIT_FIELD

ivKeepOmitFieldNames Course#

Output data

Student# Grade

321 90

101 88

101 96

003 89

003 77

321 97

Header file: idmpccf.hpp

Format:


typedef ISequence<IString> KeepOmitFieldSeq;

class IDMFilterFields : public IDMProcessingSettings{private:

IDM_KeepFieldType ivKeepFieldType;KeepOmitFieldSeq ivKeepOmitFieldNames;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepFieldType keepFieldType,const KeepOmitFieldSeq &keepOmitFieldNames,IDMFilterFields* &pFfSettings);


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepFieldType keepFieldType,const KeepOmitFieldSeq &keepOmitFieldNames,IDMFilterFields* &pFfSettings);


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepFieldType keepFieldType,const KeepOmitFieldSeq &keepOmitFieldNames,IDMFilterFields* &pFfSettings);


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,


IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepFieldType keepFieldType,const KeepOmitFieldSeq &keepOmitFieldNames,IDMFilterFields* &pFfSettings);


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepFieldType keepFieldType,const KeepOmitFieldSeq &keepOmitFieldNames);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,IDM_KeepFieldType &keepFieldType,KeepOmitFieldSeq &keepOmitFieldNames) const;

IDMFilterFields(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepFieldType keepFieldType,const KeepOmitFieldSeq &keepOmitFieldNames);

IDMFilterFields(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepFieldType keepFieldType,


const KeepOmitFieldSeq &keepOmitFieldNames);

IDMFilterFields(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepFieldType keepFieldType,const KeepOmitFieldSeq &keepOmitFieldNames);

IDMFilterFields(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepFieldType keepFieldType,const KeepOmitFieldSeq &keepOmitFieldNames);

virtual ˜IDMFilterFields() ;};

Data members:

ivKeepFieldTypeThe keep/omit flag.

Set the flag to IDM_KEEP_FIELD to include only the fields in theivKeepOmitFieldNames list. Set the flag to IDM_OMIT_FIELD to keep all thefields except those in the ivKeepOmitFieldNames list.

ivKeepOmitFieldNamesLists the field names to either keep or omit.

Member functions:























IDMFilterFields(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &serverName, ...)

Constructs a filter fields object with the given values.


IDMFilterFields(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, IDMData &inputData, ...)

Constructs a filter fields object using an IDMData object. The IDMDataobject provides the server name, input schema name, and input data name.


IDMFilterFields(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &serverName, ... IDMData &outputData ...)

Constructs a filter fields object with the given values. The IDMDataobject provides the output schema name and output data name. The servername from the IDMData object must match the server name inputparameter.


IDMFilterFields(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &inputData, ... IDMData &outputData ...)

Constructs a filter fields object using two IDMData objects. The first


IDMData object provides the server name, input schema name, and inputdata name. The second IDMData object provides the output schema nameand output data name. The server name from the second IDMData objectmust match the input server name.


∼IDMFilterFieldsThe destructor.

IDMFilterRecords

This class filters out records that do not meet the conditions specified in thefiltering condition parameter. This class can also be used to discard the records thatcontain nonvalid values or that do not fall within a range.

The output data contains all the fields in the input data.


Example:

Input data

Name Age City

P. Smith 15 New York

A. Dupont 34 Paris


Data member Value

ivFiltering condition "Age" >= 18

Output data

The age for P. Smith does not meet the filtering condition, so the record containingP. Smith is not included in the output data:

Name Age City

A. Dupont 34 Paris


Header file: idmpcrf.hpp

Format:class IDMFilterRecords : public IDMProcessingSettings{

private:IString ivFilteringCondition;


const IString &objName,IDMMiningBase *const pMiningBase,


const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &filteringCondition,IDMFilterRecords* &pFrSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &filteringCondition,IDMFilterRecords* &pFrSettings );


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &filteringCondition,IDMFilterRecords* &pFrSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &filteringCondition,IDMFilterRecords* &pFrSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,


const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &filteringCondition);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,IString &filteringCondition) const;

IDMFilterRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &filteringCondition);

IDMFilterRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &filteringCondition);

IDMFilterRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &filteringCondition);

IDMFilterRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,


const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const IString &filteringCondition);

virtual ˜IDMFilterRecords() ;

};

Data members:

ivFilteringConditionThe condition used to filter the records.

Member functions:























IDMFilterRecords(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &serverName, ...)

Constructs a filter records object with the given values.


IDMFilterRecords(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, IDMData &inputData, ...)

Constructs a filter records object using an IDMData object. TheIDMData object provides the server name, input schema name, and inputdata name.


IDMFilterRecords(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &serverName, ... IDMData &outputData ...)

Constructs a filter records object with the given values. The IDMDataobject provides the output schema name and output data name. The servername from the IDMData object must match the server name inputparameter.


IDMFilterRecords(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &inputData, ... IDMData &outputData ...)

Constructs a filter records object using two IDMData objects. The firstIDMData object provides the server name, input schema name, and inputdata name. The second IDMData object provides the output schema nameand output data name. The server name from the second IDMData objectmust match the server name from the first IDMData object.


∼IDMFilterRecordsThe destructor.

IDMFilterRecordsUsingAValueSet

This class keeps or discards records in the input data. If an input data field valuematches a value in the matching-values table, the record is either kept ordiscarded, based on the value of the keep/omit flag.

The matching-values table must be a relational database table or view.




Example:

Input data




345 Jack Green 10 Downing St. UK

Matching-values data

State Code State Name

AK Alaska

AL Alabama

CA California

... ...

Data members Value

ivKeepRecordType IDM_KEEP_RECORD

ivInputFieldName State

ivMatchingValuesFieldName State Name

Output data

The state for Jack Green does not match a value in the matching-values data, sothe record containing Jack Green is not included in the output data:




Header file: idmpcfmr.hpp

Format:class IDMFilterRecordsUsingAValueSet : public IDMProcessingSettings{

private:IDM_KeepRecordType ivKeepRecordType;IString ivInputFieldName;IString ivMatchingValuesSchemaName;IString ivMatchingValuesDataSourceName;IString ivMatchingValuesFieldName;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,


const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepRecordType keepRecordType,const IString &inputFieldName,const IString &matchingValuesSchemaName,const IString &matchingValuesDataSourceName,const IString &matchingValuesFieldName,IDMFilterRecordsUsingAValueSet* &pFruSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepRecordType keepRecordType,const IString &inputFieldName,const IString &matchingValuesSchemaName,const IString &matchingValuesDataSourceName,const IString &matchingValuesFieldName,IDMFilterRecordsUsingAValueSet* &pFruSettings );


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepRecordType keepRecordType,const IString &inputFieldName,const IString &matchingValuesSchemaName,const IString &matchingValuesDataSourceName,const IString &matchingValuesFieldName,IDMFilterRecordsUsingAValueSet* &pFruSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepRecordType keepRecordType,const IString &inputFieldName,const IString &matchingValuesSchemaName,const IString &matchingValuesDataSourceName,const IString &matchingValuesFieldName,IDMFilterRecordsUsingAValueSet* &pFruSettings );



IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepRecordType keepRecordType,const IString &inputFieldName,const IString &matchingValuesSchemaName,const IString &matchingValuesDataSourceName,const IString &matchingValuesFieldName);

IDMRETURN setMatchingValuesData(IDMData &matchingValuesData);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,IDM_KeepRecordType &keepRecordType,IString &inputFieldName,IString &matchingValuesSchemaName,IString &matchingValuesDataSourceName,IString &matchingValuesFieldName) const;

IDMFilterRecordsUsingAValueSet(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepRecordType keepRecordType,const IString &inputFieldName,const IString &matchingValuesSchemaName,const IString &matchingValuesDataSourceName,const IString &matchingValuesFieldName);

IDMFilterRecordsUsingAValueSet(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,


IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepRecordType keepRecordType,const IString &inputFieldName,const IString &matchingValuesSchemaName,const IString &matchingValuesDataSourceName,const IString &matchingValuesFieldName);

IDMFilterRecordsUsingAValueSet(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepRecordType keepRecordType,const IString &inputFieldName,const IString &matchingValuesSchemaName,const IString &matchingValuesDataSourceName,const IString &matchingValuesFieldName);

IDMFilterRecordsUsingAValueSet(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_KeepRecordType keepRecordType,const IString &inputFieldName,const IString &matchingValuesSchemaName,const IString &matchingValuesDataSourceName,const IString &matchingValuesFieldName);

virtual ˜IDMFilterRecordsUsingAValueSet() ;};

Data members:

ivKeepRecordTypeThe keep/omit records flag.

Set the flag to IDM_KEEP_RECORD to keep records. Set the flag toIDM_OMIT_RECORD to remove records.

ivInputFieldNameThe name of the field in the input data to compare to the matching-valuesfield in the matching-values table.

ivMatchingValuesSchemaNameThe name of the schema to which the matching-values data is assigned.

ivMatchingValuesDataSourceNameThe name of the matching-values data.

ivMatchingValuesFieldNameThe name of the matching-values field.


Member functions:





















setMatchingValuesDataSets the matching-values data using an IDMData object. The IDMDataobject must reference a relational database table or view.




IDMFilterRecordsUsingAValueSet(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs a filter records using a value set object with the givenvalues. The IDMData object provides the server name, input schema name,and input data name.


IDMFilterRecordsUsingAValueSet(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a filter records using a value set object using an IDMDataobject. The IDMData object provides the server name, input schema name,and input data name.


IDMFilterRecordsUsingAValueSet(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs a filter records using a value set object with the givenvalues. The IDMData object provides the output schema name and outputdata name. The server name from the IDMData object must match theserver name input parameter.


IDMFilterRecordsUsingAValueSet(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs a filter records using a value set object using two IDMDataobjects. The first IDMData object provides the server name, input schemaname, and input data name. The second IDMData object provides theoutput schema name and output data name. The server name from thesecond IDMData object must match the server name from the firstIDMData object.


∼IDMFilterRecordsUsingAValueSetThe destructor.

IDMGetRandomSample

This class reduces an input data to a smaller size, called a sample. The size of thesample is expressed as a percentage of the number of records in the input data.

A random selection method is used to select records for the sample. Therefore, thenumber of records in the sample might not match exactly the specified percentage.

If you are using the DB2 Common Server for AIX, the sample can be a table or aview, depending on the value specified in the ivOutputType entry.

If your database server does not support the RAND function, for example, DB2 forOS/390, DB2PE, DataJoiner, or DB2 for AS400, this function produces output datain the form of a table only.




Example:

Input data

Stud# Course Grade

123 German 78

234 German 79

345 German 80

456 German 68

567 German 69

678 German 70

789 German 91

890 German 90

Data member Value

ivSampleSize 25

Output data

Stud# Course Grade

345 German 80

890 German 90

Header file: idmpcrs.hpp

Format:class IDMGetRandomSample : public IDMProcessingSettings{

private:IDMINTEGER ivSampleSize;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDMINTEGER sampleSize,IDMGetRandomSample* &pGrsSettings );



const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDMINTEGER sampleSize,IDMGetRandomSample* &pGrsSettings );


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDMINTEGER sampleSize,IDMGetRandomSample* &pGrsSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDMINTEGER sampleSize,IDMGetRandomSample* &pGrsSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDMINTEGER sampleSize);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,


IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,IDMINTEGER &sampleSize) const;

friend ostream &operator<<(ostream &,const IDMGetRandomSample &);

IDMGetRandomSample(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDMINTEGER sampleSize);

IDMGetRandomSample(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDMINTEGER sampleSize);

IDMGetRandomSample(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDMINTEGER sampleSize);

IDMGetRandomSample(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDMINTEGER sampleSize);

virtual ˜IDMGetRandomSample();};

Data members:


ivSampleSizeThe size of the sample, expressed as a percentage (1 to 99) of the inputdata.

Member functions:






















IDMGetRandomSample(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs a get random sample object with the given values.



IDMGetRandomSample(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a get random sample object using an IDMData object. TheIDMData object provides the server name, input schema name, and inputdata name.


IDMGetRandomSample(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs a get random sample object with the given values. The IDMDataobject provides the output schema name and output data name. The servername from the IDMData object must match the server name inputparameter.


IDMGetRandomSample(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs a get random sample object using two IDMData objects. Thefirst IDMData object provides the server name, input schema name, andinput data name. The second IDMData object provides the output schemaname and output data name. The server name from the second IDMDataobject must match the server name from the first IDMData object.


∼IDMGetRandomSampleThe destructor.

IDMGroupRecords

This class creates a single record from a group of records. If the input data containsmore than one group of records, a single record is created for each group.Grouping by multiple fields is supported.

For example, if the grouping fields list contains only the DEPT field, the output datacontains one record per department. This is an example of grouping by a singlefield.

If the grouping fields list contains the DEPT and JOB fields, the output data containsone record per unique combination of DEPT and JOB. This is an example ofgrouping by multiple fields.

The output data contains all of the grouping fields, in the same order that they arespecified in the grouping field list. Grouping fields in the output data retain theiroriginal names. The output data also contains a field for each ivAggregationSeqentry.

You can use the Filter Records and Filter Fields functions in subsequent steps tofilter the output data.


Example:

Input data


321 p-101 90

101 a-099 88

101 a-440 96

003 m-101 89

003 p-101 77

321 a-440 97

Data members Value

ivAggregationSeq AVG("Grade"),Grade

ivGroupingFieldNames Student#

Output data

Student# Grade

321 93

101 92

003 83

Header file: idmpcgrp.hpp

Format:struct AggregationPairStruct{

IString aggregationExpression;IString newFieldName;

};

typedef ISequence<AggregationPairStruct> AggregationSeq;typedef ISequence<IString> GroupingFieldSeq;

class IDMGroupRecords : public IDMProcessingSettings{

private:AggregationSeq ivAggregationSeq;GroupingFieldSeq ivGroupingFieldNames;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,


const GroupingFieldSeq &groupingFieldNames,IDMGroupRecords* &pGrpSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,const GroupingFieldSeq &groupingFieldNames,IDMGroupRecords* &pGrpSettings );


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,const GroupingFieldSeq &groupingFieldNames,IDMGroupRecords* &pGrpSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,const GroupingFieldSeq &groupingFieldNames,IDMGroupRecords* &pGrpSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,const GroupingFieldSeq &groupingFieldNames);


IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,AggregationSeq &aggregationSeq,GroupingFieldSeq &groupingFieldNames) const;

IDMGroupRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,const GroupingFieldSeq &groupingFieldNames);

IDMGroupRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,const GroupingFieldSeq &groupingFieldNames);

IDMGroupRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,const GroupingFieldSeq &groupingFieldNames);

IDMGroupRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,


IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const AggregationSeq &aggregationSeq,const GroupingFieldSeq &groupingFieldNames);

virtual ˜IDMGroupRecords();};

Struct members:

AggregationPairStructDefines the aggregation expression and the new field name.

aggregationExpressionThe aggregation expression. All fields in the aggregation expressionmust either exist in the grouping fields list or be specified withinan SQL column function. Null aggregation expressions are notallowed.

newFieldNameThe name of the new field that contains the value of theexpression.

Data members:

ivAggregationSeqSpecifies the aggregation expressions and output field names.

ivGroupingFieldNamesSpecifies the fields being grouped.

Member functions:























IDMGroupRecords(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs a group records object with the given values.


IDMGroupRecords(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a group records object using an IDMData object. The IDMDataobject provides the server name, input schema name, and input data name.


IDMGroupRecords(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs a group records object with the given values. The IDMDataobject provides the output schema name and output data name. The servername from the IDMData object must match the server name inputparameter.


IDMGroupRecords(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs a group records object using two IDMData objects. The firstIDMData object provides the server name, input schema name, and inputdata name. The second IDMData object provides the output schema nameand output data name. The server name from the second IDMData objectmust match the server name from the first IDMData object.


∼IDMGroupRecordsThe destructor.


IDMJoinDataSources

This class joins two relational database tables or views. Only inner equijoins aresupported.

Each pair of fields named in the structure (struct FieldPairStruct) must contain twoequality compatible fields.

If a field in the second input data is not included in the list of join field pairs, buthas the same name as a field in the first input data, a unique name is generatedand used in the output data. The unique name is generated by adding anunderscore (_) and an integer to the existing field name. The integer is incrementeduntil the new field name is unique.

If the unique name exceeds the maximum length for a field name, the original fieldname is truncated and the underscore and integer are added. The integer isincremented until the new name is unique.

You can use this function repeatedly to join more than two input data.

The output data contains all the fields from both input data except the join fieldfrom the second input data.

Example:

Input data 1

Dept# Name Budget

123 AIX Development 345678.90

234 MVS Development 987654.32

Input Data 2

Empl# Name DeptNo Salary

123 John Q. Public 123 23456.78

234 Jane D. Eau 234 34567.89

345 Jack B. Nimble 123 45678.90

456 Adelaide Erdate 234 56789.01

Data member Value

ivFieldPairSeq Dept#,DeptNo

Output data

Dept# Name Budget Empl# Name_1 Salary

123 AIX Development 345678.90 123 John Q. Public 23456.78

123 AIX Development 345678.90 345 Jack B. Nimble 45678.90

234 MVS Development 987654.32 234 Jane D. Eau 34567.89

234 MVS Development 987654.32 456 Adelaide Erdate 56789.01

Header file: idmpcjid.hpp

Format:


struct FieldPairStruct{

IString table1FieldName;IString table2FieldName;

};

typedef ISequence<FieldPairStruct> FieldPairSeq;

class IDMJoinDataSources : public IDMProcessingSettings{

private:IString ivInputSchemaName2;IString ivInputDataSourceName2;FieldPairSeq ivFieldPairSeq;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName1,const IString &inputDataSourceName1,const IString &inputSchemaName2,const IString &inputDataSourceName2,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldPairSeq &fieldPairSeq,IDMJoinDataSources* &pJdsSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName2,const IString &inputDataSourceName2,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldPairSeq &fieldPairSeq,IDMJoinDataSources* &pJdsSettings );


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName1,const IString &inputDataSourceName1,const IString &inputSchemaName2,const IString &inputDataSourceName2,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,


const IString &outputDataSourceComment,const FieldPairSeq &fieldPairSeq,IDMJoinDataSources* &pJdsSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName2,const IString &inputDataSourceName2,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldPairSeq &fieldPairSeq,IDMJoinDataSources* &pJdsSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName1,const IString &inputDataSourceName1,const IString &inputSchemaName2,const IString &inputDataSourceName2,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldPairSeq &fieldPairSeq);

IDMRETURN setInput2Data(IDMData &input2Data);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName1,IString &inputDataSourceName1,IString &inputSchemaName2,IString &inputDataSourceName2,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,FieldPairSeq &fieldPairSeq) const;

IDMJoinDataSources(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName1,const IString &inputDataSourceName1,const IString &inputSchemaName2,const IString &inputDataSourceName2,IDM_OutputType outputType,const IString &outputSchemaName,


const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldPairSeq &fieldPairSeq);

IDMJoinDataSources(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName2,const IString &inputDataSourceName2,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldPairSeq &fieldPairSeq);

IDMJoinDataSources(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName1,const IString &inputDataSourceName1,const IString &inputSchemaName2,const IString &inputDataSourceName2,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldPairSeq &fieldPairSeq);

IDMJoinDataSources(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName2,const IString &inputDataSourceName2,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const FieldPairSeq &fieldPairSeq);

virtual ˜IDMJoinDataSources();};

Struct members:

FieldPairStructDefines the names of the fields being joined.

table1FieldNameThe name of the field in the first input data used for joining.

table2FieldNameThe name of the field in the second input data used for joining.

Data members:


ivInputSchemaName2The name of the schema to which the second input data is assigned.

ivInputDataSourceName2The name of the second input data.

ivFieldPairSeqLists the field pairs from the two input data.

Member functions:






















setInput2DataSets the second data using an IDMData object. The IDMData object mustreference a relational database table or view.

The database server attribute of the IDMData object must match theivServerName attribute of the IDMJoinDataSources object.


IDMJoinDataSources(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs a join data object with the given values.


IDMJoinDataSources(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a join data object using an IDMData object. The IDMDataobject provides the server name, input schema name, and input data name.


IDMJoinDataSources(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs a join data object with the given values. The IDMData objectprovides the output schema name and output data name. The server namefrom the IDMData object must match the server name input parameter.


IDMJoinDataSources(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs a join data object using two IDMData objects. The firstIDMData object provides the server name, input schema name, and inputdata name. The second IDMData object provides the output schema nameand output data name. The server name from the second IDMData objectmust match the server name from the first IDMData object.


∼IDMJoinDataSourcesThe destructor.

IDMMapValues

This class provides one-to-one or many-to-one mappings of discrete values. Themapping is defined by a mapping data that contains a lookup field and a mappingvalue field.

The mapping data must be a relational database table or view.

The output data contains all the fields in the input data, plus a field that containsthe mapped values.


The field type of the field added to the output data is always that of the mappingfield.

If null values are not allowed in the mapping value field, then null values are notallowed in the output data fields. However, null values are allowed if theivValueMappingType entry is IDM_REPLACE_INPUT_VALUE_WITH_NULL, or if theivValueMappingType entry is IDM_COPY_INPUT_VALUE and the input data fieldallows null values.

If the input data field value matches a value in the lookup field, theivValueMappingType entry is ignored and the value from the mapping value fieldis copied to the output data.

If the ivValueMappingType entry is IDM_REPLACE_INPUT_VALUE_WITH_NULL and theinput data field value does not match a value in the lookup field, a null value iscopied to the output data.

If the ivValueMappingType entry is IDM_COPY_INPUT_VALUE and the input data fieldvalue does not match a value in the lookup field, the input data field value iscopied to the output data.

If the value mapping type is IDM_COPY_INPUT_VALUE, then the input data field andthe mapping value field must be of the same data type. The output data field canbe of equal or greater length than the input data field.

If the name of the new field is not specified, the name of the mapping value fieldfrom the mapping data is used. If the name of this field matches the name of afield in the input data, a unique name, generated from the name of the mappingvalue field, is used.


Example:

Input data

Soup Day

Potato Monday

Minestrone Wednesday

Clam chowder Friday

Vegetable Saturday

Value Mapping

DayString Day#

Sunday 1

Monday 2

Tuesday 3

Wednesday 4

Thursday 5

Friday 6


DayString Day#

Saturday 7

Data members Value

ivValueMappingType IDM_REPLACE_INPUT_VALUE_WITH_NULL

ivInputFieldName Day

ivMappingLookupField DayString

ivMappingValueField Day#

ivNewFieldName Day#

Output Data

Soup Day Day#

Potato Monday 2

Minestrone Wednesday 4

Clam chowder Friday 6

Vegetable Saturday 7

Header file: idmpcvm.hpp

Format:class IDMMapValues : public IDMProcessingSettings{

private:IDM_ValueMappingType ivValueMappingType;IString ivInputFieldName;IString ivMappingSchemaName;IString ivMappingDataSourceName;IString ivMappingLookupField;IString ivMappingValueField;IString ivNewFieldName;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_ValueMappingType valueMappingType,const IString &inputFieldName,const IString &mappingSchemaName,const IString &mappingDataSourceName,const IString &mappingLookupField,const IString &mappingValueField,const IString &newFieldName,IDMMapValues* &pMvSettings );

static IDMRETURN


createObject (const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_ValueMappingType valueMappingType,const IString &inputFieldName,const IString &mappingSchemaName,const IString &mappingDataSourceName,const IString &mappingLookupField,const IString &mappingValueField,const IString &newFieldName,IDMMapValues* &pMvSettings );


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_ValueMappingType valueMappingType,const IString &inputFieldName,const IString &mappingSchemaName,const IString &mappingDataSourceName,const IString &mappingLookupField,const IString &mappingValueField,const IString &newFieldName,IDMMapValues* &pMvSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_ValueMappingType valueMappingType,const IString &inputFieldName,const IString &mappingSchemaName,const IString &mappingDataSourceName,const IString &mappingLookupField,const IString &mappingValueField,const IString &newFieldName,IDMMapValues* &pMvSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,


const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_ValueMappingType valueMappingType,const IString &inputFieldName,const IString &mappingSchemaName,const IString &mappingDataSourceName,const IString &mappingLookupField,const IString &mappingValueField,const IString &newFieldName);

IDMRETURN setMappingData(IDMData &mappingData);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,IDM_ValueMappingType &valueMappingType,IString &inputFieldName,IString &mappingSchemaName,IString &mappingDataSourceName,IString &mappingLookupField,IString &mappingValueField,IString &newFieldName) const;

friend ostream &operator<<(ostream &,const IDMMapValues &);

IDMMapValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_ValueMappingType valueMappingType,const IString &inputFieldName,const IString &mappingSchemaName,const IString &mappingDataSourceName,const IString &mappingLookupField,const IString &mappingValueField,const IString &newFieldName);

IDMMapValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,


const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_ValueMappingType valueMappingType,const IString &inputFieldName,const IString &mappingSchemaName,const IString &mappingDataSourceName,const IString &mappingLookupField,const IString &mappingValueField,const IString &newFieldName);

IDMMapValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_ValueMappingType valueMappingType,const IString &inputFieldName,const IString &mappingSchemaName,const IString &mappingDataSourceName,const IString &mappingLookupField,const IString &mappingValueField,const IString &newFieldName);

IDMMapValues(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,IDM_ValueMappingType valueMappingType,const IString &inputFieldName,const IString &mappingSchemaName,const IString &mappingDataSourceName,const IString &mappingLookupField,const IString &mappingValueField,const IString &newFieldName);

virtual ˜IDMMapValues() ;};

Data members:

ivValueMappingTypeThe replace/copy flag.

Set the flag to IDM_REPLACE_INPUT_VALUE_WITH_NULL to replace the inputdata field value with a null value in the output data, or set the flag toIDM_COPY_INPUT_VALUE to copy the input data field value to the outputdata.


ivInputFieldNameThe field to check for mapped-from values.

ivMappingSchemaNameThe name of the schema to which the mapping data is assigned.

ivMappingDataSourceNameThe name of the mapping data.

ivMappingLookupFieldThe lookup field in the mapping data.

ivMappingValueFieldThe field in the mapping data that contains the mapping value.

ivNewFieldNameThe name of the new field created by the mapping. If no name is specified,the name of the new field is the same as the name of the mapping valuefield in the mapping data.

Member functions:






















setMappingDataSets the mapping data using an IDMData object. The IDMData object mustreference a relational database table or view.

The database server attribute of the IDMData object must match theivServerName attribute of the IDMMapValues object.


IDMMapValues(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &serverName, ...)

Constructs a map values object with the given values.


IDMMapValues(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, IDMData &inputData, ...)

Constructs a map values object using an IDMData object. The IDMDataobject provides the server name, input schema name, and input data name.


IDMMapValues(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &serverName, ... IDMData &outputData ...)

Constructs a map values object with the given values. The IDMData objectprovides the output schema name and output data name. The server namefrom the IDMData object must match the server name input parameter.


IDMMapValues(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &inputData, ... IDMData &outputData ...)

Constructs a map values object using two IDMData objects. The firstIDMData object provides the server name, input schema name, and inputdata name. The second IDMData object provides the output schema nameand output data name. The server name from the second IDMData objectmust match the server name from the first IDMData object.


∼IDMMapValuesThe destructor.

IDMPivotFieldsToRecords

This class splits each record into multiple records. The fields in the list of pivotfields are repeated in each record and each pivoted input data field becomes partof a separate record.


Null values in the fields being pivoted are preserved in the output data. Therecords with null values can be discarded or encoded in a subsequent step.

The output data contains the pivot fields, plus a field that contains the values fromeach pivoted input data field.

If you are using OS/390 or AS/400, the output data is always in form of a table.You cannot create views.


Example:

Input data

Cust# Date Prod Purch 1 Prod Purch 2 Prod Purch 3

1273 10/12/95 Apples Bananas Bread

2578 12/14/95 Coffee Paper Cups -

Data members Value

ivPivotFieldNames Cust#,Date

ivFieldNamesToPivot Prod Purch1,Prod Purch 2,Prod Purch 3

ivPivotedFieldName Prod Purchased

Output data

Cust# Date Prod Purchased

1273 10/12/95 Apples

2578 12/14/95 Coffee

1273 10/12/95 Bananas

2578 12/14/95 Paper Cups

1273 10/12/95 Bread

2578 12/14/95 -

Header file: idmpcpiv.hpp

Format:class IDMPivotFieldsToRecords : public IDMProcessingSettings{

private:PivotFieldSeq ivPivotFieldNames;PivotFieldSeq ivFieldNamesToPivot;IString ivPivotedFieldName;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,


const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const PivotFieldSeq &pivotFieldNames,const PivotFieldSeq &fieldNamesToPivot,const IString &pivotedFieldName,IDMPivotFieldsToRecords* &pPivSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const PivotFieldSeq &pivotFieldNames,const PivotFieldSeq &fieldNamesToPivot,const IString &pivotedFieldName,IDMPivotFieldsToRecords* &pPivSettings );


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const PivotFieldSeq &pivotFieldNames,const PivotFieldSeq &fieldNamesToPivot,const IString &pivotedFieldName,IDMPivotFieldsToRecords* &pPivSettings );


const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const PivotFieldSeq &pivotFieldNames,const PivotFieldSeq &fieldNamesToPivot,const IString &pivotedFieldName,IDMPivotFieldsToRecords* &pPivSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,


const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const PivotFieldSeq &pivotFieldNames,const PivotFieldSeq &fieldNamesToPivot,const IString &pivotedFieldName);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &inputSchemaName,IString &inputDataSourceName,IDM_OutputType &outputType,IString &outputSchemaName,IString &outputDataSourceName,IDM_OutputOption &overwriteExistingDataSource,IString &outputDataSourceComment,PivotFieldSeq &pivotFieldNames,PivotFieldSeq &fieldNamesToPivot,IString &pivotedFieldName) const;

IDMPivotFieldsToRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,const IString &inputDataSourceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const PivotFieldSeq &pivotFieldNames,const PivotFieldSeq &fieldNamesToPivot,const IString &pivotedFieldName);

IDMPivotFieldsToRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,const IString &outputSchemaName,const IString &outputDataSourceName,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const PivotFieldSeq &pivotFieldNames,const PivotFieldSeq &fieldNamesToPivot,const IString &pivotedFieldName);

IDMPivotFieldsToRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &inputSchemaName,


const IString &inputDataSourceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const PivotFieldSeq &pivotFieldNames,const PivotFieldSeq &fieldNamesToPivot,const IString &pivotedFieldName);

IDMPivotFieldsToRecords(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,IDMData &inputData,const IString &databaseName,const IString &tablespaceName,IDM_OutputType outputType,IDMData &outputData,IDM_OutputOption overwriteExistingDataSource,const IString &outputDataSourceComment,const PivotFieldSeq &pivotFieldNames,const PivotFieldSeq &fieldNamesToPivot,const IString &pivotedFieldName);

virtual ˜IDMPivotFieldsToRecords() ;};

Data members:

ivPivotFieldNamesList of the pivot field names.

ivFieldNamesToPivotList of the field names to pivot.

ivPivotedFieldNameName of the pivoted field.

Member functions:























IDMPivotFieldsToRecords(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ...)

Constructs a pivot fields to records object with the given values.


IDMPivotFieldsToRecords(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, IDMData &inputData, ...)

Constructs a pivot fields to records object using an IDMData object.The IDMData object provides the server name, input schema name, andinput data name.


IDMPivotFieldsToRecords(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &serverName, ... IDMData&outputData ...)

Constructs a pivot fields to records object with the given values. TheIDMData object provides the output schema name and output data name.The server name from the IDMData object must match the server nameinput parameter.


IDMPivotFieldsToRecords(IDMRETURN &rc, const IString &objName,IDMMiningBase *const pMiningBase, const IString &inputData, ... IDMData&outputData ...)

Constructs a pivot fields to records object using two IDMData objects.The first IDMData object provides the server name, input schema name,and input data name. The second IDMData object provides the output


schema name and output data name. The server name from the secondIDMData object must match the server name from the first IDMDataobject.


∼IDMPivotFieldsToRecordsThe destructor.

IDMRunSQL

This class submits SQL statements to the database server for immediate execution.Do not use this class to run SELECT statements.

The SQL statement must conform to the level of SQL supported by the databaseserver.

Header file: idmpcigs.hpp

Format:class IDMRunSQL : public IDMProcessingSettings{

private:IString ivSQLStatement;


const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &SQLStatement,IDMRunSQL* &pSqlSettings );


IDMRETURN update(const IString &objName,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &SQLStatement);

IDMRETURN get(IString &objName,IDMMiningBase *&pMiningBase,IString &serverName,IString &databaseName,IString &tablespaceName,IString &SQLStatement) const;

IDMRunSQL(IDMRETURN &rc,const IString &objName,IDMMiningBase *const pMiningBase,const IString &serverName,const IString &databaseName,const IString &tablespaceName,const IString &SQLStatement);

virtual ˜IDMRunSQL() ;};


Data members:

ivSQLStatementThe SQL statement to be passed to the database.

Member functions:

createObjectCalls the constructor and, if successful, adds the object to the mining base.


Delete the object using the deleteObject method.




IDMRunSQL(IDMRETURN &rc, const IString &objName, IDMMiningBase*const pMiningBase, const IString &serverName, ...)

Constructs a run SQL object with the given values.


∼IDMRunSQLThe destructor.

Notes on inherited methods: The following methods always return an error:v setInputData()v setOutputData()v getOutputData()

Statistics settings

IDMStatLinearRegression

Use the Linear Regression function to determine the best linear relationshipbetween the dependent variable and one or more independent variables.

For example, if you change the price of a product, you might want to know if theproduct price has linear relationship with its sales. By using linear regressionanalysis, you can find the best linear function for your data and predict futuresales.

Analysis options1. Dependent variable

Specify a field to be used as the dependent variable.The field used as the dependent variable is measured in at least an intervalscale.

2. Independent variablesSpecify one or more fields to be used for independent variables.The fields used as independent variables are measured in at least an intervalscale.


If you specify one independent variable, the regression is simple linearregression. If you select two or more variables, the multiple linear regressionis fitted.

3. Apply equation through origin

Specify:Yes to force the regression line to pass through the origin; that is, theestimated value of the dependent variable is zero when all independentvariables are zero.No to use the linear equation with nonzero constant term.An example of this requirement occurs in some growth model. Since thesales revenue (dependent) is zero at first, that is, when the time(independent) is zero, you can fit the regression line that passes throughthe origin.

4. Level for confidence intervalsSpecify a value in the range of 0.5 through 0.9999 to calculate confidenceintervals for any single observation or population mean.For example, if you type 0.95, the 95% confidence intervals for eachobservation and the population mean are calculated.

5. Lag for Durbin-Watson statisticsSpecify a positive integer in the range of 1 through (number ofobservations)/4 to calculate a Durbin-Watson statistic with the specified lag,or 0 if you do not want it.Durbin-Watson statistic tests the existence of first order auto-correlation.

Header file: idmsclin.hpp

Format:#define IDMS_FITTED_FIELD "FITTED"#define IDMS_RESIDUAL_FIELD "RESIDUAL"#define IDMS_STD_RESIDUAL_FIELD "STDRESIDUAL"#define IDMS_LOWER_MEAN_FIELD "LOWMEAN"#define IDMS_UPPER_MEAN_FIELD "UPPMEAN"#define IDMS_LOWER_INDIVIDUAL_FIELD "LOWINDIVIDUAL"#define IDMS_UPPER_INDIVIDUAL_FIELD "UPPINDIVIDUAL"


} IDM_UseMode;

class IDMStatLinearRegression : public IDMSettings{protected:

ISequence<IString> ivOrderedBy,IDM_UseMode ivUseMode;IString ivDependentVariable;ISequence<IString> ivIndependentVariables;ISequence<IString> ivOutputFields;IString ivFittedField;IString ivResidualField;IString ivStdResidualField;IString ivLowerMeanField;IString ivUpperMeanField;IString ivLowerIndividualField;IString ivUpperIndividualField;IString ivLinearRegressionResult;

IDMBOOLEAN ivThruOrigin; // thru origin?IDMREAL rvCILevel; // confidence interval level


IDMINTEGER ivLagDurbin; // lag for Durbin-Watson stat.

public:IDMStatLinearRegression();IDMStatLinearRegression(

IDMRETURN &rc,IString name,IDMMiningBase *pMnb,IDMData *pInputData,IDMSelections &selection,ISequence<IString> &orderedBy,IDM_UseMode useMode;IDMData *pOutputData,ISequence<IString> &outputFields,IString dependentVariable,ISequence<IString> &independentVariables,IDMBOOLEAN thruOrigin,IDMDOUBLE CILevel,IDMINTEGER lagDurbin,IString linearRegressionResult,IString fittedField = IDMS_FITTED_FIELD,IString residualField = IDMS_RESIDUAL_FIELD,IString stdResidualField = IDMS_STD_RESIDUAL_FIELD,IString lowerMeanField = IDMS_LOWER_MEAN_FIELD,IString upperMeanField = IDMS_UPPER_MEAN_FIELD,IString lowerIndividualField

= IDMS_LOWER_INDIVIDUAL_FIELD,IString upperIndividualField

= IDMS_UPPER_INDIVIDUAL_FIELD);

virtual ∼IDMStatLinearRegression();

static IDMRETURN createObject(IString name,IDMMiningBase *pMnb,IDMData *pInputData,IDMSelections &selection,ISequence<IString> &orderedBy,IDM_UseMode useMode;IDMData *pOutputData,ISequence<IString> &outputFields,IString dependentVariable,ISequence<IString> &independentVariables,IDMBOOLEAN thruOrigin,IDMDOUBLE CILevel,IDMINTEGER lagDurbin,IString linearRegressionResult,IDMStatLinearRegression *&pLIN,IString fittedField = IDMS_FITTED_FIELD,IString residualField = IDMS_RESIDUAL_FIELD,IString stdResidualField = IDMS_STD_RESIDUAL_FIELD,IString lowerMeanField = IDMS_LOWER_MEAN_FIELD,IString upperMeanField = IDMS_UPPER_MEAN_FIELD,IString lowerIndividualField




IDMRETURN update(IString name,IDMData *pInputData,IDMSelections &selection,ISequence<IString> &orderedBy,IDM_UseMode useMode;IDMData *pOutputData,


ISequence<IString> &outputFields,IString dependentVariable,ISequence<IString> &independentVariables,IDMBOOLEAN thruOrigin,IDMDOUBLE CILevel,IDMINTEGER lagDurbin,IString linearRegressionResult,IString fittedField = IDMS_FITTED_FIELD,IString residualField = IDMS_RESIDUAL_FIELD,IString stdResidualField = IDMS_STD_RESIDUAL_FIELD,IString lowerMeanField = IDMS_LOWER_MEAN_FIELD,IString upperMeanField = IDMS_UPPER_MEAN_FIELD,IString lowerIndividualField



IDMRETURN get(IString &name,IDMMiningBase *&pMnb,IDMData *&pInputData,IDMSelections &selection,ISequence<IString> &orderedBy,IDM_UseMode &useMode;IDMData *&pOutputData,ISequence<IString> &outputFields,IString &dependentVariable,ISequence<IString> &independentVariables,IDMBOOLEAN &thruOrigin,IDMDOUBLE &CILevel,IDMINTEGER &lagDurbin,IString &linearRegressionResult,IString &fittedField,IString &residualField,IString &stdResidualField,IString &lowerMeanField,IString &upperMeanField,IString &lowerIndividualField,IString &upperIndividualField);

};

Data members:

ivOrderedByA sequence of order fields which orders the observations into the differenttime lags.

ivUseModeSpecifies the mode the linear regression should be run in.IDM_TRAINING_MODE means that the regression equation is built.IDM_APPLICATION_MODE means that for the observations of theindependent variables the dependent variable is determined.

ivDependentVariableThe name of the field specified for the dependent variable.

ivIndependentVariablesLists names of fields specified for independent variables.

ivOutputFieldsA sequence collection of field names that appear in the produced output, ifpOutputData is specified. Output data can be produced in the use modesIDM_TRAINING_MODE and IDM_APPLICATION_MODE.


ivFittedFieldThe name of the field in the output data into which the fitted value iswritten.

ivResidualFieldThe name of the field in the output data into which the residual of theobserved and fitted value is written.

ivStdResidualFieldThe name of the field in the output data into which the standardizedresidual is written.

ivLowerMeanFieldThe name of the field in the output data into which the lower bound meanvalue of the given confidence interval is written.

ivUpperMeanFieldThe name of the field in the output data into which the upper bound meanvalue of the given confidence interval is written.

ivLowerIndividualFieldThe name of the field in the output data into which the lower boundindividual value is written.

ivUpperIndividualFieldThe name of the field in the output data into which the upper boundindividual value is written.

ivThruOriginSpecifies whether the regression line goes through the origin. Specify 0 forintercept model, or another number for non-intercept model.

cvCILevelSpecifies significance level for the calculation of confidence interval. Thevalue should be in the range of 0.5 through 0.9999.

ivLagDurbinSpecify a positive integer in the range of 1 through (number ofobservations)/4 to calculate a Durbin-Watson statistic with the specifiedlag, or 0 if you do not want it.

ivLinearRegressionResultThis is the name of the result object. For use modeIDM_APPLICATION_MODE the statistics information is read out of thisresult. This statistical information is used to calculate the output data inIDM_APPLICATION_MODE. In IDM_APPLICATION_MODE no result iswritten. Only the output data is produced.

Member functions:

IDMStatLinearRegression()The default constructor.IDMStatLinearRegression(IDMRETURN &rc, IString name,...)

Constructs a linear regression object with the given parameters. The objectis added to the Statistics-settings IKeySortedSet collection that is located inthe mining base (class IDMMiningBase). An error has occurred if the returncode is not equal to IDM_SUCCESS. The linear regression object should bedeleted using deleteObject().

∼IDMStatLinearRegression()The destructor.


createObjectConstructs a linear regression object with given values and returns it, if noerror occurred. The object is added to the Statistics-settings IKeySortedSetcollection that is located in the mining base (class IDMMiningBase). If anerror occurred, the object is deleted and pLIN is set to NULL.

deleteObjectRemoves the object from the Key-Set collection located in the mining baseobject and calls the destructor.

updateChanges the values of the data members. To update some values of theSequence collection, the whole collection must be retrieved with a getcommand and the appropriate elements must be changed.


IDMStatUnivariateCurve

Use the Univariate Curve Fitting function to fit a curve between the data and time.You can specify one of the following six curves to fit your data.

For a non-seasonal model, the six curves are:

For a seasonal model, the six curves are:

If you choose Best Fit in the Curve option, the curve with the smallest residualstandard deviation is fitted.

Analysis options1. Variable

Specify a field measured on at least an interval scale.2. Curve type

Specify a curve of one of the following types:


– Linear– Exponential– Power– Hyperbola– Reciprocal– Rational– Best fit

If you select Best Fit, the best curve, which has the smallest residualstandard deviation, is fitted.

3. Forecast periodsSpecify an integer between 0 and 99 to define the number of forecastperiods.

4. Seasonal periodsSpecify an integer 2 through 13 for a seasonal model, or 0 for non-seasonalmodel.

Header file: idmscuni.hpp

Format:#define IDMS_PERIOD_FIELD "PERIOD"#define IDMS_FITTED_FIELD "FITTED"#define IDMS_RESIDUAL_FIELD "RESIDUAL"

typedef enum { IDM_CURVE_LINEAR,IDM_CURVE_EXPONENTIAL,IDM_CURVE_POWER,IDM_CURVE_HYPERBOLA,IDM_CURVE_RECIPROCAL,IDM_CURVE_RATIONAL,IDM_CURVE_BESTFIT } IDM_StCurveType;

class IDMStatUnivariateCurve : public IDMSettings{protected:

ISequence<IString> ivOrderedBy;IString ivDependentVariable;ISequence<IString> ivOutputFields;IString ivPeriodField;IString ivFittedField;IString ivResidualField;

IDM_StCurveType ivCurveType;IDMINTEGER ivForecastPeriod;IDMINTEGER ivSeasonalPeriod;

public:IDMStatUnivariateCurve();IDMStatUnivariateCurve(

IDMRETURN &rc,IString name,IDMMiningBase *pMnb,IDMData *pInputData,IDMSelections &selection,ISequence<IString> &orderedBy,IDMData *pOutputData,ISequence<IString> &outputFields,IString dependentVariable,IDM_StCurveType curveType,IDMINTEGER forecastPeriod,IDMINTEGER seasonalPeriod,


IString periodField = IDMS_PERIOD_FIELD,IString fittedField = IDMS_FITTED_FIELD,IString residualField = IDMS_RESIDUAL_FIELD);

virtual ∼IDMStatUnivariateCurve();

static IDMRETURN createObject(IString name,IDMMiningBase *pMnb,IDMData *pInputData,IDMSelections &selection,ISequence<IString> &orderedBy,IDMData *pOutputData,ISequence<IString> &outputFields,IString dependentVariable,IDM_StCurveType curveType,IDMINTEGER forecastPeriod,IDMINTEGER seasonalPeriod,IDMStatUnivariateCurve *&pUNI,IString periodField = IDMS_PERIOD_FIELD,IString fittedField = IDMS_FITTED_FIELD,IString residualField = IDMS_RESIDUAL_FIELD);


IDMRETURN update(IString name,IDMData *pInputData,IDMSelections &selection,ISequence<IString> &orderedBy,IDMData *pOutputData,ISequence<IString> &outputFields,IString dependentVariable,IDM_StCurveType curveType,IDMINTEGER forecastPeriod,IDMINTEGER seasonalPeriod,IString periodField = IDMS_PERIOD_FIELD,IString fittedField = IDMS_FITTED_FIELD,IString residualField = IDMS_RESIDUAL_FIELD);

IDMRETURN get(IString &name,IDMMiningBase *&pMnb,IDMData *&pInputData,IDMSelections &selection,ISequence<IString> &orderedBy,IDMData *&pOutputData,ISequence<IString> &outputFields,IString &dependentVariable,IDM_StCurveType &curveType,IDMINTEGER &forecastPeriod,IDMINTEGER &seasonalPeriod,IString &periodField,IString &fittedField,IString &residualField);

};

Data members:

ivOrderedByA sequence of order fields which orders the observations into the differenttime periods and seasons.

ivDependentVariableThe name of the field specified for the dependent variable.


ivOutputFieldsA sequence collection of field names that appear in the produced output, ifpOutputData is specified.

ivPeriodFieldThe name of the field in the output data into which the period is written.

ivFittedFieldThe name of the field in the output data into which the fitted value iswritten.

ivResidualFieldThe name of the field in the output data into which the residual of theobserved and fitted value is written.

ivCurveTypeSpecifies the curve type the function tries to fit.

ivForecastPeriodSpecifies the number of periods for which forecasting is required. If set to0, no forecast is done.

ivSeasonalPeriodSpecifies the number of seasonal periods for seasonal model. The valuemust be between 2 and 13. If set to 0, it is no seasonal model.

Member functions:

IDMStatUnivariateCurve()The default constructor.

IDMStatUnivariateCurve(IDMRETURN &rc, IString name,...)Constructs a univariate curve fit object with the given parameters. Theobject is added to the Statistics-settings IKeySortedSet collection that islocated in the mining base (class IDMMiningBase). An error has occurred ifthe return code is not equal to IDM_SUCCESS. The univariate curve fitobject should be deleted using deleteObject().

∼IDMStatUnivariateCurve()The destructor.

createObjectConstructs a univariate curve fit object with given values and returns it, ifno error occurred. The object is added to the Statistics-settingsIKeySortedSet collection that is located in the mining base (classIDMMiningBase). If an error occurred, the object is deleted and pUNI is setto NULL.





IDMStatPrinComAnalysis

Principal Component Analysis seeks the standardized linear combination of theoriginal variables which can be used to summarize the data and identify linearrelationships among variables. This attempt to reduce dimensionality can be usedto reduce the number of variables in regression, clustering, and other analysismethods for multivariate data.


Specify more than two fields.2. Analysis target

Select:Correlation to calculate principal components from correlation matrix.Covariance to calculate principal components from covariance matrix.

Header file: idmscpri.hpp

Format:#define IDMS_PRINCIPLE_COMP_FIELD_PREFIX "COMP_"

typedef enum { IDM_FROM_CORRELATION,IDM_FROM_COVARIANCE } IDM_StPrinComFrom;

class IDMStatPrinComAnalysis : public IDMSettings{protected:

ISequence<IString> ivSelectedColSeq; // selected columns

IDM_StPrinComFrom ivPrinComFrom; // target analysisISequence<IString> ivOutputFields;IString ivPrincipalCompFieldPrefix;

public:IDMStatPrinComAnalysis();IDMStatPrinComAnalysis(

IDMRETURN &rc,IString name,IDMMiningBase *pMnb,IDMData *pInputData,IDMSelections &selection,IDMData *pOutputData,ISequence<IString> &outputFields,ISequence<IString> &selectedColSeq,IDM_StPrinComFrom prinComFrom,IString principalCompFieldPrefix

= IDMS_PRINCIPLE_COMP_FIELD_PREFIX);

virtual ∼IDMStatPrinComAnalysis();

static IDMRETURN createObject(IString name,IDMMiningBase *pMnb,IDMData *pInputData,IDMSelections &selection,IDMData *pOutputData,ISequence<IString> &outputFields,ISequence<IString> &selectedColSeq,IDM_StPrinComFrom prinComFrom,IDMStatPrinComAnalysis *&pPRI,IString principalCompFieldPrefix




IDMRETURN update(IString name,IDMData *pInputData,IDMSelections &selection,IDMData *pOutputData,ISequence<IString> &outputFields,ISequence<IString> &selectedColSeq,IDM_StPrinComFrom prinComFrom,IString principalCompFieldPrefix


IDMRETURN get(IString &name,IDMMiningBase *&pMnb,IDMData *&pInputData,IDMSelections &selection,IDMData *&pOutputData,ISequence<IString> *&outputFields,ISequence<IString> &selectedColSeq,IDM_StPrinComFrom &prinComFrom,IString &principalCompFieldPrefix);

};

Data members:

ivSelectedColSeqLists the fields to be used for analysis target.

ivPrinComFromSpecifies analysis target.

ivOutputFieldsA sequence collection of field names that appear in the produced output, ifpOutputData is specified.

ivPrincipalCompFieldPrefixThe name of the field prefix in the output data into which the principalcomponent scores are written.

Member functions:

IDMStatPrinComAnalysis()The default constructor.

IDMStatPrinComAnalysis(IDMRETURN &rc, IString name,...)Constructs a principal component analysis object with the givenparameters. The object is added to the Statistics-settings IKeySortedSetcollection that is located in the mining base (class IDMMiningBase). Anerror has occurred if the return code is not equal to IDM_SUCCESS. Theprincipal component analysis object should be deleted using deleteObject().

∼IDMStatPrinComAnalysis()The destructor.

createObjectConstructs a principal component analysis object with given values andreturns it, if no error occurred. The object is added to the Statistics-settingsIKeySortedSet collection that is located in the mining base (classIDMMiningBase). If an error occurred, the object is deleted and pTWO isset to NULL.





IDMStatFactorAnalysis

The purpose of factor analysis is to describe, if possible, the relationships amongmany variables in terms of a few underlying, but unobservable, random quantitiescalled factors.

For example, correlations from the group of test scores in French, English,Mathematics, and music classes collected by Spearman suggested an underlyingintelligence factor. A second group of variables, representing physical-fitness scores,if available, might correspond to another factor.


Specify more than two fields.2. Change diagonal

Select:

NO If you do not want to change the leading diagonal of the correlationmatrix.

SMC If you want Squared Multiple Correlation coefficient in the leadingdiagonal.

MAR If you want Maximum Absolute Row value in the leading diagonal.3. Determination of the number of factors

Select:

AUTOIf you want the system to determine the number of factors.

PEV If you want to retain as many factors as are needed to explain at leastp% of the variance. You must also specify a real number for p with thepercentage value of variance you want the model to explain.

RETAINIf you want to retain a specified number of factors. You must specify aninteger value for the required number of factors. The number must notexceed the number of variables being analyzed.

4. Factor rotation

Select:

NO If you do not want to rotate the factor matrix.

QRM For a Quartimax rotation.

VRM For a Varimax rotation.5. Show factor structure

Select:


YES If you want to see the structures of the factors. It will rearrange thecorrelation matrix by grouping the variables by their dominant factorand arranging them as percentages in descending order based on theinfluence that the factor has on them.

Header file: idmscfac.hpp

Format:#define IDMS_FACTOR_FIELD_PREFIX "FACT_"

typedef enum { IDM_NO_CHANGE_DIAGONAL,IDM_SMC_DIAGONAL,IDM_MAR_DIAGONAL } IDM_StDiagonalChange;

typedef enum { IDM_AUTO_SELECT_NFACTOR,IDM_PRE_PEV_NFACTOR,IDM_PRE_RETAIN_NFACTOR } IDM_StHowNumberOfFactor;

typedef enum { IDM_NO_ROTATE,IDM_VRM_ROTATE,IDM_QRM_ROTATE } IDM_StFactorRotation;

typedef enum { IDM_TRAINING_MODE,IDM_TEST_MODE,IDM_APPLICATION_MODE } IDM_UseMode;

class IDMStatFactorAnalysis : public IDMSettings{protected:

IDM_UseMode ivUseMode;ISequence<IString> ivSelectedColSeq; // selected columnsISequence<IString> ivOutputFields;

IDM_StDiagonalChange ivDiagonalChange; // diagonal changeIDM_StHowNumberOfFactor ivHowNumberOfFactor;

// how to decide # of factorsIDMDOUBLE rvPEV; // PEVIDMINTEGER ivRetain; // retainIDM_StFactorRotation ivFactorRotation; // factor rotationIDMBOOLEAN ivFactorStructure;

// show factor structure?

IString ivFactorFieldPrefix;IString ivFactorAnalysisResult;

public:IDMStatFactorAnalysis();IDMStatFactorAnalysis(

IDMRETURN &rc,IString name,IDMMiningBase *pMnb,IDMData *pInputData,IDMSelections &selection,IDM_UseMode useMode,IDMData *pOutputData,ISequence<IString> &outputFields,ISequence<IString> &selectedColSeq,IDM_StDiagonalChange diagonalChange,IDM_StHowNumberOfFactor howNumberOfFactor,IDMDOUBLE PEV,IDMINTEGER retain,IDM_StFactorRotation factorRotation,IDMBOOLEAN factorStructure,IString factorAnalysisResult,IString factorFieldPrefix


= IDMS_FACTOR_FIELD_PREFIX);

virtual ∼IDMStatFactorAnalysis();

static IDMRETURN createObject(IString name,IDMMiningBase *pMnb,IDMData *pInputData,IDMSelections &selection,IDM_UseMode useMode,IDMData *pOutputData,ISequence<IString> &outputFields,ISequence<IString> &selectedColSeq,IDM_StDiagonalChange diagonalChange,IDM_StHowNumberOfFactor howNumberOfFactor,IDMDOUBLE PEV,IDMINTEGER retain,IDM_StFactorRotation factorRotation,IDMBOOLEAN factorStructure,IString factorAnalysisResult,IDMStatFactorAnalysis *&pFAC,IString factorFieldPrefix



IDMRETURN update(IString name,IDMData *pInputData,IDMSelections &selection,IDM_UseMode useMode,IDMData *pOutputData,ISequence<IString> &outputFields,ISequence<IString> &selectedColSeq,IDM_StDiagonalChange diagonalChange,IDM_StHowNumberOfFactor howNumberOfFactor,IDMDOUBLE PEV,IDMINTEGER retain,IDM_StFactorRotation factorRotation,IDMBOOLEAN factorStructure,IString factorAnalysisResult,IString factorFieldPrefix


IDMRETURN get(IString &name,IDMMiningBase *&pMnb,IDMData *&pInputData,IDMSelections &selection,IDM_UseMode &useMode,IDMData *&pOutputData,ISequence<IString> *&outputFields,ISequence<IString> &selectedColSeq,IDM_StDiagonalChange &diagonalChange,IDM_StHowNumberOfFactor &howNumberOfFactor,IDMDOUBLE &PEV,IDMINTEGER &retain,IDM_StFactorRotation &factorRotation,IDMBOOLEAN &factorStructure,IString &factorAnalysisResult,IString &factorFieldPrefix);

};

Data members:


ivUseModeSpecifies the mode the factor analysis should be run in.IDM_TRAINING_MODE means that the factors are built.IDM_APPLICATION_MODE means that each observation of the inputfields is transformed to factors according to the regression model.

ivOutputFieldsA sequence collection of field names that appear in the produced output, ifpOutputData is specified. Output data can be produced in the use modesIDM_TRAINING_MODE and IDM_APPLICATION_MODE.

ivFactorFieldPrefixThe name of the field prefix in the output data into which the factor scoresare written.

ivSelectedColSeqLists the fields to be used for analysis target.

ivDiagonalChangeSpecifies diagonal change method.

ivHowNumberOfFactorSpecifies how to determine the number of factors.

rvPEV Specifies the percentage value of variance the model is required to explainif ivHowNumberOfFactor is specified as IDM_PRE_PEV_NFACTOR.

rvRetainSpecifies the number of factors the model is required to retain ifivHowNumberOfFactor is specified as IDM_PRE_RETAIN_NFACTOR.

ivFactorRotationSpecifies the method of factor rotation.

ivFactorStructureFlag for the calculation of factor structure.

ivFactorAnalysisResultThis is the name of the result object. For use modeIDM_APPLICATION_MODE the statistics information is read out of thisresult. These statistic information is used to calculate the output data inIDM_APPLICATION_MODE. In IDM_APPLICATION_MODE no result iswritten. Only the output data is produced.

Member functions:

IDMStatFactorAnalysis()The default constructor.

IDMStatFactorAnalysis(IDMRETURN &rc, IString name,...)Constructs a factor analysis object with the given parameters. The object isadded to the Statistics-settings IKeySortedSet collection that is located inthe mining base (class IDMMiningBase). An error has occurred if the returncode is not equal to IDM_SUCCESS. The factor analysis object should bedeleted using deleteObject().

∼IDMStatFactorAnalysis()The destructor.

createObjectConstructs a factor analysis object with given values and returns it, if noerror occurred. The object is added to the Statistics-settings IKeySortedSet


collection that is located in the mining base (class IDMMiningBase). If anerror occurred, the object is deleted and pFAC is set to NULL.




Repeatable sequences settings

You can use repeatable sequences to combine multiple settings objects into a singlelogical unit. The settings objects (sequence elements) can include multiples of eachof the preprocessing functions, statistical functions, mining run classes, oradditional repeatable sequences. You can run all sequence elements at the serverwith a single invocation from the client.

A sequence object which is contained in the collection of settings objects of anothersequence object is called a nested sequence.

IDMSequence

This class contains an ordered collection of Intelligent Miner settings objects(sequence elements). It provides methods for adding and removing sequenceelements and for running a subset of or all of the elements in the collection. Inorder to preserve the semantics of a single logical unit, the default behavior of asequence will be to terminate if one of the elements fails to complete successfully.This behavior can be overridden by setting the stopOnFailure attribute toIDM_FALSE. This can be done at construction time or using the stopOnFailuremethod.

The start member function runs the repeatable sequence. Only sequence elementsfor which the enableForStart struct member is set to IDM_TRUE are run.

If the numberOfProcesses parameter is specified and has a value not equal to 0, thenumberOfProcesses is treated as 0 because the repeatable sequences do not supportparallel processes. For more details on the numberOfProcesses parameter, refer tothe description of the start method under IDMSettings.

The stop member function ends the run if it was started asynchronously. Anattempt is made to stop the currently running sequence element. This method willnot return until the currently running sequence element is stopped or completesnormally, whichever occurs first.

Header file: idmcrseq.hpp

Format:struct SeqElement{

IDMSettings *settingsObject;IDMBOOLEAN enableForStart;

};


class IDMSequence : public IDMSettings{

public:IDMSequence();IDMSequence( IDMRETURN &rc

, const IString &sequenceObjName, IDMMiningBase *const pMiningBase, const ISequence< SeqElement > &seqElements

= ISequence< SeqElement >(), IDMBOOLEAN stopOnFailure = IDM_TRUE );

virtual ∼IDMSequence();

static IDMRETURN createObject( const IString &name, IDMMiningBase *const pMiningBase, const ISequence< SeqElement >

&seqElements, IDMBOOLEAN stopOnFailure, IDMSequence *&pSeq );


IDMRETURN optimizeForTime( IDMBOOLEAN optTime );

IDMRETURN setResultName( IString resName, IDMBOOLEAN overwriteResult );

IDMRETURN update( const IString &sequenceObjName, const ISequence< SeqElement > &seqElements, IDMBOOLEAN stopOnFailure );

IDMRETURN stopOnFailure( IDMBOOLEAN isStoppedOnFailure );

IDMRETURN setSeqElements(const ISequence<SeqElement> &seqElements);

IDMRETURN get( IString &objName, IDMMiningBase *&pMiningBase, ISequence< SeqElement > &seqElements, IDMBOOLEAN &stopOnFailure ) const;

IDMBOOLEAN isStoppedOnFailure() const;IDMRETURN getSeqElements( ISequence< SeqElement > &seqElements ) const;

IDMRETURN getSeqResults(ISequence<IString> &seqResultNames);

IDMRETURN startAll( IDMBOOLEAN syncRunFlag=IDM_TRUE, IDMINTEGER traceLevel=0 );

IString const& keySeqName() const;

struct SeqElement

The elements in the collection contained in an IDMSequence object are of typeSeqElement:

Struct members:

settingsObjectA pointer to an instance in the mining base of some subclasses ofIDMSettings.

enableForStartIf set to IDM_TRUE, this flag indicates that the sequence element will beincluded when the sequence is run using the start() method or the


startAll() method. If set to IDM_FALSE, the sequence element is onlyincluded when the sequence is run using the startAll() method.

Data members:

ivSeqElementsSpecifies the settings objects that make up the sequence and which settingsobjects should be run when the start() method is issued against thesequence.

ivStopOnFailureSpecifies whether the sequence will continue running if one of thesequence elements fails to complete successfully.

Member functions:

IDMSequence()Default constructor.

IDMSequence(IDMRETURN &rc, const IString &sequenceObjName,IDMMiningBase *const pMiningBase, ...)

Constructs a repeatable sequence object with the given values.

The collection of sequence elements can be provided in the constructor, ormanipulated after construction using the getSeqElements andsetSeqElements methods.

∼IDMSequence()The destructor.


stopOnFailureSets the behavior of the sequence when one of the sequence elements failsto complete successfully.

setSeqElementsReplaces the collection of sequence elements.

optimizedForTimeReturns a warning because ivOptimizedForTime is not an attribute that isused by the repeatable sequence.

setResultNameReturns IDM_ERROR because ivResultName should never be set for arepeatable sequence object. Repeatable sequences do not generate a result.

get Returns the values of the object’s parameters.

isStoppedOnFailureQueries the behavior of the sequence when one of the sequence elementsfails to complete successfully.

getSeqElementsReturns the collection of sequence elements.

The collection can be manipulated as necessary for later use with thesetSeqElements method.

getSeqResultsReturns a collection of result names.


The collection consists of the non-NULL result name data memberscontained in each of the sequence elements, regardless of nesting, providedthe enableForStart struct member of the sequence element is IDM_TRUE.

startAllRuns the repeatable sequence. All sequence elements are run, regardless ofthe enableForStart struct member setting.

In all other respects, this method is identical to start.


Chapter 3. The Result API

The Result API provides a set of C++ classes and C structures that represent theresults of a mining run together with C++ and C functions for retrieving andsorting them. It provides the basis for writing routines to export data to othersoftware products.

This Result API is independent of the Environment Layer API. The independencemakes it possible to create conversion routines for mining results that run onplatforms different from the current client or server platforms of Intelligent Miner.

To use this Result API on the exported results, export the results to a result fileafter having computed a mining result represented by an IDMResult object.

Result APIs for associatons and sequential patterns

The Result API for associations and sequential patterns provides operations forretrieving association rules, frequent item sets, and sequential patterns from theexported result files. Because these operations run directly on the result files, theydo not rely on the objects and methods for data definition and mining.

The Result API for associations and sequential patterns results generated by earlierversions of the Intelligent Miner is shipped with the Intelligent Miner Version 6.However, you should use the new Result API shipped with the Intelligent MinerVersion 6 because the API of earlier versions will not be supported in the future.

The following libraries contain the Result APIs:

libidmex.aV2 API AIX

libasres.aV6 API AIX

idmex.libV2 API WIN32 and OS/2

idmasres.libV6 API WIN32 and OS/2

idmex.dllV2 API additionally for OS/2

idmasres.dllAdditional V6 API for OS/2

The following header files contain the result structures and the functions:

idmxpasr.hV2 API

idmasres.hV6 API


The message catalog for these libraries is called IDMparse.cat for the V2 API. Thecatalogs for the U.S. English version are located in the directory/usr/lpp/IMiner/bin. The catalogs for the other supported languages are locatedin the respective directories.

A Java Result API for associations and sequential patterns results is stored inidmasres.jar.

Result APIs for associatons and sequential patterns Version 2

For association rules, the Intelligent Miner Version 2 offers the following features toenhance the output of the rules:v Rule orderingv Rule groupingv Rule filtering

The Intelligent Miner Version 6 does not support these features any longer.

Rule ordering

Rules can be ordered according to their values for confidence or support.User-defined orderings, like the product of confidence and support, are supportedtoo.

Rule grouping

Two Items I₁ and I₂ belong to the same group G, if one of the following is thecase:v They appear in the same association rule.v There are items I’₁ and I’₂ belonging to G, and there are association rules R₁

and R₂ such that I₁ and I’₁ occur in R₁, and I₂ and I’₂ occur in R₂.

Items in one group are directly or indirectly interrelated by association rules.

Given such a partitioning of the item set, the set of association rules can also bepartitioned into disjoint groups. The rules inside one group are ordered accordingto the specified ordering criterion.

The groups themselves can also be ordered according to this criterion, either withrespect to the values for the best rule of each group or the average value for therules in a group. Other sorting criteria, like the size of the groups, are alsopossible.

For example, given the rulesR₁ : beer, soda → wine,R₂ : beer → soda,R₃ : cereals, banana → apple,R₄ : banana → orange,R₆ : wine → cider,

you can see that the rules R₁, R₂, and R₆ belong to one group and the rules R₃and R₄ belong to the other group.

Rule filtering

Filtering can be performed on items occurring in the rule head or rule body. Thismeans that you can retrieve rules where head or body contain some specific items.


Filtering can have the effect that several groups of rules are split into subgroups.An option indicates whether these subgroups are computed or not.

Similar options also exist for frequent item sets and sequential patterns.

Data structures

The following C structures are provided in the V2 API to represent associationrules, frequent item sets, sequential patterns, and related information.

Associations rules structure

typedef enum {IDM_POSITIVE, IDM_NEGATIVE, IDM_NEUTRAL, IDM_RULE_NOT_SPECIFIED} IDMRuleType;

typedef struct{

IDMINTEGER assocRuleId; /* association rule identifier */IDMINTEGER groupId; /* group ID */IDMREAL conf; /* actual confidence */IDMREAL sup; /* actual support */IDMINTEGER nHead; /* number of items in the head */IDMCHAR **ppHeadItems; /* array of items in the head */IDMINTEGER nBody; /* number of items in the body */IDMCHAR **ppBodyItems; /* array of the items in the body */IDMRuleType ruleKind; /* type of the rule */IDMREAL pvalue; /* subtractive lift */

IDMREAL lift; /* lift */} IDMAssocRule;

Frequent item set structure

typedef struct{

IDMINTEGER largeItemSetId; /* large itemset identifier */IDMINTEGER groupId; /* group ID */IDMREAL sup; /* actual support */IDMINTEGER nItems; /* number of items in the large itemset*/IDMCHAR **ppItems; /* array of items in the large itemset */

} IDMLargeItemSet;

Statistics structure

typedef struct{

IDMINTEGER noOfTrans; /* Number of transactions */IDMINTEGER noOfCustomers; /* Number of customers */IDMINTEGER noOfRules; /* Number of rules generated */IDMINTEGER noOfSequences; /* Number of sequential patterns */IDMINTEGER noOfLargeItemSets; /* Number of large itemsets */IDMINTEGER noOfItems; /* Number of items in the database */IDMINTEGER noOfLargeItems; /* Number of large items */IDMINTEGER noOfPasses; /* Number of passes */IDMINTEGER tranMaxSize; /* Maximum no of items per transaction */IDMREAL tranAvgSize; /* Average no of items per transaction */IDMINTEGER tranPerCustMax; /* Maximum no of transact per customer */IDMREAL tranPerCustAvg; /* Average no of transact per customer */IDMINTEGER minSupTran; /* Minimum support in Transactions */IDMREAL minSupPnct; /* Minimum support as a Percentage */IDMINTEGER maxSupTran; /* Maximum support in Transactions */IDMREAL maxSupPnct; /* Maximum support as a Percentage */

Chapter 3. The Result API 269

IDMREAL minConfidence; /* Minimum confidence as a Percentage */IDMREAL maxConfidence; /* Maximum confidence as a Percentage */IDMREAL execTime; /* not supported */IDMCHAR *paramFile; /* Name of the parameter file: NULL if */

/* not specified in the input file */IDMCHAR **taxonomy; /* Always NULL */

} IDMAssocStat;

Sort order enumeration

The enumeration IDMSortOrder specifies the possible sorting criteria supported bythe V2 API function IDMSortAssocRules.

typedef enum {IDM_SUP, /* Sort on Support */IDM_CONF, /* Sort on Confidence */IDM_SUP_CONF, /* Sort on Support x Confidence */IDM_RULE_HEAD, /* Sort on Rule head */IDM_BODY_ITEMS, /* Sort on Body items */IDM_G_SUP, /* Sort on Group, and Support within each group */IDM_G_CONF, /* Sort on Group, Confidence within each group */IDM_G_SUP_CONF, /* Sort on Group, and Sup x Conf within each group */IDM_G_RULE_HEAD, /* Sort on Group, Rule head within each group */IDM_G_BODY_ITEMS /* Sort on Group, and Body items within each group */

} IDMSortOrder;

Sequential Patterns structure

typedef struct{

IDMINTEGER seqSetId; /* sequence itemset identifier */IDMINTEGER groupId; /* group ID */IDMREAL sup; /* actual support */IDMINTEGER nItemsets; /* number of itemsets in sequence */IDMINTEGER *pItInItemset; /* number of items in the itemsets */IDMCHAR **ppItems; /* array of items in the sequence */

} IDMSeqSet;

Structures of the Result API Version 6

The following C structures are provided in the Result API Version 6 to representassociation rules, frequent item sets, sequential patterns, and related information.Rather than providing data structures, the Result API Version 6 provides functionsto access detailed information. Therefore, the data structures represent opaque datatypes.typedef enum {

IDM_POSITIVE, IDM_NEGATIVE, IDM_NEUTRAL, IDM_RULE_NOT_SPECIFIED} IDMRuleType;

typedef void *IDMAssocResult; /* Represents an associations result */typedef void *IDMSeqPatResult; /* Represents a sequential pattern result */

typedef void *IDMAssocSeqPatStat; /* Represents the result statistics */typedef void *IDMRuleCursor; /* A cursor to access association rules */typedef void *IDMItemsetCursor; /* A cursor to access frequent itemsets */typedef void *IDMSeqPatCurspr; /* A cursor to access sequential patterns */

typedef void *IDMAssocRule; /* Represents an association rule */typedef void *IDMFrequentItemset; /* Represents a frequent itemset */typedef void *IDMSeqPattern; /* Represents a sequential pattern */

typedef void *IDMItemsetArray; /* Represents a sequence of itemsets */typedef void *IDMItemArray; /* Represents a set of items */


General functions for the Associations and Sequential Patterns ResultAPI

The following general functions are dealing with the results of the mining runs.

Functions of the Result API Version 2

IDMAsrParserInitialize

Before calling IDMAsrGetAssocRules, IDMAsrGetLargeItemSets, orIDMAsrGetSeqs, the function IDMAsrParserInitialize must be called. The inputargument is the name of the file containing the results of a mining run.

The function returns a file ID that can be used in subsequent calls of the exportAPI functions. Note that every call to IDMAsrParserInitialize generates a new fileID.

Format:IDMRETURN IDMAsrParserInitialize(

IDMOID *pFileOid, /* out */IDMCHAR * achFileName ) /* in */

Parameters:

pFileOidA pointer to the generated ID for the result file.

achFileNameThe name of the result file.

IDMAsrParserCleanup

For every call to IDMAsrParserInitialize the function IDMAsrParserCleanup mustbe called to clean up some internal memory structures.

Format:IDMRETURN IDMAsrParserCleanup(

IDMOID asrOid ) /* in */

Parameters:

asrOidThe ID for the result file generated in IDMAsrParserInitialize.

IDMAsrGetErrorMsg

Returns the error ID and the error message when an error occurred during parsing.This function should be called if any other function returns with a return code notequal to IDM_SUCCESS.

Format:void IDMAsrGetErrorMsg(

IDMRETURN * errorId, /* out */IDMCHAR ** errorMsg ); /* out */


Parameters:

errorIdError ID.

errorMsgThe address of the pointer to the error message.

IDMAsrFreeErrorMsg

Free memory space that was allocated to hold the error text errorMsg.

Format:void IDMAsrFreeErrorMsg(

IDMCHAR ** errorMsg ); /* in */

Parameters:

errorMsgThe address of the pointer to the error message.


IDMAsrGetErrorMsg

Retrieves the error message as a string given a return code. The calling applicationis responsible for releasing the memory allocated for the error message.void IDMAsrGetErrorMsg( IDMRETURN retCode, /* in */IDMCHAR **errorMsg ) /* out */

IDMOpenAssocResult

Before you can invoke any function on an associations result,IDMOpenAssocResult must be called.IDMRETURN IDMOpenAssocResult ( IDMCHAR *resultFile, /* in */IDMAssocResult *resultObject); /* out */

Parameters:

filenameThe path and the name of the result file.

resultObjectThe pointer to an object that allows to call functions on the result.

IDMCloseAssocResult

Releases a result object. All memory allocated for the result object is released.IDMRETURN IDMCloseAssocResult ( IDMAssocResult *resultObject); /* in/out */

IDMOpenSeqPatResult

Before you can invoke any function on a sequential patterns result, you must callIDMOpenSeqPatResult.IDMRETURN IDMOpenSeqPatResult ( IDMCHAR *resultFile, /* in */

IDMSeqPatResult *resultObject); /* out */


Parameters:

filenameThe path and the name of the result file.

resultObjectThe pointer to an object that allows to call functions on the result.

IDMCloseSeqPatResult

Releases a result object. All memory allocated for the result object is released.IDMRETURN IDMCloseSeqPatResult ( IDMSeqPatResult *resultObject); /* in/out */

Functions for associations rules

The following general functions deal with the association rules.


IDMAsrGetAssocRules

IDMAsrGetAssocRules retrieves the association rules from a result file.

Format:IDMRETURN IDMAssocRules(

IDMOID asrOid, /* in */IDMREAL minConf, /* in */IDMREAL maxConf, /* in */IDMREAL minSup, /* in */IDMREAL maxSup, /* in */IDMREAL minSupxConf, /* in */IDMREAL maxSupxConf, /* in */IDMINTEGER* pnAssocRules, /* out */IDMAssocRule*** pppAssocRules ) /* out */

Parameters:

asrOidThe ID of the result file.

minConfThe minimum confidence value the retrieved rules must have in percent(%).

maxConfThe maximum confidence value for the retrieved rules in percent (%).

minSupThe minimum support value for the retrieved rules in percent (%).

maxSupThe maximum support value for the retrieved rules in percent (%).

minSupxConfThe minimum value of the product of support and confidence for theretrieved rules.

maxSupxConfThe maximum value of the product of support and confidence for theretrieved rules.


pnAssocRulesThe number of association rules satisfying the constraints specified by theinput parameters.

pppAssocRulesA pointer to the pointer to the array of pointers to the retrieved associationrules.

IDMSortAssocRules

IDMSortAssocRules sorts the association rules according to different sorting criteriawithout requiring additional memory.

Format:IDMRETURN IDMSortAssocRules(

IDMINTEGER* nRules /* in */IDMAssocRule** ppAssocRules /* in, out */IDMSortOrder sortOrder ) /* in */

Parameters:

nRulesThe number of association rules in ppAssocRules.

ppAssocRulesA pointer to the array of pointers to the association rules to be sorted.After the execution the pointer points to the sorted association rules.

sortOrderSpecifies the criterion according to which the association rules are to besorted.

IDMAsrFreeAssocRules

IDMAsrFreeAssocRules frees the memory occupied by the association rules havingbeen retrieved by IDMAsrGetAssocRules.

Format:IDMRETURN IDMAsrFreeAssocRules(

IDMAssocRule** ppAssocRules ) /* in */

Parameters:

ppAssocRulesA pointer to the array of pointers to the association rules to be removed.


IDMOpenRuleCursor

Retrieves a cursor to access associations rules.IDMRETURN IDMOpenRuleCursor ( IDMAssocResult resultObject, /* in */IDMRuleCursror *ruleCursor); /* out */


IDMCloseRuleCursor

Closes a rule cursor and releases all memory allocated for the cursor.IDMRETURN IDMCloseRuleCursor ( IDMRuleCursor *ruleCursor); /* in/out */

IDMGetNextRule

Retrieves the rule from the rule cursor. If there are no more rules, the functionreturns IDM_WARNING and NULL.IDMRETURN IDMGetNextRule(IDMRuleCursor ruleCursor, /* in */

IDMAssocRule *rule); /* out */

IDMFreeRule

Releases all memory allocated for a rule.IDMRETURN IDMFreeRule ( IDMAssocRule *rule); /* in/out */

There is a number of functions to access the attributes of a rule:IDMREAL IDMGetRuleSupport ( IDMAssocRule rule);IDMREAL IDMGetRuleConfidence ( IDMAssocRule rule);IDMREAL IDMGetRuleLift ( IDMAssocRule rule);IDMREAL IDMGetRuleSubLift(IDMAssocRule rule);IDMRuleType IDMGetRuleType ( IDMAssocRule rule);

The API allocates memory in the following functions. The calling application isresponsible for releasing this memory.IDMRETURN IDMGetRuleHead ( IDMAssocRule rule, IDMFrequentItemset**head);IDMRETURN IDMGetRuleBody ( IDMAssocRule rule, IDMFrequentItemset**body);

Functions for frequent item sets

The following general functions deal with frequent item sets.


IDMAsrGetLargeItemSets

IDMAsrGetLargeItemSets retrieves the frequent item sets from a result file.

Format:IDMRETURN IDMAsrGetLargeItemSets(

IDMOID asrOid, /* in */IDMREAL minSup, /* in */IDMREAL maxSup, /* in */IDMINTEGER* pnLargeItemSets /* out */IDMLargeItemSet*** pppLargeItemSets ) /* out */

Parameters:

asrOidThe ID of a result file.

minSupThe minimum support value for the retrieved frequent item sets.

maxSupThe maximum support value for the retrieved frequent item sets.


pnLargeItemSetsThe number of frequent item sets satisfying the constraints on the supportvalue.

pppLargeItemSetsA pointer to the pointer to the array of pointers to the retrieved frequentitem sets.

IDMSortLargeItemSets

IDMSortLargeItemSets sorts the frequent item sets according to different sortingcriteria without requiring additional memory.

Format:IDMRETURN IDMSortLargeItemSets(

IDMLargeItemSet** ppLargeItemSets /* in, out */IDMINTEGER* nLargeItemSets /* in */IDMSortOrder sortOrder ) /* in */

Parameters:

ppLargeItemSetsA pointer to the array of pointers to the frequent item sets to be sorted.

nLargeItemSetsThe number of frequent item sets in ppLargeItemSets. After the executionthe pointer points to the sorted frequent item sets.

sortOrderSpecifies the criterion according to which the frequent item sets are to besorted.

IDMAsrFreeLargeItemSets

IDMAsrFreeLargeItemSets frees the memory occupied by the frequent item setsthat were retrieved by IDMAsrGetLargeItemSets.

Format:IDMRETURN IDMAsrFreeLargeItemSets(

IDMLargeItemSet** ppLargeItemSets ) /* in */

Parameters:

ppLargeItemSetsA pointer to the array of pointers to the frequent item sets to be removed.


IDMOpenAssocItemsetCursor

Retrieves a cursor to access frequent item sets of an associations result.IDMRETURN IDMOpenAssocItemsetCursor (IDMAssocResult resultObject, /* in */

IDMItemsetCursror *itemsetCursor); /* out */

IDMOpenSeqPatternsetCursor

Retrieves a cursor to access frequent item sets of a sequential patterns result.


IDMRETURN IDMOpenSeqPatItemsetCursor (IDMSeqPatResult resultObject, /* in */IDMItemsetCursror *itemsetCursor); /* out */

IDMCloseItemsetCursor

Closes an item set cursor and releases all memory allocated for the cursor.IDMRETURN IDMCloseItemsetCursor ( IDMItemsetCursor *itemsetCursor); /* in/out */

IDMGetNextItemset

Retrieves the next item set from the item set cursor. If there are no more rules, thefunction returns IDM_WARNING and NULL.IDMRETURN IDMGetNextItemset (IDMItemsetCursor itemsetCursor, /* in */

IDMFrequentItemset *itemset); /* out */

IDMFreeItemset

Releases all memory allocated for an item set.IDMRETURN IDMFreeItemset ( IDMFrequentItemset *itemset); /* in/out */

There is a number of functions to access the attributes of an item set:IDMINTEGER IDMGetItemsetGroup (IDMFrequentItemset itemset);IDMREAL IDMGetItemsetSupport (IDMFrequentItemset itemset);IDMREAL IDMGetItemsetLift (IDMFrequentItemset itemset);IDMREAL IDMGetItemsetSubLift (IIDMFrequentItemset itemset);

The API allocates memory in the next function. The calling application isresponsible for releasing this memory.IDMRETURN IDMGetItemsetString ( IDMFrequentItemset itemset, IDMCHAR **itemsetString);IDMRETURN IDMGetItemsetItems(IDMFrequentItemset itemset, IDMItemArray**items);/* Use this function to release an itemset string */IDMRETURN IDMReleaseItemsetString(IDMCHAR **itemsetString);

Statistics functions for the Associations and Sequential PatternsResult API

The following general statistics functions are provided for the Associations andSequential Patterns Result API.


IDMAsrGetStatistics

IDMAsrGetStatistics retrieves statistical information about a mining run.

Format:IDMRETURN IDMAsrGetStatistics(

IDMOID asrOid, /* in */IDMAssocStat** ppAssocStat ) /* out */

Parameters:

asrOidThe ID of the result file containing the results of the association run whosestatistical information is to be retrieved.


ppAssocStatA pointer to the statistics structure holding the statistical information.

IDMAsrFreeStatistics

IDMAsrFreeStatistics frees the memory occupied by the IDMAssocStat structure.

Format:IDMRETURN IDMAsrFreeStatistics(

IDMAssocStat* pAssocStat ) /* in */

Parameters:

pAssocStatA pointer to the IDMAssocStat structure to be removed.


IDMGetAssocStatistics

Retrieves the statistics of an associations result.IDMRETURN IDMGetAssocStatistics( IDMAssocResult resultObject, /* in */IDMAssocSeqPatStat *statistics); /* out */

IDMGetSeqPatStatistics

Retrieves the statistics of a sequential patterns result.IDMRETURN IDMGetSeqPatStatistics( IDMSeqPatResult resultObject, /* in */IDMAssocSeqPatStat *statistics); /* out */

IDMFreeStatistics

Releases all memory allocated for statistics.IDMRETURN IDMFreeAssocSeqPatStatistics( IDMAssocSeqPatStat *statistics);

You can access the attributes of the statistics by using the following functions:IDMINTEGER IDMGetNumberOfTransactions ( IDMAssocSeqPatStat statistics);IDMINTEGER IDMGetNumberOfCustomers ( IDMAssocSeqPatStat statistics);IDMINTEGER IDMGetNumberOfRules ( IDMAssocSeqPatStat statistics);IDMINTEGER IDMGetNumberOfSequences ( IDMAssocSeqPatStat statistics);IDMINTEGER IDMGetNumberOfItemsets ( IDMAssocSeqPatStat statistics);IDMINTEGER IDMGetNumberOfItems ( IDMAssocSeqPatStat statistics);IDMINTEGER IDMGetNumberOfPasses ( IDMAssocSeqPatStat statistics);IDMINTEGER IDMGetMaxTransactionSize ( IDMAssocSeqPatStat statistics);IDMREAL IDMGetAvgTransactionSize ( IDMAssocSeqPatStat statistics);IDMINTEGER IDMGetMaxTranPerCustomer ( IDMAssocSeqPatStat statistics);IDMREAL IDMGetAvgTranPerCustomer ( IDMAssocSeqPatStat statistics);IDMREAL IDMGetMinimumSupport ( IDMAssocSeqPatStat statistics);IDMREAL IDMGetMinimumConfidence ( IDMAssocSeqPatStat statistics);

Sequential Patterns functions

These are general Sequential Patterns functions.



IDMAsrGetSeqs

IDMAsrGetSeqs retrieves the sequence item sets from a result file.

Format:IDMRETURN IDMAsrGetSeqs(

IDMOID asrOid, /* in */IDMREAL minSup, /* in */IDMREAL maxSup, /* in */IDMINTEGER* pnSeqs /* out */IDMSeqSet*** pppSeqs ) /* out */

Parameters:

fileOidThe ID of the result file.

minSupThe minimum support value the retrieved sequence item sets must have.

maxSupThe maximum support value for the retrieved sequence item sets.

pnSeqPatternsThe number of sequence item sets satisfying the constraints on the supportvalue.

pppSeqPatternsA pointer to the pointer to the array of pointers to the retrieved sequenceitem sets.

IDMSortSequences

IDMSortSequences sorts the sequence item sets according to different sortingcriteria without requiring additional memory.

Format:IDMRETURN IDMSortSequences(

IDMSeqSeqs** ppSeqSets /* in, out */IDMINTEGER nSeqs /* in */IDMSortOrder sortOrder ) /* in */

Parameters:

ppSeqSetsA pointer to the array of pointers to the sequence item sets to be sorted.

nSeqs The number of sequence item sets in ppSeqSets.

sortOrderSpecifies the criterion according to which the sequence item sets are to besorted.

IDMAsrFreeSeqs

IDMAsrFreeSeqs frees the memory occupied by the sequence item sets that wereretrieved by IDMAsrGetSeqs.

Format:


IDMRETURN IDMAsrFreeSeqs(IDMSeqSet** ppSeqSets ) /* in */

Parameters:

ppSeqSetsA pointer to the array of pointers to the sequence item sets to be removed.


IDMOpenSeqPatCursor

Retrieves a cursor to access sequential patterns.IDMRETURN IDMOpenSeqPatCursor ( IDMSeqPatResult resultObject, /* in */IDMSeqPatCursor *patternCursor); /* out */

IDMCloseSeqPatCursor

Closes a sequential patterns cursor and releases all memory allocated for thecursor.IDMRETURN IDMCloseSeqPatCursor ( IDMSeqPatCursor *patternCursor); /* in/out */

IDMGetNextPattern

Retrieves the next sequential patterns from the pattern cursor. If there are no morepatterns, the function returns IDM_WARNING and NULL.IDMRETURN IDMGetNextPattern ( IDMSeqPatCursor patternCursor, /* in */IDMSeqPattern *pattern); /* out */

IDMFreePattern

Releases all memory allocated for a sequential pattern.IDMRETURN IDMFreePattern ( IDMSeqPattern *pattern); /* in/out */

You can use the following functions to access the attributes of a sequential pattern:IDMINTEGER IDMGetPatternGroup(IDMSeqPattern pattern);

IDMREAL IDMGetPatternSupport (IDMSeqPattern pattern);IDMREAL IDMGetPatternLift ( IDMSeqPattern pattern);IDMREAL IDMGetPatternSubLift(IDMSeqPattern pattern);

The API allocates memory in the next function. The calling application isresponsible for releasing this memory.IDMRETURN IDMGetPatternString (IDMSeqPattern pattern, IDMCHAR **patternString);

Functions to access arrays

All memory allocated for arrays is released when the original structures are freed.

To retrieve the number of item sets in an item set array, use the following:IDMINTEGER IDMGetNumberOfItemsetsInArray(IDMiTEMSETaRRAY ITEMSETS);

To retrieve the item set at position index of an item set array, use the following:IDMRETURN IDMGetItemsetFromArray (IDMItemsetArray itemsets,

IDMIDMINTEGER index,IDMFrequentItemset *element);

To retrieve the number of items in an item array, use the following:


IDMINTEGER IDMGetNumberOfItemsInArray(IDMItemArray items);

To retrieve the item at position index of an item array, use the following:IDMRETURN IDMGetItemFromArray (IDMItemArray items,

IDMINTEGER index,IDMCHAR **item);

Result APIs for classification, clustering, prediction, and descriptivestatistics

The following libraries contain the result APIs:

libidmac.aAIX and Sun Solaris

idmac.libWindows 95 and NT, OS/2

idmac.dllAdditionally for OS/2

This is the class structure for these result APIs.

The common parts of the result APIs for:v Classificationv Clusteringv Predictionv Descriptive statistics

are:v Univariate statistics.v Statistics about a certain partition of the data (bivariate statistics part). These

partitions can be tree nodes, clusters, regions or partitions with respect to thevalue of a field or predicted field (quantiles).

These commonalities are reflected in the following class structures:

IDMDBasicDescrStatsResult

IDMDDescrStatsQuantResult

IDMDPredictionResult

IDMDClassificationResult

IDMDClusteringResult

Figure 7. IDMDBasicDescrStatsResult class structure


IDMDBasicDescrStatsResult

The class IDMDBasicDescrStatsResult contains the univariate statistics of theselected fields and an array of IDMDBasicPartition objects representing thestatistics about the partitions. In addition, it contains indicator arrays for discreteand continuous fields describing their use, for example, they can be used fordistinguishing between active and supplementary fields).

Header file: idmdrbds.hpp

Format:class IDMDBasicDescrStatsResult {

public:const IDMCHAR* getName() const;

const IDMCHAR* getResultFile() const;

const IDMArray<IDMDBasicPartition*>* getBasicPartitions() const;

const IDMArray<IDMField*>* getDiscreteFields() const;const IDMArray<IDMNumericField*>* getContinuousFields() const;

const IDMArray<IDMINTEGER>* getDiscFieldIndicators() const;const IDMArray<IDMINTEGER>* getContFieldIndicators() const;

const IDMCHAR* getMiningSettings() const;

IDMLONGINT getTotalFrequency() const;

IDMRETURN checkFieldsForStatistic(IDMArray<IDMField*> *pFields );

IDMRETURN checkFieldsForApplication(IDMArray<IDMField*> *pFieldsParamFile,IDMArray<IDMField*> *&pDiscrFields,IDMArray<IDMNumericField*> *&pContFields,IDMBOOLEAN setStatistic=IDM_FALSE,IDMBOOLEAN setMetaStaticOnly=IDM_FALSE);

IDMRETURN checkPredictedField(IDMField *pPredictedField,IDMBOOLEAN setStatistic=IDM_TRUE,IDMBOOLEAN setMetaStaticOnly=IDM_TRUE);

Member functions:

getNameGets the name of the settings object that produced the result.

IDMDBasicPartition

IDMDPartitionIDMDTreeNode

IDMDCluster IDMDRegion

Figure 8. IDMDBasicPartition class structure


getResultFileGets the name of the result file.

getBasicPartitionsGets an array of the basic partition objects representing the statistics ofeach partition.

getDiscreteFieldsGets the array of discrete fields together with their discrete backgroundstatistics representing the discrete statistics of the analyzed data.

getContinuousFieldsGets the array of continuous fields together with their continuousbackground statistics representing the continuous statistics of the analyzeddata.

getDiscFieldIndicatorsGets the array of discrete field indicators. For results of classIDMDClusteringResult the indicators have the following meaning:

The n-th element in this array matches the n-th element in the arrayreturned by getDiscreteFields().

-1 means the field is a supplementary field.

0 means the field is an active field.

For results of class IDMDPredictionResult the indicators have the followingmeaning:

The n-th element in this array matches the n-th element in the arrayreturned by getDiscreteFields().


-2 means the field is the dependent field.

>=0 indicates the index in the arrays returned byIDMDRegion::getCenterCoordinates() and returned byIDMDRegion::getVariances() or IDMDRegion::getCovariances.

getContFieldIndicatorsGets the array of continuous field indicators. For results of classIDMDClusteringResult the indicators have the following meaning:

The n-th element in this array matches the n-th element in the arrayreturned by getContinuousFields().


0 means the field is an active field.

For results of class IDMDPredictionResult the indicators have the followingmeaning:

The n-th element in this array matches the n-th element in the arrayreturned by getContinuousFields().


-2 means the field is the dependent field.

>=0 indicates the index in the arrays returned by


IDMDRegion::getCenterCoordinates() and returned byIDMDRegion::getVariances() or IDMDRegion::getCovariances.

getMiningSettingsRetrieves the mining settings that were used by the mining kernel thatproduced the result. The character string that is returned bygetMiningSettings can be read with methodIDMMiningBase::loadFromString.

getTotalFrequencyGets the total frequency of valid and invalid values of all fields.

checkFieldsForStatistic

1. Check whether the fields (passed as parameter) occur in the resultobject. The check is done by name comparison.

2. Check whether data types are identical and change them if necessaryand possible to the data types of the result.

3. Copy statistics from result fields to new input fields.

checkFieldsForApplication

1. Check whether the fields (passed as parameter) occur in the resultobject as activeFields. The check is done by name comparison.

2. Check whether data types are identical and change them if necessaryand possible to the data types of the result.

3. Copy statistics from result fields to new input fields ifsetMetaStaticOnly==IDM_FALSE. Otherwise only the meta-statistics,that is, allowed values, bucket specifications, and so onare copied. Inthis case all frequencies are initialized to 0.

4. Reorder the discrete and cont field arrays according to the result. Theordered arrays are returned.

checkPredictedField

1. Check whether the field (passed as parameter) occurs in the resultobject as predicted field. The check is done by name comparison.

2. Check whether data types are identical and change them if necessaryand possible to the data type of the result.

3. Copy statistics from result fields to new input fields ifsetStatistic==IDM_TRUE and setMetaStaticOnly==IDM_FALSE. If bothare IDM_TRUE, only the meta-statistics, that is, allowed values, bucketspecifications, and so on are copied. In this case all frequencies areinitialized to 0. If setStatistic==IDM_FALSE, only the name and thetype are checked.

IDMDBasicPartition

The class IDMDBasicPartition represents the statistics of one partition. The derivedclasses determine whether the partition is, for example, a cluster in a clusteringresult or a region in a prediction result.

Header file: idmdrbds.hpp

Format:

class IDMDBasicPartition {public:


const IDMCHAR* getName() const;

const IDMArray<IDMDiscreteStatistics*>* getDiscreteStatistics() const;const IDMArray<IDMContinuousStatistics*>* getContinuousStatistics() const;

IDMLONGINT getFrequency() const;

};

Member functions:

getNameGets the name of the partition.

getDiscreteStatisticsGets the array of discrete statistics of a partition. The i-th element of thestatistics array corresponds to the i-th discrete field in the array returnedby IDMDBasicDescrStatsResult::getDiscreteFields().

getContinuousStatisticsGets the array of continuous statistics of a partition. The i-th element of thestatistics array corresponds to the i-th continuous field in the arrayreturned by IDMDBasicDescrStatsResult::getContinuousFields().

getFrequencyGets the frequency of valid and invalid values of the partition.

IDMDClassificationResult

The class IDMDClassificationResult describes the results of classification. Inaddition to the background statistics, it describes the confusion matrix of thecorrectly and wrongly classified training or test instances.

Header file: idmdrclf.hpp

Format:

class IDMDClassificationResult : public IDMDBasicDescrStatsResult {

public:∼IDMDClassificationResult();IDMDClassificationResult(IDMRETURN &rc, const IDMCHAR* resultFileName);

const IDMCHAR* getClassFieldName();

const IDMArray<IDMCHAR*>* getLabels() const;

const IDMArray<IDMINTEGER>* getPredictions(IDMCHAR* pLabel) const;const IDMArray<IDMREAL>* getSensitivity() const;const IDMArray<IDMCHAR*>* getFieldNames() const;const IDMCHAR* getNetworkArchitecture(void) const;

};

Member Functions:

∼IDMDClassificationResultThe destructor.

IDMDClassificationResultThe constructor is called by components that access classification results(for example, a result visualizer). The argument is the file name where the


classification results are stored. The constructor reads the result file andbuilds up a complete classification result object.

getClassFieldNameRetrieves the name of the class field.

getLabelsRetrieves the array of class labels. The class labels represent the set ofvalues of the class field.

getPredictionsRetrieves the array of predictions for a specified label from the confusionmatrix. The confusion matrix is a two-dimensional array. The i-th elementcontains the prediction for the i-th class label. Its j-th value is the frequencythat the j-th element of the class labels has been predicted for the i-thelement of the class labels. This array of predictions usually has the samenumber of elements as the array of class labels. If it contains one additionalelement, this element represents the frequency that no value could bepredicted (which is possible for neural classification).

getSensitivityRetrieves the array of sensitivity values. The i-th element is the sensitivityvalue for the i-th element of the array of field names.

getFieldNamesRetrieves the array of field names for the sensitivity array.

getNetworkArchitecture(void)Returns a string A B C D E describing the architecture of thebackpropagation network, where:v A is the number of input unitsv B, C, and D are the number of units in the first, second, and third

(hidden) layerv E is the number of output units

IDMDClusteringResult

The clustering result describes the result of neural and demographic clustering. Forclustering results the array of basic partitions is an array of pointers toIDMDCluster objects.

Header file: idmdrclu.hpp

Format:class IDMDClusteringResult : public IDMDBasicDescrStatsResult {

public:

∼IDMDClusteringResult();IDMDClusteringResult(IDMRETURN &rc,

const IDMCHAR* fileName);

const IDMArray<IDMDCluster*>* getClusters() const;

IDMREAL getScore() const;

IDMREAL getCondorcetValue();

const IDMMatrix<IDMREAL>& getClusterSimilarities() const;


const IDMArray<IDMREAL>* getContFieldDistanceUnits();const IDMArray<IDMREAL>* getDiscrFieldDistanceUnits();

const IDMArray<IDMREAL>* getDiscrFieldCondorcetValues() const;const IDMArray<IDMREAL>* getContFieldCondorcetValues() const;

IDMINTEGER getNbOfPasses();

IDMREAL getNormalizedError();};

Member Functions

∼IDMDClusteringResultThe destructor.

IDMDClusteringResultThe constructor is called by components that access clustering results (forexample, a result visualizer). The argument is the file name where theclustering results are stored. The constructor reads the result file and buildsup a complete clustering result object.

getClustersRetrieves the array of clusters.

getScoreRetrieves the score of the clustering.

getCondorcetValueRetrieves the condorcet value for the clustering.

getClusterSimilaritiesRetrieves the matrix of cluster similarities.

getContFieldDistanceUnitsGets the array of continuous fields distance units. The i-th element of thearray corresponds to the i-th continuous field in the array returned byIDMDBasicDescrStatsResult::getContinuousFields().

getDiscrFieldDistanceUnitsGets the array of discrete fields distance units. The i-th element of thearray corresponds to the i-th discrete field in the array returned byIDMDBasicDescrStatsResult::getDiscreteFields().

getDiscrFieldCondorcetValuesGets the array of discrete fields condorcet values. The i-th element of thearray corresponds to the i-th discrete field in the array returned byIDMDBasicDescrStatsResult::getDiscreteFields().

getContFieldCondorcetValuesGets the array of continuous fields condorcet values. The i-th element ofthe array corresponds to the i-th continuous field in the array returned byIDMDBasicDescrStatsResult::getContinuousFields().

getNbOfPassesRetrieves the number of passes the kernel took to produce the clusteringresult.

getAccuracyRetrieves the accuracy value of the clustering.


IDMDCluster

Header file: idmdrclu.hpp

Format:

class IDMDCluster : public IDMDBasicPartition {

public:IDMREAL getScore() const;IDMREAL getCondorcetValue();const IDMArray<IDMREAL>* getDiscrFieldCondorcetValues() const;const IDMArray<IDMREAL>* getContFieldCondorcetValues() const;

};

Members of class IDMDCluster:

getScoreRetrieves the score of the cluster. The cluster score shows howhomogeneous a SINGLE cluster is. The global score shows how good allclusters are.

getCondorcetValueGets the condorcet value of a cluster.

getDiscrFieldCondorcetValuesGets the array of discrete fields condorcet values for a cluster. The i-thelement of the array corresponds to the i-th discrete field in the arrayreturned by IDMDBasicDescrStatsResult:getDiscreteFields().

getContFieldCondorcetValuesGets the array of continuous fields condorcet values for a cluster. The i-thelement of the array corresponds to the i-th continuous field in the arrayreturned by IDMDBasicDescrStatsResult::getContinuousFields().

IDMDPredictionResult

The prediction result describes the result of RBF prediction. For prediction resultsthe array of basic partitions pivBasicPartitions is an array of pointers toIDMDRegion objects.

Header file: idmdrpre.hpp

Format:typedef enum { IDM_TRAINING_MODE,

IDM_TEST_MODE,IDM_APPLICATION_MODE

} IDM_UseMode;

class IDMDPredictionResult : public IDMDBasicDescrStatsResult{

public:

∼IDMDPredictionResult();IDMDPredictionResult(IDMRETURN &rc, const IDMCHAR* fileName);

IDMINTEGER getNbOfPredictedValues() const;

const IDMArray<IDMDRegion*>* getRegions() const;


IDM_UseMode getUseMode();IDMREAL getRootMeanSquaredError();IDMArray<IDMDRegion*>* getQuantileRegions() const;IDMREAL getNormalizedError() const;IDMREAL getAvgError();IDMREAL getMaxError();const IDMCHAR* getNetworkArchitecture(void) const;const IDMArray<IDMREAL>* getQuantileLimits() const;const IDMArray<IDMREAL>* getQuantiles() const;

};

Member Functions:

xIDMDPredictionResultThe destructor.

IDMDPredictionResultThe constructor is called by components that access prediction results (forexample, a result visualizer). The argument is the file name where theprediction results are stored. The constructor reads the result file andbuilds up a complete prediction result object.

getNbOfPredictedValuesIf the predicted field is categorical, this method retrieves the number ofvalues to be predicted.

getRegionsRetrieves the regions belonging to the prediction result.

getUseModeGets the use mode in that the prediction result was produced.

getRootMeanSquaredErrorGets the root mean squared error.

getQuantileRegionsRetrieves the quantile regions belonging to the prediction result. Returnsan array with N elements, where N is the number of quantile regions inthe result. If a result contains RBF regions but no quantile regions, thearray contains 0 elements. Returns NULL if a result is invalid. Thereturned array must be deleted by the caller.

getNormalizedErrorRetrieves the accuracy of the prediction result.

getAvgErrorRetrieves the average error.

getMaxErrorRetrieves the maximum absolute error.

getNetworkArchitecture(void)Returns a string A B C D E describing the architecture of thebackpropagation network, where:v A is the number of input unitsv B, C, and D are the number of units in the first, second, and third

(hidden) layerv E is the number of output units

For results produced by the RBF Prediction mining function, this methodreturns a null pointer.


getQuantileLimitsReturns an array of N+1 numbers describing the upper and lower quantilelimits in percent. N is the number of elements in getQuantileRegions(). If aquantile range is empty, N can be less than the value you specified for thenumber of quantiles in the prediction settings.

getQuantiles () const;Returns an array of N+1 elements describing the upper and lower limits inthe original value range. N is the number of elements ingetQuantileRegions(). With qi being the quantile range ofgetQuantileRegions>element(i), getQuantiles()->element(i+1) is the lowerlimit of qi and getQuantiles()>element(i) is the upper limit of qi.

IDMDRegion

Header file: idmdrpre.hpp

Format:typedef enum { IDM_INACT_BRANCH,

IDM_ACTIVE_BRANCH,IDM_INACT_LEAF,IDM_ACTIVE_LEAF,IDM_QUANTILE_REGION

} IDM_RegionType;

class IDMDRegion : public IDMDBasicPartition {

public:const IDMArray<IDMREAL>* getCenterCoordinates() const;const IDMArray<IDMREAL>* getVariances() const;const IDMArray<IDMREAL>* getCoVariances() const;IDM_RegionType getRegionType() const;IDMREAL getRegionWeight() const;IDMCHAR* getParentName() const;IDMREAL getRootMeanSquaredError();IDMREAL getNormalizedError() const;IDMREAL getAvgError();IDMREAL getMaxError();

};

Member Functions

getCenterCoordinatesGets the array of center coordinates. The first n elements in the arraybelong to the active continuous fields. The next m elements belong to theactive discrete numeric fields and the last k elements belong to the activecategorical fields where for each possible value in the categorical field thecenter position is stored. That means that each categorical field matchesnumberOfPossibleValues elements in this array.

getVariancesGets the array of variances. The first n elements in the array belong to theactive continuous fields. The next m elements belong to the active discretenumeric fields and the last k elements belong to the active categoricalfields where for each possible value in the categorical field the variance isstored. That means that each categorical field matchesnumberOfPossibleValues elements in this array.


getCoVariancesGets the array of covariances with respect of the predicted field. The first nelements in the array belong to the active continuous fields. The next melements belong to the active discrete numeric fields and the last kelements belong to the active categorical fields where for each possiblevalue in the categorical field the covariance is stored. That means that eachcategorical field matches numberOfPossibleValues elements in this array.

getRegionTypeGets the region type. Possible types are active and inactive leaves andbranches, as well as quantile regions.

getRegionWeightGets the center weight of a region.

getParentNameGets the name of the parent region.

getRootMeanSquaredErrorRetrieves the root mean squared error of a region.

getNormalizedErrorRetrieves the normalized error of the region.

getAvgErrorGets the average error of a region.

getMaxErrorGets the maximum error of a region.

IDMDQuantileResult

The descriptive statistics and quantile result contains univariate and bivariatestatistics and quantile results. Sampled data tables that are computed by the DQSkernel are handled by the IDMDDataResult class.

A separate class describes the quantile result for each field:

Header file: idmdrdsq.hpp

Format:class IDMDQuantileResult {

public:

IDMDQuantileResult(IDMRETURN &rc,IDMDDescrStatsQuantResult* pDSQRes,const IDMCHAR* pFieldName,const IDMArray<IDMREAL> *pQuantileLimits,const IDMArray<IDMREAL> *pQuantiles,const IDMArray<IDMREAL> *pLowerExtremeValues,const IDMArray<IDMREAL> *pHigherExtremeValues);

xIDMDQuantileResult();const IDMDDescrStatsQuantResult* getDescrStatsQuantResult() const {

return pivDSQResult;}const IDMCHAR* getFieldName() const;const IDMArray<IDMREAL>* getQuantileLimits() const;const IDMArray<IDMREAL>* getQuantiles() const;const IDMArray<IDMREAL>* getLowerExtremeValues() const;


const IDMArray<IDMREAL>* getHigherExtremeValues() const;IDMBOOLEAN getQuantile(IDMINTEGER quantile,

IDMREAL& quantileLimit) const;};

Member functions:

getDescrStatsQuantResultGets a pointer to the descriptive statistics and quantile result.

getFieldNameRetrieves the field name.

getQuantileLimitsRetrieves the quantile limits of the field. The quantile limits are thepercentage values for which the corresponding quantile values werecomputed.

getQuantilesRetrieves the quantiles of the field. Quantiles are values in the range of thequantile field.

getLowerExtremeValuesRetrieves the lower extreme values for the field.

getHigherExtremeValuesRetrieves the higher extreme values for the field.

IDMDDescrStatsQuantResult

The descriptive statistics and quantile result contains univariate and bivariatestatistics and quantile results. Sampled data tables that are computed by the DQSkernel, too, are handled by the IDMDDataResult class. For descriptive statistics andquantile results the array of basic partitions is an array of pointers toIDMDPartition objects.


Format:class IDMDDescrStatsQuantResult : public IDMDBasicDescrStatsResult {public:

∼IDMDDescrStatsQuantResult();IDMDDescrStatsQuantResult(IDMRETURN &rc,

const IDMCHAR* fileName);

const IDMArray<IDMDPartition*>* getPartitions() const;

const IDMArray<IDMDQuantileResult*> * getQuantileResults() const;const IDMDQuantileResult* getQuantileResult(const IDMCHAR* chFieldName );

const IDMCHAR* getBivarStatsField();

};

Additional members of class IDMDDescrStatsQuantResult:

IDMDDescrStatsQuantResultThe destructor.

IDMDDescrStatsQuantResultThe constructor is called by components that access descriptive statisticsand quantile results (for example, a result visualizer). The argument is the


file name where the results are stored. The constructor reads the result fileand builds up a complete descriptive statistics quantile result object.

getPartitionsGets the array of partitions of the result.

getQuantileResultsGets the array of quantile results.

getQuantileResultGets the quantile result of a specific field.

getBivarStatsFieldGets the name of the bivariate statistics field.

IDMDPartition

The class IDMDPartition has no additional public members with respect toIDMDBasicPartition.


Format:class IDMDPartition : public IDMDBasicPartition {

public:

};

Result APIs for Statistical functions

These are Data Access API classes used in the Statistical Result API functions. Onlythe parts relevant for the Result API classes and functions are described here.

The library containing the Statistical Result API:v libidmsr.a

IDMStatisticsResult

IDMStatisticsResult provides a static method to identify the type of the statisticalresult.

IDM_StatResultType IDMStatisticsResult::getResultType(const IDMCHAR*resultFileName)

Header file: idmsrsta.hpp

Format:typedef enum { IDM_NO_STATISTIC_RESULT,

IDM_UNIVARIATE_CURVE_RESULT,IDM_PRIN_COM_ANALYSIS_RESULT,IDM_FACTOR_ANALYSIS_RESULT,IDM_LINEAR_REGRESSION_RESULT

} IDM_StatResultType;

class IDMStatisticsResult{public:


IDMStatisticsResult(IDMRETURN &rc,const IDMCHAR* resultFileName,IDMUSEMODE useMode);

∼IDMStatisticsResult();

static IDM_StatResultType getResultType( const IDMCHAR* resultFileName);

virtual IDMRETURN load() = 0;virtual IDMRETURN save() = 0;

const inline IDMCHAR* getResultFile() const;return pivResultFile;};

IDMRETURN updateResultFile( const IDMCHAR* resultFileName);

IDMUSEMODE getUseMode();}

getResultType( const IDMCHAR* resultFileName)Returns the type of the statistical result under resultFileName. SeeIDM_StatResultType for further information.

IDMStatTable

The statistics table is used as generic result in the other statistic results to provideresult data like fitting tables.

Header file: idmsrsta.hpp

Format:class IDMStatTable{

public:IDMStatTable();IDMStatTable(IDMStatTable *);∼IDMStatTable();

void getSize(IDMINTEGER &rows, IDMINTEGER &columns);IDMCHAR * getColumnHeading(IDMINTEGER column);IDMBOOLEAN isStringColumn(IDMINTEGER column);IDMBOOLEAN isDecimalColumn(IDMINTEGER column);IDMCHAR * getString(IDMINTEGER row, IDMINTEGER column);IDMREAL getDecimal(IDMINTEGER row, IDMINTEGER column);

};

Member functions:

getSize()Returns the maximum number of rows and columns of the table.

getColumnHeading()Returns the name of the specified column.

isStringColumn()Returns IDM_TRUE if the specified column has the data type string.Otherwise IDM_FALSE is returned.

isDecimalColumn()Returns IDM_TRUE if the specified column has the data type decimal.Otherwise IDM_FALSE is returned.


getString()Returns the data value of the specified column and row as string. The datatype of the column has to be string.

getDecimal()Returns the data value of the specified column and row as decimal. Thedata type of the column has to be decimal.

IDMStatCovarianceMatrix

The covariance matrix part of the correlation matrices result.

Header file: idmsrcma.hpp

Format:class IDMStatCovarianceMatrix{public:

IDMStatCovarianceMatrix();IDMStatCovarianceMatrix(IDMStatCovarianceMatrix *);∼IDMStatCovarianceMatrix();

void getSize(IDMINTEGER &rows, IDMINTEGER &columns);IDMCHAR * getVariableName(IDMINTEGER index);IDMREAL getCovariance(IDMINTEGER row, IDMINTEGER column);

};

Member functions:

getSize()Returns the maximum number of rows and columns of the covariancematrix, where the n-th row and the n-th column match the n-th variablename.

IDMStatLinearRegressionResult

This is the base class of linear regression. Particular results are described underIDMStatLinRegTable and IDMStatLinRegANOVA.

Header file: idmsrlin.hpp

Format:class IDMStatLinearRegressionResult : public IDMStatisticsResult{public:

IDMStatLinearRegressionResult(IDMRETURN &rc,const IDMCHAR* resultFileName,IDMUSEMODE useMode = IDM_TRAINING_MODE);

∼IDMStatLinearRegressionResult();

IDMRETURN load();IDMRETURN save();

IDMCHAR * getIndependentVariableName();IDMREAL getRSquared();IDMREAL getStandardError();void getDurbinWatsonProbability(IDMINTEGER &lag,


IDMREAL &prob);IDMREAL getConfidenceLevel();

const IDMStatLinRegTable * getLinRegTable();const IDMStatLinRegANOVA * getLinRegANOVA();const IDMStatTable * getLinRegModelFitting();

};

Member functions:

load() Loads result data from the result file.

save Saves result data to the result file.

IDMStatLinRegTable

The regression table part of the linear regression class.


Format:class IDMStatLinRegTable{public:

IDMStatLinRegTable();IDMStatLinRegTable(IDMStatLinRegTable *);∼IDMStatLinRegTable();

IDMINTEGER getSize();IDMCHAR * getDependentVariableName(IDMINTEGER index);IDMREAL getRegressionCoefficient(IDMINTEGER index);IDMREAL getStdError(IDMINTEGER index);IDMREAL getBetaCoefficient(IDMINTEGER index);IDMREAL getFValue(IDMINTEGER index);IDMREAL getFProbability(IDMINTEGER index);

void getConstant(IDMREAL &regressionCoefficient,IDMREAL &stdError);

};

Member functions:

getSize()Returns the maximum linear regression table size, where the n-th tableelement matches the n-th dependent variable name.

IDMStatLinRegANOVA

The ANOVA part of the linear regression class.


Format:class IDMStatLinRegANOVA{public:

IDMStatLinRegANOVA();IDMStatLinRegANOVA(IDMStatLinRegANOVA *);


∼IDMStatLinRegANOVA();

void getRegression(IDMREAL &degreeOfFreedom,IDMREAL &sumOfSquares,IDMREAL &meanSquares,IDMREAL &fValue,IDMREAL &fProbability );

void getResidual(IDMREAL &degreeOfFreedom,IDMREAL &sumOfSquares,IDMREAL &meanSquares);

void getTotal(IDMREAL &degreeOfFreedom,IDMREAL &sumOfSquares);

};

IDMStatUnivariateCurveResult

The univariate curve fitting result, providing the fitting curve and the fittingequation.

Header file: idmsruni.hpp

Format:typedef enum { IDM_CURVE_LINEAR,

IDM_CURVE_EXPONENTIAL,IDM_CURVE_POWER,IDM_CURVE_HYPERBOLA,IDM_CURVE_RECIPROCAL,IDM_CURVE_RATIONAL,IDM_CURVE_BESTFIT } IDM_StCurveType;

class IDMStatUnivariateCurveResult : public IDMStatisticsResult{public:

IDMStatUnivariateCurveResult(IDMRETURN &rc,const IDMCHAR* resultFileName,IDMUSEMODE useMode = IDM_TRAINING_MODE);

∼IDMStatUnivariateCurveResult();


IDMCHAR * getVariableName();IDMREAL getCorrelation();IDMREAL getStandardDeviation();IDMINTEGER getSeasonalPeriods();IDMINTEGER getNumberOfForecastPeriods();IDM_StCurveType getCurveType();void getEquation(IDMREAL &aCoeff, IDMREAL &bCoeff);

const IDMStatTable * getCurveFitting();

};

Member functions:


getCurveType()Returns the fitted curve type.


IDMStatPrinComAnalysisResult

The results of the principal component analysis.

Header file: idmsrpri.hpp

Format:typedef enum { IDM_FROM_CORRELATION,

IDM_FROM_COVARIANCE } IDM_StPrinComFrom;

class IDMStatPrinComAnalysisResult : public IDMStatisticsResult{public:

IDMStatPrinComAnalysisResult(IDMRETURN &rc,const IDMCHAR* resultFileName,IDMUSEMODE useMode = IDM_TRAINING_MODE);

∼IDMStatPrinComAnalysisResult();


void getSize(IDMINTEGER &maxVariables,IDMINTEGER &maxPrincipals);

IDMCHAR * getVariableName(IDMINTEGER variable);IDMCHAR * getPrincipalName(IDMINTEGER principal);IDM_StPrinComFrom getType();IDMREAL getMatrixOfVariables(IDMINTEGER variable1,

IDMINTEGER variable2);void getPrincipalAttributes(IDMINTEGER principal,

IDMREAL &eigenValue,IDMREAL &difference,IDMREAL &proportion,IDMREAL &cumulative);

IDMREAL getEigenVector(IDMINTEGER variable,IDMINTEGER principal);

};

Member functions:

getSize(IDMINTEGER &maxVariables, IDMINTEGER &maxPrincipals)Returns the maximum number of variables and principal components.

setSize(IDMINTEGER maxVariables, IDMINTEGER maxPrincipals)Destroys previous principal component data.



getMatrixOfVariablesThe type of this matrix is defined with the getType() method. It is acorrelation matrix or a covariance matrix of the input variables.

setMatrixOfVariablesThe type of this matrix is set with the setType() method. It is a correlationmatrix or a covariance matrix of the input variables.


IDMStatFactorAnalysisResult

The results of the factor analysis. This is the base class for factor analysis.Particular results are described under IDMStatFactorInputAnalysis,IDMStatFactorStatistic, IDMStatFactorRotation, IDMStatFactorRegression, andIDMStatFactorStructure.

Header file: idmsrfac.hpp

Format:class IDMStatFactorAnalysisResult : public IDMStatisticsResult{public:

IDMStatFactorAnalysisResult(IDMRETURN &rc,const IDMCHAR* resultFileName,IDMUSEMODE useMode = IDM_TRAINING_MODE);

∼IDMStatFactorAnalysisResult();


IDMBOOLEAN existsFactorRotation();IDMBOOLEAN existsFactorStructure();

const IDMStatFactorInputAnalysis * getInputAnalysis();const IDMStatFactorStatistic * getFactorStatistic();const IDMStatFactorRotation * getFactorRotation();const IDMStatFactorRegression * getFactorRegression();const IDMStatFactorStructure * getFactorStructure();

};

Member functions:



IDMStatFactorInputAnalysis

This class handles the input variables for factor analysis.


Format:typedef enum { IDM_NO_CHANGE_DIAGONAL,

IDM_SMC_DIAGONAL,IDM_MAR_DIAGONAL } IDM_StDiagonalChange;

class IDMStatFactorInputAnalysis{public:

IDMStatFactorInputAnalysis();IDMStatFactorInputAnalysis(IDMStatFactorInputAnalysis *);∼IDMStatFactorInputAnalysis();

IDMINTEGER getSize();IDM_StDiagonalChange getDiagonalChangeType();IDMCHAR * getVariableName(IDMINTEGER variable);


IDMREAL getEigenValue(IDMINTEGER variable);IDMREAL getCorrelation(IDMINTEGER variable1,

IDMINTEGER variable2);

};

Member functions:

getSize()Returns the maximum number of input variables.

IDMStatFactorStatistic

This class handles the factors for factor analysis.


Format:class IDMStatFactorStatistic{public:

IDMStatFactorStatistic();IDMStatFactorStatistic(IDMStatFactorStatistic *);∼IDMStatFactorStatistic();

void getSize(IDMINTEGER &maxVariables,IDMINTEGER &maxFactors);

IDMCHAR * getVariableName(IDMINTEGER variable);IDMCHAR * getFactorName(IDMINTEGER factor);void getPercentageOfFactor(IDMINTEGER factor,

IDMREAL &percent,IDMREAL &cumulative );

IDMREAL getEigenVectorOfFactor(IDMINTEGER factor,IDMINTEGER variable);

IDMREAL getFactorOfVariable(IDMINTEGER variable,IDMINTEGER factor);

IDMREAL getCommunalityOfVariable(IDMINTEGER variable);

};

Member functions:

getSize()Returns the maximum number of variables and factors.

IDMStatFactorRotation

This class handles the factor rotation for factor analysis.


Format:typedef enum { IDM_NO_ROTATE,

IDM_VRM_ROTATE,IDM_QRM_ROTATE } IDM_StFactorRotation;

class IDMStatFactorRotation{public:

IDMStatFactorRotation();


IDMStatFactorRotation(IDMStatFactorRotation *);∼IDMStatFactorRotation();


IDM_StFactorRotation getRotationType();IDMCHAR * getVariableName(IDMINTEGER variable);IDMCHAR * getFactorName(IDMINTEGER factor);IDMREAL getPercentageOfVariance(IDMINTEGER factor);IDMREAL getFactorOfVariable(IDMINTEGER variable,

IDMINTEGER factor);void getCommunalityOfVariable(IDMINTEGER variable,

IDMREAL &original,IDMREAL &final,IDMREAL &difference);

IDMINTEGER getNbOfIterations();IDMREAL getIterationVariance(IDMINTEGER iteration);

};

Member functions:


getNbOfIterations()Returns maximum number of iterations.

IDMStatFactorStructure

This class handles the structure of the factor analysis.


Format:class IDMStatFactorStructure{public:

IDMStatFactorStructure();IDMStatFactorStructure(IDMStatFactorStructure *);∼IDMStatFactorStructure();

IDMINTEGER getSize();IDMCHAR * getRearrangedVariableName(IDMINTEGER variable);IDMREAL getRearrangedFactorOfVariable(IDMINTEGER variable);IDMREAL getRearrangedCorrelationOfVariables(IDMINTEGER variable1,

IDMINTEGER variable2);

};

Member functions:

getSize()Returns the maximum number of variables.

IDMStatFactorRegression

This class handles the regression coefficients for factor analysis.



Format:class IDMStatFactorRegression{public:

IDMStatFactorRegression();IDMStatFactorRegression(IDMStatFactorRegression *);∼

IDMStatFactorRegression();


IDMCHAR * getVariableName(IDMINTEGER variable);IDMCHAR * getFactorName(IDMINTEGER factor);IDMREAL getRegressionCoefficient(IDMINTEGER variable,

IDMINTEGER factor);

};

Member functions:


Data Sample Result API

The data sample result essentially is a flat-file data table with its meta-data in aresult file. In addition, it can contain the specification of a default view of the data(like line plot or scatter plot).

IDMDDataSample

Header file: idmdrdat.hpp

Format:class IDMDDataSample {

public:∼IDMDDataSample();

const IDMCHAR* getName() const;

const IDMCHAR* getTitle();

void getChartingInfo(IDM_CHARTING_TYPE &chartType,const IDMArray<IDMCHAR*> *&pFieldArgs);

static IDMRETURN loadDataSamples(const IDMCHAR* pResFile,IDMArray<IDMDDataSample*>& dataSamples);

IDMArray<IDMCategoricalField*>* getCategoricalFields();IDMArray<IDMNumericField*>* getNumericFields();

IDMRETURN getTableAsMatrix( IDMMatrix0<IDMCHAR*>*&catValuesMatrix,IDMMatrix0<IDMREAL>*&numValuesMatrix);

};

Member Functions

∼IDMDDataSampleThe destructor.


getNameGets the name.

getTitleGets the title.

getChartingInfoGets the charting information. For example,(*pivFieldArgs)[0]

is the x axis,(*pivFieldArgs)[1]

is the y axis.

loadDataSamplesLoads the data samples from the result file.

getCategoricalFieldsGets the categorical fields.

getNumericFieldsGets the numeric fields.

getTableAsMatrixGets the table as matrix. catValuesMatrix is the subtable with categoricalvalues. The columns are numbered from 0 to number-of-cat-fields -1.Column i contains the values of (*getCategoricalFields())[i].numValuesMatrix is the subtable with numeric values. As the numberingof rows starts with 0 for IDMMatrix0 objects, the records numbering startswith 0, too.

Auxiliary classes for the Result API

These are Data Access API classes used in the clustering, the prediction, and theclassification test Result API functions. Only the parts relevant for the Result APIclasses and functions are described here.

Overview on IDMField Classes

A Field object represents a field in a record read from a file or DB2 relation orwritten to an output file. The field contains the basic value, that is, some string orreal value. There are following characteristics for fields:v The cardinality: a field may be single-valued or multi-valuedv The data type of the value may be string or realv The values may be discrete or continuousv A field may be defined by a (C++) function internallyv A categorical field may be restricted to have only binary values

Figure Figure 9 on page 304 shows the class hierarchy.


The class IDMField represents a single-valued field instead of a field in general.Multi-valued fields are not used.

The characteristics of an IDMField object can be inspected using the memberfunctions getDataType(), getFieldType(), isDiscrete(), isComputedFunction(), andgetCardinality().

The following table shows possible combinations for single-valued fields:

In any case, a field can be a computed field.

Note: By default, a numeric field is assumed to have a discrete value range. If toomany different values appear for a field that is assumed to be discrete bydefault, the characteristics of this field can change to being a continuousfield. Due to the reduction of the number of field types from 5 to 3, the DataAccess API Version 6 is incompatible with the Data Access API Version 1.This incompatiblity concerns the Result APIs with respect to the followingfunctions, which should be replaced respectively:

Version 1 Version 6

IDMDiscreteField::getStatistics() IDMField::getDiscreteStatistics()

IDMContinuousField::getStatistics() IDMNumericField::getContinuousStatistics()

IDMGeneralField

Header file: idmdfld.hpp

Format:

IDMField

IDMGeneralField

IDMMultiField

IDMCategoricalField IDMMultiCategoricalFieldIDMNumericField IDMMultiNumericField

Figure 9. IDMField class hierarchy

Figure 10. Characteristics of a single-valued IDMField object


class IDMGeneralField {

public:

IDM_FieldType getFieldType() const;IDM_FieldType getDataType() const;IDM_FieldDataType fieldTypeToDataType(IDM_FieldType fieldType);IDMBOOLEAN isNumeric() const;IDMBOOLEAN isCategorical() const;IDMBOOLEAN isBinary() const;IDMBOOLEAN isContinuous() const;

};

Member functions:

getFieldType()Returns the type of the field, that is, IDM_CATEGORICAL,IDM_CONT_NUMERIC, IDM_DISCR_NUMERIC, IDM_NUMERIC,IDM_BINARY, or IDM_UNDEFINED. The field type is an internalindicator describing which kind of values is expected to be contained inthis field. Instead of reading the field type directly, application code shoulduse derived functions such as isNumeric(), isDiscrete().

fieldTypeToDataType()An IDM_FieldType uniquely defines an IDM_FieldDataType. If the fieldtype is IDM_CATEGORICAL or IDM_BINARY, the function returns as datatype IDM_STRING_TYPE; if the field type is IDM_NUMERIC,IDM_CONT_NUMERIC, or IDM_DISCR_NUMERIC, the function returnsas data type IDM_REAL_TYPE; if the field type is none of the types listedabove it returns as data type IDM_UNDEFINED_TYPE.

IDMField


Format:class IDMField : public IDMGeneralField {

public:

IDMBOOLEAN getStringValue( const IDMCHAR*& ) const;IDMBOOLEAN getRealValue( IDMREAL ) const;IDMBOOLEAN getIndexValue(IDMINTEGER&) const;IDMINTEGER getIndexValue() const;IDMINTEGER getFieldWidth() const;

IDMArray<IDMCHAR*>* getAllStringValues() const;IDMArray<IDMREAL>* getAllRealValues() const;IDMDiscreteStatistics* getDiscrStatistics() const;IDMBOOLEAN isOrdered() const;IDMINTEGER getCycleLength() const;IDMINTEGER getCycleBegin() const;IDMArray<IDMREAL>* getAllDescriptions() const;

};

Member functions:

getStringValue(const IDMCHAR*&)Retrieves a value of a field as a string value.


getStringValue()Retrieves a value of a field as a string value.

getRealValue(IDMREAL&)Retrieves a value of a field as a real value.

getRealValue()Retrieves a value of a field as a real value.

getIndexValue(IDMINTEGER&)getIndexValue returns -1 if the value is out of value range and -2 if thevalue is invalid.

getIndexValue()getIndexValue returns -1 if the value is out of value range and -2 if thevalue is invalid.

getFieldWidth()Returns 0 if field width is not defined.

getAllRealValues()Retrieves all values of a field as real values.

getAllStringValues()Retrieves all values of a field as string values. Auxiliary function forIDMTableTerm updateStringValue(). Retrieves all values of a discrete datafield that are registered in the hash table. Return type should beIDMArray<const IDMCHAR *> * .

getAllRealValues()Retrieves all values of a field as real values.

getDiscrStatistics()Returns a pointer to the discrete statistics object of the field.

isOrdered()Returns IDM_TRUE for numeric data fields and IDM_FALSE fornonnumeric data fields.

getCycleLength()Returns 0 if the field is not cyclic.

getCycleBegin()Returns begin of cycle.

getAllDescriptions()If a name mapping has been defined, it returns the set of all descriptions.If not, it returns the result of getAllStringValues(). The i-th descriptioncorresponds to the i-th string value.

IDMNumericField

This class is used to describe real value data fields. IDMNumericField is derivedfrom IDMField.


Format:class IDMNumericField : IDMField {

public:

IDMNumericField();


IDMNumericField( IDMREAL )IDMNumericField( IDMCHAR* fieldName,

IDMINTEGER fieldWidth,IDM_FieldType=IDM_NUMERIC );

IDMBOOLEAN getRealValue(IDMREAL& realVal) const;IDMREAL getRealValue() const;IDMINTEGER getIndexValue() const ;IDMBOOLEAN getIndexValue(IDMINTEGER& intVal) const ;

IDMBOOLEAN getStringValue(const IDMCHAR*& pStr) const;IDMCHAR* getStringValue() const;IDMContinuousStatistics* getContStatistics() const;IDMArray<IDMCHAR*>* getAllStringValues() const;IDMArray<IDMREAL>* getAllRealValues() const;

IDMINTEGER getCycleLength() const;IDMINTEGER getCycleBegin() const;IDMBOOLEAN isOrdered() const;

};

Member functions:

IDMNumericField();Creating a field.

IDMNumericField( IDMREAL )Used for creating a field with constant value.

IDMNumericField( IDMCHAR*, IDMINTEGER,IDM_FieldType=IDM_NUMERIC)

Used for creating a field where the second argument for the fieldWidth isthe number of bytes to be used for printing values. If fieldWidth <= 0, theprecision is used as default.

getRealValue()If the field has a cyclic value range, getRealValue returns the normalizedvalue, that is, CycleBegin <= realVal < CycleBegin+CycleLength. Statisticsand Indexing will use this normalized value. It is safe and advisable to usegetRealValue for noncyclic fields.

getIndexValue()getIndexValue returns -1 if the value i is out of the value range and -2 ifthe value is invalid.

getIndexValue( IDMINTEGER& )Returns an index value together with the validity.

getStringValue( IDMCHAR*& )Returns a string value together with the validity.

getStringValue()Returns the string representation of getRealValue().

getContStatistics()Returns a pointer to the IDMContinuousStatistics object of that field.

getAllStringValues()Returns all string values of the field.

getAllRealValues()Returns all real values of the field.


getCycleLength()Returns the cycle length.

getCycleBegin()Returns the cycle begin.

isOrdered()Returns IDM_TRUE if a field is ordered.

IDMCategoricalField

This class is used to describe string data fields. IDMCategoricalField is derivedfrom IDMField.


All relevant member functions are inherited from IDMField.

IDMMultiField

The class IDMMultiField has been designed for future versions of the IntelligentMiner.

IDMMultiNumericField

The class IDMMultiNumericField has been designed for future versions ofIntelligent Miner.

IDMMultiCategoricalField

The class IDMMultiCategoricalField has been designed for future versions of theIntelligent Miner.

IDMContinuousStatistics

The class IDMContinuousStatistics provides methods for computing and retrievingthe statistics for continuous fields. The collection of statistics for continuous datafields comprises mean, variance, maximum, minimum, and distribution. Thedistribution of values is computed for the values between lowest and highest limit.This range is split into intervals of equal width (histogram buckets) and thefrequencies are computed for each bucket.

The functions that are relevant for the use of the result API are described in thefollowing.

Header file: idmdstat.hpp

Format:class IDMContinuousStatistics {

public:IDMREAL getMean() const;IDMREAL getVariance() const;IDMREAL getValuesSum() const;IDMREAL getSquaresSum() const;IDMREAL getMin() const;


IDMREAL getMax() const;IDMREAL getLowestLimit() const;IDMREAL getHighestLimit() const;

IDM_WidthUnit getBucketWidthUnit() const;IDMREAL getBucketWidth() const;IDMINTEGER getNumberOfBuckets() const;IDMINTEGER getNumberOfHistoBuckets() const;IDMArray<IDMREAL>* getBucketLimits() const;

IDMLONGINT getFrequency( IDMREAL value) const;IDMLONGINT getFrequency( IDMINTEGER bucketNumber ) const;IDMREAL getValuesSum( IDMINTEGER bucketNumber) const;IDMREAL getSquaresSum( IDMINTEGER bucketNumber) const;IDMLONGINT getTotalFrequency() const;IDMLONGINT getNbValidValues() const { return ivNbValidValues; };IDMBOOLEAN getHistoBucket( IDMREAL value,

IDMINTEGER& bucketNumber );IDMBOOLEAN getBucketLimits( IDMINTEGER bucketNumber,

IDMREAL& lowerLimit,IDMREAL& upperLimit ) const;

IDMBOOLEAN getBucketLimitsAsString( IDMINTEGER bucketNumber,IDMCHAR*& pBucketLimits) const;

IDMLONGINT getSumOfFrequencies( IDMBOOLEANcomputeFromScratch=IDM_FALSE)const;

};

Member functions:

getMean()Returns the mean value of the field.

getVariance()Returns the variance value of the field. Used for segmentation results.

getValuesSum()Returns the sum of all values of the field.

getSquaresSum()Returns the sum of all squares of the field.

getMin()Returns the minimum value of the field.

getMax()Returns the maximum value of the field.

getLowestLimit()Returns the lowest limit of a field for which the distribution of values iscomputed.

getHighestLimit()Returns the highest limit of a field for which the distribution of values iscomputed.

getBucketWidthUnit()Returns the bucket width unit.

getBucketWidth()Returns the bucket width.

getNumberOfBuckets()Returns the number of buckets.


getNumberOfHistoBuckets()Returns the number of histogram buckets. getNumberOfHistoBuckets is anold name for getNumberOfBuckets. Both functions are identical.

getBucketLimits()Returns NULL if the number of elements is 0 and the bucket limits else.

getFrequency( IDMREAL )Returns the frequency of occurrences of values for the bucket this valuebelongs to.

getFrequency( IDMINTEGER )Returns the frequency of occurrences of values for the histogram bucketwith this number. Bucket numbering starts with 0.

getValuesSum()Returns the sum of all values.

getSquaresSum()Returns the square of all values.

getTotalFrequency()Returns the total number of occurrences of values for this field.

getNbValidValues()Returns the number of all valid values.

getHistoBucket()Returns the number of the bucket this value belongs to.

getBucketLimits()Returns the limits of the bucket with this number.

getBucketLimitsAsString()Returns the string <lowerLimit>, <upperLimit>.

getSumOfFrequencies()Returns the sum of all frequencies.

IDMDiscreteStatistics

The class IDMDiscreteStatistics provides methods for computing and retrieving thestatistics for discrete fields.

The functions that are relevant for the use of the result API are described in thefollowing.

Header file: idmdstat.hpp

Format:class IDMDiscreteStatistics {public:

IDMDiscreteStatistics();IDMDiscreteStatistics( const IDMDiscreteStatistics& );IDMDiscreteStatistics( IDMRETURN &rc,

const IDMArray<IDMLONGINT> *pFrequencies,const IDMLONGINT freqOutOfRange,const IDMLONGINT totalFrequency,const IDMCHAR pFieldName = NULL );

IDMINTEGER getNumberOfValues() const;const IDMArray<IDMINTEGER>* getFrequencies() const;IDMLONGINT getFrequency(IDMINTEGER indexValue) const;IDMLONGINT getTotalFrequency() const;


IDMINTEGER getNumberOfBuckets() const;IDMArray<IDMLONGINT>* getFrequencies() const;IDMLONGINT getSumOfFrequencies() const;IDMLONGINT getOutOfRangeFrequency() const;

}

Member functions:

getNumberOfValues()Retrieves the number of distinct values for that field.

getNumberOfBuckets()Returns the number of buckets in the frequency array. Note that thenumber of buckets might be greater than getNumberOfValues(). Forexample, if there are buckets with frequency=0.

getFrequencies()Returns a pointer to an array of integers; the frequency for an index value iis the i-th element of this array. It represents the frequency of the i-thelement of the result of getAllStringValue(), getAllIntegerValues(), orgetAllRealValues() of the discrete field this discrete statistics object is amember of.

getFrequency()Returns the number of occurrences for that index value.

getTotalFrequency()Returns the total number of occurrences of values for that field.

getFrequencies()Returns NULL if the number of elements is 0; otherwise returns thefrequencies.

getSumOfFrequencies()Returns the sum of all frequencies.

getOutOfRangeFrequency()Returns the frequencies that are out of range.

IDMArray

IDMArray provides methods for an array that extends its size automatically whenelements are added.

Header file: idmdarr.hpp

Format:template <class Type>class IDMArray {

public:

IDMArray(const IDMArray<Type>&);IDMArray(Type initValue, IDMINTEGER initSize = IDM_MIN_ARRAY_SIZE);∼IDMArray();IDMINTEGER numberOfElements() const;IDMINTEGER size() const;const Type& operator[](IDMINTEGER index) const;const Type get(IDMINTEGER index) const;IDMINTEGER find(Type element) const;void append(IDMArray<Type>&);void append(IDMArray<Type>*);void addAtPosition(IDMINTEGER index, Type newElement);


inline void addAsLast(Type newElement);void removeInitValues();inline void removeAll();Type removeAtPosition(IDMINTEGER index = 0);Type replaceAtPosition(IDMINTEGER index, Type newElement);

};

Member functions:

IDMArray(const IDMArray<Type>&)The copy constructor.

IDMArray(Type, IDMINTEGER)Creates an array with given initial size and initial value. It allocatesmemory for initial size number of elements of type Type.

numberOfElements()Returns the number of elements of the array.

size() Returns the size of the internal array; for example, the number of elementsmemory has been allocated for.

array() Returns a pointer to the internal array. This can be useful for having afaster access to array elements.

operator[]Returns the i-th element.

get() Returns the i-th element.

find() Returns the index in the array if it contains the element, otherwise itreturns -1.

append(IDMArray<Type>&)Appends another array at the end of the array. The size of the array isextended if necessary.

append(IDMArray<Type>*)Appends another array at the end of the array. The size of the array isextended if necessary.

addAtPosition()Adds an element at a certain position. The size of the array is extended ifnecessary.

addAsLast()Adds an element at the end of the array. The size of the array is extendedif necessary.

removeInitValues()Removes all elements equal to the initial value.

removeAll()Removes all elements.

removeAtPosition()Removes the element at the given position.

replaceAtPosition()Replaces the element at the given position by another element.


Appendix A. Sample applications using the EnvironmentLayer API

Two sample programs are provided to illustrate the usage of the EnvironmentLayer API.

Sample application using flat files

The sample Environment Layer API program for flat files does the following:v Builds a mining basev Builds a name mapping object

– Constructs the flat-file field objects– Constructs the flat-file table– Constructs the data table– Constructs the name-mapping object

v Builds a data object– Constructs the flat-file field objects– Assigns the name mapping object to the ITEMID flat-file field object– Constructs the flat-file table– Constructs the data table– Constructs the data object

v Builds an Associations-settings objectv Starts an Associations-mining runv Gets the result of the Associations-mining run and launches the browserv Saves the mining basev Closes the mining basev Loads the mining basev Removes the mining base

You can build and run this sample program yourself. The following files areincluded with the product in the directory /usr/lpp/IMiner/samples:

idmenvsa.cppCode of the sample program

MakefileMakefile to generate an executable out of the sample program code

sample.namesName-mapping sample data file

sample.datawk1First flat file containing sample transaction data

sample.datawk2Second flat file containing sample transaction data


Source code of the flat file sample program

Following is the complete source code of the sample program idmenvsa.cpp foryour reference.

/* *//* Licensed Materials - Property of IBM *//* *//* 5648-127 5655-161 5733-IM1 *//* 5697-IM2 5655-IM2 5733-IM2 5697-IMP *//* 5697-IM3 5655-IM3 5733-IM3 5697-IMQ *//* (C) Copyright IBM Corporation 1996, 1999 *//* *//* All rights reserved. *//* US Government Users Restricted Rights - *//* Use, duplication or disclosure restricted by GSA ADP *//* Schedule Contract with IBM Corporation. *//* */

#include "idmcmnb.hpp"#include "idmcffld.hpp"#include "idmcfptb.hpp"#include "idmcdatb.hpp"#include "idmdglob.hpp"

/**************************************************************************//* DECLARATIONS *//**************************************************************************/

IDMRETURN rc = IDM_SUCCESS; // Returncode of the API methods

// Variables used to build a mining baseIDMMiningBase *pBase = NULL;IString base("BASE1");

// Variables used to define the data fields of the flat file table and// the flat file itself which is// used for the mining run ( /usr/lpp/IMiner/samples/sample.datawk1 and// /usr/lpp/IMiner/samples/sample.datawk2 )

IKeySortedSet dataFields;IDMFlatFileField *pField1 = NULL;IDMFlatFileField *pField2 = NULL;ISequence field1Positions;ISequence field2Positions;IDMFlatFileTable *pDataTable = NULL;ISequence fileNamesData;

// Variables used to define the data fields of a name mapping// flat file table and the flat file table itself// ( /usr/lpp/IMiner/samples/sample.names ) which is used// to assign a name to the items in ITEMID field in files sample.datawk1// and sample.datawk2

IKeySortedSet nameFields;IDMFlatFileField *pName1 = NULL;IDMFlatFileField *pName2 = NULL;ISequence name1Positions;ISequence name2Positions;IDMFlatFileTable *pNameTable = NULL;ISequence fileNamesNmp;

// Variables used to define a data table objectISequence compFields;


// Variables used to define a data objectIString dataObjName( "DataObj");IDMData *pDataObject = NULL;

// Valriables used to define a name mapping objectIString nmpObjName( "NmpObj");IDMNameMapping *pNmpObject = NULL;IString item( "ITEM");IString desc( "DESC");

// Variables used to define an association settings objectIString assObjName( "AssObj");IDMAssocSettings *pAssObject = NULL;IDMDOUBLE confidence = 25.0;IDMDOUBLE support = 3.0;IDMINTEGER maxRuleLength = 10;IDMTaxonomy *pTaxonomy=NULL;IDMSelections selections;IDMItemConstraints itemConstraints;

// Variables used to define a result objectIDMResult *pAssResult;

// Variables used to launch a browserIDMBrowseFormatDefs formatDefs;IDMBrowseFormat format;IDMBrowseFormatDefs::Cursor formatDefCur( formatDefs );

// Variables used to get exceptionsIDMExceptionType excType;IString excText;IDMINTEGER excId = 0;IDMCOMPONENT comp;IDMRETURN severity;

void deleteObjects() {if ( pBase ) pBase->deleteObject();if ( pField1 ) delete pField1;if ( pField2 ) delete pField2;if ( pDataTable ) delete pDataTable;if ( pName1 ) delete pName1;if ( pName2 ) delete pName2;if ( pNameTable ) delete pNameTable;

}

void handleException( IString text ) {IDMException *pExc;cout << text << ": ";if (rc != IDM_SUCCESS) {

pExc = IDMBase::getException();pExc->get( excType, comp, excId, severity, excText );cout << excId << "," << excText << endl;if (rc < IDM_SUCCESS) { // rc could be IDM_WARNING

deleteObjects();exit(1);

}} else {

cout << "OK!" << endl;}

}

Appendix A. Sample applications using the Environment Layer API 315

/**************************************************************************//* MAIN PROGRAM *//**************************************************************************/

main(int argc, char *argv[]){

/**************************************************************************//* Set the hostname of the server process *//**************************************************************************/

if (argc==4) {IDMBase::setHostName( argv[1] );IDMBase::setUserId( argv[2] );IDMBase::setPassword( argv[3] );

} else {cout << "Usage: " << argv[0] << " Hostname UserID Password" << endl;exit(1);

}

/**************************************************************************//* Check whether IDM_BIN_DIR environment variable is set *//**************************************************************************/

if (getenv(IDM_BIN_DIR)==NULL) {cout << "Warning: environment variable \""

<< IDM_BIN_DIR<< "\" is not set - no detailed warnings.\n";

} /* endif */

/**************************************************************************//* Build a mining base *//**************************************************************************/

pBase = new IDMMiningBase( rc, base );handleException("Create Mining Base");

/**************************************************************************//* Build a name mapping object *//**************************************************************************/

// Set the positions of the fields. The item id field ITEM begins// at column 1 and ends at column 3. The description field DESC begins at// column 5 and ends at column 23.

name1Positions.add(1);name1Positions.add(3);name2Positions.add(5);name2Positions.add(23);

// constructs the field objects

pName1 = new IDMFlatFileField( rc, "ITEM", IDM_CATEGORICAL, name1Positions );pName2 = new IDMFlatFileField( rc, "DESC", IDM_CATEGORICAL, name2Positions );

nameFields.add( pName1 );nameFields.add( pName2 );

// Add the filename of the name mapping flat file table to the sequence// of file names

#ifdef _SOLARIS


fileNamesNmp.add( "/opt/IMiner/samples/sample.names" );#else#ifdef WIN32

fileNamesNmp.add( "/im/sample/sample.names" );#else

fileNamesNmp.add( "/usr/lpp/IMiner/samples/sample.names" );#endif#endif

// Construct the name mapping flat file table. The record length of one// record in the files sample.datawk1 and sample.datawk2 is 24

pNameTable = new IDMFlatFileTable( rc, fileNamesNmp, 24, nameFields );

// Build the data table holding a pointer to the flat file table object// and a list of computed fields, which are not used in this program.

IDMDataTable nameTable( rc, pNameTable, compFields );

// Construct the name mapping object

rc = IDMNameMapping::createObject( nmpObjName, pBase, nameTable,item, desc, pNmpObject );

handleException("Create Name Mapping Object");

/**************************************************************************//* Build a data object *//**************************************************************************/

// Set the positions of the fields. The transaction id field TRANSID begins// at column 20 and ends at column 24. The item id field ITEMID begins at// column 26 and ends at column 28.

field1Positions.add(20);field1Positions.add(24);field2Positions.add(26);field2Positions.add(28);

// constructs the field objects

pField1 = new IDMFlatFileField( rc, "TRANSID", IDM_CATEGORICAL,field1Positions );

pField2 = new IDMFlatFileField( rc, "ITEMID", IDM_CATEGORICAL,field2Positions );

// Assign the name mapping object to ITEMID fieldrc = pField2->setNameMapping(pNmpObject);

// Add the field objects to the IKeySortedSet Collection of IDMFlatFileField// objects

dataFields.add( pField1 );dataFields.add( pField2 );

// Add the filenames of the data flat file table to the sequence// of file names

#ifdef _SOLARISfileNamesData.add( "/opt/IMiner/samples/sample.datawk1" );fileNamesData.add( "/opt/IMiner/samples/sample.datawk2" );

#else#ifdef WIN32

fileNamesData.add( "/im/sample/sample.datawk1" );fileNamesData.add( "/im/sample/sample.datawk2" );

#elsefileNamesData.add( "/usr/lpp/IMiner/samples/sample.datawk1" );


fileNamesData.add( "/usr/lpp/IMiner/samples/sample.datawk2" );#endif#endif

// Construct the data flat file table. The record length of one// record in the file dept.data is 29

pDataTable = new IDMFlatFileTable( rc, fileNamesData, 29, dataFields );

// Build the data table holding a pointer to the flat file table object// and a list of computed fields, which are not used in this program.

IDMDataTable dataTable( rc, pDataTable, compFields );

// Construct the data object

rc = IDMData::createObject( dataObjName, pBase, dataTable,IDM_INPUT_ONLY, pDataObject );

handleException("Create Data Object");

/**************************************************************************//* Build an associations settings object *//**************************************************************************/

rc = IDMAssocSettings::createObject( assObjName, pBase, pDataObject,pTaxonomy, selections,itemConstraints, "ITEMID","TRANSID", confidence, support,maxRuleLength, pAssObject );

handleException("Create Association Settings Object");#ifndef WIN32

IDMBase::setWorkingDirectory("/tmp");#endif

IDMBase::setMemorySize(32);

/**************************************************************************//* Sets the name of the result object that is generated after the *//* association run. If a result object with the name "AssResult" *//* already exists, it is overwritten. *//* Note: This method is one possibility to make a result of a mining run *//* persistent in the mining base. The other possibility is shown in *//* sample file idmsdb2.cpp. *//**************************************************************************/

pAssObject->setResultName("AssResult");

/**************************************************************************//* Start an association run *//**************************************************************************/

rc = pAssObject->start();handleException("Start Association Run");

/**************************************************************************//* Retrieve the result object "AssResult" from the mining base *//**************************************************************************/

rc = pBase->getElement("AssResult", pAssResult );handleException("Get result element from mining base");

/**************************************************************************//* Launch the browser for Associations *//* 1) Retrieve all available browse formats for Associations from file *//* idmcsctr.dat ( client registration tool client file ) *//* 2) Out of the available browse formats select the one you want to use */


/* 3) Launch the browser for your selected browse format *//**************************************************************************/

rc = IDMResult::getBrowseFormats( IDM_ASS_RESULT, formatDefs );handleException("Get Browse Formats");

IDMBrowseFormatKey key(IDM_ASS_RESULT, "IdmApi", "Browse Associations" );

if (formatDefs.locateElementWithKey( key, formatDefCur )) {format = formatDefs.elementAt( formatDefCur );

}

rc = pAssResult->launchBrowser( format );handleException("Launch Browser");

/**************************************************************************//* Save the mining base *//**************************************************************************/

rc = pBase->save();handleException("Save Mining Base");

/**************************************************************************//* Close the mining base *//**************************************************************************/

delete pBase;

/**************************************************************************//* Load the mining base *//**************************************************************************/

rc = IDMMiningBase::load( base, pBase );handleException("Load Mining Base");

/**************************************************************************//* Remove the mining base *//**************************************************************************/

rc = pBase->deleteObject();handleException("Delete Mining Base");

pBase = NULL;deleteObjects();

return 0;} /* end of main */

Running the flat file sample program

To run the sample program on AIX, do the following steps:1. On the client, type make idmsflat in the /usr/lpp/IMiner/samples directory.

The output of the compile and link job is an executable called idmsflat. Ensurethat you have write permission to the directory mentioned, or copy theMakefile and idmenvsa.cpp to another directory where you can write.

2. On the server side set the environment variable IDM_MNB_DIR to a directorythat should contain the meta-data of the mining bases. The home directory isused if you do not set this variable.

3. On the server side set the environment variable IDM_RES_DIR to a directorythat should contain the results of the mining runs. The home directory is usedif you don’t set this variable.


4. On the server side start the server daemon by entering idmstart in the/usr/lpp/IMiner/bin directory.

5. On the client side start the apisample executable by typing: idmsflat <serverhostname> <server userid> <server password>

Sample application using DB2

The Intelligent Miner provides a DB2 sample application. You can build and runthis sample application. It performs the following tasks:v Building the mining base IMSAMPLEv Building a DB2 table

– Initializing the DB2 environment– Retrieving the database name, table, and schema– Printing the field names of the selected database

v Building an input data objectv Building an output data objectv Building a Clustering settings object

You can change the following settings:Values of IKeyset< IDMClusFieldParams*, IString> clusFieldParams;IDMINTEGER maxNumberOfPasses = 4;IDMINTEGER maxNumberOfClusters = 10;IDMREAL accuracy = 5;IDMREAL similarityThreshold = 0.9;

v Setting the active, supplementary, and output fieldsIf your database contains other field names as written into the active,supplementary, and output fields, you should change them (see also sourcecode). The valid fields are printed as field names of the input object.

v Starting a Demographic Clustering run in training mode and saving its result.v Starting a Demographic Clustering run in application mode and retrieving data

on the output data object:– The name of the object– The table type (DB2/Flat file)– The names of the fields in the output object

v Terminating the DB2 environmentv Removing the mining base

The following sections describe how to use this application on AIX.

Source code of the DB2 sample application

On AIX, the source code of the DB2 sample application is stored in the directory/usr/lpp/IMiner/samples. This directory contains the following files:

idmsdb2.cppCode of the DB2 sample program

MakefileMakefile to generate an executable out of idmsdb2.cpp

Following is the complete DB2 sample application for your reference.


/* *//* Licensed Materials - Property of IBM *//* *//* 5648-127 5655-161 5733-IM1 *//* 5697-IM2 5655-IM2 5733-IM2 5697-IMP *//* 5697-IM3 5655-IM3 5733-IM3 5697-IMQ *//* (C) Copyright IBM Corporation 1996, 1999 *//* *//* All rights reserved. *//* US Government Users Restricted Rights - *//* Use, duplication or disclosure restricted by GSA ADP *//* Schedule Contract with IBM Corporation. *//* */

/* Note: All lines marked with //CHANGE! are to be changed*//* if running on another system! */

#include "idmcmnb.hpp"#include "idmcdb2.hpp"#include "idmcdatb.hpp"#include "idmdglob.hpp"#include "idmcclus.hpp"

/**************************************************************************//* DECLARATIONS *//**************************************************************************/

IDMRETURN rc = IDM_SUCCESS; // Returncode of the API methods

// Variables used to build a mining base// These parameters are to be changed for a particular databaseIDMMiningBase *pBase = NULL;IString base("IMSAMPLE"); //CHANGE! // the name of the databaseIString tname("STANDARD"); //CHANGE! // the name of the tableIString tnameOut("STANDOUT"); // the name of the output tableIString schema("IDMDB2"); //CHANGE! // the name of the schema

IKeySortedSet get_dataFields;IKeySortedSet::Cursor fieldCur(get_dataFields );

IKeySet clusFieldParams;IKeySet::Cursor clusCur( clusFieldParams );

// Variables used to define the data fields of the DB2 databaseIDMDB2Table *pDataTable = NULL;IDMDB2Table *pDataTableOut = NULL;SQLHENV henv; //environment handleSQLHDBC hdbc; //connectionIKeySortedSet dataFieldsOut;

// Variables used to define a data table objectISequence compFields;

// Variables used to define a data objectIString dataObjName("DataObj");IDMData *pData = NULL;

// Variables used to define an output data objectIString dataOutObjName("DataOutObj");IDMData *pOutputData = NULL;


// Variables used to define a clustering settings objectIDMClusteringSettings *pClObject = NULL;IString clObjName("clObj");IDMUSEMODE useMode = IDM_TRAINING_MODE;IDMSelections selection;ISequence activeFields;ISequence supplementaryFields;ISequence outputFields;ISequence fieldWeights;IString clusterField("clClusterField");IString scoreField("clScoreField");IString clusterResult("");IDMINTEGER maxNumberOfPasses = 4;IDMINTEGER maxNumberOfClusters = 10;IDMREAL accuracy = 5;IDMREAL similarityThreshold = 0.9;ISequence probeWeightingFlags;ISequence infoWeightingFlags ;ISequence distanceUnits ;

// Variables used to define the result objectIDMResult result;IDMResult *pClResult; // ClusteringResult

// Variables used to get exceptionsIDMExceptionType excType;IString excText;IDMINTEGER excId = 0;IDMCOMPONENT comp;IDMRETURN severity;

/**************************************************************************//* Functions used to get exceptions: void handleException(Message) *//**************************************************************************/

void deleteObjects() {if (pBase) pBase->deleteObject();if (pDataTable) delete pDataTable;if (pDataTableOut) delete pDataTableOut;

forCursor(fieldCur) {delete fieldCur.element();

}

forCursor(clusCur) {delete clusCur.element();

}

}

void handleException( IString text ) {IDMException *pExc;cout << text << " ";if (rc != IDM_SUCCESS) {

pExc = IDMBase::getException();pExc->get( excType, comp, excId, severity, excText );cout << excId << "," << excText << endl;if (rc < IDM_SUCCESS) { // rc could be IDM_WARNING

deleteObjects();exit(1);

}} else {

cout << "OK!" << endl;


}}

/**************************************************************************//* MAIN PROGRAM *//**************************************************************************/

main(int argc, char *argv[]){

cout << "Begin : "<< argv[0] << endl;

/**************************************************************************//* Important information for IDMBase *//**************************************************************************/

if (argc==6) {IDMBase::setHostName( argv[1]);IDMBase::setUserId( argv[2]);IDMBase::setPassword( argv[3]);IDMBase::setDB2UserId( argv[4]);IDMBase::setDB2Password(argv[5]);

} else {cout << "Usage: " << argv[0] << " Hostname UserID Password DB2UserID DB2Password" << endl;exit (1);

}

/**************************************************************************//* Check whether IDM_BIN_DIR environment variable is set *//**************************************************************************/

if (getenv(IDM_BIN_DIR)==NULL) {cout << "Warning: environment variable \""

<< IDM_BIN_DIR<< "\" is not set - no detailed warnings.\n";

} /* endif */

/**************************************************************************//* Open a mining base *//**************************************************************************/

pBase = new IDMMiningBase( rc , base );handleException(" new Mnb: ");

/**************************************************************************//* Build the description for the DB2 table.... *//**************************************************************************/

pDataTable = new IDMDB2Table(rc, base, schema, tname,IDMBase::getDB2UserId(),IDMBase::getDB2Password());

handleException(" new IDMDB2Table: ");

pDataTableOut = new IDMDB2Table(rc, base, schema, tnameOut, dataFieldsOut); //,dataFieldsOuthandleException(" new IDMDB2Table: ");

/**************************************************************************//* Print the fields of pDataTable: *//**************************************************************************/

// define a cursor ...ISortedSet fieldNames1;pDataTable->getFieldNames(fieldNames1);ISortedSet::Cursor myCursor1(fieldNames1);


myCursor1.setToFirst();cout << " Fieldnames of the INPUT Object: " << endl;cout << " -" << fieldNames1.elementAt(myCursor1) << endl; // first elementwhile (myCursor1.setToNext()) {

cout <<" -";cout << fieldNames1.elementAt(myCursor1) << endl;

}

/**************************************************************************//* Build a data object *//**************************************************************************/IDMDataTable dataTable(rc, pDataTable, compFields );

rc = IDMData::createObject(dataObjName, pBase, dataTable, IDM_INPUT_ONLY,pData );

handleException(" IDMData::createObject(i): ");

/**************************************************************************//* Build an output data object *//**************************************************************************/IDMDataTable dataOutTable( rc , pDataTableOut , compFields);

rc = IDMData::createObject(dataOutObjName , pBase, dataOutTable ,IDM_INPUT_OUTPUT, pOutputData);

handleException(" IDMData::createObject(o): ");

/**************************************************************************//* Initializes the DB2 environment ... *//**************************************************************************/rc = IDMDB2Table::initializeDB2(,);handleException(" initializeDB2: ");

/**************************************************************************//* gets the database name, the table name, and so on *//**************************************************************************/IString get_base; // database_nameIString get_tname; // tablenameIString get_schema; // schema

rc = pDataTable->get(get_base,get_schema,get_tname,get_dataFields);handleException(" getDB2: ");

cout << " -> dbname:"<setDemoClusParameters( similarityThreshold, accuracy );

handleException(" new IDMClustering(T): ");

/**************************************************************************//* Start a clustering settings run ... (in IDM_TRAINING_MODE) *//**************************************************************************/

rc = pClObject->start();

handleException(" clustering run(T): ");

/**************************************************************************//* Save the result of the clustering settings run...(in IDM_TRAINING_MODE)*//* Note: The following code shows one possibility how a result of a *//* mining run can be made persistent in the mining base. Use this *//* possibility only if you have not assigned a name for a result object *//* to your settings object using method IDMSettings::setResultName(IString*//* name) before starting the mining run. *//* The other possibility of making a result of a mining run persistent *//* in the mining base is shown in sample file idmenvsa.cpp *//**************************************************************************/result = pClObject->getResult();


pClResult = new IDMResult(rc,"ClResult",result);

handleException(" get result(T): ");

/**************************************************************************//* switch to IDM_APPLICATION_MODE *//**************************************************************************/

useMode = IDM_APPLICATION_MODE;clusterResult = "ClResult";

rc = pClObject->update( IDM_CLUS_TYPE_DEMO,clObjName ,pData,selection,useMode,activeFields,supplementaryFields,pOutputData,outputFields,clusterField,scoreField,"", "", "",clusFieldParams,maxNumberOfPasses,maxNumberOfClusters,IDM_AS_VALID_VALUES,clusterResult );

handleException(" update IDMClustering(A): ");

/**************************************************************************//* Start a clustering settings run ... *//**************************************************************************/

rc = pClObject->start();

handleException(" clustering run(A): ");

/**************************************************************************//* Show some data about the output object ... *//**************************************************************************/

IString name2;IDMMiningBase *pMnb=NULL;IDMDataTable dataTable2;IDM_DataUseMode dataUseMode;rc = pOutputData->get(name2,pMnb,dataTable2, dataUseMode);handleException(" IDMData->get: ");

cout << " name of object : " << name2 << endl;

IDMStaticTable *pStaticTable=NULL;rc = dataTable2.get(pStaticTable, compFields);

ISortedSet fieldNames2;rc = pStaticTable->getFieldNames(fieldNames2);// Notice: for getTableType 1 .. flatFileTable// 2 .. DB2Tablecout << " staticTable->getTableType:" << pStaticTable->getTableType() << endl;

// define a cursor ...ISortedSet::Cursor myCursor(fieldNames2);myCursor.setToFirst();


cout << " Fieldnames of the OUTPUT Object:"<< endl;cout << " -" << fieldNames2.elementAt(myCursor) << endl; // first elementwhile (myCursor.setToNext()) {

cout <<" -";cout << fieldNames2.elementAt(myCursor) << endl;

}

delete pStaticTable;

/**************************************************************************//* Terminates the DB2 environment ... *//**************************************************************************/rc = IDMDB2Table::terminateDB2(henv,hdbc);handleException(" terminateDB2: ");

rc = pBase->deleteObject();handleException(" Delete Mining Base: ");

pBase = NULL;

deleteObjects();

cout << "End : "<< argv[0] << endl;return 0;

} /* end of main */

Running the DB2 sample program

To run the sample program on AIX, follow these steps:1. Edit/replace data fields in the file /usr/lpp/IMiner/samples/idmsdb2.cpp

(activeFields, ...) with valid ones. The (data) fields to be changed are markedwith a comment: //CHANGE!

2. On the client side in the /usr/lpp/IMiner/samples directory type: makeidmsdb2 Ensure that you have write permission to this directory or copy theMakefile and idmsdb2.cpp to another writable directory. The output will be theexecutable: idmsdb2

3. On the server side set the environment variable IDM_MNB_DIR to a directorythat should contain the meta-data of the mining bases. The home directory isused if you do not set this variable.

4. On the server side set the environment variable IDM_RES_DIR to a directorythat should contain the results of the mining runs. The home directory is usedif you don’t set this variable.

5. On the server side start the server daemon by entering idmstart in the/usr/lpp/IMiner/bin directory.

6. On the client side start the idmsdb2 executable by typing: idmsdb2<Hostname> <UserID> <Password> <DB2UserID> <DB2Password>

Output of the DB2 sample

The output of the program should look like this:Begin : idmsdb2new Mnb: OK! <------ OK is normal (rc=0)new IDMDB2Table: OK!new IDMDB2Table: OK!Field names of the INPUT Object: <------ valid fields-AGE-COMMUTE_DIST-CYCLES


-MARITAL_STATUS-NUM_CLAIMS-NUM_DEPENDENTS-RENEWAL_MONTH-SALARY-SEX-YEAR_1ST_POLICYIDMData::createObject(i): OK!IDMData::createObject(o): OK!initialiseDB2: OK!getDB2: OK!-> dbname:IMSAMPLE-> tbname:STANDARD-> schema:IDMDB2new IDMClustering(T): OK!clustering run(T): OK!get result(T): OK!update IDMClustering(A): OK!clustering run(A): OK!IDMData->get: OK!name of object : DataOutObjstaticTable->getTableType:2 <------ 2 means: IDM_DB2_TABLEField names of the OUTPUT Object:-AGE-COMMUTE_DIST-RENEWAL_MONTH-SALARY-clClusterField-clScoreFieldterminateDB2: OK!End : idmsdb2


Appendix B. Notices

This information was developed for products and services offered in the U.S.A.IBM may not offer the products, services, or features discussed in this document inother countries. Consult your local IBM representative for information on theproducts and services currently available in your area. Any reference to an IBMproduct, program, or service is not intended to state or imply that only that IBMproduct, program, or service may be used. Any functionally equivalent product,program, or service that does not infringe any IBM intellectual property right maybe used instead. However, it is the user’s responsibility to evaluate and verify theoperation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter inthis document. The furnishing of this document does not give you any license tothese patents. You can send license inquiries, in writing, to the

IBM Director of LicensingIBM Corporation,North Castle DriveArmonk, NY 10504–1785U.S.A.

For license inquiries regarding double-byte (DBCS) information, contact the IBMIntellectual Property Department in your country or send inquiries, in writing, to:

IBM World Trade Asia CorporationLicensing2-31 Roppongi 3-chome, Minato-kuTokyo 106, Japan

The following paragraph does not apply to the United Kingdom or any othercountry where such provisions are inconsistent with local law:

INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THISPUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSOR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIEDWARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESSFOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express orimplied warranties in certain transactions, therefore, this statement may not applyto you.

This information could include technical inaccuracies or typographical errors.Changes are periodically made to the information herein; these changes will beincorporated in new editions of the publication. IBM may make improvementsand/or changes in the product(s) and/or the program(s) described in thispublication at any time without notice.

Licensees of this program who wish to have information about it for the purposeof enabling: (i) the exchange of information between independently createdprograms and other programs (including this one) and (ii) the mutual use of theinformation which has been exchanged, should contact:

IBM Deutschland Informationssysteme GmbHDepartment 3982Pascalstrasse 10070569 StuttgartGermany


Such information may be available, subject to appropriate terms and conditions,including in some cases, payment of a fee.

This information contains examples of data and reports used in daily businessoperations. To illustrate them as completely as possible, the examples include thenames of individuals, companies, brands, and products. All of these names arefictitious and any similarity to the names and addresses used by an actual businessenterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, whichillustrates programming techniques on various operating platforms. You may copy,modify, and distribute these sample programs in any form without payment toIBM, for the purposes of developing, using, marketing or distributing applicationprograms conforming to the application programming interface for the operatingplatform for which the sample programs are written. These examples have notbeen thoroughly tested under all conditions. IBM, therefore, cannot guarantee orimply reliability, serviceability, or function of these programs.

The Intelligent Miner incorporates code generated by the Purdue CompilerConstruction Tool Set (PCCTS).

Trademarks and Service Marks

The following terms are trademarks of the IBM Corporation in the United States,or other countries, or both:

AIX AS/400Common User Access DATABASE 2DataJoiner DB2IBM Intelligent MinerMVS MVS/ESAOS/2 OS/390PowerPC POWERparallelPS/2 RETAINRISC System/6000 RS/6000SAA Scalable POWERparallel SystemsSP2 VisualAge

Java and all Java-based trademarks and logos are trademarks or registeredtrademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks ofMicrosoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark in the United States, other countries, or both and islicensed exclusively through X/Open Company Limited.

Other company, product, or service names may be the trademarks or service marksof others.


Glossary

This glossary defines terms as they are used inthis book. If you do not find the term you arelooking for, see the IBM Dictionary of Computing.

Aadaptive connection. A numeric weight used todescribe the strength of the connection between twoprocessing units in a neural network. The connection iscalled adaptive because it is adjusted during training.Values typically range from zero to one, or -0.5 to +0.5.

AFS. Andrew File System. A distributed file systemdeveloped by IBM and Carnegie-Mellon University.

aggregate. To summarize data in a field.

application programming interface (API). Afunctional interface supplied by the operating systemor a separately orderable licensed program that allowsan application program written in a high-levellanguage to use specific data or functions of theoperating system or the licensed program.

architecture. The number of processing units in theinput, output, and hidden layers of a neural network.The number of units in the input and output layers iscalculated from the mining data and input parameters.

associations. The relationship of items in a transactionin such a way that items imply the presence of otheritems in the same transaction.

attribute. Characteristics or properties that can becontrolled, usually to obtain a required appearance. Forexample, the color is an attribute of a line. Inobject-oriented programming, a data element definedwithin a class.

Bback propagation. A general-purpose neural networknamed for the method used to adjust its weights whilelearning data patterns. The Neural Classificationmining function uses such a network.

boundary field. The upper limit of an interval as usedfor the Discretization using ranges processing function.

bucket. One of the bars in a bar chart representing thefrequency distribution of a continuous field. A bucketshows how many values lie within a specific frequencyrange.

CCentral Processing Complex (CPC). A group of up to10 interconnected processors in a parallel sysplex.

chi-square test. A test to check whether two variablesare statistically dependent. Chi-square is calculated bysubtracting the expected frequencies (imaginary values)from the observed frequencies (actual values). Theexpected frequencies represent the values that were tobe expected if the variables in question werestatistically independent.

class. In object-oriented design or programming, agroup of objects that share a common definition andthat therefore share common properties, operations,and behavior. Members of the group are calledinstances of the class. A collection of defined entities(users, groups, and resources) with similarcharacteristics. Any category to which things areassigned or defined. The specification of an object,including its attributes and behaviors.

classification. The assignment of objects into groupsor categories based on their characteristics.

cluster. A group of records with similar characteristics.

cluster prototype. The attribute values that are typicalof all records in a given cluster. Used to compare theinput records to determine if a record should beassigned to the cluster represented by these values.

clustering mining function. A mining function thatcreates groups of data records within the input data onthe basis of similar characteristics. Each group is calleda cluster.

confidence factor. Indicates the strength or thereliability of the associations detected.

comma-separated variables format (CSV). A fileformat used by spreadsheet, database, and statisticalapplications.

CPC. See Central Processing Complex.

CSF. See comma-separated file format.

DDATABASE 2 (DB2). An IBM relational databasemanagement system.

database table. A table residing in a database.


database view. An alternative representation of datafrom one or more database tables. A view can includeall or some of the columns contained in the databasetable or tables on which it is defined.

data field. In a database table, the intersection fromtable description and table column where thecorresponding data is entered.

data format. There are different kinds of data formats,for example, database tables, database views, pipes, orflat files.

data table. A data table, regardless of the data formatit contains.

data type. There are different kinds of IntelligentMiner data types, for example, categorical, continuous,or discrete-numeric.

delimiter. A character used to indicate the beginningand end of a character string.

discrete. Pertaining to data that consists of distinctelements such as characters, or to physical quantitieshaving a finite number of distinctly recognizablevalues.

discretization. The act of assigning continuous valuesto intervals.

distributed file system. A file system composed offiles or directories that physically reside on more thanone computer in a communication network.

dotted decimal. A common notation for Internet hostaddresses that divides the 32-bit address into four 8-bitfields. The value of each field is specified as a decimalnumber and the fields are separated by periods, forexample, 010.002.000.052 or 10.2.0.52.

double-byte character set (DBCS). A set of charactersin which each character is represented by two bytes.

Eenvelope. The area between two curves that areparallel to a curve of time-sequence data. The firstcurve runs above the curve of time-sequence data, thesecond one below. Both curves have the same distanceto the curve of time-sequence data. The width of theenvelope, that is, the distance from the first parallelcurve to the second, is defined by epsilon.

epsilon. The maximum width of an envelope thatencloses a sequence. Another sequence isepsilon-similar if it fits in this envelope.

epsilon-similar. Two sequences are epsilon-similar ifone sequence does not go beyond the envelope thatencloses the other sequence.

equality compatible. Pertaining to different data typesthat can be operands for the = logical operator.

Euclidean distance. The square root of the sum of thesquared differences between two numeric vectors. TheEuclidean distance is used to calculate the errorbetween the calculated network output and the targetoutput in neural classification, and to calculate thedifference between a record and a prototype clustervalue in neural clustering. A zero value indicates anexact match; larger numbers indicate greaterdifferences.

Ffield. A set of one or more related data items groupedfor processing. In this document, with regard todatabase tables and views, field is synonymous tocolumn.

file. A collection of related data that is stored andretrieved by an assigned name.

file name. (1) A name assigned or declared for a file.(2) The name used by a program to identify a file.

file-selection box. A box that enables the user tochoose a file to work with by selecting a file name fromthe ones listed or by typing a file name into the spaceprovided.

file specification. In the AIX operating system, thename and location of a file. A file specification consistsof a drive specifier, a path name, and a file name.

file system. In the AIX operating system, thecollection of files and file management structures on aphysical or logical mass storage device, such as adiskette or minidisk. See distributed file system, virtualfile system.

flat file. (1) A one-dimensional or two-dimensionalarray: a list or table of items. (2) A file that has nohierarchical structure.

formatted information. An arrangement ofinformation into discrete units and structures in amanner that facilitates its access and processing.Contrast with narrative information.

frequent item sets. The total volume of items abovethe specified support factor returned by theAssociations mining function.

F-test. A statistical test that checks whether twoestimates of the variances of two independent samplesare the same. In addition, the F-test checks whether thenull hypothesis is true or false.

function. Any instruction or set of related instructionsthat perform a specific operation.


fuzzy logic. In artificial intelligence, a technique usingapproximate rules of inference in which truth valuesand quantifiers are defined as possibility distributionsthat carry linguistic labels.

Hhidden layer. A set of processing units in a neuralnetwork used to calculate its outputs. Hidden layerprocessing units take their inputs from the precedinghidden layer units, or from the input layer. Theiroutputs are passed to either a succeeding hidden layeror the network’s output layer. The number of hiddenlayers and the number of processing units in eachhidden layer is part of the network architecture.

host. Pertaining to a computer controlling all or partof a network, and providing an access method to thatnetwork.

Iindex. In SQL, pointers that are logically arranged bythe values of a key. Indexes provide quick access andcan enforce uniqueness on the rows in a table.

input data. The metadata of the database table,database view, or flat file containing the data youspecified to be mined.

input layer. A set of processing units in a neuralnetwork which present the numeric values derivedfrom user data to the network. The number of fieldsand type of data in those fields is used to calculate thenumber of processing units in the input layer.

instance. In object-oriented programming, a single,actual occurrence of a particular object. Any level of theobject class hierarchy can have instances. An instancecan be considered in terms of a copy of the object typeframe that is filled in with particular information.

interval. A set of real numbers between two numberseither including or excluding both of them.

interval boundaries. Values that represent the upperand lower limits of an interval.

item category. A categorization of an item. Forexample, a room in a hotel can have the followingcategories: Standard, Comfort, Superior, Luxury. Thelowest category is called child item category. Each childitem category can have several parent item categories.Each parent item category can have severalgrandparent item categories.

item description. The descriptive name of a characterstring in a data table.

item ID. The identifier for an item.

item set. A collection of items. For example, all itemsbought by one customer during one visit to adepartment store.

Kkey. In SQL, a column or an ordered collection ofcolumns identified in the description of an index.

Kohonen Feature Map. A neural network modelcomprised of processing units arranged in an inputlayer and output layer. All processors in the input layerare connected to each processor in the output layer byan adaptive connection. The learning algorithm usedinvolves competition between units for each inputpattern and the declaration of a winning unit. Used inneural clustering to partition data into similar recordgroups.

Llearning algorithm. The set of well-defined rules usedduring the training process to adjust the connectionweights of a neural network. The criteria and methodsused to adjust the weights define the different learningalgorithms.

learning parameters. The variables used by eachneural network model to control the training of aneural network which is accomplished by modifyingnetwork weights.

lift. Confidence factor divided by expected confidence.

Mmetadata. In databases, data that describes dataobjects.

mining. Synonym for analyzing or searching.

mining base. A repository where all the informationabout the mining data, the mining run settings, and thecorresponding results is stored.

model. A specific type of neural network and itsassociated learning algorithm. Examples include theKohonen Feature Map and back propagation.

mount. (1) To place a data medium in a position tooperate. (2) To make recording media accessible.

Nname mapping. A table containing descriptive namesor translations of other languages mapped to thenumerals or the character strings of a data table.

Glossary 333

named pipe. A named buffer that providesclient-to-server, server-to-client, or full duplexcommunication between unrelated processes.

narrative information. Information that is presentedaccording to the syntax of a natural language. Contrastwith formatted information.

neural network. A collection of processing units andadaptive connections that is designed to perform aspecific processing function.

Neural Network Utility (NNU). A family of IBMapplication development products for creating neuralnetwork and fuzzy rule system applications.

nonsupervised learning. A learning algorithm thatrequires only input data to be present in the datasource during the training process. No target output isprovided; instead, the desired output is discoveredduring the mining run. A Kohonen Feature Map, forexample, uses nonsupervised learning.

Ooffset. (1) The number of measuring units from anarbitrary starting point in a record, area, or controlblock, to some other point. (2) The distance from thebeginning of an object to the beginning of a particularfield.

operator. (1) A symbol that represents an operation tobe done. (2) In a language statement, the lexical entitythat indicates the action to be performed on operands.

output data object. The metadata of the databasetable, database view, or flat file containing the databeing produced or to be produced by a function.

output layer. A set of processing units in a neuralnetwork which contain the output calculated by thenetwork. The number of outputs depends on thenumber of classification categories or maximumclusters value in neural classification and neuralclustering, respectively.

PParallel Operating Environment (POE or poe). POE isa program (poe in AIX) required for parallel processing.It assigns the tasks to the different parallel processingnodes or CPCs.

pass. One cycle of processing a body of data. During apass, each record is read once.

path. The route used to locate files; the storagelocation of a file. A fully qualified path lists the driveidentifier, directory name, subdirectory name (if any),and file name with the associated extension.

pipe. A named or unnamed buffer used to pass databetween processes.

POE. See Parallel Operating Environment.

prediction model. A model of the dependency and thevariation of one field’s value within a record on theother fields within the same record. A profile is thengenerated that can predict a value for the particularfield in a new record of the same form, based on itsother field values.

processing unit. A processing unit in a neuralnetwork is used to calculate an output value bysumming all incoming values multiplied by theirrespective adaptive connection weights.

Qquantile range. One of a finite number ofnonoverlapping subranges or intervals, each of which isrepresented by an assigned value.

Q is an N%-quantile of a value set S when:

v Approximately N percent of the values in S arelower than or equal to Q.

v Approximately (100-N) percent of the values aregreater than or equal to Q.

The approximation is less exact when there are manyvalues equal to Q. N is called the quantile label orquantile limit. The 50%-quantile represents the median.

RRadial Basis Function (RBF). The individual RadialBasis Functions are functions of the distance or theradius from a particular point. They are used to buildup approximations to more complicated functions. TheRBF-Prediction mining function usesRadial-Basis-Functions to predict values.

record. A set of one or more related data itemsgrouped for processing. In reference to a database table,record is synonymous to row.

region. (Sub)set of records with similar characteristicsin their active fields. Regions are used to visualize aprediction result.

root. In the AIX operating system, the user name forthe system user with the highest authority.

round-robin method. A method by which items aresequentially assigned to units. When an item has beenassigned to the last unit in the series, the next item isassigned to the first again. This process is repeateduntil the last item has been assigned. The IntelligentMiner uses this method, for example, to store recordsin output files during a partitioning job.


rule. A clause in the form head ⇐ body. It specifiesthat the head is true if the body is true.

rule body. Represents the specified input data for amining function.

rule group. Covers all rules containing the same itemsin different variations.

rule head. Represents the derived items detected bythe Associations mining function.

Sscale. A system of mathematical notation: fixed-pointor floating-point scale of an arithmetic value.

scaling. To adjust the representation of a quantity by afactor in order to bring its range within prescribedlimits.

scale factor. A number used as a multiplier in scaling.For example, a scale factor of 1/1000 would be suitableto scale the values 856, 432, -95, and /182 to lie in therange from -1 to +1, inclusive.

schema. A logical grouping for database objects. Whena database object is created, it is assigned to oneschema, which is determined by the name of the object.For example, the following command creates table X inschema C:

CREATE TABLE C.X

self-organizing feature map. See Kohonen Feature Map.

sensitivity analysis report. An output from theNeural Clustering mining function that shows whichinput fields are relevant to the classification decision.

sequential patterns. Intertransaction patterns suchthat the presence of one set of items is followed byanother set of items in a database of transactions over aperiod of time.

similar sequences. Occurrences of similar sequencesin a database of sequences.

Structured Query Language (SQL). An established setof statements used to manage information stored in adatabase. By using these statements, users can add,delete, or update information in a table, requestinformation through a query, and display the results ina report.

server node. The processing node of the IntelligentMiner server that you logged on to. If you run theIntelligent Miner in standalone mode, this is the nodeof your local workstation.

supervised learning. A learning algorithm thatrequires input and resulting output pairs to bepresented to the network during the training process.Back propagation, for example, uses supervised

learning and makes adjustments during training so thatthe value computed by the neural network willapproach the actual value as the network learns fromthe data presented. Supervised learning is used in thetechniques provided for predicting classifications aswell as for predicting values.

support factor. Indicates the occurrence of thedetected association rules and sequential patterns basedon the input data.

swapping. A process that interchanges the contents ofan area of real storage with the contents of an area inauxiliary storage.

symbolic name. In a programming language, a uniquename used to represent an entity such as a field, file,data structure, or label. In the Intelligent Miner youspecify symbolic names, for example, for input data,name mappings, or taxonomies.

Ttaxonomy. Represents a hierarchy or a lattice ofassociations between the item categories of an item.These associations are called taxonomy relations.

taxonomy relation. The hierarchical associationsbetween the item categories you defined for an item. Ataxonomy relation consists of a child item category anda parent item category.

trained network. A neural network containingconnection weights that have been adjusted by alearning algorithm. A trained network can beconsidered a virtual processor; it transforms inputs tooutputs.

training. The process of developing a model whichunderstands the input data. In neural networks, themodel is created by reading the records of the inputdata and modifying the network weights until thenetwork calculates the desired output data.

translation process. Converting the data provided inthe database to scaled numeric values in theappropriate range for a mining kernel using neuralnetworks. Different techniques are used depending onwhether the data is numeric or symbolic. Also,converting neural network output back to the unitsused in the database.

transaction. A set of items or events that are linked bya common key value, for example, the articles (items)bought by a customer (customer number) on aparticular date (transaction identifier). In this example,the customer number represents the key value.

transaction ID. The identifier for a transaction, forexample, the date of a transaction.

Glossary 335

transaction group. The identifier for a set oftransactions. For example, a customer number canrepresent a transaction group that includes allpurchases of a particular customer during the month ofMay.

Vvector. A quantity usually characterized by an orderedset of numbers.

virtual file system. In the AIX operating system, aremote file system that has been mounted so that it isaccessible to the local user.

Wweight. The numeric value of an adaptive connectionrepresenting the strength of the connection betweentwo processing units in a neural network.

winner. The index of the cluster which has theminimum Euclidean distance from the input record.Used in the Kohonen Feature Map to determine whichoutput units will have their weights adjusted.


Bibliography

IBM Corporation

Business Intelligence Brochure, GC28-1614

Data Mining Brochure, GC28-1616

Intelligent Miner Fact Sheet, GC26-8832

Intelligent Decision Server Brochure, GC26-8826

Using the Intelligent Miner for Data, SH12-6325

IBM Neural Network Utility User’s Guide Version3.1, SC41-0223

IBM Neural Network Utility Programmer’s ReferenceVersion 3.1, SC41-0222

Bigus, Data Mining with Neural Networks,McGraw-Hill, 1996

IBM DATABASE 2 Command Reference for CommonServers, SH20-4645

MVS/ESA JCL User’s Guide, GC28-1473

MVS/ESA JCL Reference, GC28-1479

VisualAge for C++ for AS/400 User’s Guide,SC09-2416

VisualAge for C++ for OS/400 C++ ProgrammingGuide, SC09-2417

VisualAge for C++ for AS/400 Install Guide andProduct Overview, SC09-2415

VisualAge for C++ for AS/400 Open Class UserGuide, SC09-244B

IBM Access Class Library User’s Guide, SC41-4603

Other Documentation

Kohonen, Self-Organized Formation of TopologicallyCorrect Feature Maps, Biological Cybernetics 43, 1982


Index

Aaggregate values function 153AIX

application environment 13building your application 10

apply equation through origin 248architecture 2AS/400

application environment 14building your application 11prerequisites 12

associationsResult API functions for associations

Version 6 278Result API functions Version 2 273Result API functions Version 6 274Result API statistics functions for

associations Version 2 277Result API Version 2 268result APIs 267rules structure 269

atomic selections 27auxiliary classes

result API 303

Bbuilding and running your application 9building your application

AIX, OS/2, WIN32, Sun Solaris 10AS/400 11OS/390 12

built-in functions 64

Ccalculate values function 163class

IDMDBasicDescrStatsResult 282IDMDBasicPartition 284IDMDClassificationResult 285IDMDCluster 288IDMDClusteringResult 286IDMDPredictionResult 288

classification Result API 281clean up data sources function 150, 163,

165client tool registration 8

accessing client tool registration filesfrom your application 9

tool registration files 8clustering Result API 281components

client 2server 3

computed fields 26, 57confidence interval 248considerations for DB2 parallel

edition 151convert to lowercase or uppercase

function 165, 170

copy records to file function 150, 170,173

creating indexes 150creating primary keys 150

Ddata class 26data definition 5data fields 51data sample result API 302data settings 113data structures

associations rules 269frequent item sets 269sequential patterns 270statistics 269

data table 37data transformation 150data type definitions

basic data types 28data types 25

binary 25continuous numeric 25discrete nonnumeric 25discrete numeric 25numeric 26

datatable class 25DataTable Class and Data Class

datatable class 25DB2 parallel edition

considerations 151deleting input data or output data 150dependent variable 247discard records with missing values

function 173, 178discretization into quantiles

function 178, 184discretization using ranges function 184,

190disjunctive normal form 27DQS Result API 281

Eencode missing values function 190, 195encode nonvalid values function 195,

202enumerated types 29environment layer API 19

detailed structure 20general structure 19using 5

environment variables 14IDM_BIN_DIR 14IDM_CLI_USED 15IDM_DEBUG 14IDM_HOME_DIR 15IDM_MNB_DIR 14IDM_RES_DIR 14

environment variables for toolregistration 9

exception handling 29export API for associations and

sequential patternsrule ordering 268

export operationsfiltering of the rules 268rule grouping 268

Ffactor analysis 258

setting options 258files

loading 150unloading 150

filter fields function 202, 207filter records function 207, 211filter records using a value set

function 211, 217filtering

rules 268flat files 150frequent item sets

Result API functions Version 2 275Result API functions Version 6 276

frequent item sets structure 269functions for computed fields

defined on values of data fields andother attributes 26

discretization of continuousvariables 26

value mapping 26functions of the Result API Version

2 271functions of the Result API Version

6 272functions to access arrays 280

Ggeneral class description 24general functions for the associations and

sequential patterns Result API 271get random sample function 217, 222graphical user interface (GUI) 5group records function 222, 228

IIBM collection class library 5IDMAggregateValues 153IDMAndConstraints 98IDMAndSelections 97IDMArray 311IDMAsrFreeAssocRules 274IDMAsrFreeErrorMsg 272IDMAsrFreeLargeItemSets 276IDMAsrFreeSeqs 279IDMAsrFreeStatistics 278


IDMAsrGetAssocRules 273IDMAsrGetErrorMsg 271IDMAsrGetLargeItemSets 275IDMAsrGetSeqs 279IDMAsrGetStatistics 277IDMAsrParserCleanup 271IDMAsrParserInitialize 271IDMAssocSettings 113IDMAtomicConstraint 97IDMAtomicSelection 95IDMBase 30IDMBaseMatrix 35IDMBOOLEAN 28IDMCalculateValues 158IDMCategoricalField 308IDMCHAR 28IDMClassifySettings 135IDMCleanUpDataSources 163IDMCloseItemsetCursor 277IDMCloseRuleCursor 275IDMCloseSeqPatCursor 280IDMClusteringSettings 125IDMComputedField 57IDMConstraints 98IDMContinuousStatistics 308IDMConvertToLowercase

OrUppercase 165IDMCopyRecordsToFile 170IDMData 83IDMDataField 51IDMDataTable 38IDMDB2Table 44IDMDBasicDescrStatsResult 282IDMDBasicPartition 284IDMDClassificationResult 285IDMDCluster 288IDMDClusteringResult 286IDMDDataSample 302IDMDDescrStatsQuantResult 292IDMDescQuantSampleSettings 121IDMDiscardRecords

WithMissingValues 173IDMDiscreteStatistics 310IDMDiscretization 88IDMDiscretizationField 60IDMDiscretizationIntoQuantiles 178IDMDiscretizationUsingRanges 184IDMDPredictionResult 288IDMDQuantileResult 291IDMDRegion 290IDMEncodeMissingValues 190IDMEncodeNonvalidValues 195IDMField 305IDMFilterFields 202IDMFilterRecords 207IDMFilterRecordsUsingAValueSet 211IDMFlatFileField 55IDMFlatFileTable 41IDMFreeItemset 277IDMFreePattern 280IDMFreeRule 275IDMFreeStatistics 278IDMFunctionDeclaration 62IDMFunctionField 61IDMGeneralField 304IDMGetAssocStatistics 278IDMGetNextItemset 277

IDMGetNextPattern 280IDMGetNextRule 275IDMGetRandomSample 217IDMGetSeqPatStatistics 278IDMGroupRecords 222IDMItemCategory 90IDMJoinDataSources 228IDMMapValues 233IDMMatrix 36IDMMatrix0 37IDMMatrixField 56IDMMatrixTable 49IDMMiningBase 71IDMMultiCategoricalField 308IDMMultiField 308IDMMultiNumericField 308IDMNameMapping 84IDMNumericField 306IDMOpenAssocItemsetCursor 276IDMOpenRuleCursor 274IDMOpenSeqPatCursor 280IDMOpenSeqPatternsetCursor 276idmpcagg.hpp file 153idmpccf.hpp file 202idmpccln.hpp file 163idmpccnv.hpp file 158idmpcdiq.hpp file 178idmpcdrn.hpp file 173idmpcdur.hpp file 184idmpcemv.hpp file 190idmpcevn.hpp file 195idmpcfmr.hpp file 211idmpcgrp.hpp file 222idmpcigs.hpp file 246idmpcjid.hpp file 228idmpcluc.hpp file 165idmpcpiv.hpp file 240idmpcrf.hpp file 207idmpcrs.hpp file 217idmpcutf.hpp file 170idmpcvm.hpp file 233IDMPipeTable 43IDMPivotFieldsToRecords 240IDMPredictionSettings 142IDMProcessingSettings 151IDMResult 99IDMResultSet 104IDMRunSQL 246IDMSelections 97IDMSeqPatternSettings 115IDMSettings 107IDMSimSeqSettings 118IDMSortAssocRules 274IDMSortLargeItemSets 276IDMSortSequences 279IDMStatCovarianceMatrix 295IDMStatFactorAnalysisResult 299IDMStatFactorInputAnalysis 299IDMStatFactorRegression 301IDMStatFactorRotation 300IDMStatFactorStatistic 300IDMStatFactorStructure 301IDMStaticTable 40IDMStatisticsResult 293IDMStatLinearRegressionResult 295IDMStatLinRegANOVA 296IDMStatLinRegTable 296

IDMStatPrinComAnalysisResult 298IDMStatTable 294IDMStatUnivariateCurveResult 297IDMTaxonomy 93IDMTaxonomyRelation 91IDMTimeStamp 33IDMValueMapping 86IDMValueMappingField 58independent variable 247indexes 150input data 150interfaces 4introduction 1item constraints 97

Jjoin data function 228, 233

Llag for Durbin-Watson statistics 248level for confidence intervals 248linear regression 247

Mmap values function 233, 240migrating applications

using the Environment Layer API 16using the Result API 16written for Intelligent Miner for Data

Version 1 16written for Intelligent Miner Version

2 16migration considerations 15mining base

functions 71objects 71

mining base class 80mining class 80mining data objects

data class 26mining process 5mining results 98mining run selection 95mining settings 82MiningBase class 24multi-value fields 26

OOS/2


OS/390building your application 12prerequisites, compile options, link

considerations 12OS/390 application environment 13output data 150output data types 150overview 1OverviewOnIDMFieldClasses 303

Ppivot fields to records function 240, 246platforms 4


prediction Result API 281preprocessing settings 150prerequisites

AIX, OS/2, WIN32, Sun Solaris 11AS/400 12OS/390 12

previous versions 15using statistics mode of the Clustering

mining function of Version 1 15primary keys 150principal component analysis 256processing settings base class 151

Rrepeatable sequences settings 262Result API

associations and sequentialpatterns 267

associations and sequential patternsVersion 2 268, 277

associations and sequential patternsVersion 6 278

associations rules Version 2 273associations rules Version 6 274classification 281clustering 281descriptive statistics 281frequent item sets Version 2 275frequent item sets Version 6 276functions Version 6 272functionsVersion 2 271general functions for associations and

sequential patterns 271general information 267prediction 281sequential patterns Version 2 279sequential patterns Version 6 280statistics functions for associations and

sequential patterns 277structures Version 6 270using 8

Result class 28result handling 5results and result sets

export 28import 28

rulegrouping 268ordering 268

run SQL function 246, 247running the DB2 sample program 326running your application 13

environment for AIX, OS/2, WIN32,or Solaris 13

environment for AS/400 14environment for OS/390 13

RunSelection class 27RunSelection class and Settings class

RunSelection class 27

Ssample application

DB2 320flat files 313

seasonal model 253

selections, atomic 27sequential patterns

data structure 270Result API functions for sequential

patterns Version 6 278Result API functions Version 2 279Result API statistics functions for

sequential patterns Version 2 277Result API Version 2 268Result API Version 6 280result APIs 267

sequential patterns functions 278settings 107Settings class 27sort order enumeration 270source code of the DB2 sample

application 320Statistical Result API 293statistics

data structures 269IDMStatFactorAnalysis 258IDMStatLinearRegression 247IDMStatPrinComAnalysis 256IDMStatUnivariateCurve 252

statistics functionsassociations and sequential patterns

Result API 277statistics settings 247storage management 5structure of the Environment Layer

API 19structures

Result API Version 6 270Sun Solaris


Ttaxonomy 82transform data 150types

return data type 29

Uunivariate curve fitting 252

selecting options 252unload data to files 150user-defined functions 66using mining bases from previous

versions 15using the Result API 8using the statistics mode of the

Clustering mining function of Version1 15

WWIN32


Index 341

��

Part Number: CT8FMIEProgram Number: 5697-IM3 IBM DB2 Intelligent Miner for Data

5655-IM3 IBM DB2 Intelligent Miner for Data for AS/4005733-IM3 IBM DB2 Intelligent Miner for Data for OS/390

Printed in Denmark by IBM Danmark A/S

SH12-6395-00

CT8FMIE

Spine information:

�� IBM DB2 Intelligent Miner IBM DB2 Intelligent Miner API and Utility ReferenceVersion 6Release 1

application programming interface and utility reference

Documents