ibm research reportibm research report kmap - a visualizer for kohonen self-organizing map...
TRANSCRIPT
-
RC22517 (W0207-065) July 15, 2002Computer Science
IBM Research Report
KMAP - A Visualizer for Kohonen Self-organizing MapClustering Results
George S. Almasi, Richard D. LawrenceIBM Research Division
Thomas J. Watson Research CenterP.O. Box 218
Yorktown Heights, NY 10598
Research DivisionAlmaden - Austin - Beijing - Delhi - Haifa - India - T. J. Watson - Tokyo - Zurich
LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Reportfor early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests.After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T. J. Watson Research Center , P. O. Box 218, Yorktown Heights, NY 10598 USA (email: [email protected]). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home .
-
KMAP - A Visualizer for KohonenSelf-organizing Map Clustering Results
George S. Almasi and Richard D. LawrenceIBM T.J. Watson Research Center
P.O. Box 218Yorktown Heights, New York 10598
[email protected], [email protected]
August 17, 2002
Abstract
We describe a Java-based visualizer that preserves the topological information in-herent in the results of cluster data mining that used the Kohonen self-organizing mapapproach. The visualizer provides a tool with a number of views and drilldowns that ananalyst can use to circumvent the data interpretation bottleneck and quickly evaluatethe results of the map produced by the datamining run. Being written in Java, it isportable across a large number of computing platforms.
Keywords: visualizer, data mining, Kohonen, self-organizing map, clustering,Java.
1 Introduction
The Kmap visualizer described in this paper was designed to allow rapid evaluation ofthe results of data mining performed using the Kohonen self-organizing map [8] clusteringapproach. A key feature of this approach is that, in addition to grouping similar datarecords into clusters, it places the clusters on a two-dimensional map in such a way thatsimilar clusters are near each other. The nature of a cluster in a particular neighborhood ofthe map says something about that neighborhood. In other words, the contents of individualclusters and the topography of the cluster map are both of interest.
The current authors wrote a highly scalable parallel version of the self-organizing mapalgorithm [9] which is used by IBM’s Intelligent Miner for Data [7] . This program wasrecently put to the test in a demonstration of mining very large databases [4], and theparallel speedup that was obtained was so good (over 14X for a 16-processor SP2) that thenormal ratio of time needed to perform the datamining vs. interpreting its results becameinverted – data analysis became the new bottleneck. The reason was that the existing
1
-
visualizer was good at conveying the nature of individual clusters, but the map had to beconstructed basically by hand. Kmap was designed to remedy this problem.
This paper is organized as follows: we first describe the design of the Kmap visualizer,its modes of operation and its drilldown capabilities, and its implementation in Java. Thisis followed by four case studies that show examples of Kmap’s features. We then presentillustrative examples of Kmap’s features in the form of four case studies using stock market,retail supermarket, credit bureau, and census data.
2 Implementation
The main Java classes in Kmap reflect the program’s functionality:
• Kmap collects the clustering results from the input file (generic neural weights file orIntelligent Miner results file) and creates and manipulates the rectangular grid map ofclusters.
• Kdetail handles the drilldowns into an individual cluster• Kplot creates the contents of the rectangles for Kmap and Kdetail, using the JClass
chart and table java components provided by KL Group Inc. [5].
• Kmenubar handles user interactions via the menu bar that is used for both Kmapand Kdetail.
• NNClustRes provides an API to Intelligent Miner results files in the form of a dozen“get” methods for obtaining
1. the number of clusters (neural nodes) and data attributes (fields),
2. the neural weights (mean clustering attributes for each node), and
3. several sets of attribute statistics, used for drilldowns and described further below.
There are three different “modes” for viewing clustering results in Kmap and its drill-downs:
• “weights” mode, in which a plot of all the neural weights (i.e., attribute mean valuse)is shown for each cluster. This mode is useful when the attributes consist of a logicalsequence like month-by-month prices or spending. Figure 1 shows such an example.
• “topten” mode, in which the names of the attributes with the top 3 or top 10 weightsare shown for each cluster. This mode can be very useful when the number of attributesis large. Figure 3 shows a comparison of results viewed in weights and in topten modes,and Figure 4 gives a detailed example of topten mode in action.
• “sizes” mode, which simply shows the size of each cluster as a filled circle. Figure 3contains an example of a “sizes” view.
2
-
The mode can be selected from the “View” menu, or by specifying w, t, or s after the filenamewhen Kmap is called from the command line.
The cluster size is indicated by the background color in weights mode, by the graynessof the frame in topten mode, and by the size of the circle in sizes mode.
The available supplemental attributes, such as Total Spending or Age, can also be usedto color the inner rectangle in topten mode or the background in weights or sizes mode. Thisis done via the “Color By” pulldown on the menu bar. The functions provided by the othermenu bar pulldowns are as follows:
• “File” handles the opening of files and the printing of the displays.• “View” provides a large number of methods for controlling the appearance of the
visualization.
• “Sort By” allows sorting the attributes in topten mode by weights or by six otherquantities.
The attributes used in neural clustering are often normalized first, and so the neuralweights (the means of the normalized attributes) for each cluster can be different from themeans of the raw values. The un-normalized values of the attributes can be included assupplemental fields, and Intelligent Miner will compute their statistics along with those ofthe active fields. Kmap has provision to read these supplemental field statistics and makethem available for alternative ways to sort the attributes in topten mode, using the ”SortBy” menu. These quantities available for each cluster are:
1. “mean”, the mean value of the un-normalized attribute
2. “partcp”, the fraction of records with non-zero values of this attribute
3. “meanNZ”, the mean non-zero value of the un-normalized attribute, computed fromthe ratio of the two quantities above
4. “Rmean”, the mean value of the un-normalized attribute for this cluster divided bythe mean value of this attribute for the entire data set.
5. “Rpartcp”, the quantity “partcp” normalized to the entire data set as above
6. “RmeanNZ”, the quantity “meanNZ” normalized to the entire data set as above
There are also popup menus and other panels available for drilldown and other purposes;these are shown in the examples that follow.
3 Case Studies
This section describes the use of Kmap to visualize several different kinds of dataminingresults.
3
-
• Standard and Poor 500 monthly closings (weights, drilldowns)• Safeway (topten mode, more drilldowns)• Credit Bureau(bankruptcy colorby)• Census (categorical variables)
3.1 Stock Prices
For our first example, Figure 1 shows Kmap in “weights” mode, displaying the results of a64-node clustering performed on the monthly closing prices of the Standard & Poor’s stocksfor the period September 1995 - August 1996. Each stock price is normalized by its maximumvalue during this period. The action of the Kohonen algorithm in placing similar clustersnear each other on the map shows clearly in the gradual transition from gainers on the leftto decliners on the right.
Kmap offers two special drilldowns for relatively small datasets like this one. It canread a file listing the cluster for each record and display the membership of each clusteron a popup menu, as shown in Figure 1. In addition, the “View” menu has an entry thatleads to a “Record Finder” panel that can find which cluster a particular record is in andhighlight that cluster, as shown for the case of Western Atlas and cluster 25 in the figure.A hypothetical scenario for using these capabilities would be to check the membership of againer cluster like node 32, find that it contains two petroleum industry members (Mobiland Texaco), wonder where Western Atlas (also a petroleum industry member) had beenplaced, use the Record Finder, and discover it in cluster 25, a direct neighbor of cluster 32.There is a substantial number of such intuitively satisfying relationships in the map.
If the cluster membership file mentioned above also contains the attribute values for theindividual records, Kmap can provide an additional drilldown that allows comparing thebehavior of individual records in a cluster, as shown in Figure 2.
3.2 Supermarket Shoppers
The example in this section explains the motivation for “topten” mode and describes someof the features of that mode. The clustering in this case was performed on a month’sspending data by some 36,000 supermarket customers in about 100 product categories (dairy,confectionery, baby products, etc.). As shown in Figure 3, the Kmap “weights” mode usedin the previous example can be used to view these clustering results as well. Kmap providespoint labels to help identify each point even in a crowded plot, and this feature can beused to identify the three attributes with the largest weights in cluster 3 as Sugar, Tea,and Desserts/Puddings, for example. But the large number of attributes makes this a slowprocess. Furthermore, unlike the months of the previous example, there is no sequentialrelationship among the attributes that makes a plot view particularly useful. A much fastergrasp of the clusters’ nature can be obtained in this case from an initial map that shows whatthe top 3 attributes are for each cluster, with a rich set of available drilldowns for furtherinvestigation. This is what “topten” mode does.
4
-
Figure 1: Monthly closing prices for the Standard & Poor’s 500 stocks over a one-year period
5
-
Figure 2: Drilldowns available in weights mode (for example in Figure 1)
In topten mode, each rectangle in the map conveys the following information about itscorresponding cluster: the outer frame’s degree of grayness is proportional to its population.the inner rectangle’s color indicates the value of the supplemental attribute chosen from the“Color By” menu. In Figure 3, for example, blue indicates above-average customer Age,while pink denotes below-average attribute values. The numerical values of the populationand Age are also shown for each cluster.
By default, the top 3 attributes are sorted by their neural weights, but this can be changed(for the main map as well as the drilldowns) via the “Sort By” menu, which allows re-sortingby the attribute mean value or the five other statistics mentioned earlier. The color of thebullets in this example correspond to a rainbow spectrum matched to the range of categorynumbers, so attributes with similar colors have similar category numbers. The bullet shapescarry no information in this example. However, for the wine recommender described in [3],we used a different scheme in which the bullet color indicated the wine color and the bulletshape indicated price (squares cost more than circles, etc.).
Clicking on a cluster’s rectangle pops up a menu that can be used for further drilldown,as shown in Figure 3. The first item on the menu allows getting more information on aparticular attribute (field), such as the distribution of its revenue among the clusters. Forexample, the 11% of the total customer population represented by cluster 3 accounted for24% of the spending on Tea, for a “wallet share enrichment” of 2.18 – i.e., chances of findinga tea buyer in this cluster are more than twice that for the overall customer population.
The second item on the drilldown menu allows getting more detail on a particular cluster.Clicking it brings up a scrollable table showing the neural weights and other field statisticsfor all the attributes. For convenience, the distribution map for a given attribute can bebrought up by clicking the attribute in this table.
The third item on the drilldown menu brings up an enlarged plot of the neural weightsfor that cluster, as shown in the figure.
The third Kmap view, “sizes”, is shown at the top left of Figure 3 for reference. Inthis mode, the cluster population is conveyed by the size of the corresponding circle (thepopulations are very close in this case), and the color convention is the same as for the“topten” view.
The drilldowns shown are available in all 3 modes, and each drilldown figure can beprinted individually.
6
-
Figure 3: Comparison of views and drilldowns. Starting at top left, the “sizes”, “weights”,and “topten” views of the supermarket shopper clustering results for 9 clusters. The topright view also shows popup menu leading to several drilldowns: into a specific attribute(left center and right), into the weights view of a cluster (right center), and into the toptenview of a cluster (bottom right). Clicking an attribute’s button in the view at bottom rightis a second way to obtain the attribute map at bottom left.
7
-
Figure 4: Supermarket retail data clustering results
8
-
One application of this clustering capability is in the generation of personalized recom-mendations, as we describe in [2]. These recommendations were being generated for partic-ipants in a trial program using PDAs(personal digital assistants) for supermarket shopping.Since all participants had above-average spending 1, we clustered only the above-averagespenders and obtained the results shown in Figure 4. Note the emergence of a cluster (4)with strong spending on baby products. As the three drilldown maps of attribute shareshow, these clusters are sharper than the previous ones in that the “wallet share enrich-ments” are larger. For example, cluster 4 has 10% of the total population but 56% of thetotal spending on baby products, for a an enrichment of 5.6. This sort of enrichment canbe quite valuable for customer relationship and marketing purposes. It also impacted thegeneration of recommendations. For example, the most popular chocolates in cluster 4 turnout to be quite different from those most frequently preferred by the population as a whole.[2].
3.3 Credit Data
The example in this section shows the value of adroitness in the selection of the attributesused for clustering. Figure 5 shows clustering results from a study on mining a large databasefrom a credit bureau. Twelve of the 44 attributes that were used were created by categorizingeach outstanding credit card balance into a small matrix of several ranges of balance andseveral ranges of the age of the account. This helped the domain expert to find about adozen different segments on the map. For example, the top attributes for clusters 18 and28 are “BANK BAL03” and “BANK BAL06” , which both correspond to large balances inaccounts less than 6 months old, hence the analyst’s label ”BIG CHURN” for people whocontinually accept bank offers for new credit cards with low initial interest rates, and thentransfer to another card at the end of the low interest rate period. This ability to discernvarious segments of credit usage can be very helpful to a bank’s efforts to market its variousproducts.
The second example in this section shows the clustering results on a subset of the popu-lation whose credit rating is within a “gray area” that normally makes it difficult to obtaincredit. One of the supplemental attributes (not used in the clustering, of course) was whetheran individual in this subset actually went bankrupt within the next six months. When thisattribute is used to color the cluster map (Figure 6) , an interesting pattern appears: farfrom being randomly distributed, the actual bankruptcies are concentrated on the right edgeof the map. The bankruptcy rate for the population as a whole is about 0.5%, and most ofthe clusters have a rate below this value, while one of the clusters at the right of the maphas an 8% bankruptcy rate, which is 16 times higher than the average value. The dominantattributes of this cluster are all old accounts with very large balances, which tend to beprominent in the other clusters with above-average bankruptcies as well. The suggestion isthat a number of people are being denied credit without good reason, and that a more refinedrating system could free up credit to a group that could probably really use it. Customersand the bank would both benefit.
1like the children in Lake Wobegon
9
-
File View Color By Sort By Exit
credit100 Kmap -- grayest = 30% of 1073960 records, sorted by weights, colored by BEACON_96_SCR (mean=633.494)
BANK_BAL05BANK_BAL08BANK_BAL11
564.5 116799
BANK_BAL08BANK_BAL11BANK_BAL05
592.03 19598
BANK_BAL11BANK_BAL05BANK_BAL08
609.34 24897
BANK_BAL11BANK_BAL10POPDEPT_BAL
549.62 155396
AUTO_BALINSTFIN_BALOTHDEPT_BAL
471.49 431(95
OTHDEPT_BALPOPDEPT_BALBANK_BAL11
568.35 1725(0%)94
POPDEPT_BALBANK_BAL11OTHDEPT_BAL
597.69 370793
POPDEPT_BALBANK_BAL11OTHDEPT_BAL
604.53 494492
HOMEFURN_BALPOPDEPT_BALHOME_BAL
553.64 1371(91
HOMEFURN_BALHOME_BALOTHDEPT_BAL
581.72 2373(90
BANK_BAL08BANK_BAL05BANK_BAL02
587.01 28089
BANK_BAL08BANK_BAL11BANK_BAL05
584.87 28888
BANK_BAL11BANK_BAL08BANK_BAL05
610.58 44787
BANK_BAL11BANK_BAL12BANK_BAL10
596.92 470586
OTHDEPT_BALBANK_BAL11BANK_BAL10
581.69 165785
OTHDEPT_BALBANK_BAL11POPDEPT_BAL
588.18 3649(0%)84
POPDEPT_BALOTHDEPT_BALBANK_BAL11
585.21 404483
POPDEPT_BALBANK_BAL11MORT_BAL
612.58 10K(82
HOMEFURN_BALHOME_BALAUTO_BAL
607.99 7956(81
ELEC_BALHOMEFURN_BALBANK_BAL11
630.97 4319(80
BANK_BAL08BANK_BAL05BANK_BAL11
611.89 36479
BANK_BAL08BANK_BAL05BANK_BAL11
625.23 63378
BANK_BAL08BANK_BAL11MORT_BAL
639.46 77277
BANK_BAL11MORT_BALPOPDEPT_BAL
620.66 10K(76
BANK_BAL11OTHDEPT_BALPOPDEPT_BAL
608.83 428675
OTHDEPT_BALAUTO_BALMORT_BAL
600.93 4763(0%)74
OTHDEPT_BALPOPDEPT_BALBANK_BAL11
610.86 948073
POPDEPT_BALOTHDEPT_BALBANK_BAL11
619.82 16K(72
CREDITU_BALAUTO_BALMORT_BAL
680.76 9648(71
CREDITU_BALAUTO_BALMORT_BAL
635.64 3146(70
BANK_BAL05BANK_BAL02BANK_BAL08
620.21 30369
BANK_BAL05BANK_BAL08BANK_BAL02
640 6698(068
BANK_BAL05BANK_BAL08MORT_BAL
666.35 94567
BANK_BAL11BANK_BAL08BANK_BAL05
643.55 599566
BANK_BAL11MORT_BALBANK_BAL10
646.42 17K(65
CLOTHING_BALOTHDEPT_BALBANK_BAL11
605.9 1443(0%)64
OTHDEPT_BALPOPDEPT_BALAUTO_BAL
625.28 14K(63
AUTO_BALPOPDEPT_BALOTHDEPT_BAL
599.59 582662
AUTO_BALBANK_BAL11MORT_BAL
648.65 13K(161
AUTO_BALMORT_BALBANK_BAL11
595.68 3060(60
BANK_BAL02BANK_BAL05BANK_BAL08
621.92 24759
BANK_BAL02BANK_BAL05BANK_BAL08
656.61 54458
BANK_BAL05BANK_BAL02BANK_BAL08
678.22 79357
BANK_BAL07BANK_BAL04BANK_BAL08
592.4 11K(156
BANK_BAL10BANK_BAL11POPDEPT_BAL
572.57 643255
BANK_BAL10BANK_BAL11POPDEPT_BAL
585.82 2455(0%)54
BANK_BAL10BANK_BAL11OTHDEPT_BAL
592.15 16K(53
AUTO_BALMORT_BALBANK_BAL11
673.48 43K(52
SECUR_BALMORT_BALAUTO_BAL
658.7 2911(051
STUDENT_BALINSTFIN_BALBANK_BAL11
638.2 5578(050
BANK_BAL02BANK_BAL05MORT_BAL
671.21 44949
BANK_BAL02MORT_BALBANK_BAL05
685.3 11K48
BANK_BAL02BANK_BAL05MORT_BAL
705.06 15K47
BANK_BAL07BANK_BAL04BANK_BAL08
541.33 199246
BANK_BAL04BANK_BAL01BANK_BAL07
579.84 525045
OILCARD_BALPOPDEPT_BALBANK_BAL11
546.99 1071(0%)44
OILCARD_BALOTHDEPT_BALBANK_BAL11
571.79 185143
SECUR_BALINSTFIN_BALAUTO_BAL
628.16 396442
INSTFIN_BALMORT_BALAUTO_BAL
607.79 2162(41
STUDENT_BALINSTFIN_BALBANK_BAL11
630.63 2416(40
BANK_BAL12BANK_BAL11BANK_BAL08
622.64 18939
BANK_BAL09BANK_BAL05BANK_BAL02
640.46 11238
BANK_BAL02BANK_BAL01BANK_BAL05
614.82 28337
BANK_BAL01BANK_BAL04BANK_BAL05
571.51 191636
BANK_BAL01BANK_BAL04BANK_BAL02
608.55 711835
BANK_BAL01BANK_BAL04BANK_BAL02
643.51 17K(2%)34
OILCARD_BALOTHDEPT_BALPOPDEPT_BAL
587.68 458033
POPDEPT_BALOTHDEPT_BALAUTO_BAL
636.55 21K(32
INSTFIN_BALMORT_BALAUTO_BAL
659.73 9814(31
STUDENT_BALINSTFIN_BALBANK_BAL10
647.16 13K(130
BANK_BAL12MORT_BALBANK_BAL11
688.92 43329
BANK_BAL06BANK_BAL09MORT_BAL
687.38 18228
BANK_BAL03BANK_BAL06MORT_BAL
688.73 16127
MORT_BALBANK_BAL01BANK_BAL02
651.77 461726
REVFINAN_BALMORT_BALBANK_BAL11
638.43 316225
JUMBO_BALBANK_BAL12CHARGECARD_BAL
733.6 7103(0%)24
BANK_BAL12BANK_BAL11MORT_BAL
692.11 558623
BANK_BAL12MORT_BALBANK_BAL11
712.44 11K(22
JEWELRY_BALOTHDEPT_BALAUTO_BAL
576.49 2296(21
GOV_BALSTUDENT_BALMORT_BAL
593.14 1820(20
MORT_BALBANK_BAL12BANK_BAL11
684.04 39919
BANK_BAL09BANK_BAL06MORT_BAL
689.5 414318
BANK_BAL03BANK_BAL06MORT_BAL
697 4349(017
MORT_BALAUTO_BALBANK_BAL11
694.96 14K(16
REVFINAN_BALMORT_BALBANK_BAL11
666.58 454115
CHARGECARD_BALBANK_BAL11MORT_BAL
659.36 2232(0%)14
BANK_BAL08BANK_BAL05BANK_BAL11
703.63 14K(13
OTHDEPT_BALBANK_BAL11POPDEPT_BAL
661.06 27K(12
GOV_BALSTUDENT_BALOTHDEPT_BAL
575.52 6223(11
OTHUTIL_BALOTHDEPT_BALAUTO_BAL
561.92 1608(10
MORT_BALBANK_BAL11AUTO_BAL
677.99 2719
MORT_BALBANK_BAL11AUTO_BAL
701.01 8118
MORT_BALBANK_BAL11BANK_BAL12
727.16 18K7
MORT_BALBANK_BAL11OTHDEPT_BAL
723.64 26K(6
MORT_BALAUTO_BALBANK_BAL11
725.88 34K(5
MORT_BALBANK_BAL11OTHDEPT_BAL
727.19 30K(3%)4
BANK_BAL11BANK_BAL10BANK_BAL12
690.89 22K(3
CREDITU_BALAUTO_BALMORT_BAL
693.1 13K(12
HOMEFURN_BALHOME_BALELEC_BAL
621.66 12K(11
BANK_BAL11INSTFIN_BALBANK_BAL12
718.52 326K(0
Figure 5: Credit usage customer segments
10
-
File View Color By Sort By Exit
seg650E Kmap -- grayest = 18% of 95157 records, sorted by weights, colored by ENH_DAS_SCR (mean=0.003)
POPDEPT_BALBANK_BAL11BANK_BAL12
0 208(0%)99
BANK_BAL11BANK_BAL12BANK_BAL08
0.02 458(98
BANK_BAL11MORT_BALBANK_BAL12
0 957(1%)97
BANK_BAL11OTHDEPT_BALBANK_BAL08
0.01 252(096
OTHDEPT_BALBANK_BAL11CLOTHING_BAL
0 402(0%)95
OTHDEPT_BALBANK_BAL12BANK_BAL11
0 112(0%)94
BANK_BAL12MORT_BALBANK_BAL11
0 576(0%)93
BANK_BAL06BANK_BAL09MORT_BAL
0 279(0%)92
BANK_BAL03BANK_BAL06MORT_BAL
0 379(0%)91
MORT_BALBANK_BAL11AUTO_BAL
0 463(0%)90
BANK_BAL12BANK_BAL11BANK_BAL08
0.01 261(089
BANK_BAL11BANK_BAL02BANK_BAL05
0.01 391(88
BANK_BAL11POPDEPT_BALOTHDEPT_BAL
0.02 308(087
POPDEPT_BALBANK_BAL11OTHDEPT_BAL
0 534(0%)86
OTHDEPT_BALPOPDEPT_BALBANK_BAL11
0 231(0%)85
BANK_BAL11BANK_BAL12MORT_BAL
0 428(0%)84
BANK_BAL12BANK_BAL11BANK_BAL02
0 589(0%)83
BANK_BAL09BANK_BAL12MORT_BAL
0 345(0%)82
BANK_BAL06BANK_BAL09MORT_BAL
0 654(0%)81
MORT_BALBANK_BAL11OTHDEPT_BAL
0 2237(2%)80
BANK_BAL12BANK_BAL05BANK_BAL08
0.04 236(079
BANK_BAL11BANK_BAL08BANK_BAL05
0.03 308(78
POPDEPT_BALBANK_BAL08BANK_BAL11
0.02 154(077
POPDEPT_BALBANK_BAL11OTHDEPT_BAL
0 398(0%)76
POPDEPT_BALOTHDEPT_BALAUTO_BAL
0 656(0%)75
POPDEPT_BALMORT_BALBANK_BAL11
0 637(0%)74
BANK_BAL12BANK_BAL11MORT_BAL
0 1200(1%)73
MORT_BALBANK_BAL12BANK_BAL11
0 517(0%)72
MORT_BALAUTO_BALBANK_BAL11
0 1377(1%)71
MORT_BALBANK_BAL11AUTO_BAL
0 4365(5%)70
BANK_BAL05BANK_BAL08BANK_BAL11
0.08 208(069
BANK_BAL08BANK_BAL11BANK_BAL05
0.02 310(68
POPDEPT_BALBANK_BAL08BANK_BAL11
0 254(0%)67
POPDEPT_BALBANK_BAL02BANK_BAL05
0.02 250(066
POPDEPT_BALOTHDEPT_BALAUTO_BAL
0 1310(1%)65
POPDEPT_BALOTHDEPT_BALBANK_BAL11
0 2143(2%)64
HOMEFURN_BAHOME_BALMORT_BAL
0 896(0%)63
HOMEFURN_BAHOME_BALBANK_BAL11
0.02 177(0%62
HOMEFURN_BALHOME_BALMORT_BAL
0 509(0%)61
CREDITU_BALAUTO_BALBANK_BAL11
0 1568(2%)60
BANK_BAL08BANK_BAL05BANK_BAL11
0.01 353(059
BANK_BAL08BANK_BAL05BANK_BAL11
0.02 493(58
BANK_BAL08MORT_BALBANK_BAL05
0 520(0%)57
BANK_BAL11BANK_BAL08BANK_BAL05
0.01 579(056
POPDEPT_BALBANK_BAL08BANK_BAL05
0 430(0%)55
BANK_BAL11POPDEPT_BALBANK_BAL10
0 869(0%)54
ELEC_BALHOMEFURN_BAMORT_BAL
0 569(0%)53
ELEC_BALHOMEFURN_BABANK_BAL11
0 289(0%)52
CREDITU_BALMORT_BALAUTO_BAL
0 616(0%)51
CREDITU_BALAUTO_BALBANK_BAL11
0 571(0%)50
BANK_BAL05BANK_BAL08BANK_BAL02
0.02 347(049
BANK_BAL05BANK_BAL08BANK_BAL02
0 478(0%)48
BANK_BAL05BANK_BAL08BANK_BAL02
0 860(0%)47
BANK_BAL08BANK_BAL05AUTO_BAL
0 877(0%)46
BANK_BAL08BANK_BAL11AUTO_BAL
0 951(0%)45
BANK_BAL11BANK_BAL05BANK_BAL08
0 697(0%)44
BANK_BAL11MORT_BALBANK_BAL10
0 1736(2%)43
BANK_BAL11AUTO_BALMORT_BAL
0 628(0%)42
SECUR_BALMORT_BALAUTO_BAL
0 206(0%)41
AUTO_BALMORT_BALBANK_BAL11
0 365(0%)40
BANK_BAL02BANK_BAL05BANK_BAL08
0.03 309(039
BANK_BAL05BANK_BAL02BANK_BAL08
0 628(0%)38
BANK_BAL05MORT_BALBANK_BAL08
0 594(0%)37
BANK_BAL05BANK_BAL08BANK_BAL02
0 1436(2%)36
BANK_BAL07BANK_BAL04BANK_BAL08
0 563(0%)35
BANK_BAL08BANK_BAL11BANK_BAL05
0 1393(1%)34
BANK_BAL11AUTO_BALOTHDEPT_BAL
0 3050(3%)33
OTHDEPT_BALBANK_BAL11MORT_BAL
0 653(0%)32
SECUR_BALAUTO_BALINSTFIN_BAL
0 656(0%)31
AUTO_BALBANK_BAL11POPDEPT_BAL
0 1412(1%)30
BANK_BAL02BANK_BAL05BANK_BAL08
0 401(0%)29
BANK_BAL02BANK_BAL08BANK_BAL05
0.02 492(28
BANK_BAL05BANK_BAL02BANK_BAL08
0 1064(1%)27
BANK_BAL01BANK_BAL04BANK_BAL05
0 807(0%)26
BANK_BAL04BANK_BAL07BANK_BAL08
0 144(0%)25
BANK_BAL07BANK_BAL04BANK_BAL08
0 1686(2%)24
BANK_BAL10BANK_BAL11MORT_BAL
0 287(0%)23
OTHDEPT_BALPOPDEPT_BALMORT_BAL
0 895(0%)22
OTHDEPT_BALBANK_BAL11AUTO_BAL
0 1892(2%)21
AUTO_BALBANK_BAL11MORT_BAL
0 4098(4%)20
BANK_BAL02BANK_BAL05MORT_BAL
0 684(0%)19
BANK_BAL02BANK_BAL05BANK_BAL11
0 1332(1%18
BANK_BAL02BANK_BAL05AUTO_BAL
0 1762(2%)17
BANK_BAL01BANK_BAL02BANK_BAL04
0 1404(1%)16
BANK_BAL01BANK_BAL04BANK_BAL02
0 2132(2%)15
STUDENT_BALINSTFIN_BALBANK_BAL11
0 1427(1%)14
REVFINAN_BALMORT_BALBANK_BAL11
0 570(0%)13
REVFINAN_BALMORT_BALBANK_BAL11
0 378(0%)12
CHARGECARD_BAMORT_BALBANK_BAL12
0 284(0%)11
BANK_BAL10OILCARD_BALBANK_BAL11
0 1276(1%)10
BANK_BAL02MORT_BALBANK_BAL05
0 389(0%)9
BANK_BAL02MORT_BALBANK_BAL05
0 944(0%)8
BANK_BAL02BANK_BAL01BANK_BAL05
0 356(0%)7
BANK_BAL01BANK_BAL04BANK_BAL02
0 307(0%)6
STUDENT_BALINSTFIN_BALBANK_BAL11
0 640(0%)5
STUDENT_BALINSTFIN_BALBANK_BAL11
0 310(0%)4
INSTFIN_BALMORT_BALBANK_BAL11
0 174(0%)3
INSTFIN_BALMORT_BALAUTO_BAL
0 831(0%)2
GOV_BALAUTO_BALBANK_BAL11
0 155(0%)1
OTHDEPT_BALBANK_BAL11INSTFIN_BAL
0 17K(18%)0
Figure 6: Actual bankruptcies among customers rated as marginal credit risks.
11
-
File
Vie
wC
olor
By
Sor
t By
Exi
t
seg
650E
Km
ap -
- g
raye
st =
18%
of
9515
7 re
cord
s, s
ort
ed b
y w
eig
hts
, co
lore
d b
y E
NH
_DA
S_S
CR
(m
ean
=0.0
03)
PO
PD
EP
T_B
AL
BA
NK
_BA
L11
BA
NK
_BA
L12
0 208(0%)
99
BA
NK
_BA
L11
BA
NK
_BA
L12
BA
NK
_BA
L08
0.02 458(
98
BA
NK
_BA
L11
MO
RT
_BA
LB
AN
K_B
AL
12
0 957(1%)
97
BA
NK
_BA
L11
OT
HD
EP
T_B
AL
BA
NK
_BA
L08
0.01 252(0
96
OT
HD
EP
T_B
AL
BA
NK
_BA
L11
CL
OT
HIN
G_B
AL
0 402(0%)
95
OT
HD
EP
T_B
AL
BA
NK
_BA
L12
BA
NK
_BA
L11
0 112(0%)
94
BA
NK
_BA
L12
MO
RT
_BA
LB
AN
K_B
AL
11
0 576(0%)
93
BA
NK
_BA
L06
BA
NK
_BA
L09
MO
RT
_BA
L
0 279(0%)
92
BA
NK
_BA
L03
BA
NK
_BA
L06
MO
RT
_BA
L
0 379(0%)
91
MO
RT
_BA
LB
AN
K_B
AL
11A
UT
O_B
AL
0 463(0%)
90
BA
NK
_BA
L12
BA
NK
_BA
L11
BA
NK
_BA
L08
0.01 261(0
89
BA
NK
_BA
L11
BA
NK
_BA
L02
BA
NK
_BA
L05
0.01 391(
88
BA
NK
_BA
L11
PO
PD
EP
T_B
AL
OT
HD
EP
T_B
AL
0.02 308(0
87
PO
PD
EP
T_B
AL
BA
NK
_BA
L11
OT
HD
EP
T_B
AL
0 534(0%)
86
OT
HD
EP
T_B
AL
PO
PD
EP
T_B
AL
BA
NK
_BA
L11
0 231(0%)
85
BA
NK
_BA
L11
BA
NK
_BA
L12
MO
RT
_BA
L
0 428(0%)
84
BA
NK
_BA
L12
BA
NK
_BA
L11
BA
NK
_BA
L02
0 589(0%)
83
BA
NK
_BA
L09
BA
NK
_BA
L12
MO
RT
_BA
L
0 345(0%)
82
BA
NK
_BA
L06
BA
NK
_BA
L09
MO
RT
_BA
L
0 654(0%)
81
MO
RT
_BA
LB
AN
K_B
AL
11O
TH
DE
PT
_BA
L
0 2237(2%)
80
BA
NK
_BA
L12
BA
NK
_BA
L05
BA
NK
_BA
L08
0.04 236(0
79
BA
NK
_BA
L11
BA
NK
_BA
L08
BA
NK
_BA
L05
0.03 308(
78
PO
PD
EP
T_B
AL
BA
NK
_BA
L08
BA
NK
_BA
L11
0.02 154(0
77
PO
PD
EP
T_B
AL
BA
NK
_BA
L11
OT
HD
EP
T_B
AL
0 398(0%)
76
PO
PD
EP
T_B
AL
OT
HD
EP
T_B
AL
AU
TO
_BA
L
0 656(0%)
75
PO
PD
EP
T_B
AL
MO
RT
_BA
LB
AN
K_B
AL
11
0 637(0%)
74
BA
NK
_BA
L12
BA
NK
_BA
L11
MO
RT
_BA
L
0 1200(1%)
73
MO
RT
_BA
LB
AN
K_B
AL
12B
AN
K_B
AL
11
0 517(0%)
72
MO
RT
_BA
LA
UT
O_B
AL
BA
NK
_BA
L11
0 1377(1%)
71
MO
RT
_BA
LB
AN
K_B
AL
11A
UT
O_B
AL
0 4365(5%)
70
BA
NK
_BA
L05
BA
NK
_BA
L08
BA
NK
_BA
L11
0.08 208(0
69
BA
NK
_BA
L08
BA
NK
_BA
L11
BA
NK
_BA
L05
0.02 310(
68
PO
PD
EP
T_B
AL
BA
NK
_BA
L08
BA
NK
_BA
L11
0 254(0%)
67
PO
PD
EP
T_B
AL
BA
NK
_BA
L02
BA
NK
_BA
L05
0.02 250(0
66
PO
PD
EP
T_B
AL
OT
HD
EP
T_B
AL
AU
TO
_BA
L
0 1310(1%)
65
PO
PD
EP
T_B
AL
OT
HD
EP
T_B
AL
BA
NK
_BA
L11
0 2143(2%)
64
HO
ME
FU
RN
_BA
HO
ME
_BA
LM
OR
T_B
AL
0 896(0%)
63
HO
ME
FU
RN
_BA
HO
ME
_BA
LB
AN
K_B
AL
11
0.02 177(0%
62
HO
ME
FU
RN
_BA
LH
OM
E_B
AL
MO
RT
_BA
L
0 509(0%)
61
CR
ED
ITU
_BA
LA
UT
O_B
AL
BA
NK
_BA
L11
0 1568(2%)
60
BA
NK
_BA
L08
BA
NK
_BA
L05
BA
NK
_BA
L11
0.01 353(0
59
BA
NK
_BA
L08
BA
NK
_BA
L05
BA
NK
_BA
L11
0.02 493(
58
BA
NK
_BA
L08
MO
RT
_BA
LB
AN
K_B
AL
05
0 520(0%)
57
BA
NK
_BA
L11
BA
NK
_BA
L08
BA
NK
_BA
L05
0.01 579(0
56
PO
PD
EP
T_B
AL
BA
NK
_BA
L08
BA
NK
_BA
L05
0 430(0%)
55
BA
NK
_BA
L11
PO
PD
EP
T_B
AL
BA
NK
_BA
L10
0 869(0%)
54
EL
EC
_BA
LH
OM
EF
UR
N_B
AM
OR
T_B
AL
0 569(0%)
53
EL
EC
_BA
LH
OM
EF
UR
N_B
AB
AN
K_B
AL
11
0 289(0%)
52
CR
ED
ITU
_BA
LM
OR
T_B
AL
AU
TO
_BA
L
0 616(0%)
51
CR
ED
ITU
_BA
LA
UT
O_B
AL
BA
NK
_BA
L11
0 571(0%)
50
BA
NK
_BA
L05
BA
NK
_BA
L08
BA
NK
_BA
L02
0.02 347(0
49
BA
NK
_BA
L05
BA
NK
_BA
L08
BA
NK
_BA
L02
0 478(0%)
48
BA
NK
_BA
L05
BA
NK
_BA
L08
BA
NK
_BA
L02
0 860(0%)
47
BA
NK
_BA
L08
BA
NK
_BA
L05
AU
TO
_BA
L
0 877(0%)
46
BA
NK
_BA
L08
BA
NK
_BA
L11
AU
TO
_BA
L
0 951(0%)
45
BA
NK
_BA
L11
BA
NK
_BA
L05
BA
NK
_BA
L08
0 697(0%)
44
BA
NK
_BA
L11
MO
RT
_BA
LB
AN
K_B
AL
10
0 1736(2%)
43
BA
NK
_BA
L11
AU
TO
_BA
LM
OR
T_B
AL
0 628(0%)
42
SE
CU
R_B
AL
MO
RT
_BA
LA
UT
O_B
AL
0 206(0%)
41
AU
TO
_BA
LM
OR
T_B
AL
BA
NK
_BA
L11
0 365(0%)
40
BA
NK
_BA
L02
BA
NK
_BA
L05
BA
NK
_BA
L08
0.03 309(0
39
BA
NK
_BA
L05
BA
NK
_BA
L02
BA
NK
_BA
L08
0 628(0%)
38
BA
NK
_BA
L05
MO
RT
_BA
LB
AN
K_B
AL
08
0 594(0%)
37
BA
NK
_BA
L05
BA
NK
_BA
L08
BA
NK
_BA
L02
0 1436(2%)
36
BA
NK
_BA
L07
BA
NK
_BA
L04
BA
NK
_BA
L08
0 563(0%)
35
BA
NK
_BA
L08
BA
NK
_BA
L11
BA
NK
_BA
L05
0 1393(1%)
34
BA
NK
_BA
L11
AU
TO
_BA
LO
TH
DE
PT
_BA
L
0 3050(3%)
33
OT
HD
EP
T_B
AL
BA
NK
_BA
L11
MO
RT
_BA
L
0 653(0%)
32
SE
CU
R_B
AL
AU
TO
_BA
LIN
ST
FIN
_BA
L
0 656(0%)
31
AU
TO
_BA
LB
AN
K_B
AL
11P
OP
DE
PT
_BA
L
0 1412(1%)
30
BA
NK
_BA
L02
BA
NK
_BA
L05
BA
NK
_BA
L08
0 401(0%)
29
BA
NK
_BA
L02
BA
NK
_BA
L08
BA
NK
_BA
L05
0.02 492(
28
BA
NK
_BA
L05
BA
NK
_BA
L02
BA
NK
_BA
L08
0 1064(1%)
27
BA
NK
_BA
L01
BA
NK
_BA
L04
BA
NK
_BA
L05
0 807(0%)
26
BA
NK
_BA
L04
BA
NK
_BA
L07
BA
NK
_BA
L08
0 144(0%)
25
BA
NK
_BA
L07
BA
NK
_BA
L04
BA
NK
_BA
L08
0 1686(2%)
24
BA
NK
_BA
L10
BA
NK
_BA
L11
MO
RT
_BA
L
0 287(0%)
23
OT
HD
EP
T_B
AL
PO
PD
EP
T_B
AL
MO
RT
_BA
L
0 895(0%)
22
OT
HD
EP
T_B
AL
BA
NK
_BA
L11
AU
TO
_BA
L
0 1892(2%)
21
AU
TO
_BA
LB
AN
K_B
AL
11M
OR
T_B
AL
0 4098(4%)
20
BA
NK
_BA
L02
BA
NK
_BA
L05
MO
RT
_BA
L
0 684(0%)
19
BA
NK
_BA
L02
BA
NK
_BA
L05
BA
NK
_BA
L11
0 1332(1%
18
BA
NK
_BA
L02
BA
NK
_BA
L05
AU
TO
_BA
L
0 1762(2%)
17
BA
NK
_BA
L01
BA
NK
_BA
L02
BA
NK
_BA
L04
0 1404(1%)
16
BA
NK
_BA
L01
BA
NK
_BA
L04
BA
NK
_BA
L02
0 2132(2%)
15
ST
UD
EN
T_B
AL
INS
TF
IN_B
AL
BA
NK
_BA
L11
0 1427(1%)
14
RE
VF
INA
N_B
AL
MO
RT
_BA
LB
AN
K_B
AL
11
0 570(0%)
13
RE
VF
INA
N_B
AL
MO
RT
_BA
LB
AN
K_B
AL
11
0 378(0%)
12
CH
AR
GE
CA
RD
_BA
MO
RT
_BA
LB
AN
K_B
AL
12
0 284(0%)
11
BA
NK
_BA
L10
OIL
CA
RD
_BA
LB
AN
K_B
AL
11
0 1276(1%)
10
BA
NK
_BA
L02
MO
RT
_BA
LB
AN
K_B
AL
05
0 389(0%)
9
BA
NK
_BA
L02
MO
RT
_BA
LB
AN
K_B
AL
05
0 944(0%)
8
BA
NK
_BA
L02
BA
NK
_BA
L01
BA
NK
_BA
L05
0 356(0%)
7
BA
NK
_BA
L01
BA
NK
_BA
L04
BA
NK
_BA
L02
0 307(0%)
6
ST
UD
EN
T_B
AL
INS
TF
IN_B
AL
BA
NK
_BA
L11
0 640(0%)
5
ST
UD
EN
T_B
AL
INS
TF
IN_B
AL
BA
NK
_BA
L11
0 310(0%)
4
INS
TF
IN_B
AL
MO
RT
_BA
LB
AN
K_B
AL
11
0 174(0%)
3
INS
TF
IN_B
AL
MO
RT
_BA
LA
UT
O_B
AL
0 831(0%)
2
GO
V_B
AL
AU
TO
_BA
LB
AN
K_B
AL
11
0 155(0%)
1
OT
HD
EP
T_B
AL
BA
NK
_BA
L11
INS
TF
IN_B
AL
0 17K(18%)
0
Fig
ure
7:A
ctual
ban
kru
ptc
ies
amon
gcu
stom
ers
rate
das
mar
ginal
cred
itri
sks.
12
-
3.4 Census Data
The example in this section shows how Kmap can handle categorical values. The clusteringwas performed using publicly available census data with a mixture of 12 continuous andcategorical attributes for each person. The data comes from a municipality located near aNational Laboratory, which accounts for the unusually large number of advanced degreesand large diversity of backgrounds.
Categorical attributes pose an extra degree of complexity in neural clustering becausethey are expanded to a one-to-N mapping: for example, if the original attribute were “animaltype” with values “cat, dog, cow”, it would be replaced for clustering purposes by threeattributes “is a cat”, “is a dog”, “is a cow” that can each assume values 1 or 0. So the totalnumber of attributes actually used in the clustering is data-dependent.
Kmap handles this by making two passes through the results file, obtaining the number ofdistinct values for each categorical attribute on the first pass and constructing the effectiveattributes coming from each original attribute on the second pass. Sample Kmap resultsare shown in Figure 9 and are compared to the default Intelligent Miner visualizer’s viewin Figure 8. The two views each present interesting information in different ways, but theyare indeed consistent with each other. For example, one of the supplemental attributes (i.e.,not used for clustering) is whether household income is above or below $50,000. Using thisattribute to color the Kmap in Figure 9 points to clusters 18 and 24 as being highest inthis attribute, with 70% of the households in cluster 24 having income above $50,000. Thisagrees with the corresponding pie chart in Figure 8.
One of the original categorical attributes was “education”, and one of its values was“doctorate”. Drilling down into the effective categorical attribute “Doctorate(education)”(or “has a PhD” in our earlier terminology) in Figure 9 (comfortingly, perhaps) an overlapbetween clusters with high PhD populations and clusters with high income. Drilling downinto the “educ num” attribute, a number proportional to the number of years of education,shows similar results. Both results are consistent with the drilldowns in Figure 8. Kmapdoes a good job showing the attribute distributions on the map. The default visualizer doesa good job showing the categorical attributes in pie charts. Both can be useful.
3.5 Appendix: A Tale of Two Clusters
Figure 10 offers a more detailed look at the kind of information that Kmap can extract fromthe supermarket customer clustering results mentioned earlier.
4 Summary
Kmap has proved to be a versatile visualizer for Kohonen self-organizing map clusteringresults obtained from a variety of datamining engagements. Analysts value being able tosee the map topography, as well as the drilldowns provided, many of which resulted fromtheir suggestions. We provided examples of Kmap’s use in four different areas of commerce.Currently we are applying Kmap to technical data generated by the simulators and hardwarecontrol systems of massively parallel supercomputers.
13
-
census36
8 seg0
7seg30
6seg185
seg34
5
seg32
5
seg12
5
seg24
5
seg5
4
seg23
education
HS-grad
relationship
Own-childHusband
marital_status
Married-AF-spouseMarried-civ-spouse
educ_num sex
MaleFemale
occupation
Exec-managerialSales
Transport-movingMachine-op-inspct
Craft-repairOther
workclass
Federal-govState-gov
Self-emp-incLocal-gov
Self-emp-not-incOther
relationship
Not-in-familyOther-relative
Own-child
Husband
marital_status
Married-AF-spouseMarried-civ-spouse
occupation
Transport-movingSales
Farming-fishingExec-managerial
Craft-repairOther
sex
MaleFemale
education
10thAssoc-voc
7th-8thSome-college
HS-gradOther
education
Assoc-acdmDoctorate
Prof-schoolMasters
Bachelors
educ_num relationship
WifeNot-in-family
Own-childOther-relative
Husband
marital_status
Married-civ-spouse
occupation
Tech-supportCraft-repair
SalesProf-specialty
Exec-managerialOther
sex
Male
relationship
Unmarried
marital_status
Divorced
sex
MaleFemale
age occupation
SalesOther-service
Exec-managerialProf-specialty
Adm-clericalOther
hours_per_week
relationship
Own-childWife
sex
MaleFemale
marital_status
Married-AF-spouseMarried-civ-spouse
occupation
?Other-service
Exec-managerialProf-specialty
Adm-clericalOther
age hours_per_week
education
Some-college
relationship
Own-childOther-relative
Husband
marital_status
Married-AF-spouseMarried-civ-spouse
educ_num sex
Male
workclass
State-govFederal-gov
Private
workclass
Federal-govState-govLocal-gov
Self-emp-incSelf-emp-not-inc
education
Assoc-acdmDoctorate
Prof-schoolMasters
BachelorsOther
educ_num relationship
UnmarriedOwn-child
Other-relative
Husband
marital_status
WidowedMarried-AF-spouse
Married-civ-spouse
occupation
Adm-clericalProtective-serv
SalesExec-managerial
Prof-specialtyOther
relationship
Own-child
education
Some-college
age marital_status
WidowedMarried-civ-spouse
Married-spouse-absentSeparated
Never-married
educ_num hours_per_week
relationship
Not-in-family
marital_status
Never-married
sex
Female
occupation
Exec-managerialSales
Prof-specialtyOther-service
Adm-clericalOther
education
Assoc-vocAssoc-acdm
MastersHS-grad
Some-collegeOther
age
ev36_11 Cluster seg24 4.57% of population
[income]
>50K
-
Figure 9: Census data clustering results seen with Kmap
15
-
Figure 10: A Tale of Two Clusters: rather different lifestyles emerge during examination ofthe supermarket shopper clustering results. Cluster 4 seems to be an older group of serioushomebakers likely to be found having a tea party, whereas cluster 9 seems to be a younger,health-conscious group more likely to be found jogging - and not smoking!
16
-
5 Acknowledgements
We are grateful for many helpful discussions with our colleagues Joseph Bigus and MichaelRothman.
References
[1] Kotlyar V., Viveros M.S., Duri S.S., Lawrence R.D., Almasi G.S. “A Case Studyin Information Delivery to Mass Retail Markets” In the Proceedings of the 10thInternational Conference on Database and Expert Systems Applications (DEXA),Florence, Italy, August/September 1999, pp. 842-851. Published by Springer-Verlagas Lecture Notes in Computer Science, vol 1677
[2] R. D. Lawrence, G. S. Almasi, V. Kotlyar, M. S. Viveros, and S. S. Duri , “Person-alization of Product Recomendations in Mass Retail Markets”, Data Mining andDiscovery 3(2): 171-195 (1999).
[3] G. S. Almasi and A. J. Lee, “PDA-based Personalized Recommender Agent”, Pro-ceedings of the 5th International Conference on the Practical Application of In-telligent Agent and Multi Agent Technology (PAAM 2000), Manchester, England,April 2000, pp. 299-309.
[4] H. Edelstein, “Mining Large Databases: A Case Study”,www.twocrows.com/largedb.pdf
[5] “JClass Java Components”, www.klgroup.com
[6] G. Almasi, “Kmap Demo”, www.gsalmasi.com
[7] Intelligent Miner for Data, www.ibm.com/software/data/iminer/fordata
[8] T. Kohonen, “Self-Organizing Maps”, Springer-Verlag, 1995.
[9] R. D. Lawrence, G. S. Almasi, and H. E. Rushmeier, “A Scalable Parallel Algorithmfor Self-Organizing Maps with Applications to Sparse Data Mining Problems”,Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 171-195. (U. S. Patent6,260,036, July 10, 2001.)
[10] J. C. Bezdek and N. R. Pal, “Some New Indexes of Cluster Validity”, IEEE Trans-actions on Systems, Man, and Cybernetics – Part B: Cybernetics Vol. 28, No. 3,June 1998, pp. 301 - 315.
17