fuzzy trees

8/8/2019 Fuzzy Trees

1/15

The use of Fuzzy Decision Tree Analysis in Monitoring a Minimum Wage

Malcolm Beynon

and

Keith Whitfield

Cardiff Business School, Cardiff University, Wales, UK.

Address for correspondence: Dr Malcolm Beynon,

Cardiff Business School,

Colum Drive,Cardiff, CF10 3EU,

Wales, U.K.

Telephone: +44 (0)29 2087 5747,

Fax +44 (0)29 2087 4419

E-mail: [email protected]


2/15

1

The use of Fuzzy Decision Tree Analysis in Monitoring a Minimum Wage

Abstract

Effective monitoring of a minimum wage, requires that establishments potentially paying low

wages are effectively identified. This paper investigates the identification of establishmentspaying low wages prior to the introduction of the British National Minimum Wage in 1999,

through the utilization of fuzzy decision trees. Incorporating a fuzzy aspect within this

problem (using membership functions) enables the judgements to be made with linguistic

scales. An intelligent technique for constructing the required membership functions is

introduced, which greatly reduces the necessity of any expert opinion within their

construction. The Parzen windows method of estimating a probability distribution and the

FUSINTER method of continuous variable discretisation are incorporated in this technique.

An illustration of the utilization of the constructed fuzzy if then rules is included.

JEL Classification. C14, C15, C44, J31

Keywords. FUSINTER, Fuzzy decision trees, Labour economics, Low pay, Membership

functions, Parzen windows.

1 Introduction

In April 1999, the UK government introduced a National Minimum Wage (NMW) of 3.60

per hour for workers over the age of 21. Enforcing such a regulation is a major task. The

method chosen was targeted monitoring, whereby workplaces are investigated according to

their probability of employing workers on low pay. Such a procedure needs to be based on an

appropriate model for identifying potentially low-paying workplaces. Fuzzy decision treeanalysis seems highly apropriate for such a task.

Inductive decision trees were first introduced in 1963 with the Concept Learning System

Framework Hunt (1962). Since then they have continued to be developed and applied. The

structure of a decision tree starts with a root decision node, from which all branches

originate. A branch is a series of nodes where decisions are made at each node enabling

progression through (down) the tree. A progression stops at a leaf node, where a decision

classification is given, based on the rule associated with the full branch from the root node to

the individual leaf node.

As with many data analysis techniques (e.g., traditional regression models), decision trees

have been developed within a fuzzy environment. For example, the well known decision tree

method ID3 (Quinlan, 1986) was developed to include fuzzy entropy measures (see Cios and

Sztandera (1992) and Weber (1992)). The fuzzy decision tree method used in this paper was

introduced by Yuan and Shaw (1995), to take account of cognitive uncertainty, i.e. vagueness

and ambiguity. One reason for the utilization of fuzzy set theory is its simplicity and

similarity to human reasoning (Hong and Chen 1999). This similarity includes the use of

linguistic terms through the utilization of certain membership functions.

The membership function converts crisp numerical values into levels over a set of linguistic

terms. Central to any method within a fuzzy environment is the defining of the requiredmembership functions. This area has itself been the subject of research studies (see Hong and


3/15

2

Chen (1999), Sancho-Royo and Verdegay (1999) and Kahraman et al. (2000)), with many

studies using opinions of experts to construct the necessary functions, e.g. see Tarrazo and

Gutierrez (2000). In this paper an intelligent technique for constructing the membership

functions is introduced which takes into account the information of the individual continuous

values in the original data used to construct the fuzzy decision tree.

The main aim of this paper is to illustrate how fuzzy decision tree analysis can be used to

help monitor a minimum wage. It uses data derived from a survey of British workplaces

(WERS98) which was undertaken just before the introduction of the NMW and which

contains information on low pay.

The rest of the paper is structured as follows. In section 2 a description of the problem

considered is given. In section 3 an intelligent method of membership function construction

is introduced. In section 4 a brief description of the fuzzy decision tree method is given. In

section 5 the construction of the fuzzy decision tree for this problem is exposited.

2 Problem description and data set

In this paper the proportion of employees paid less than 3.50 per hour is defined as the

decision attribute %pay.1 From WERS98 over two thirds of establishment reported zero level

of low-paid employees. In the study of McNabb and Whitfield (2000) certain intervals

(classes) of %pay values were considered. Similarly here, three classes are used to offer an

initial partitioning of this decision attribute %pay. These are; zero percentage (zero - Z),

between 0 and 10 percent (low - L) and above 10 percent (high - H).

Since a full analysis of this problem is not the basis of this paper, a subset of the whole data

set is used, i.e. details on 100 establishments are used to enable the construction of the fuzzy

decision tree (see later). Furthermore a subset of the condition attributes (characteristics) of

the establishments are used. That is, here six condition attributes are used, see Table 1 for

their introduction and description.

Attribute Description

age Age of the organisation (years)

emps Number of employees in establishment

%yng Percentage of employees < 20 years old

%old Percentage of employees > 51 years old

%fem Percentage of female employees%prt Percentage of part-time employees

Table 1: Description of condition attributes.

For a full description of the data (condition and decision attributes) the reader is directed to

the study by McNabb and Whitfield (2000).

Currently a model used by the Inland Revenue aimed at identifying those sectors of

geographical areas where non-compliance is likely to be most prevalent (Low Pay

1 With a one year difference between the WERS98 data and the NMW in 1999, the level of 3.50 takes into

account inflation, i.e. to the 3.60 level.


4/15

3

Commission, 2000). The ability to successfully identify (predict) those establishments with a

high percentage of low paid employees is an important factor. That is, limited resources to

inspect establishments (by the Inland Revenue) requires efficient ways to target those

establishments more likely to pay low wages. This efficiency includes using external

characteristics of the establishment which are quick (free) to acquire. Many of these

characteristics will be approximations (e.g. percentage of young or female employees). Onefurther factor is that WERS98 includes data about low pay, based on answers from managers

most responsible for personnel matters, i.e. their answers may not be accurate facts but more

an immediate reaction judgement. Subsequently a fuzzy approach would go someway to

appease these issues, and within a decision tree setting, the resultant (readable) rules do not

require particular expertise in specific analysis techniques.

3 Construction of membership functions

As described in section 1, certain membership functions are used to convert a crisp numerical

value into levels over a set of linguistic terms. In this section an intelligent technique isintroduced for constructing the required membership functions, used in the subsequent fuzzy

decision tree method. This intelligent technique is made up of three parts, namely;

a) Discretisation of data set to provisionally intervalise the values of the continuouscondition attributes.

b) Construction of estimated distributions to offer a functional form for the spread of thevalues in an identified interval.

c) Definition of membership function from the constructed estimated distribution.Each of these parts will be described here, through using the NMW problem and data set

described in section 2.

3.1 Discretisation of Data Set

This section is concerned with Continuous Variable Discretisation (CVD). Research within

CVD has suggested several alternatives, based on whether the discretisation is supervised

(utilise the decision class) or unsupervised (consider only the group of continuous (condition

attribute) variables in question). CVD can further be separated into whether they are local

methods, i.e. operate on a single variable at a time, or global methods when they discretise a

group of objects at the same time.

In this paper the supervised CVD method FUSINTER is used (Zighed, 1998). One reason for

using FUSINTER is that this derives the appropriate number of intervals from the

distribution of the data, hence removing the need for an expert opinion here. FUSINTER is a

bottom up algorithm (merging sub-intervals rather than introducing new interval boundary

values) whose objective is to partition a condition attribute subject to the optimising of a

certain entropy measure. The method only partitions one attribute at a time, one advantage of

this method is its ability to avoid very thin partitioning, i.e. intervals which include a very

small number of objects.2

2 For a detailed discussion of the FUSINTER algorithm see Zighed et al. (1998). In this paper the quadratic

entropy method is used, including the default values = 0.975 and = 1.


5/15

4

Since FUSINTER is a supervised technique, the actual value of the decision attribute

(%pay), is employed to enable the discretisation of each of the six continuous condition

attributes (given in Table 1) to take place, see Table 2. Here, as with the decision attribute, it

is a provisional discretisation aiming to intelligently group the condition attributes before

further analysis. The decision classes (Z, L and H) defined in section 2 for %pay are used to

provisionally discrete the six condition attributes.

Attribute Interval 1 Interval 2 Interval 3

age [0, 7.5), 15 [7.5, 19.0), 41 [19.0, ), 44emps [0, 30.5), 21 [30.5, 98.5), 35 [98.5, ), 44%yng [0, 0.035), 46 [0.035, 0.120), 24 [0.120, 1], 30

%old [0, 0.045), 13 [0.045, 0.135), 44 [0.135, 1], 43

%fem [0, 0.455), 35 [0.455. 0.575), 15 [0.575, 1], 50

%prt [0, 0.325), 53 [0.325, 1], 47

Table 2: Intervals from FUSINTER discretisation.

From Table 2, it is shown the six condition attributes are each partitioned into 2 or 3

intervals. Also given are the number of objects in each interval, which clearly shows the

avoidance of particularly small intervals (i.e. thin partitioning).

3.2 Construction of estimated distributions

The method of Parzen windows (Parzen, 1962) constructs a probability density function (pdf)

based on the values in the domain of the interval. In its general form (assuming each value xiis represented by a zero mean, unit variance, univariate density function, see Thompson and

Tapia (1990)), the estimatedpdfis given by:

=

=

m

im

i

m h

xx

hmxpdf

1

2

2

1exp

2

111)(

,

where m is the number of values in the interval and hm is the window width, Duda and Hart

(1973, p. 89) consider the problem of constructing hm. They givem

hhm

1= , where h1 is a

parameter to define. In this paper h1 is the range of the individual values in the interval under

consideration. Defining Ij to be thejth interval, then h1 = max )(Ij min )(Ij , where min )(Ijand max )(I

j, signify the smallest and largest of the values in the j

thinterval respectively.

Hence, the associatedpdf(i.e.pdfj(x)) for thejth

interval is given by;

==

jm

ijj

i

jjj

j

xx

mxpdf

1

2

)min(I)max(I2

1exp

2))(Imin)(max(I

1)(

(1)

where mj is the number of values in Ij. The pdfj(x) function is the mean of the univariate

density functions centred at each of the values in the j

th

interval.


6/15

5

Using the original data values of the condition attributes and the intervals defined in Table 2,

the associated estimated distributions (i.e.,pdfs) can be constructed, see Figure 1.

0.05

0.15

0

0 .5

1

1 .5

2

0 .2 0 .4 0 .6 0 .8 1 0

0.5

1

1.5

2

0.2 0 .4 0 .6 0 .8 1

0

0 .5

1

1 .5

0 .2 0 .4 0 .6 0 .8 1 0

0.2

0 .4

0 .60 .8

0 .2 0 .4 0 .6 0 .8 1

0

0.1

50 10 0 1 5 0 2 0 0 25 0 3 0 0 0

0.02

0.04

0.06

0.08

2 0 0 40 0 60 0 8 0 0 1 0 0 0 12 0 0

e m p sag e

% y n g % o l d

% f e m % p r t

1

2

3

3

3

3 2

2

2 2

2 3

1

1

1

1

1

Figure 1: Estimated distributions of condition attributes.

In Figure 1, each set of estimated distributions is shown over the domain of the intervals

given in Table 1. It is noted that the constructed pdfj(x) functions have a domain over (,), but here a check is made on the feasible domain for each attribute, e.g. %yng is apercentage hence has a feasible domain [0, 100], given as a proportion with [0, 1] domain inFigure 1. The labels 1, 2 and 3 identify the estimated distributions to the intervals given

in Table 2.

A similar set of estimated distributions can be constructed for the decision attribute (%pay),

as shown in Figure 2.

0

1

2

3

4

0 .2 0 .4 0 .6 0 .8 1

1

2 3

% pay

Figure 2: Estimated distributions of decision attribute.

In Figure 2, the three associated pdfs are shown. Of special note is the pdf with label 1

relating to the %pay = Z class. That is, while it represents those establishments with zero

percentage of low pay workers, it would have zero interval width hence unable to use

equation (1). In this case an interval width h1 = 0.05 is used, enabling apdfto be constructed.

The reasoning for this, is that allowing a pdf to exist for a relatively crisp value, a level of

fuzziness is included. That is, within a workplace the manager answering the questions mayanswer with zero level of low pay while aware of a very small proportion existing.


7/15

6

3.3 Definition of the membership functions

This section is concerned with the construction of the required membership functions. Within

related studies, a number of different types of membership functions have been investigated.

These include triangular functions, trapezoidal functions also whether they should belinear/non-linear and possibly piecewise (see Hu and Fang (1998), Medasani et al. (1998) and

Roa-Sepulveda and Herrera (2000)). Here, linear trapezoidal membership functions are

utilised. For each interval, i.e. membership function, their general functional form is given

by,

=zjj

p ,

1,2,

==

zjjp ,

1,3, ==

zjjp and

0,4, >=

zjjp .

Using the estimated distributions given in section 3.2, and with z=1

= 0.1 and z>0

= 0.97 the

defining values for each membership function can be found. For the case z>0 = 0.97, this

implies that the associated membership function has a value greater than zero for the central

97% area of thepdffor this interval, hence possibly removing the influence of any particular

outliers in the data. If comparing to a possibility distribution the z=1 andz>0 values define the

necessity and possibility measures for the membership functions (Bandemer and Gottwald,

1995). These defining values enable the membership functions to be constructed, as given in

Figure 3.

7 9 .9 8

2.931 7 .8 7

1 0 .9 91 0 .2 7

5.719.81

4.62

0

0 .5

1

L M H

ag e3 6 0 .9 4

1 4 .0 61 0 4 .8 8

5 4 .9 34 8 .7 8

1 7 .5 83 5 .4 5

2 2 .7 8

0

0.5

1

L M H

e m p s

0 .2 6 2

0 .0 2 0 0 .1 2 7

0 .0 6 50 .0 5 8

0 . 0 1 9 0 . 0 3 7

0 .0 1 7

0

0 .5

1

L M H

% y n g

0 .2 6 0

0 .0 2 7 0 .1 4 1

0 .0 9 00 .0 8 1

0 .0 3 2 0 .0 5 7

0 .0 2 5

0

0 .5

1

L M H

% o l d

0. 700

0 .5 00 0 .6 04

0 . 5290 . 519

0 .4 4 2 0 .5 2 3

0 . 227

0

0 .5

1

L M H

% f e m0. 592

0 .2 15 0 .3 48

0 . 127

0

0 .5

1

L H

% p r t

Figure 3: Sets of membership functions for condition attributes.

From Figure 3, the membership functions are shown, e.g. for the 2 nd interval of the %yng

attribute, its defining values are [0.019, 0.058, 0.065, 0.127], this membership function is

labelled M - representing a linguistic term medium. Further labels are also given to its

neighbouring intervals in Figure 3, i.e. L - low and H - high. In summary for the %yng

attribute, a linguistic scale of low, medium and high has been constructed with the only

requirement needed from an expert, being the choice of the z=1 andz>0 values. This follows

also for age, emps, %old and %fem, with the attribute %prt having linguistic scales L - low

and H - high only.

A similar set of fuzzy membership functions can be constructed for the decision attribute

using the estimated distribution given in Figure 2, see Figure 4.


9/15

8

0.352

0.102

0.0390.032

0.0020.019

0.006

0

0 .5

1

Z

L H

% p a y

Figure 4: Membership functions for decision attribute.

In Figure 4, the membership functions for the decision attribute are given. In this case the

associated linguistic terms are Z - zero, L - low and H - high.

To further illustrate the construction of the fuzzy numbers from the original data, the details

of an establishment are given in Table 3 along with the subsequent fuzzy values.

Crisp value Fuzzy value

age 15 [0, 0.417, 0.157]

emps 91 [0, 0.278, 0.222]

%yng 0.05 [0, 0.793, 0.126]

%old 0.16 [0, 0, 0.571]

%fem 0.12 [1, 0, 0]

%prt 0.01 [1, 0]

%pay 0.02 [0, 0.599, 0.007]

Table 3: Original and Fuzzy attribute values.

Using the membership functions previously defined, the resultant fuzzified values given in

Table 3 can also be written;

{0, 0.417, 0.157; 0, 0.278, 0.222; 0, 0.793, 0.126; 0, 0, 0.571; 1, 0, 0; 1, 0; 0, 0.599, 0.007}

where the semi-colons separate the sets of fuzzy values for each attribute (condition and

decision attributes included).4

4 Summary of fuzzy decision tree method

In this section a brief description of the functions used in the fuzzy decision tree method

introduced by Yuan and Shaw (1995) are exposited. A fuzzy set A in a universe of discourse

Uis characterized by a membership function A which takes values in the interval [0, 1]. For

all uU, the intersectionAB of two fuzzy sets is given by AB = min(A(u), B(u)).

A membership function (x) of a fuzzy variable Y defined on X, can be viewed as a

possibility distribution ofYon X, i.e. (x) = (x), for all xX. The possibilistic measure -)(YE of ambiguity is defined as;

4 The same method of illustrating fuzzy values as used in Wang et al. (2000).


10/15

9

===

+

n

iii igYE

11 ]ln[)()()( ,

where },...,,{ 21= n is the permutation of the possibility distribution

)}(),...,(),({ 21 nxxx = ,5

sorted so that+

1ii for i = 1, .., n, and 01 =+n , see Zadeh

(1978) and Higashi and Klir (1983). The ambiguity of attributeA is then;

==

m

iiuAE

mAE

1

))((1

)( ,

where )))((max)(())((1

iTsj

iTi uuguAE js = , with Tj the linguistic scales used within an

attribute for m cases. When there is overlapping between linguistic terms of an attribute or

between classes, the ambiguity exists.

The fuzzy subsethood S(A, B) measures the degree to which A is a subset ofB (see Kosko

1986) and is given by;6

=

UuA

UuBA

u

uu

BAS)(

))(),(min(

),(

.

Given fuzzy evidenceE, the possibility of classifying an object to class Ci can be defined as;

),(max

),(

)|(j

j

i

i CES

CES

EC == ,

where S(E, Ci) represents the degree of truth for the classification rule, i.e. ifE then Ci.

Knowing a single piece of evidence (i.e., a fuzzy value from an attribute) the classification

ambiguity based on this fuzzy evidence is defined as;

))|(()( ECgEG = .

The classification ambiguity with fuzzy partitioning P = {E1, ,Ek} on the fuzzy evidence

F, denoted as G(P | F), is the weighted average of classification ambiguity with each subsetof partition;

==

k

iii FEGFEwFPG

1

)()|()|( ,

where G(Ei F) is the classification ambiguity with fuzzy evidence Ei F, w(Ei | F) is theweight which represents the relative size of subsetEiFin F.

5

That is, the values )}(),...,(),({ 21 nxxx are normalised based on the largest value.6 To calculate S(A,B),A andB should be defined on the same universe of discourse. In this case all attributes are

over the same set of objects (workplaces).


11/15

10

=

=

k

j UuFE

UuFE

i

uu

uu

FEw

j

i

1

))(),(min(

))(),(min(

)|(

.

The fuzzy decision tree method considered here utilizes these functions. In summaryattributes are assigned to nodes based on the lowest level of ambiguity. A node becomes a

leaf node if the level of subsethood (based on the conjunction (intersection) of the branches

from the root) is higher than some truth value assigned to the whole of the decision tree.The classification from the leaf node is to the decision class with the largest subsethood

value. For a full description of this method see Yuan and Shaw (1995) and Wang et al.

(2000).

5 Fuzzy decision tree construction

Utilizing the definitions defined is section 4, in this section the fuzzy decision tree method isillustrated, using the fuzzy values for the low pay problem described in section 2. A truth

level of= 0.6 is used throughout. The final fuzzy decision tree is given in Figure 5 and canbe used as reference while its construction is described below.

To find the root node attribute, the class ambiguity values are found for each attribute, they

are; G(age) = 0.6607, G(emps) = 0.4568, G(%yng) = 0.4195, G(%old) = 0.7244, G(%fem) =

0.5189 and G(%prt) = 0.4546. Since G(%yng) is the lowest of these values, it is chosen as

the root node attribute. The subsethood of each of the branches from %yng to the classes of

the decision attribute (%pay) are calculated. For the branch (%yng = L) they are; S(%yng =

L, %pay = Z) = 0.8666, S(%yng = L, %pay = L) = 0.0569 and S(%yng = L, %pay = H) =

0.0834. The largest of these values (0.8666) is above the required truth level (= 0.6), hencethis branch ends in a leaf node from which a rule can be constructed.

Similar considerations are given to the branches (%yng = M) and (%yng = H), in these cases

the largest subsethood values are S(%yng = M, %pay = L) = 0.4246 and S(%yng = H, %pay

= H) = 0.5431 respectively. Since both of these largest subsethood values are less than the

acceptable truth level it follows these branches require further partitioning with different

attributes needed to be considered. For the (%yng = M) branch we first calculate this

classification ambiguity G(%yng = M) = 0.7820 value then compare this with the

classification ambiguity with fuzzy partitions values, i.e. consider the other attributes from

this branch, e.g. G(age | %yng = M) = 0.6598. An inspection of the possible values showsG(%prt | %yng = M) = 0.4856 is the least, hence %prt is the chosen attribute for the decision

node at this branch. It also follows G(%yng = H) = 0.3831, and G(%fem | %yng = H) =

0.2725 is the chosen attribute for this branch.

The branches from the decision node (%prt | %yng = M) are next considered. Firstly the

associated largest subsethood values for each subsequent branch; S(%yng = M and %prt = L,

%pay = L) = 0.6101 and S(%yng = M and %prt = H, %pay = H) = 0.4970. Of these values,

only S(%yng = M and %prt = L, %pay = L) has a value above the truth value, hence is a leaf

node, the other branch requires possible further partitioning with attributes. For the decision

node (%fem | %yng = H) it follows the largest subsethood values for each branch are S(%yng

= H and %fem = L, %pay = L) = 0.5860, S(%yng = H and %fem = M, %pay = H) = 0.6482


12/15

11

and S(%yng = H and %fem = H, %pay = H) = 0.6597. Hence only branch (%yng = H and

%fem = L) requires further possible partitioning by attributes.

This process is continued until only leaf nodes are at the end of each branch, or no further

augmentation of attributes to nodes can be made.7

The final results of the fuzzy decision tree

method are illustrated in Figure 5.

R o o t

% y n g = L % y n g = M % y n g = H

% y n g

% fe m = L% p r t = H

% f e m % o l d

% o ld = L % o ld = M % o ld = H

% p r t

% p r t = L

% f e m

% fe m = M

% f e m = M

% f e m = H

% f e m = H% fe m = L

% p r t = L % p r t = H

% p r t

8 6 . 7 %

6 1 . 0 %

8 2 .2 % 6 6 .9 %

7 7 .2 % 6 5 .7 %

9 3 .6 % 7 8 .9 %1 0 0 . 0 %

% p a y = H % p a y = H6 4 .8 % 6 5 .9 %

% p a y = Z

% p a y = L

% p a y = L

% p a y = H

% p a y = L % p a y = H

% p a y = L % p a y = L% p a y = Z

Figure 5: Fuzzy decision tree.

In Figure 5, the fuzzy decision tree is shown for the NMW problem considered. It follows

there are 11 fuzzy rules (leaf nodes), described by the larger rectangle boxes. Hence each ruleis described by the downward progression from the root to a leaf node. That is, in each non-

leaf node (excluding root) there are two parts. Firstly in their rectangle boxes, above the

dashed line the particular condition attribute linguistic term to be satisfied. Secondly, below

the dashed line the next condition attribute to consider.

At a leaf node, above the dashed line is the final condition attribute linguistic term to be

satisfied and below the dashed line the class of the decision attribute %pay the rule classifies

7 This may be based on no improvement (reduction) of the classification ambiguity value of a branch, or no

further attributes able to be augmented.


13/15

12

to, along with the degree of truth in the classification. For example one rule is given in Figure

6 along with a wording of the rule.

R o o t

% y n g = M

% y n g

% p r t

% p r t = L

% p a y = L6 1 .0 %

If %yng = M and %prt = L then

%pay = L with degree of truth61.0%.

That is, when the fuzzy value of the

membership function for %yng = M

is the largest for that attribute,

similarly for %prt = L condition

attribute.

Figure 6: Description of a fuzzy decision rule.

To illustrate this decision tree the establishment given in Table 3 is used to illustrate its

classification. The fuzzy values for the establishment are given below, with the largest values

from each attribute underlined;

{0, 0.417, 0.157; 0, 0.278, 0.222; 0, 0.793, 0.126; 0, 0, 0.571; 1, 0, 0; 1, 0; 0, 0.599, 0.007}

it follows for each attribute the dominant linguistic terms are age = M (since largest value0.417), emps = M, %yng = M, %old = H, %fem = L, %prt = H and %pay = L. Using this

information it shows that the fuzzy rule given in Figure 6 is the rule which classifies this

establishment. An inspection of the result shows the correct classification was given, even

though the degree of truth is an indication of the fuzzy nature of this analysis.

6 Conclusions

This paper has illustrated the use of a fuzzy decision tree approach to the investigation of

identifying establishments that pay low wages. Through the use of Parzen windows and

FUSINTER, the required membership functions are intelligently constructed, with the needfor an expert opinion not required within many parts of the analysis.

The results of the fuzzy decision tree, are fuzzy classification rules each with an associated

degree of truth in their classification. These rules are relatively simple to read and apply, i.e.

a person may calculate the specific fuzzy values from crisp data or simply use the low (L),

medium (M) and high (H) labels as simple linguistic terms. Hence removing the need for any

further analysis to be undertaken, except the personnel linguistic judgements.

References

Bandemer, H. and Gottwald, S. (1995). Fuzzy Sets, Fuzzy Logic Fuzzy Methods. Wiley,

New York.


14/15

13

Cios, K. J. and Sztandera, L. M. (1992). Continuous ID3 algorithm with fuzzy entropy

measure. Proceedings IEEE International Conference on Fuzzy Systems, San Diego, CA,

469476.

Duda, R. O., and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley, New

York.

Higashi, M. and Klir, G. J. (1983). Measure of uncertainty and information based on

possibility distributions.International Journal of General systems, 9: 4358.

Hong, T-P. and Chen, J-B. (1999). Finding relevant attributes and membership functions.

Fuzzy Sets and Systems, 103: 389404.

Hu, C-F. and Fang, S-C (1998). Solving fuzzy inequalities with concave membershipfunctions. Fuzzy Sets and Systems, 99: 233240.

Hunt, E. B. (1962). Concept learning: An information processing problem. New York,

Wiley.

Kahraman, C., Tolga, E. and Ulukan, Z. (2000). Justification of manufacturing technologies

using fuzzy benefit/cost ration analysis.International Journal of Production Economics, 66:

4552.

Kosko, B. (1986), Fuzzy entropy and conditioning.Information Science, 30: 165

174.

Low Pay Commission (2000). The National Minimum Wage: The Story So Far: Second

Report of the Low Pay Commission. Cm 4571, London: HMSO.

McNabb, R. and Whitfield K. (2000). Worth So Appallingly Little: A Workplace-Level

Analysis of Low Pay.British Journal of Industrial Relations, 38(4): 585609.

Medasani, S., Kim, J. and Krishnapuram, R. (1998). An overview of membership function

generation techniques for pattern recognition. International Journal of Approximate

Reasoning, 19: 391417.

Parzen, E. (1962). On Estimation of a probability density function mode. Annals of

Mathematical Statistics , 33: 10651076.

Quinlan, J. R. (1986). Induction of decision trees.Machine Learning, 1(1): 81106.

Roa-Sepulveda C. A. and Herrera, M. (2000). A solution to the economic dispatch problem

using decision trees.Electric Power Systems Research, 56: 255259.

Sancho-Royo, A. and Verdegay, J. L. (1999). Methods for the Construction of Membership

Functions.International Journal of Intelligent Systems , 14: 12131230.


15/15

14

Tarrazo, M. and Gutierrez L. (2000). Economic expectation, fuzzy sets and financial

planning.European Journal of Operational Research, 126: 89105.

Thompson, J. R. and Tapia, R. A. (1990). Nonparametric Function Estimation, Modeling,

and Simulation. Society for Industrial and Applied Mathematics, Philadelphia.

Wang, X., Chen, B., Qian, G. and Ye, F. (2000). On the optimization of fuzzy decision

trees. Fuzzy sets and Systems, 112: 117125.

Weber, R. (1992). Fuzzy-ID3: a class of methods for automatic knowledge acquisition.

Proceedings of 2nd

International conference on Fuzzy Logic and Neural networks, Iizuka,

Japan, 265268.

Yuan, Y. and Shaw, M. J. (1995). Induction of fuzzy decision trees. Fuzzy Sets and

Systems, 125139.

Zadeh, L. A. (1978). Fuzzy Sets as a basis for a theory of possibility. Fuzzy Sets and

Systems, 1: 328.

Zighed, D. A., Rabaseda, S. and Rakotomala R. (1998). FUSINTER: A method for

discretisation of continuous attributes. International Journal of Uncertainty, Fuzziness and

Knowledge-Based Systems, 6(3): 307326.

fuzzy trees

Documents