fuzzy trees

Upload: raman-pradhan

Post on 10-Apr-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Fuzzy Trees

    1/15

    The use of Fuzzy Decision Tree Analysis in Monitoring a Minimum Wage

    Malcolm Beynon

    and

    Keith Whitfield

    Cardiff Business School, Cardiff University, Wales, UK.

    Address for correspondence: Dr Malcolm Beynon,

    Cardiff Business School,

    Colum Drive,Cardiff, CF10 3EU,

    Wales, U.K.

    Telephone: +44 (0)29 2087 5747,

    Fax +44 (0)29 2087 4419

    E-mail: [email protected]

  • 8/8/2019 Fuzzy Trees

    2/15

    1

    The use of Fuzzy Decision Tree Analysis in Monitoring a Minimum Wage

    Abstract

    Effective monitoring of a minimum wage, requires that establishments potentially paying low

    wages are effectively identified. This paper investigates the identification of establishmentspaying low wages prior to the introduction of the British National Minimum Wage in 1999,

    through the utilization of fuzzy decision trees. Incorporating a fuzzy aspect within this

    problem (using membership functions) enables the judgements to be made with linguistic

    scales. An intelligent technique for constructing the required membership functions is

    introduced, which greatly reduces the necessity of any expert opinion within their

    construction. The Parzen windows method of estimating a probability distribution and the

    FUSINTER method of continuous variable discretisation are incorporated in this technique.

    An illustration of the utilization of the constructed fuzzy if then rules is included.

    JEL Classification. C14, C15, C44, J31

    Keywords. FUSINTER, Fuzzy decision trees, Labour economics, Low pay, Membership

    functions, Parzen windows.

    1 Introduction

    In April 1999, the UK government introduced a National Minimum Wage (NMW) of 3.60

    per hour for workers over the age of 21. Enforcing such a regulation is a major task. The

    method chosen was targeted monitoring, whereby workplaces are investigated according to

    their probability of employing workers on low pay. Such a procedure needs to be based on an

    appropriate model for identifying potentially low-paying workplaces. Fuzzy decision treeanalysis seems highly apropriate for such a task.

    Inductive decision trees were first introduced in 1963 with the Concept Learning System

    Framework Hunt (1962). Since then they have continued to be developed and applied. The

    structure of a decision tree starts with a root decision node, from which all branches

    originate. A branch is a series of nodes where decisions are made at each node enabling

    progression through (down) the tree. A progression stops at a leaf node, where a decision

    classification is given, based on the rule associated with the full branch from the root node to

    the individual leaf node.

    As with many data analysis techniques (e.g., traditional regression models), decision trees

    have been developed within a fuzzy environment. For example, the well known decision tree

    method ID3 (Quinlan, 1986) was developed to include fuzzy entropy measures (see Cios and

    Sztandera (1992) and Weber (1992)). The fuzzy decision tree method used in this paper was

    introduced by Yuan and Shaw (1995), to take account of cognitive uncertainty, i.e. vagueness

    and ambiguity. One reason for the utilization of fuzzy set theory is its simplicity and

    similarity to human reasoning (Hong and Chen 1999). This similarity includes the use of

    linguistic terms through the utilization of certain membership functions.

    The membership function converts crisp numerical values into levels over a set of linguistic

    terms. Central to any method within a fuzzy environment is the defining of the requiredmembership functions. This area has itself been the subject of research studies (see Hong and

  • 8/8/2019 Fuzzy Trees

    3/15

    2

    Chen (1999), Sancho-Royo and Verdegay (1999) and Kahraman et al. (2000)), with many

    studies using opinions of experts to construct the necessary functions, e.g. see Tarrazo and

    Gutierrez (2000). In this paper an intelligent technique for constructing the membership

    functions is introduced which takes into account the information of the individual continuous

    values in the original data used to construct the fuzzy decision tree.

    The main aim of this paper is to illustrate how fuzzy decision tree analysis can be used to

    help monitor a minimum wage. It uses data derived from a survey of British workplaces

    (WERS98) which was undertaken just before the introduction of the NMW and which

    contains information on low pay.

    The rest of the paper is structured as follows. In section 2 a description of the problem

    considered is given. In section 3 an intelligent method of membership function construction

    is introduced. In section 4 a brief description of the fuzzy decision tree method is given. In

    section 5 the construction of the fuzzy decision tree for this problem is exposited.

    2 Problem description and data set

    In this paper the proportion of employees paid less than 3.50 per hour is defined as the

    decision attribute %pay.1 From WERS98 over two thirds of establishment reported zero level

    of low-paid employees. In the study of McNabb and Whitfield (2000) certain intervals

    (classes) of %pay values were considered. Similarly here, three classes are used to offer an

    initial partitioning of this decision attribute %pay. These are; zero percentage (zero - Z),

    between 0 and 10 percent (low - L) and above 10 percent (high - H).

    Since a full analysis of this problem is not the basis of this paper, a subset of the whole data

    set is used, i.e. details on 100 establishments are used to enable the construction of the fuzzy

    decision tree (see later). Furthermore a subset of the condition attributes (characteristics) of

    the establishments are used. That is, here six condition attributes are used, see Table 1 for

    their introduction and description.

    Attribute Description

    age Age of the organisation (years)

    emps Number of employees in establishment

    %yng Percentage of employees < 20 years old

    %old Percentage of employees > 51 years old

    %fem Percentage of female employees%prt Percentage of part-time employees

    Table 1: Description of condition attributes.

    For a full description of the data (condition and decision attributes) the reader is directed to

    the study by McNabb and Whitfield (2000).

    Currently a model used by the Inland Revenue aimed at identifying those sectors of

    geographical areas where non-compliance is likely to be most prevalent (Low Pay

    1 With a one year difference between the WERS98 data and the NMW in 1999, the level of 3.50 takes into

    account inflation, i.e. to the 3.60 level.

  • 8/8/2019 Fuzzy Trees

    4/15

    3

    Commission, 2000). The ability to successfully identify (predict) those establishments with a

    high percentage of low paid employees is an important factor. That is, limited resources to

    inspect establishments (by the Inland Revenue) requires efficient ways to target those

    establishments more likely to pay low wages. This efficiency includes using external

    characteristics of the establishment which are quick (free) to acquire. Many of these

    characteristics will be approximations (e.g. percentage of young or female employees). Onefurther factor is that WERS98 includes data about low pay, based on answers from managers

    most responsible for personnel matters, i.e. their answers may not be accurate facts but more

    an immediate reaction judgement. Subsequently a fuzzy approach would go someway to

    appease these issues, and within a decision tree setting, the resultant (readable) rules do not

    require particular expertise in specific analysis techniques.

    3 Construction of membership functions

    As described in section 1, certain membership functions are used to convert a crisp numerical

    value into levels over a set of linguistic terms. In this section an intelligent technique isintroduced for constructing the required membership functions, used in the subsequent fuzzy

    decision tree method. This intelligent technique is made up of three parts, namely;

    a) Discretisation of data set to provisionally intervalise the values of the continuouscondition attributes.

    b) Construction of estimated distributions to offer a functional form for the spread of thevalues in an identified interval.

    c) Definition of membership function from the constructed estimated distribution.Each of these parts will be described here, through using the NMW problem and data set

    described in section 2.

    3.1 Discretisation of Data Set

    This section is concerned with Continuous Variable Discretisation (CVD). Research within

    CVD has suggested several alternatives, based on whether the discretisation is supervised

    (utilise the decision class) or unsupervised (consider only the group of continuous (condition

    attribute) variables in question). CVD can further be separated into whether they are local

    methods, i.e. operate on a single variable at a time, or global methods when they discretise a

    group of objects at the same time.

    In this paper the supervised CVD method FUSINTER is used (Zighed, 1998). One reason for

    using FUSINTER is that this derives the appropriate number of intervals from the

    distribution of the data, hence removing the need for an expert opinion here. FUSINTER is a

    bottom up algorithm (merging sub-intervals rather than introducing new interval boundary

    values) whose objective is to partition a condition attribute subject to the optimising of a

    certain entropy measure. The method only partitions one attribute at a time, one advantage of

    this method is its ability to avoid very thin partitioning, i.e. intervals which include a very

    small number of objects.2

    2 For a detailed discussion of the FUSINTER algorithm see Zighed et al. (1998). In this paper the quadratic

    entropy method is used, including the default values = 0.975 and = 1.

  • 8/8/2019 Fuzzy Trees

    5/15

    4

    Since FUSINTER is a supervised technique, the actual value of the decision attribute

    (%pay), is employed to enable the discretisation of each of the six continuous condition

    attributes (given in Table 1) to take place, see Table 2. Here, as with the decision attribute, it

    is a provisional discretisation aiming to intelligently group the condition attributes before

    further analysis. The decision classes (Z, L and H) defined in section 2 for %pay are used to

    provisionally discrete the six condition attributes.

    Attribute Interval 1 Interval 2 Interval 3

    age [0, 7.5), 15 [7.5, 19.0), 41 [19.0, ), 44emps [0, 30.5), 21 [30.5, 98.5), 35 [98.5, ), 44%yng [0, 0.035), 46 [0.035, 0.120), 24 [0.120, 1], 30

    %old [0, 0.045), 13 [0.045, 0.135), 44 [0.135, 1], 43

    %fem [0, 0.455), 35 [0.455. 0.575), 15 [0.575, 1], 50

    %prt [0, 0.325), 53 [0.325, 1], 47

    Table 2: Intervals from FUSINTER discretisation.

    From Table 2, it is shown the six condition attributes are each partitioned into 2 or 3

    intervals. Also given are the number of objects in each interval, which clearly shows the

    avoidance of particularly small intervals (i.e. thin partitioning).

    3.2 Construction of estimated distributions

    The method of Parzen windows (Parzen, 1962) constructs a probability density function (pdf)

    based on the values in the domain of the interval. In its general form (assuming each value xiis represented by a zero mean, unit variance, univariate density function, see Thompson and

    Tapia (1990)), the estimatedpdfis given by:

    =

    =

    m

    im

    i

    m h

    xx

    hmxpdf

    1

    2

    2

    1exp

    2

    111)(

    ,

    where m is the number of values in the interval and hm is the window width, Duda and Hart

    (1973, p. 89) consider the problem of constructing hm. They givem

    hhm

    1= , where h1 is a

    parameter to define. In this paper h1 is the range of the individual values in the interval under

    consideration. Defining Ij to be thejth interval, then h1 = max )(Ij min )(Ij , where min )(Ijand max )(I

    j, signify the smallest and largest of the values in the j

    thinterval respectively.

    Hence, the associatedpdf(i.e.pdfj(x)) for thejth

    interval is given by;

    ==

    jm

    ijj

    i

    jjj

    j

    xx

    mxpdf

    1

    2

    )min(I)max(I2

    1exp

    2))(Imin)(max(I

    1)(

    (1)

    where mj is the number of values in Ij. The pdfj(x) function is the mean of the univariate

    density functions centred at each of the values in the j

    th

    interval.

  • 8/8/2019 Fuzzy Trees

    6/15

    5

    Using the original data values of the condition attributes and the intervals defined in Table 2,

    the associated estimated distributions (i.e.,pdfs) can be constructed, see Figure 1.

    0.05

    0.15

    0

    0 .5

    1

    1 .5

    2

    0 .2 0 .4 0 .6 0 .8 1 0

    0.5

    1

    1.5

    2

    0.2 0 .4 0 .6 0 .8 1

    0

    0 .5

    1

    1 .5

    0 .2 0 .4 0 .6 0 .8 1 0

    0.2

    0 .4

    0 .60 .8

    0 .2 0 .4 0 .6 0 .8 1

    0

    0.1

    50 10 0 1 5 0 2 0 0 25 0 3 0 0 0

    0.02

    0.04

    0.06

    0.08

    2 0 0 40 0 60 0 8 0 0 1 0 0 0 12 0 0

    e m p sag e

    % y n g % o l d

    % f e m % p r t

    1

    2

    3

    3

    3

    3 2

    2

    2 2

    2 3

    1

    1

    1

    1

    1

    Figure 1: Estimated distributions of condition attributes.

    In Figure 1, each set of estimated distributions is shown over the domain of the intervals

    given in Table 1. It is noted that the constructed pdfj(x) functions have a domain over (,), but here a check is made on the feasible domain for each attribute, e.g. %yng is apercentage hence has a feasible domain [0, 100], given as a proportion with [0, 1] domain inFigure 1. The labels 1, 2 and 3 identify the estimated distributions to the intervals given

    in Table 2.

    A similar set of estimated distributions can be constructed for the decision attribute (%pay),

    as shown in Figure 2.

    0

    1

    2

    3

    4

    0 .2 0 .4 0 .6 0 .8 1

    1

    2 3

    % pay

    Figure 2: Estimated distributions of decision attribute.

    In Figure 2, the three associated pdfs are shown. Of special note is the pdf with label 1

    relating to the %pay = Z class. That is, while it represents those establishments with zero

    percentage of low pay workers, it would have zero interval width hence unable to use

    equation (1). In this case an interval width h1 = 0.05 is used, enabling apdfto be constructed.

    The reasoning for this, is that allowing a pdf to exist for a relatively crisp value, a level of

    fuzziness is included. That is, within a workplace the manager answering the questions mayanswer with zero level of low pay while aware of a very small proportion existing.

  • 8/8/2019 Fuzzy Trees

    7/15

    6

    3.3 Definition of the membership functions

    This section is concerned with the construction of the required membership functions. Within

    related studies, a number of different types of membership functions have been investigated.

    These include triangular functions, trapezoidal functions also whether they should belinear/non-linear and possibly piecewise (see Hu and Fang (1998), Medasani et al. (1998) and

    Roa-Sepulveda and Herrera (2000)). Here, linear trapezoidal membership functions are

    utilised. For each interval, i.e. membership function, their general functional form is given

    by,

    =zjj

    p ,

    1,2,

    ==

    zjjp ,

    1,3, ==

    zjjp and

    0,4, >=

    zjjp .

    Using the estimated distributions given in section 3.2, and with z=1

    = 0.1 and z>0

    = 0.97 the

    defining values for each membership function can be found. For the case z>0 = 0.97, this

    implies that the associated membership function has a value greater than zero for the central

    97% area of thepdffor this interval, hence possibly removing the influence of any particular

    outliers in the data. If comparing to a possibility distribution the z=1 andz>0 values define the

    necessity and possibility measures for the membership functions (Bandemer and Gottwald,

    1995). These defining values enable the membership functions to be constructed, as given in

    Figure 3.

    7 9 .9 8

    2.931 7 .8 7

    1 0 .9 91 0 .2 7

    5.719.81

    4.62

    0

    0 .5

    1

    L M H

    ag e3 6 0 .9 4

    1 4 .0 61 0 4 .8 8

    5 4 .9 34 8 .7 8

    1 7 .5 83 5 .4 5

    2 2 .7 8

    0

    0.5

    1

    L M H

    e m p s

    0 .2 6 2

    0 .0 2 0 0 .1 2 7

    0 .0 6 50 .0 5 8

    0 . 0 1 9 0 . 0 3 7

    0 .0 1 7

    0

    0 .5

    1

    L M H

    % y n g

    0 .2 6 0

    0 .0 2 7 0 .1 4 1

    0 .0 9 00 .0 8 1

    0 .0 3 2 0 .0 5 7

    0 .0 2 5

    0

    0 .5

    1

    L M H

    % o l d

    0. 700

    0 .5 00 0 .6 04

    0 . 5290 . 519

    0 .4 4 2 0 .5 2 3

    0 . 227

    0

    0 .5

    1

    L M H

    % f e m0. 592

    0 .2 15 0 .3 48

    0 . 127

    0

    0 .5

    1

    L H

    % p r t

    Figure 3: Sets of membership functions for condition attributes.

    From Figure 3, the membership functions are shown, e.g. for the 2 nd interval of the %yng

    attribute, its defining values are [0.019, 0.058, 0.065, 0.127], this membership function is

    labelled M - representing a linguistic term medium. Further labels are also given to its

    neighbouring intervals in Figure 3, i.e. L - low and H - high. In summary for the %yng

    attribute, a linguistic scale of low, medium and high has been constructed with the only

    requirement needed from an expert, being the choice of the z=1 andz>0 values. This follows

    also for age, emps, %old and %fem, with the attribute %prt having linguistic scales L - low

    and H - high only.

    A similar set of fuzzy membership functions can be constructed for the decision attribute

    using the estimated distribution given in Figure 2, see Figure 4.

  • 8/8/2019 Fuzzy Trees

    9/15

    8

    0.352

    0.102

    0.0390.032

    0.0020.019

    0.006

    0

    0 .5

    1

    Z

    L H

    % p a y

    Figure 4: Membership functions for decision attribute.

    In Figure 4, the membership functions for the decision attribute are given. In this case the

    associated linguistic terms are Z - zero, L - low and H - high.

    To further illustrate the construction of the fuzzy numbers from the original data, the details

    of an establishment are given in Table 3 along with the subsequent fuzzy values.

    Crisp value Fuzzy value

    age 15 [0, 0.417, 0.157]

    emps 91 [0, 0.278, 0.222]

    %yng 0.05 [0, 0.793, 0.126]

    %old 0.16 [0, 0, 0.571]

    %fem 0.12 [1, 0, 0]

    %prt 0.01 [1, 0]

    %pay 0.02 [0, 0.599, 0.007]

    Table 3: Original and Fuzzy attribute values.

    Using the membership functions previously defined, the resultant fuzzified values given in

    Table 3 can also be written;

    {0, 0.417, 0.157; 0, 0.278, 0.222; 0, 0.793, 0.126; 0, 0, 0.571; 1, 0, 0; 1, 0; 0, 0.599, 0.007}

    where the semi-colons separate the sets of fuzzy values for each attribute (condition and

    decision attributes included).4

    4 Summary of fuzzy decision tree method

    In this section a brief description of the functions used in the fuzzy decision tree method

    introduced by Yuan and Shaw (1995) are exposited. A fuzzy set A in a universe of discourse

    Uis characterized by a membership function A which takes values in the interval [0, 1]. For

    all uU, the intersectionAB of two fuzzy sets is given by AB = min(A(u), B(u)).

    A membership function (x) of a fuzzy variable Y defined on X, can be viewed as a

    possibility distribution ofYon X, i.e. (x) = (x), for all xX. The possibilistic measure -)(YE of ambiguity is defined as;

    4 The same method of illustrating fuzzy values as used in Wang et al. (2000).

  • 8/8/2019 Fuzzy Trees

    10/15

    9

    ===

    +

    n

    iii igYE

    11 ]ln[)()()( ,

    where },...,,{ 21= n is the permutation of the possibility distribution

    )}(),...,(),({ 21 nxxx = ,5

    sorted so that+

    1ii for i = 1, .., n, and 01 =+n , see Zadeh

    (1978) and Higashi and Klir (1983). The ambiguity of attributeA is then;

    ==

    m

    iiuAE

    mAE

    1

    ))((1

    )( ,

    where )))((max)(())((1

    iTsj

    iTi uuguAE js = , with Tj the linguistic scales used within an

    attribute for m cases. When there is overlapping between linguistic terms of an attribute or

    between classes, the ambiguity exists.

    The fuzzy subsethood S(A, B) measures the degree to which A is a subset ofB (see Kosko

    1986) and is given by;6

    =

    UuA

    UuBA

    u

    uu

    BAS)(

    ))(),(min(

    ),(

    .

    Given fuzzy evidenceE, the possibility of classifying an object to class Ci can be defined as;

    ),(max

    ),(

    )|(j

    j

    i

    i CES

    CES

    EC == ,

    where S(E, Ci) represents the degree of truth for the classification rule, i.e. ifE then Ci.

    Knowing a single piece of evidence (i.e., a fuzzy value from an attribute) the classification

    ambiguity based on this fuzzy evidence is defined as;

    ))|(()( ECgEG = .

    The classification ambiguity with fuzzy partitioning P = {E1, ,Ek} on the fuzzy evidence

    F, denoted as G(P | F), is the weighted average of classification ambiguity with each subsetof partition;

    ==

    k

    iii FEGFEwFPG

    1

    )()|()|( ,

    where G(Ei F) is the classification ambiguity with fuzzy evidence Ei F, w(Ei | F) is theweight which represents the relative size of subsetEiFin F.

    5

    That is, the values )}(),...,(),({ 21 nxxx are normalised based on the largest value.6 To calculate S(A,B),A andB should be defined on the same universe of discourse. In this case all attributes are

    over the same set of objects (workplaces).

  • 8/8/2019 Fuzzy Trees

    11/15

    10

    =

    =

    k

    j UuFE

    UuFE

    i

    uu

    uu

    FEw

    j

    i

    1

    ))(),(min(

    ))(),(min(

    )|(

    .

    The fuzzy decision tree method considered here utilizes these functions. In summaryattributes are assigned to nodes based on the lowest level of ambiguity. A node becomes a

    leaf node if the level of subsethood (based on the conjunction (intersection) of the branches

    from the root) is higher than some truth value assigned to the whole of the decision tree.The classification from the leaf node is to the decision class with the largest subsethood

    value. For a full description of this method see Yuan and Shaw (1995) and Wang et al.

    (2000).

    5 Fuzzy decision tree construction

    Utilizing the definitions defined is section 4, in this section the fuzzy decision tree method isillustrated, using the fuzzy values for the low pay problem described in section 2. A truth

    level of= 0.6 is used throughout. The final fuzzy decision tree is given in Figure 5 and canbe used as reference while its construction is described below.

    To find the root node attribute, the class ambiguity values are found for each attribute, they

    are; G(age) = 0.6607, G(emps) = 0.4568, G(%yng) = 0.4195, G(%old) = 0.7244, G(%fem) =

    0.5189 and G(%prt) = 0.4546. Since G(%yng) is the lowest of these values, it is chosen as

    the root node attribute. The subsethood of each of the branches from %yng to the classes of

    the decision attribute (%pay) are calculated. For the branch (%yng = L) they are; S(%yng =

    L, %pay = Z) = 0.8666, S(%yng = L, %pay = L) = 0.0569 and S(%yng = L, %pay = H) =

    0.0834. The largest of these values (0.8666) is above the required truth level (= 0.6), hencethis branch ends in a leaf node from which a rule can be constructed.

    Similar considerations are given to the branches (%yng = M) and (%yng = H), in these cases

    the largest subsethood values are S(%yng = M, %pay = L) = 0.4246 and S(%yng = H, %pay

    = H) = 0.5431 respectively. Since both of these largest subsethood values are less than the

    acceptable truth level it follows these branches require further partitioning with different

    attributes needed to be considered. For the (%yng = M) branch we first calculate this

    classification ambiguity G(%yng = M) = 0.7820 value then compare this with the

    classification ambiguity with fuzzy partitions values, i.e. consider the other attributes from

    this branch, e.g. G(age | %yng = M) = 0.6598. An inspection of the possible values showsG(%prt | %yng = M) = 0.4856 is the least, hence %prt is the chosen attribute for the decision

    node at this branch. It also follows G(%yng = H) = 0.3831, and G(%fem | %yng = H) =

    0.2725 is the chosen attribute for this branch.

    The branches from the decision node (%prt | %yng = M) are next considered. Firstly the

    associated largest subsethood values for each subsequent branch; S(%yng = M and %prt = L,

    %pay = L) = 0.6101 and S(%yng = M and %prt = H, %pay = H) = 0.4970. Of these values,

    only S(%yng = M and %prt = L, %pay = L) has a value above the truth value, hence is a leaf

    node, the other branch requires possible further partitioning with attributes. For the decision

    node (%fem | %yng = H) it follows the largest subsethood values for each branch are S(%yng

    = H and %fem = L, %pay = L) = 0.5860, S(%yng = H and %fem = M, %pay = H) = 0.6482

  • 8/8/2019 Fuzzy Trees

    12/15

    11

    and S(%yng = H and %fem = H, %pay = H) = 0.6597. Hence only branch (%yng = H and

    %fem = L) requires further possible partitioning by attributes.

    This process is continued until only leaf nodes are at the end of each branch, or no further

    augmentation of attributes to nodes can be made.7

    The final results of the fuzzy decision tree

    method are illustrated in Figure 5.

    R o o t

    % y n g = L % y n g = M % y n g = H

    % y n g

    % fe m = L% p r t = H

    % f e m % o l d

    % o ld = L % o ld = M % o ld = H

    % p r t

    % p r t = L

    % f e m

    % fe m = M

    % f e m = M

    % f e m = H

    % f e m = H% fe m = L

    % p r t = L % p r t = H

    % p r t

    8 6 . 7 %

    6 1 . 0 %

    8 2 .2 % 6 6 .9 %

    7 7 .2 % 6 5 .7 %

    9 3 .6 % 7 8 .9 %1 0 0 . 0 %

    % p a y = H % p a y = H6 4 .8 % 6 5 .9 %

    % p a y = Z

    % p a y = L

    % p a y = L

    % p a y = H

    % p a y = L % p a y = H

    % p a y = L % p a y = L% p a y = Z

    Figure 5: Fuzzy decision tree.

    In Figure 5, the fuzzy decision tree is shown for the NMW problem considered. It follows

    there are 11 fuzzy rules (leaf nodes), described by the larger rectangle boxes. Hence each ruleis described by the downward progression from the root to a leaf node. That is, in each non-

    leaf node (excluding root) there are two parts. Firstly in their rectangle boxes, above the

    dashed line the particular condition attribute linguistic term to be satisfied. Secondly, below

    the dashed line the next condition attribute to consider.

    At a leaf node, above the dashed line is the final condition attribute linguistic term to be

    satisfied and below the dashed line the class of the decision attribute %pay the rule classifies

    7 This may be based on no improvement (reduction) of the classification ambiguity value of a branch, or no

    further attributes able to be augmented.

  • 8/8/2019 Fuzzy Trees

    13/15

    12

    to, along with the degree of truth in the classification. For example one rule is given in Figure

    6 along with a wording of the rule.

    R o o t

    % y n g = M

    % y n g

    % p r t

    % p r t = L

    % p a y = L6 1 .0 %

    If %yng = M and %prt = L then

    %pay = L with degree of truth61.0%.

    That is, when the fuzzy value of the

    membership function for %yng = M

    is the largest for that attribute,

    similarly for %prt = L condition

    attribute.

    Figure 6: Description of a fuzzy decision rule.

    To illustrate this decision tree the establishment given in Table 3 is used to illustrate its

    classification. The fuzzy values for the establishment are given below, with the largest values

    from each attribute underlined;

    {0, 0.417, 0.157; 0, 0.278, 0.222; 0, 0.793, 0.126; 0, 0, 0.571; 1, 0, 0; 1, 0; 0, 0.599, 0.007}

    it follows for each attribute the dominant linguistic terms are age = M (since largest value0.417), emps = M, %yng = M, %old = H, %fem = L, %prt = H and %pay = L. Using this

    information it shows that the fuzzy rule given in Figure 6 is the rule which classifies this

    establishment. An inspection of the result shows the correct classification was given, even

    though the degree of truth is an indication of the fuzzy nature of this analysis.

    6 Conclusions

    This paper has illustrated the use of a fuzzy decision tree approach to the investigation of

    identifying establishments that pay low wages. Through the use of Parzen windows and

    FUSINTER, the required membership functions are intelligently constructed, with the needfor an expert opinion not required within many parts of the analysis.

    The results of the fuzzy decision tree, are fuzzy classification rules each with an associated

    degree of truth in their classification. These rules are relatively simple to read and apply, i.e.

    a person may calculate the specific fuzzy values from crisp data or simply use the low (L),

    medium (M) and high (H) labels as simple linguistic terms. Hence removing the need for any

    further analysis to be undertaken, except the personnel linguistic judgements.

    References

    Bandemer, H. and Gottwald, S. (1995). Fuzzy Sets, Fuzzy Logic Fuzzy Methods. Wiley,

    New York.

  • 8/8/2019 Fuzzy Trees

    14/15

    13

    Cios, K. J. and Sztandera, L. M. (1992). Continuous ID3 algorithm with fuzzy entropy

    measure. Proceedings IEEE International Conference on Fuzzy Systems, San Diego, CA,

    469476.

    Duda, R. O., and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley, New

    York.

    Higashi, M. and Klir, G. J. (1983). Measure of uncertainty and information based on

    possibility distributions.International Journal of General systems, 9: 4358.

    Hong, T-P. and Chen, J-B. (1999). Finding relevant attributes and membership functions.

    Fuzzy Sets and Systems, 103: 389404.

    Hu, C-F. and Fang, S-C (1998). Solving fuzzy inequalities with concave membershipfunctions. Fuzzy Sets and Systems, 99: 233240.

    Hunt, E. B. (1962). Concept learning: An information processing problem. New York,

    Wiley.

    Kahraman, C., Tolga, E. and Ulukan, Z. (2000). Justification of manufacturing technologies

    using fuzzy benefit/cost ration analysis.International Journal of Production Economics, 66:

    4552.

    Kosko, B. (1986), Fuzzy entropy and conditioning.Information Science, 30: 165

    174.

    Low Pay Commission (2000). The National Minimum Wage: The Story So Far: Second

    Report of the Low Pay Commission. Cm 4571, London: HMSO.

    McNabb, R. and Whitfield K. (2000). Worth So Appallingly Little: A Workplace-Level

    Analysis of Low Pay.British Journal of Industrial Relations, 38(4): 585609.

    Medasani, S., Kim, J. and Krishnapuram, R. (1998). An overview of membership function

    generation techniques for pattern recognition. International Journal of Approximate

    Reasoning, 19: 391417.

    Parzen, E. (1962). On Estimation of a probability density function mode. Annals of

    Mathematical Statistics , 33: 10651076.

    Quinlan, J. R. (1986). Induction of decision trees.Machine Learning, 1(1): 81106.

    Roa-Sepulveda C. A. and Herrera, M. (2000). A solution to the economic dispatch problem

    using decision trees.Electric Power Systems Research, 56: 255259.

    Sancho-Royo, A. and Verdegay, J. L. (1999). Methods for the Construction of Membership

    Functions.International Journal of Intelligent Systems , 14: 12131230.

  • 8/8/2019 Fuzzy Trees

    15/15

    14

    Tarrazo, M. and Gutierrez L. (2000). Economic expectation, fuzzy sets and financial

    planning.European Journal of Operational Research, 126: 89105.

    Thompson, J. R. and Tapia, R. A. (1990). Nonparametric Function Estimation, Modeling,

    and Simulation. Society for Industrial and Applied Mathematics, Philadelphia.

    Wang, X., Chen, B., Qian, G. and Ye, F. (2000). On the optimization of fuzzy decision

    trees. Fuzzy sets and Systems, 112: 117125.

    Weber, R. (1992). Fuzzy-ID3: a class of methods for automatic knowledge acquisition.

    Proceedings of 2nd

    International conference on Fuzzy Logic and Neural networks, Iizuka,

    Japan, 265268.

    Yuan, Y. and Shaw, M. J. (1995). Induction of fuzzy decision trees. Fuzzy Sets and

    Systems, 125139.

    Zadeh, L. A. (1978). Fuzzy Sets as a basis for a theory of possibility. Fuzzy Sets and

    Systems, 1: 328.

    Zighed, D. A., Rabaseda, S. and Rakotomala R. (1998). FUSINTER: A method for

    discretisation of continuous attributes. International Journal of Uncertainty, Fuzziness and

    Knowledge-Based Systems, 6(3): 307326.