participatory learning with granular observations

13
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY2009 1 Participatory Learning With Granular Observations Ronald R. Yager, Fellow, IEEE Abstract—We introduce and discuss the participatory learning paradigm. A formal system implementing this type of learning agent is described. We then extend this system so that it can learn from interval-type observations. We further extend this system to the case when the observation is a more general granular object such as a fuzzy set. In the initial stage, while we allowed our ob- servations to be granular, we restricted the learning to be precise values. In the next part, we allow both the observations and learned object to be granular. An important issue that arises when learn- ing granular values relates to the specificity of the learned value. Learned values that are too unspecific can be useless. We suggest methods for controlling the specificity of the values learned. Index Terms—Fuzzy sets, learning, participatory learning paradigm (PLP), specificity, trapezoidal membership. I. INTRODUCTION T HE PARTICIPATORY learning provides a paradigm for learning that emphasizes the pervasive role of what is al- ready known or believed in the learning process [1]–[3]. Central to this framework is the idea that in order for new information to contribute to learning, it must display some compatibility or consistency with what is already believed. Zadeh [4], [5] discussed the “role” of perception in human cognition and communication. He noted that much of the in- formation provided by humans is in terms of perceptions. Of- ten this perception-based information is expressed in granular terms, such as with natural language. This situation requires that digital-based agents with a capacity for learning need to be able to learn from granularly described information. Our focus here is to extend the participatory learning paradigm (PLP) to the situation in which our observation and beliefs are granules. II. P ARTICIPATORY LEARNING P ARADIGM As noted by Quine [6], [7], human learning is not context free as it is pervasively affected by the belief state of the agent doing the learning. To help include this reality in computational learning schemes, we introduced the paradigm of participatory learning [1], [8]. The basic premise of the PLP is that learning takes place in the framework of what is already learned and be- lieved. The implication of this is that every aspect of the learning process is affected and guided by the current belief system. Our choice of name, participatory learning, is meant to highlight the fact that when learning, we are in an environment in which our current knowledge participates in the process of learning about itself. The classic work by Kuhn [8] now describes related ideas in the framework of a scientific advancement. Manuscript received March 12, 2007, revised October 23, 2007; accepted February 1, 2008. First published September 9, 2008; current version published February 4, 2009. The author is with the Machine Intelligence Institute, Iona College, New Rochelle, NY 10801 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TFUZZ.2008.2005690 Fig. 1. Partial view of a prototypical participatory learning process. In Fig. 1, we provide a partial view of a prototypical participa- tory learning process that highlights the enhanced role played by the current belief system. Central to the PLP is the idea that ob- servations too conflicting with our current beliefs are generally discounted. An experience presented to the system is first sent to the acceptance or censor component. This component, which is under the control of the current belief state, decides whether the experience is compatible with the current state of belief; if it is deemed as being compatible, the experience is passed along to the learning components that uses this experience to update the current belief. If the experience is deemed as being too incom- patible, it is rejected and not used for learning. Thus, we see that the acceptance component acts as a kind of filter with respect to deciding which experiences are to be used for learning. We emphasize here that the state of the current beliefs participates in this filtering operation. We note that many learning paradigms do not include a filtering mechanism of the type provided by the acceptance function, and let all data pass through to modify the current belief state. Often in these systems, stability is ob- tained by using slow learning rates. Participatory learning has the characteristic of protecting our belief structures from wide swings due to erroneous and anomalous observations while still allowing the learning of new knowledge. Because of the aforementioned structure, a central character- istic of the PLP is that an experience has the greatest impact in causing learning or belief revision when it is compatible with our current belief system. In particular, observations too con- flicting with our current beliefs are discounted. As shown in [1] and [9], the rate of learning using the PLP is optimized for situ- ations in which we are just trying to change a small part of our current belief system. The structure of the participatory learning system (PLS) is such that it is most receptive to learning when confronted with experiences that convey the message “what you know is correct except for this little part.” On the other hand, a PLS when confronted with an experience that says “you are all wrong, this is the truth” responds by discounting what is being told to it. In its nature, it is a conservative learning system and hence very stable. We can see that the participatory learn- ing environment uses sympathetic experiences to modify itself. Unsympathetic observations are discounted as being erroneous. Generally, a system based on the PLP uses the whole context of an observation (experience) to judge something about the 1063-6706/$25.00 © 2009 IEEE

Upload: rr

Post on 28-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Participatory Learning With Granular Observations

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2009 1

Participatory Learning With Granular ObservationsRonald R. Yager, Fellow, IEEE

Abstract—We introduce and discuss the participatory learningparadigm. A formal system implementing this type of learningagent is described. We then extend this system so that it can learnfrom interval-type observations. We further extend this system tothe case when the observation is a more general granular objectsuch as a fuzzy set. In the initial stage, while we allowed our ob-servations to be granular, we restricted the learning to be precisevalues. In the next part, we allow both the observations and learnedobject to be granular. An important issue that arises when learn-ing granular values relates to the specificity of the learned value.Learned values that are too unspecific can be useless. We suggestmethods for controlling the specificity of the values learned.

Index Terms—Fuzzy sets, learning, participatory learningparadigm (PLP), specificity, trapezoidal membership.

I. INTRODUCTION

THE PARTICIPATORY learning provides a paradigm forlearning that emphasizes the pervasive role of what is al-

ready known or believed in the learning process [1]–[3]. Centralto this framework is the idea that in order for new informationto contribute to learning, it must display some compatibility orconsistency with what is already believed.

Zadeh [4], [5] discussed the “role” of perception in humancognition and communication. He noted that much of the in-formation provided by humans is in terms of perceptions. Of-ten this perception-based information is expressed in granularterms, such as with natural language. This situation requires thatdigital-based agents with a capacity for learning need to be ableto learn from granularly described information. Our focus hereis to extend the participatory learning paradigm (PLP) to thesituation in which our observation and beliefs are granules.

II. PARTICIPATORY LEARNING PARADIGM

As noted by Quine [6], [7], human learning is not contextfree as it is pervasively affected by the belief state of the agentdoing the learning. To help include this reality in computationallearning schemes, we introduced the paradigm of participatorylearning [1], [8]. The basic premise of the PLP is that learningtakes place in the framework of what is already learned and be-lieved. The implication of this is that every aspect of the learningprocess is affected and guided by the current belief system. Ourchoice of name, participatory learning, is meant to highlight thefact that when learning, we are in an environment in which ourcurrent knowledge participates in the process of learning aboutitself. The classic work by Kuhn [8] now describes related ideasin the framework of a scientific advancement.

Manuscript received March 12, 2007, revised October 23, 2007; acceptedFebruary 1, 2008. First published September 9, 2008; current version publishedFebruary 4, 2009.

The author is with the Machine Intelligence Institute, Iona College, NewRochelle, NY 10801 USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TFUZZ.2008.2005690

Fig. 1. Partial view of a prototypical participatory learning process.

In Fig. 1, we provide a partial view of a prototypical participa-tory learning process that highlights the enhanced role played bythe current belief system. Central to the PLP is the idea that ob-servations too conflicting with our current beliefs are generallydiscounted. An experience presented to the system is first sent tothe acceptance or censor component. This component, which isunder the control of the current belief state, decides whether theexperience is compatible with the current state of belief; if it isdeemed as being compatible, the experience is passed along tothe learning components that uses this experience to update thecurrent belief. If the experience is deemed as being too incom-patible, it is rejected and not used for learning. Thus, we see thatthe acceptance component acts as a kind of filter with respectto deciding which experiences are to be used for learning. Weemphasize here that the state of the current beliefs participatesin this filtering operation. We note that many learning paradigmsdo not include a filtering mechanism of the type provided bythe acceptance function, and let all data pass through to modifythe current belief state. Often in these systems, stability is ob-tained by using slow learning rates. Participatory learning hasthe characteristic of protecting our belief structures from wideswings due to erroneous and anomalous observations while stillallowing the learning of new knowledge.

Because of the aforementioned structure, a central character-istic of the PLP is that an experience has the greatest impact incausing learning or belief revision when it is compatible withour current belief system. In particular, observations too con-flicting with our current beliefs are discounted. As shown in [1]and [9], the rate of learning using the PLP is optimized for situ-ations in which we are just trying to change a small part of ourcurrent belief system. The structure of the participatory learningsystem (PLS) is such that it is most receptive to learning whenconfronted with experiences that convey the message “what youknow is correct except for this little part.” On the other hand,a PLS when confronted with an experience that says “you areall wrong, this is the truth” responds by discounting what isbeing told to it. In its nature, it is a conservative learning systemand hence very stable. We can see that the participatory learn-ing environment uses sympathetic experiences to modify itself.Unsympathetic observations are discounted as being erroneous.Generally, a system based on the PLP uses the whole contextof an observation (experience) to judge something about the

1063-6706/$25.00 © 2009 IEEE

Page 2: Participatory Learning With Granular Observations

2 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2009

Fig. 2. Fully developed prototypical participatory learning process.

credibility of the observation with respect to the learning agent’sbeliefs; if it finds the whole experience credible, it can modify itsbelief to accommodate any portion of the experience in conflictwith its belief. That is, if most of an experience or observationis compatible with the learning agent’s current belief, the agentcan use the portion of the observation that deviates from its cur-rent belief to learn. It should be pointed out that in this system,as in human learning and most other formal learning models,the order of experiences affects the model.

While the acceptance function in PLP acts to protect an agentfrom responding to “bad” data, it has an associated downside. Ifthe agent using a PLP has an incorrect belief system about theworld, it constrains this agent to remain in this state of blissfulignorance by blocking out correct observations that may conflictwith this erroneous belief model. In Fig. 2, we provide a morefully developed version of the PLP that addresses this issue byintroducing an arousal mechanism in its guise of a critic.

The role of the arousal mechanism, which is an autonomouscomponent not under the control of the current belief state, is toobserve the performance of the acceptance function. In particu-lar, if too many observations are rejected as being incompatiblewith the agent belief model, the learning agent becomes arousedthat something may be wrong with its current state of belief,i.e., a loss of confidence is incurred. The effect of this loss ofconfidence is to weaken the filtering aspect of the acceptancecomponent and let incoming experiences that are not necessar-ily compatible with the current state of belief to be used to helpupdate the current belief. This situation can result in a rapidlearning in the case of a changing environment once the agenthas been aroused. Essentially, the role of the arousal mechanismis to help the agent to get out of a state of belief that is deemedas false.

Fundamentally, we see two collaborating mechanisms at playin this PLP. The primary mechanism manifested by the accep-tance function and controlled by the current state of belief is aconservative one; it assumes that the current state of belief issubstantially correct and only requires slight tuning. It rejectsstrongly incompatible experiences and does not allow them tomodify its current belief. This mechanism manifests its effect oneach individual learning experience. The secondary mechanism,controlled by the arousal mechanism, being less conservative al-lows for the possibility that the agent’s current state of beliefmay be wrong. This secondary mechanism is generally kept

dormant unless activated by being aroused by an accumulationof input observations in conflict with the current belief. Whatmust be emphasized is that the arousal mechanism contains noknowledge of the current beliefs, all knowledge resides in thebelief system; it is basically a scoring system calculating howthe current belief system is performing. It essentially does thisby noting how often the system has encountered incompati-ble observations. Its effect is not manifested by an individualincompatible experience but by an accumulation of these.

III. BASIC PARTICIPATORY LEARNING MODEL

We provided an example of a learning agent based on the PLPin [1] in which we have a context consisting of a collection ofvariables, X(i), i = 1 to n. We assumed that the values of theX(i) ∈ [0, 1]. The current state of the agents belief consists ofa vector Vk−1 whose components, Vk−1(i), i = 1 to n, are theagents current belief about the values of the X(i). This is whatthe agent has learned after k − 1 observations. The current ob-servation (learning experience) consists of a vector Dk , whosecomponent dk (i), i = 1 to n, is an observation about the variableX(i). Using the participatory learning mechanism, the updationof our current belief is the vector Vk whose components are

Vk (i) = Vk−1(i) + αρ(1−ak )k (dk (i) − Vk−1(i)). (1)

Using vector notation, we can express this as

Vk = Vk−1 + αρ(1−ak )k (Dk − Vk−1)

Vk = αρ(1−ak )k Dk + (1 − αρ

(1−ak )k )Vk−1 .

In the previous equation, α ∈ [0, 1] is the basic learningrate. We see α functions like the learning rate in most updationalgorithms; the larger the α, the more the update value is affectedby the current observation. The term ρk is the compatibilityof the observation Dk with the current belief Vk−1 . It wassuggested in [1] that we calculate

ρk = 1 − 1n

n∑i=1

|dk (i) − Vk−1(i)|. (2)

It is noted that ρk ∈ [0, 1]. The larger the ρk , the morecompatible the observation is with the current belief. Theagent looks at the individual compatibilities as expressed by|dk (i) − Vk−1(i)| to determine the overall compatibility ρk .Here, we see that if ρk is small, the observation is not com-patible, and then, we tend to filter out the current observa-tion. On the other hand, if ρk → 1, then the current observa-tion is in agreement with the belief and is allowed to affectthe belief. We refer to observation with large ρk as “kindred”observations.

The term ak , also lying in the unit interval, is called thearousal rate. It is inversely related to the confidence in the modelbased upon its past performance. The smaller the ak , the moreconfident it is. The larger the arousal rate, the more suspect weare about the credibility of our current beliefs. The calculationfor ak suggested in [1] is

ak = (1 − β)ak−1 + β(1 − ρk ). (3)

Page 3: Participatory Learning With Granular Observations

YAGER: PARTICIPATORY LEARNING WITH GRANULAR OBSERVATIONS 3

Here, β ∈ [0, 1] is a learning rate. β is generally less thanα.

While here we shall restrict the values of X(i) to be the unitinterval, this is not necessary. We can draw X(i) from somesubset Si of the real line. However, in this case, we must use aproximity relationship, Proxi [10]. We recall

Proxi : Si × Si → [0, 1]

such that Prox(x, x) = 1 and Prox(x, y) = Prox(y, x). Moregenerally, we need some methodology for taking an observationDk and the current belief Vk−1 , and determining Comp(Dk ,Vk−1 ) as a value in the unit interval.

IV. LEARNING FROM INTERVAL OBSERVATIONS

We shall now consider the issue of learning when our obser-vations can be granular objects [11]–[14]. Here, we shall thenassume a collection of n variables X(i), i = 1 to n. Again,for simplicity, we shall assume that each X(i) takes its valuein the unit interval. We note, as pointed out earlier, that moregenerally, X(i) can be drawn from space Si on which thereexists some proximity relationship, Proxi(x, y) ∈ [0, 1] for xand y ∈ Si . Here, an observation Dk has components dk (i)that are observations about the variable X(i). Here, we shall,however, allow dk (i) to be granular objects.

We shall use the basic participatory learning algorithm

Vk (i) = Vk−1(i) + αρ(1−ak )k (dk (i) − Vk−1(i))

where ρk = 1 − (1/n)∑n

i=1 |dk (i) − Vk−1(i)|. We shall cal-culate ak as in the preceding ak = βak−1 + βρk .

Initially, we shall consider the case where while the observa-tions are granular, our desire is to learn precise values for thevariables X(i). We shall begin our investigation by further as-suming that the granular observations are intervals. Thus, hereour observations are dk (i) = [Lk (i), Uk (i)] where Lk (i) andUk (i) are the lower and upper ends of the interval. It is clearthat if Lk (i) = Uk (i), then we have a point observation.

Let us first consider the calculation of Vk (i) in this case.Here, we shall assume that we have already calculated ρk andak . Using these values, we calculate αρ

(1−ak )k = δk . Now we

must calculate

Vk (i) = Vk−1(i) + δk (dk (i) − Vk−1(i))

where dk (i) = [Lk (i), Uk (i)]. Using interval arithmetic, we getthat

dk (i) − Vk−1(i) = [Lk (i) − Vk−1(i), Uk (i) − Vk−1(i)].

Since δk ≥ 0, then δk [Lk (i) − Vk−1(i), Uk (i) − Vk−1(i)] =[δk (Lk (i) − Vk−1(i)), δk (Uk (i) − Vk−1(i))]. From this,we get Vk (i) = [Vk−1(i) + δk (Lk (i) − Vk−1(i)), Vk−1(i) +δk (Uk (i) − Vk−1(i))].

Since we desire a precise value for Vk (i), we must combinethis interval to a single value. The most natural method is to take

the midpoint of this

Vk (i) =2Vk−1(i) + δk (Lk (i) + Uk (i) − 2Vk−1(i))

2Vk (i) = Vk−1(i) + δk (Mk (i) − Vk−1(i)).

Here, Mk (i) = [Lk (i) + Uk (i)]/2 is the midpoint of the ob-served interval.

We now turn to the calculation of ρk ,which we previouslyassumed was available. Since ρ

(1−ak )k

ρk =1n

n∑i=1

Compi(dk (i), Vk−1(i))

the value of ρk depends on the calculation of the individualCompi(dk (i), Vk−1(i)). Here, since we have assumed the vari-able X(i) lies in the unit interval

Compi(dk (i), Vk−1(i)) = 1 − Dist(dk (i), Vk−1(i)).

Since dk (i) = [Lk (i), Uk (i)] is an interval, we are faced withthe problem of calculating the distance of a point Vk−1(i) froman interval.

In the following discussion, for the sake of clarity, we shalltemporarily suppress unnecessary indices, and just consider thecalculation of the distance between a generic interval [L,U ] anda point V , all lying in the unit interval. We can associate withthe interval [L,U ] and V two points, the points in the intervalclosest and farthest from V . We denote these as C and F .

We see that these points are as follows.1) If V ≤ L, then C = L and F = U .2) If V ≥ U, then C = U and F = L.3) If V ∈ [L, U ] and U − V ≥ V − L, then C = V and

F = U .If V ∈ [L,U ] and (U −V ) < V −L, then C = V andF = L.

Using the values C and F , we can calculate DC = |V − C|and DF = |V − F | as the closest and farthest distance of Vfrom the observation. We note that if we are using a prox-imity relationship, then we calculate Prox(V , F ) and Prox(V ,C). Since in unit interval case, we can calculate Prox(x, y) =1−Dist(x, y), we shall use the more general concept of proxim-ity. At this point, the learning agent has some degree of freedomin calculating the compatibility, Comp(D, V ). In order to modelthis freedom, we introduce a parameter λ ∈ [0, 1] and express

Comp(D, V ) = λ Prox(V, C) + λ Prox(V, F ).

If λ = 1, we use the closest value in D to calculate thecompatibility, and if λ = 0, we use the farthest.

Here, λ provides a parameter by which we can characterizeour learning agent. We note that if λ is large, closer to one, theagent is more willing to accept new observation and hence moreopen to learning. On the other hand, the smaller the λ, i.e., closerto zero, the less open the agent is to using observation to learn.It is more conservative.

We should note that while building agents that learn, we havethe additional freedom of specifying λ globally for all variablesor providing a different value λi for each variable. An additionaldegree of sophistication can be obtained by allowing λ to change

Page 4: Participatory Learning With Granular Observations

4 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2009

rather than being fixed. In particular, λ can be a function of thearousal rate.

We note that once we have determined ρk , the calculation ofak poses no new challenges. However, there is an interestingobservation we can make. We note that by selecting λ to belarge will result in larger compatibilities and hence a largervalue for ρk . This will, of course, as we already noted, make thelearning of Vk more open to accept the observation and hencemake for faster learning. However, there is an interesting counterbalancing effect. Making ρk large has the effect of making thearousal factor ak smaller. Since the full term used in (1) isρ

(1−ak )k , a smaller value for ak tends to slow down the learning

because we are assuming that the model is good. Selecting λ

to be small will, of course, have the opposite effect: it willdiminish ρk , which tends to reduce the effect of the currentobservation, but on the other hand, it will tend to increase ak ,the arousal rate, which has the effect of making the system morewilling to receive new observations. Thus, the effect of λ hasvery interesting dynamics that appears to be a kind of delayedfeedback. This effect appears to be balancing and does not allowthe learning to get to extreme.

An interesting possible type of learning agent is one that hastwo values for the parameter λ used to determine ρk . Let usdenote one as λ1 and the other λ3 . Here, λ1 is used to determinethe compatibility, and eventually, the value of ρk used in the for-mula (1), while λ3 is determined to determine the compatibilityvalue ρk used in calculating ak . We see that by making λ1 largeand λ3 small, we are tending to be extremely open to learning.In this case, we tend to believe that the current observation iscompatible with our beliefs for learning Vk . On the other hand,we tend to assume that the observation was not compatible fordetermining our rate of arousal. This tends to make ak large.In combination, these together lead to larger values for ρ

(1−ak )k

making the learning faster.An opposite effect can be observed if λ1 is made small and

λ3 is made large. Here, the agent is tending to believe that theobservation is incompatible with his current beliefs for learningVk . On the other hand, he is assuming that the observationis compatible for determining the arousal rate. This tends tomake ak smaller. In combination, these together lead to smallervalues for ρ

(1−ak )k tending to block any learning. Here, the agent

is being very conservative.That the values of λ1 and λ3 are very different may be useful

to model kinds of neurotic behavior.

V. LEARNING FROM FUZZY SET OBSERVATIONS

We now turn to the case where the granular observation ratherbeing an interval is a fuzzy subset. Here, we are still interestedin learning a precise value for Vk (i). Thus, here dk (i) is a fuzzysubset over the universe of X(i), which we will assume to bethe unit interval. We shall denote dk (i) as Fi/k . In implementingformula (1) for the calculation of Vk (i), a natural approach is touse the center of gravity of Fi/k . In particular, we calculate

mk (i) =∫

y∈I

yFi/k (y) dy

Fig. 3. Basic normal unimodal fuzzy set.

Fig. 4. Point V to the left of the fuzzy set.

where I = [0, 1]. If the domain of X(i) is not I but Yi , then weintegrate over Yi .

Using mk (i), we calculate

Vk (i) = Vk−1(i) + αρ(1−ak )k (mk (i) − Vk−1(i)).

In this case, the determination of ρk becomes slightly morecomplicated as we deal with fuzzy subsets rather than intervals.

Again here, the issue becomes determining the closest andfarthest distance between the fuzzy subset Fi/k and the pointVk−1(i). For clarity, in the following discussion, we shall againsuppress the unnecessary indices and consider the distance be-tween a fuzzy subset F and a point V both from the unit interval.

Since F is an observation or perception of the value of avariable, it is natural to assume that F is a normal unimodal fuzzysubset. It has the form shown in Fig. 3 and has the followingcharacteristics:

F (a) ≤ F (b), for a ≤ b ≤ y1

F (y) = 1, for y1 ≤ y ≤ y2

F (a) ≥ F (b), for y2 ≥ a ≥ b.

The first step in the process is to calculate the fuzzy sub-set E = |F − V |. To calculate E, we use Zadeh’s extensionprinciple [15], [16]

E =⋃

y∈[0,1]

{F (y)

|y − V |

}.

In particular, E is a fuzzy subset of [0, 1] such that

E(z) = Max|y−V |

[F (y)] .

Note: If the domain of Y is not the unit interval, we then calcu-late Ep =

⋃y∈Y {F (y)/(1 − Prox(x, v))}. Since Prox(x, y) ∈

[0, 1], then we still get a fuzzy subset of the unit interval.A relatively easy method for obtaining E can be formulated

especially in the case where V is outside F , F (V ) = 0. Firstconsider the situation where F and V are as given in Fig. 4.

Page 5: Participatory Learning With Granular Observations

YAGER: PARTICIPATORY LEARNING WITH GRANULAR OBSERVATIONS 5

Fig. 5. Point V to the right of the fuzzy set.

Fig. 6. Point V has membership in the fuzzy set.

In this case, to obtain the fuzzy set E, we will calculate

E(z) = F (V + z), for 0 ≤ z ≤ 1 − V

E(z) = 0, for 1 − V ≤ z ≤ 1.

Now consider the case shown in Fig. 5. Again, F (V ) = 0In the case, we obtain E as

E(z) = F (V − z), 0 ≤ z ≤ 1 − V

E(z) = 0, 1 − V < z ≤ 1.

Now consider the case where F (V ) �= 0, as shown in Fig. 6.In this case

E(z) = Max[F (V + z), F (V − z)].

Actually, all three cases can be expressed as given earlier.However, in the case of Fig. 4, we have F (V − z) = 0 for allz ≥ 0, while in the case of Fig. 5, we have F (V + z) = 0 forall z ≥ 0.

It should be noted that E(z) will itself be unimodal. Forthe situations shown in Figs. 4 and 5, the unimodality of Eis obvious. For the case in Fig. 6, it requires a little expla-nation. Again assume that F is unimodal with some inter-val (y1 , y2) where F (y) = 1. Assume that V is to the rightof this interval, V > y2 . For all z such that 0 ≤ z ≤ V − y1 ,we have that F (V − z) ≥ F (V ) ≥ F (V + z). Hence, E(z) =F (V -z) and therefore is increasing as z increases. For all(V − y1) ≤ z ≤ (V − y2), we have F (V − z) = 1, and hence,E(z) = 1. For all z ≥ V − y1 , both F (V − z) and F (V + z) aremonotonically decreasing, and hence, E(z) must be decreasing.

An analogous argument can be used to show that if V < y1 ,then E must be unimodal.

In the case where V ∈ [y1 , y2 ], we get a special case ofunimodal function for E. In this case, E(0) = 1. In particu-lar, E(z) = 1 for 0 ≤ z ≤ (|y2 − V | ∨ |V − y1 |) and elsewhereE(z) = Max[F (V + z), F (V − z)]. Since both these terms aredecreasing, E is also decreasing and looks as shown in Fig. 7.

We now have a fuzzy subset E of the unit interval corre-sponding to the distance of V from F . We must now use this

Fig. 7. Decreasing membership.

Fig. 8. Illustration of the α-level set.

to calculate the compatibility between V and F . Let us recallthe case where we had an interval instead of fuzzy sets to getsome inspiration. In the interval case, we determined the clos-est distance a and the farthest distance b. We then calculatedComp(V, D) = λa + λb where λ ∈ [0, 1].

We can make use of the idea of level sets [17], [18] to gen-eralize this approach to the case where we have a unimodalfuzzy subset E instead of an interval. We recall that the α-levelset of E, Eα , is a crisp subset of the unit interval such thatEα = {z/E(z) ≥ α}.

It is well known that in the case when E is unimodal, thenEα is an interval. In Fig. 8, this is clearly illustrated.

Furthermore, for any level set Eα , we can easily obtain theminimal and maximum distance between F and V as the onescorresponding to the ending points of the interval Eα . We denotethem as cα and fα . In Fig. 9, we illustrate this.

Using a method suggested for extending set-based operationsto fuzzy set operations [19], [20], we can calculate

c =∫ 1

0cα dα

and

f =∫ 1

0fα dα.

Thus, the shortest distance is the average of the shortest dis-tance of the level sets weighted by the level grade. A very effi-cient way of obtaining these values can be suggested. We shallfirst consider the case of the calculation of c. This is based on thefact that the calculation of C is a kind of Choquet integral [21],which is shown in Fig. 10. The striped area is the value of C.

In Fig. 11, z1 is the value of c1 , the smallest point in the levelset E1 . It is also the smallest value of z where E(z) = 1.Using

Page 6: Participatory Learning With Granular Observations

6 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2009

Fig. 9. Minimal and maximal distance between F and V .

Fig. 10. Calculation of C .

Fig. 11. Calculation of f .

z1 , we can easily obtain C as

C =∫ 1

0cα dα = z1 −

∫ z1

0E(z) dz.

In essence, we calculate the area under E from 0 to z1 , andthen, subtract it from z1 to obtain the striped area, the value ofC.

The calculation of f =∫ 1

0 fα dα can also be made very effi-ciently. In Fig. 11, we show the calculation of f . It is also thestriped area. The calculation of f can be obtained as

f = z2 −∫ 1

z2

E(z) dz.

VI. LEARNING INTERVAL VALUES

We now turn to the situation in which rather than imposing therestriction that the learned values are precise values, we allowthem to be granular objects. One fundamental considerationwhen using granular values involves the specificity [22]–[24]of the learned value. While precise values have specificity oneand are maximally specific, granular objects can have varyingdegrees of specificity. Generally, the wider and the less precisethe granular object, the smaller the specificity. As noted in [23]and [25] for an interval, the specificity is directly related to itswidth. Of great significance here is the fact that the specificityor precision of information is closely related to its usefulness.

For example, knowing that the temperature is between 10◦ and70◦ is less useful than knowing the temperature is between 50◦

and 60◦. Thus, generally, the less specific the information, theless useful it is. Since usefulness is an important reason behindour learning, issues related to specificity play an important rolein learning granular objects.

As noted in [23] and [25], if a variable takes its value in theinterval [a, b], then the specificity associated with a subinter-val [c, d] is obtained as Sp([c, d]) = 1 − (d − c)/(b − a). Inthe special case where [a, b] is the unit interval [0, 1], thenSp([c, d]) = 1 − (d − c) = c + (1 − d) = c + d.

As an initial situation of learning granular objects, we shallconsider the case where our observations are intervals, and wealso allow the learned values to be intervals. This will allow usto focus on some fundamental issues raised by the introductionof granular beliefs without the complexity of the gradualarityintroduced by fuzzy objects. Here again, for simplicity, we as-sume that the variables take their values in the unit interval. Withthe use of proximity relations, the ideas are easily extended toany domain.

Consider the variable X(i). Our belief of its value after k − 1iterations is Vk−1(i) = [L1 , U1 ] and the current observation isdk (i) = [L2 , U2 ]. Let us assume that using these values, wecalculate ρk and ak , and using these, we get δ = αρ

(1−ak )k . In

this case, our new belief is

Vk (i) = Vk−1(i) + δ(dk (i) − Vk−1(i)) = δdk (i) + δVk−1(i).

Putting in the interval values for dk (i) and Vk−1(i), we get

Vk (i) = [L1 , U1 ] + δ([L2 , U2 ] − [L1 , U1 ]) = [L3 , U3 ].

Here

L3 = L1 + δ(L2 − L1) = δL2 + δL1

U3 = U1 + δ(U2 − U1) = δU2 + δU1 .

Let us look at the specificity and the related idea of usefulness.As we earlier indicated, for any interval value [L,U ], its speci-ficity Sp([L,U ]) = 1− (U − L). If we denote ∆ = U − L, thenSp([L, U ]) = 1 − ∆. Thus, in the case of Vk−1(i) = [L1 , U1 ],we have Sp(Vk−1) = 1 − ∆1 .

Now consider the updated value Vk (i) = [L3 , U3 ]. Denoting∆3 = U3 − L3 , we get Sp(Vk ) = 1 − ∆3 . For convenience, weshall refer to ∆3 as the imprecision in our knowledge of Vk .Thus, we see that the larger the imprecision, the smaller thespecificity. Thus, increasing ∆3 decreases the specificity and, inturn, leads to a reduction in the usefulness of the information.

Consider the calculation of ∆3

∆3 = U3 − L3 = (U1 − L1) + δ(U2 − L2) − δ(U1 − L1).

If we denote ∆2 = U2 − L2 and ∆1 = U1 − L1 , then

∆3 = δ ∆1 + δ ∆2

where δ = 1 − δ.The first thing we observe is that if ∆2 > ∆1 , then ∆3 ≥ ∆1 ,

and we have decreased the specificity and potential usefulnessof X(i). More generally, large values of ∆2 , highly impreciseobservations, tend to decrease the specificity of the resulting

Page 7: Participatory Learning With Granular Observations

YAGER: PARTICIPATORY LEARNING WITH GRANULAR OBSERVATIONS 7

learned values. An extreme example is dk (i) = [0, 1], and in thiscase, ∆3 = δ ∆1 + δ. Clearly, this observation has not providedany help in determining the value of the variable X(i) but hasjust introduced more uncertainty into the situation. On the otherhand, a precise observation that has ∆2 = 0 will tend to increasethe specificity.

The effect of the most recent observation is, of course, medi-ated by the compatibility through the value δ. It would seem thatthere can be some benefit of controlling the effect or influenceof very imprecise observations on the leaning. Before turning tothis possibility, we look at the calculation of the compatibilityρk .

We recall that ρk = 1/n∑n

i=1 Compi(Vk−1(i), dk (i)). Here,Compi(Vk−1(i), dk (i)) is the compatibility of the observationof X(i) with the current belief. In the case where the variabletakes values in the unit interval, we suggested that

Compi(Vk−1(i), dk (i)) = 1 − Dist(Vk−1(i), dk (i)).

In the following, we shall suppress unnecessary subscriptsand simply refer to Vk−1(i) as V (i) and dk (i) as d(i). They arethe current belief and current observation. In the case where bothV (i) and d(i) are points, then Dist(V (i), d(i)) = |d(i) − V (i)|,a value that lies in the unit interval. In the case where d(i) is aninterval d(i) = [L,U ] and V (i) is a precise value, we calculated

Dist(V (i), d(i)) = λ |C − V (i)| + λ|f − V (i)|

where C is the closest point, in d, to V and f is the farthest point,in d, from V . Here, λ ∈ [0, 1] is a parameter characterizing thelearner’s attitude to resolving imprecision. If λ = 1, the learneruses the most optimistic resolution, and when λ = 0, he usesthe most pessimistic. We recall that the larger the value of λ, themore open the agent is to learning. In this case, the learner getsa bigger compatibility.

Let us now consider the determination of Compi(V (i), d(i))in the case where both V (i) and d(i) are intervals. We denotethese as V (i) = [L1 , U1 ] and d(i) = [L2 , U2 ]. Again, we canexpress

Compi(V (i), d(i)) = 1 − Dist(V (i), d(i)).

Since the distance between V (i) and d(i) is imprecise, wecan use a method similar to the preceding technique. Let x andy denote arbitrary points in V (i) and d(i). Using this, we define

M∗ = Minx, y [|x − y|]M ∗ = Maxx,y [|x − y|].

Thus, M∗ is the smallest distance between points from V (i)and d(i), and M ∗ is the farthest distance between points in theseintervals. Using this, we define

Dist(V (i), d(i)) = λM∗ + λM ∗

where again the parameter λ ∈ [0, 1].Let us now consider the calculation of M ∗ and M ∗.

We first consider the case when the two intervals intersect,[L1 , U1 ] ∩ [L2 , U2 ] �= Ø. In this case, since they have at leastone element in common, we have M∗ = 0. On the other hand,

Fig. 12. Nonoverlapping case.

M ∗ = |L1 − U2 | ∨ |L2 − U1 |. Here, we use the join ∨ to in-dicate the Max operator. Similarly, we shall use the meet ∧ toindicate the Min operator.

Now consider the case where the intervals do not intersect, asillustrated in Fig. 12. Here, we see

M∗ = |U1 − L2 | ∧ |U2 − L1 |M ∗ = |L1 − U2 | ∨ |L2 − U1 | .

Summarizing, we see if δ12 = |U1 − L2 | and δ21 =|U2 − L1 |, then we have

M ∗ = Max[δ21 , δ12 ]

M∗ = 0, if [L1 , U1 ] ∩ [L2 , U2 ] �= Ø

M∗ = δ21 ∧ δ12 , if [L1 , U1 ] ∩ [L2 , U2 ] = Ø.

Some special cases are worth noting. Assume λ = 1,then Dist(V (i), d(i)) = M∗. If the intervals intersect, thenwe get Dist(V (i), d(i)) = 0. If they do not intersect, thenDist(V (i), d(i)) = Min(|L2 − U1 | , |L1 − U2 |). If λ = 0, thenDist(V (i), d(i)) = M ∗= Max(|L2 − U1 | , |L1 − U2 |).

Another way to view this measure of distance between theintervals is with the aid of the concept of possibility introducedby Zadeh [26]. Assume A and B are two subsets on the samespace, then Poss[A/B] = Maxz [E(z)] where E = A ∩ B. Insituation of interest to us, here, we will make use of Poss[V/d].We observe here that

Poss[V/d] = 1, if V ∩ d �= Ø

Poss[V/d] = 0, if V ∩ d = Ø.

Since

M∗ = 0, if V ∩ d �= Ø

M∗ = δ12 ∧ δ21 , if V ∩ d = Ø

then we can express M∗ = (1 − Poss[V/d])(δ12 ∧ δ21). WithDist(V, d) = λM∗ + λM ∗, we get

Dist(V, d) = λ(1 − Poss[V/d])(δ12 ∧ δ21) + λ(δ12 ∨ δ12).

Recalling [26] that Cert[V /d] = 1 − Poss[V /d], we have

Dist(V, d) = λ Cert[V /d](δ12 ∧ δ21) + λ(δ12 ∨ δ21).

In the case with λ = 1, Dist(V, d) = Cert(V /d)(δ12 ∧ δ21).Given d if we are certain that V is not true, then we take δ12 ∧δ12 , otherwise we take zero.

Let us now investigate the effect of imprecision in the ob-servation on the compatibility, i.e., the distance. Now con-sider V (i) = [L1 , U1 ] and our observation is d(i) = [L2 , U2 ].Here, then, specificity of the observation is 1 − ∆2 , where

Page 8: Participatory Learning With Granular Observations

8 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2009

∆2 = (U2 − L2). As we noted that by increasing the impreci-sion, i.e., increasing ∆2 , we cause an increase in the impressionin the final value [L3 , U3 ].

As we just showed

Dist(V, d) = λM∗ + λM ∗

Dist(V, d) = λ(1 − Poss[V/d])(δ12 ∧ δ21) + λ(δ12 ∨ δ21).

Let d = [L2 , U2 ] where L2 ≤ L2 and U2 ≥ U2 . That is,d ⊆ d the interval has increased. Let M∗ be the shortest distancebetween points in d and V , and let M∗ be the shortest distancebetween points in d and V . It is easy to see that M∗ ≤ M∗.Thus, increasing the imprecision in d can only serve to decreasethe shortest distance. The situation with respect to M ∗ is morecomplex; in some cases, increasing the impression in the obser-vation can cause an increase in M ∗ and in other cases can causea decrease. What we can say with certainty is that if λ = 1, thenan increase in impression will tend to increase the compatibility.

Our earlier analyses made it clear that observations that arealso imprecise can have negative effects on the learning process.They can, by increasing the impression of what we learn, resultin a decrease in usefulness in our knowledge. It can also com-plicate the process of trying to determine the compatibility ofthe current belief with the observation. This situation leads us tofeel that some precaution should be taken in the learning processto try to control the effects of imprecise data. In the following,we suggest a revised version of our basic learning algorithm totry to accomplish this in the case of interval observations.

We first start with the determination of the compatibilityof the observation with the current belief that we suggestedcalculating as ρk = 1/n

∑ni=1 Compi(Vk−1(i), dk (i)) where

Compi(Vk−1(i), dk (i)) = 1 − Dist(Vk−1(i), dk (i)). LettingM ∗

k (i) and M∗k (i) be the longest and shortest distance betweenpoints in the observation and the current value of X(i), we have

Dist(Vk−1(i), dk (i)) = λM∗k (i) + λM ∗k (i)

where λ ∈ [0, 1] is a parameter associated with the learningagents attitude.

We then use this to calculate

Vk (i) = Vk−1(i) + αρ(1−ak )k γk (i)(dk (i) − Vk−1(i)).

Here, γk (i) is the term that is being inserted to control theeffect of the imprecision in the observation dk (i). As we shallsubsequently describe, γk (i) will be some function of the rela-tionship between Sp(dk (i)) and Sp(Vk−1(i)).

Finally, we turn to the calculation of ak , the arousal rate.Here, we shall use

ak = ak−1 + β ψk (ρk − ak−1)

where ψk is calculated as

ψk =1n

n∑i=1

Sp(dk (i))

which is the average specificity of the observations. The role ofψk is to reduce the effect on the arousal level of observations thatare extremely imprecise. That is, we do not want to change ourarousal rate in the face of an extremely imprecise observation.

Fig. 13. Normal trapezoidal fuzzy subset.

One possible way to calculate γk (i) is based upon the com-parison of the specificity of dk (i) and Vk−1(i). In particular, wecan use the function

γk (i) =Sp(dk (i))

Sp(Vk−1(i)), if Sp(dk (i)) ≤ Sp(Vk−1(i))

γk (i) = 1, if Sp(dk (i)) > Sp(Vk−1(i)).

In some ways, the lack of specificity of an observation is areflection of confidence of the learner in the observation.1

VII. LEARNING TRAPEZOIDAL VALUES

We now turn to the case when we allow both our observationand learned values to be granules. We shall restrict ourselvesto the case when these granules are normal trapezoidal fuzzysubsets of the type shown in Fig. 13.

Our focus on trapezoidal fuzzy subsets stems from the factthat they provide very suitable objects for the representation ofgranule information in our application. This suitability is basedon at least two important features associated with the trapezoidalfuzzy subset. One, as we shall subsequently see, is the ease withwhich they can be manipulated by the required mathematical op-erations needed in the learning algorithms. A second is relatedto the information input process where the user must provide in-formation about the observations. Starting with some impreciseperception or description of the value of the current observation,the user must represent it in terms of some formal granular ob-ject. As noted by Zadeh [27], this representation step affords theuser some degree of freedom in their selection of the represent-ing granule. Various considerations can effect a users choice ofrepresentation. Foremost among these is what Zadeh [27] callscointention, the ability of the representing object to convey themeaning of the concept it is being used to represent. Anotherimportant consideration is the ease with which any needed pa-rameters can be acquired. An additional consideration is the easewith which the representing object can be manipulated in thecontext of the application. With this understanding, we now seesome of the benefits of the trapezoidal fuzzy sets. First, we seehow the form of the trapezoidal fuzzy set allows it to model awide class of imprecise observations. Another beneficial featureof the trapezoid is the ease of acquiring the parameters. Here,we only need four parameters, all of which are related to realfeatures that are not cognitively complex.

1Implicitly, we have been assuming that x(i) is a variable, such as tempera-ture, that has an exact value, and that imprecision in an observation is a reflectionof a lack of quality. There exists other types of variables whose actual valuescan be imprecise. Such a variable can be a rating or scoring of some criteria.

Page 9: Participatory Learning With Granular Observations

YAGER: PARTICIPATORY LEARNING WITH GRANULAR OBSERVATIONS 9

As we indicated, a trapezoidal fuzzy subset is completelydetermined by four parameters a, b, c, and d. If a = b and c =d, then we get the special case of an interval fuzzy subset. If a= b = c = d, then we have a simple point value. If a = c, thenwe get a triangle.

Here, in addition to using trapezoids to represent our obser-vations, we are also using trapezoids to represent our learnedvalues. This has the simplifying effect of just requiring us toobtain the four defining parameters associated with the learnedtrapezoid.

We can view a trapezoidal fuzzy set F as being described bytwo intervals [b, c] and [a, d]. These are actually two specificcrisp sets associated with F , the core and support. We recall theCore(F ) = [b, c] and the support of Sup(F ) = [a, d]. The coreof F is the set of values with full membership in F while thesupport is the set of elements with any membership in F . Themembership grade of an element can be easily determined fromthese sets

F (x) = 0, x ≤ a

F (x) =x − a

b − a, a ≤ x ≤ b

F (x) = 1, b ≤ x ≤ c

F (x) =d − x

d − c, c ≤ x ≤ d

F (x) = 0, x ≥ d.

Thus, having the core and support completely determines F .The area of a trapezoid is easy to calculate, and is given as

Area(F ) = (c − b) +12(b − a) +

12(d − c)

=12(c − d) +

12(d − a).

We say that the trapezoid is symmetric if b − a = d − c. If Fis symmetric, then

Area(F ) = (c − b) + (b − a) = c − a = d − b.

Using the area, we can calculate the specificity [25] of atrapezoidal fuzzy set as

Sp(F ) = 1 − Area(F )Range(X)

.

If the domain is the unit interval Range(X) = 1, then Sp =1 − Area(F ).

Let us now turn to the performance of the participatory learn-ing model in the case where our observations and current beliefare trapezoids. Again, our basic learning rule is

Vk (i) = Vk−1(i) + αρ(1−ak )k (dk (i) − Vk−1(i)).

Letting αρ(1−ak )k = δ and suppressing the index i when it is

not needed, we have

Vk = Vk−1 + δ(dk − Vk−1).

Letting δ = 1 − δ, we can rewrite this as

Vk = δdk + δ Vk−1 .

It can be shown that if dk and Vk−1 are trapezoidal fuzzysubsets, then Vk obtained by the previous linear operation willalso be a trapezoid.

In the following, let Vk/S denote the support interval and Vk/C

denote the core interval of Vk . In this trapezoidal environment,we can work with the core and support separately. In this case,we have that

Vk/S = δdk/S + δ Vk−1/S

Vk/C = δdk/C + δ Vk−1/C .

Using the indexing 1, 2, and 3 corresponding to Vk−1 , dk , andVk , we have

Vk−1/S = [a1 , d1 ]

Vk−1/C = [b1 , c1 ]

dk/S = [a2 , d2 ]

dk/C = [b2 , c2 ]

Vk/S = [a3 , d3 ]

Vk/C = [b3 , d3 ].

Putting these in the preceding equation, we have

Vk/S = δ [a2 , d2 ] + δ [a1 , d1 ]

Vk/C = δ [b2 , b2 ] + δ [b1 , c1 ].

Here, using interval arithmetic, we get

Vk/S = [a3 , d3 ] = [δa2 + δa1 , δd2 + δd1 ]

Vk/C = [b3 , c3 ] = [δb2 + δb1 , δc2 + δc1 ].

Here, we should strongly emphasize the simplicity of theseoperations. We just need to do a few simple arithmetic operators.We also retain the trapezoidal nature.

Consider the calculation of the area for this new trapezoid F3corresponding to Vk . Here

Area(F3) =12(c3 − b3) +

12(d3 − a3)

Area(F3) =12(δc2 + δc1 − δb2 − δb1 + δd2

+ δd1 − δa2 − δa1)

Area(F3) = δ12((c2 − b2) + (d2 − a2))

+ δ12((c1 − b1) + d1 − a1))

Area(F3) = δArea(F2) + δArea(F1).

Furthermore

Sp(F3) = 1 − Area(F3) = 1 − δArea(F2) − δArea(F1)

Sp(F3) = δSp(F2) + δ Sp(F1).

Thus, again, we see if F2 , the trapezoidal fuzzy subset rep-resenting our observation dk , is not very specific, then we maywant to limit the effect of the current observation on the learningprocess.

Page 10: Participatory Learning With Granular Observations

10 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2009

As in the preceding situation, we can consider limiting theeffect of observation that are also imprecise by including aterm in the calculation of Vk to suppress the effect of suchobservations. In particular, for the ith variable, we can add theterm γk (i) and use the formula

Vk (i) = Vk−1(i) + γk (i)αρ(1−ak )k )(dk (i) − Vk−1(i)).

Here, γk (i) ∈ [0, 1] is defined as

γk (i) =Sp(dk (i))

Sp(Vk−1(i)), if Sp(dk (i)) ≤ Sp(Vk−1(i))

γk (i) = 1, if Sp(dk (i)) > Sp(Vk−1(i)).

More complex forms for γk (i) can be considered.We turn to the issue of determining the compatibility ρk when

we have trapezoidal observations and trapezoidal current beliefs.Here again, we shall assume, for computational simplicity,

that we are restricted to the unit interval. This will allow us touse distances instead of similarity to calculate the compatibility.Thus, here, ρk = 1/n

∑ni=1 Compi(Vk−1(i), dk (i)). Restrict-

ing ourselves to the case where the fuzzy subsets are restrictedto the unit interval, then

ρk =1n

n∑i=1

1 − Dist(Vk−1(i), dk (i))

ρk = 1 − 1n

n∑i=1

Dist(Vk−1(i), dk (i)).

So, then the issue becomes determining the distance betweentwo trapezoidal fuzzy subsets. We should note that this problemis closely related to the issue of comparing clusters in hierarchi-cal clustering [28].

One approach to calculating the distance between two trape-zoidal fuzzy sets is to calculate the distance between the centers.Assume F is a trapezoid, as shown in Fig. 13. This has param-eters a, b, c, and d. We see that for these fuzzy sets, we cancalculate the center of the core interval

M1 =b + c

2

and the center of the support is

M0 =a + d

2.

Using these and the trapezoidal nature of F , we can calculatethe center Mα of every α-level set as

Mα = (M1 − M0)α + M0 .

To find the midpoint of F , we calculate

Mid(F ) =∫ 1

0Mα dα =

∫ 1

0((M1 − M0)α + M0) dα

Mid(F ) = (M1 − M0)α2

2+ M0α

∣∣∣∣ 10

Mid(F ) =14(a + b + c + d) =

12(M1 + M0).

Using this, if E and F are two trapezoids with parameters (a1 ,b1 , c1 , d1) and (a2 , b2 , c2 , d2), respectively, then the distancebetween them is

Dist(E, F ) = |Mid(E) − Mid(F )|

Dist(E, F ) =14|(a1 + b1 + c1 + d1) − (a2 + b2 + c2 + d2)| .

We can also calculate the distance between trapezoids basedon the minimal and maximal distances between the trapezoids.In this regard, it is important to emphasize that if F is a trape-zoid with core F1 and support F0 , then F1 ⊆ F0 , the core iscontained in the support.

Let E and F be two trapezoidal fuzzy sets. Let Min[E, F ] andMax[E, F ] denote the minimal and maximal distance betweenthese two trapezoids.

We shall let Minα (E, F ) denote the minimal distance be-tween the α-level sets of E and F . Similarly, we shall useMaxα (E, F ) to denote the maximal distance between theirα-level sets.

We first focus on the determination of the minimal distance.The minimal distance between E and F is

Min[E, F ] =∫ 1

0Minα (E,F ) dα.

We see that Min1[E,F ] = Min[E1 , F1 ] is the minimaldistance between the two cores. Similarly, Min0[E,F ] =Min[E0 , F0 ] is the minimal distance between the two supports.Now, we must determine Minα [E, F ]. We must consider threecases.

1) E1 ∩ F1 �= Ø.2) E0 ∩ F0 = Ø.3) E1 ∩ F1 = Ø.

E0 ∩ F0 �= Ø.We look at these in turn.1) E1 ∩ F1 �= Ø. In this case, since E1 ⊆ Eα and F1 ⊆ Fα

for all α, then Eα ∩ Fα �= Ø. Thus, here Minα (E, F ) =0 for all α.

2) E0 ∩ F0 = Ø. This implies that E1 ∩ F1 = Ø. In thiscase,

Minα [E, F ] = α Min[E1 , F1 ] + α Min[E0 , F0 ].

Assuming that E is to the left of F , this becomes

Minα [E, F ] = α[b1 − c2 ] + α [a1 − d2 ].

3) E1 ∩ F1 = Ø and E0 ∩ F0 �= Ø. In this case, we shall,without loss of generality, assume that E1 is less than F1and c1 < b2 .

The rightmost point of α-level set of E, R − Eα , as a functionof α is R − Eα = αc1 + αd1 . The leftmost point of the α-levelset of F , L − Fα, is L − Fα = αb2 + αa2 . These trapezoidsintersect where

αc1 + αd1 = αb2 + αa2

α(c1 − d1 − b2 + a2) = a2 − d1

Page 11: Participatory Learning With Granular Observations

YAGER: PARTICIPATORY LEARNING WITH GRANULAR OBSERVATIONS 11

and this intersection occurs at α∗ = (a2 − d1)/(a2 − d1 + c1 −b2).

Using this, we get that

Minα [E, F ] = 0, 0 ≤ α ≤ α∗

Minα [F1 , F2 ] = L − Fα − R − Eα, α∗ ≤ α < 1

Here, L − Fα − R − Eα = α(b2 − c1) + α(a2 − d1).We now turn to the calculation of Max(E, F ). If Maxα (E,F )

is the maximal distance between the α-levels of E and F , then

Max(E, F ) =∫ 1

0Maxα (E,F ) dα.

In this case

Maxα (E, F ) = αMax(E1 , F1) + α Max(E0 , F0).

We can calculate

Max(E1 , F1) = |b1 − c2 | ∨ |b2 − c1 |Max(E0 , F0) = |a1 − d2 | ∨ |a2 − d1 | .

Once having obtained Max(E, F ) and Min(E, F ), we canobtain

Dist(E, F ) = λ Min(E, F ) + λ Max(E, F )

where λ ∈ [0, 1] is a parameter capturing the learner’s attitude.Once having ρk , we can now update ak using the method

usually used in participation learning

ak = ak−1 + β (ρk − ak−1).

Again, we can include information about the specificity ofthe observation to control the change in ak . Here, with

ΨK =1n

n∑i=1

γk (i)

we calculate

ak = ak−1 + ΨK β(ρk − ak−1).

VIII. ALTERNATIVE FORMS FOR INCLUDING AROUSAL RATE

Here, we shall discuss a possible variation of the basic up-dation model. In the updation equation, we indicated that thearousal rate is included by the term ρ

(1−ak )k . We see that if the

arousal rate in zero, we get ρ(1−ak )k = ρk . The effect of the ob-

servation is completely determined by the compatibility of theobservation and the current belief. On the other hand, if thearousal rate is one, then ρ

(1−ak )k = (ρk )0 = 1. In this case, we

let the observation have the most influence. We also observe thatρ

(1−ak )k is increasing in both ρk and ak . We also observe that if

ρk = 1, then ρ(1−ak )k = 1.

We can consider a more general expression of our updationequation by replacing ρ

(1−ak )k with F (ρk , ak )

Vk = Vk−1 + αF (ρk , ak )(dk (i) − Vk−1(i)).

Let us look at the requirements of F (ρk , ak ) so that it func-tions like ρ

(1−ak )k . In the following, we shall suppress the sub-

scripts and generically use F (ρ, a). First, we note that both ρ anda ∈ [0, 1]. We also require that F (ρ, a) lie in the unit interval.From the preceding, we see that F should have the followingproperties.

1) F (ρ1 , a1) ≥ F (ρ2 , a2) if ρ1 ≥ ρ2 and a1 ≥ a2 .2) F (1, a) = F (ρ, 1) = 1.3) F (ρ, 0) = ρ.The situation where ρ = 0 is open. Clearly, we want F (0, 0) =

0. Using F (ρ, a) = ρ(1 − a), we get F (0, 1) = 1. Requiring thatF (0, a) = a is consistent with these observations. However, aless restrictive requirement is 0 ≤ F (0, a) ≤ a.

In the case where we impose the condition that F (0, a) = a,then F is a known class of aggregation operators called con-junctors [29]. These conjunctors are closely related to logical“ORing” operators. Using this semantic, we see that the effectof the term F (ρ, a) can be interpreted as allowing the use ofthe current observation for learning if it is compatible with thecurrent belief “OR,” we are aroused with respect to the reliabilityof our model.

We shall refer to F as the filtering function. We note thatif F1 and F2 are two filtering functions such that F1(ρ, a) ≥F2(ρ, a), then for all ρ, a learning agent using F1 is more opento learning than an agent using F2 .

A special class of conjunctors are the t-conorm operators[30], [31] that have been used in the fuzzy set theory to modelthe union of fuzzy sets, and more generally, used for modelingthe multivalued logic “OR” operator. A t-conorm has the addi-tional property of symmetry, F (ρ, a) = F (a, ρ). It also has theproperty of associativity; however; associativity is superfluousin this environment since we are always considering binary op-erations. A important feature of the t-conorm is that they havebeen extensively researched. A large number of examples ofthese operators are presented in [31]. Among the most notableare

F (ρ, a) = Max(ρ, a) (maximum)

F (ρ, a) = a + ρ − ρa (probabilistic sum)

F (ρ, a) = Min(ρ + a, 1) (bounded sum).

It is well known that the Max is the smallest of t-conormsoperators.

A useful family of t-conorm was introduced in [32]. For thisfamily

F (ρ, a) = Max(ρ, a), if Min[ρ, a] ≤ β

F (ρ, a) = 1, otherwise.

Here, we specify some level β, and if both ρ and a are abovethis level, we completely let the observations have effect. Ifeither ρ and a are below β, we use the maximum of ρ and a.

Another class of operators that can be used to formalize F arecocopulas [33]. While many cocopulas are t-conorms, many are

Page 12: Participatory Learning With Granular Observations

12 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 1, FEBRUARY 2009

not. An important distinction is that cocopulas are not necessar-ily symmetric. One example of an operator, which is a cocopulabut not a t-conorm, is

F (ρ, a) = 1 − ρ a(1 + λ ρa)

for λ ∈ [−1, 1]. When λ = 0, we get the probabilistic sum.When λ = −1, we get F (ρ, a) = 1 − ρ a (1 − ρa).

The important point we want to emphasize here is that bychoosing different forms for F , we can model different types oflearning agents.

IX. CONCLUSION

We described the PLP in this paper. This paradigm, inspiredby human learning, can be seen as addressing the plasticity–stability dilemma noted by Grossberg [34] in that it allows thelearning of new knowledge in a stable fashion. A notable fea-ture associated with this paradigm is the use of the compati-bility between observations and the current belief to modulatethe learning. Another feature of this paradigm is the use of twolevels of learning, the one, constraint by the current beliefs,dealing with the immediate task of learning from the currentobservation, and the second, independent of the current be-lief system, observing the performance of the current beliefsystems. A formal system implementing this type of learningagent was described. This system involved equations for up-dation, determining the compatibility of the observation withthe current belief, and calculating the arousal level. We ex-tended this system so that it can learn from interval-type ob-servations. We then further extended this system to the casewhere the observation is a fuzzy set. In the next portion, weallowed both the observations and learned object to be granular.An important issue that arises when learning granular values isrelated to the specificity of the learned value. Learned valuesthat are too unspecific can be useless. We suggested methodsfor controlling the specificity of the values learned. This hasbeen accomplished by discounting observations that are toounspecific. We should note that specific observations whethercompatible or not with the current play a role in the learningprocess.

We see that this PLP can be used in at least two modes.One mode is as a methodology for building learning algo-rithms such as those used in data mining. The second modeis as a framework for building intelligent agents. Here, thevalues assigned to the parameters will allow us to imple-ment agents of different personality types. We note that whilewe have described, at a granular level, the effect of differentchoices for the various parameters, we feel that there will bebenefit from both experimental and analytic research provid-ing a more detailed understanding of the effect of differentchoices for the values of the parameters associated with thismodel.

REFERENCES

[1] R. R. Yager, “A model of participatory learning,” IEEE Trans. Syst., ManCybern., vol. 20, no. 5, pp. 1229–1234, Sep./Oct. 1990.

[2] R. R. Yager, “Participatory learning: A paradigm for building better digitaland human agents,” Law, Probab. Risk, vol. 3, pp. 133–145, 2004.

[3] R. R. Yager, “Extending the participatory learning paradigm to includesource credibility,” Fuzzy Optim. Decis. Making, vol. 6, pp. 85–97, 2007.

[4] L. A. Zadeh, “Toward a perception-based theory of probabilistic reason-ing with imprecise probabilities,” J. Stat. Planning Inference, vol. 105,pp. 233–264, 2002.

[5] L. A. Zadeh, “From imprecise to granular probabilities,” Fuzzy Sets Syst.,vol. 154, no. 3, pp. 370–374, Sep. 2005.

[6] M. V. O. Quine, “Two dogmas of empiricism,” Philos. Rev., vol. 60,pp. 20–43, 1951.

[7] M. V. O. Quine, From a Logical Point of View. Cambridge, MA: HarvardUniv. Press, 1953.

[8] R. R. Yager and K. M. Ford, “Participatory learning: A constructionistmodel,” in Proc. 6th Int. Workshop Mach. Lear., 1989, pp. 420–423.

[9] R. R. Yager and D. Z. Zhang, “Effective suggestion based on participatorylearning,” Expert Syst.: Res. Appl., vol. 7, pp. 423–432, 1994.

[10] A. Kaufmann, Introduction to the Theory of Fuzzy Subsets. vol. I, NewYork: Academic, 1975.

[11] T. Y. Lin, “Granular computing: From rough sets and neighborhood sys-tems to information granularization and computing with words,” in Proc.Eur. Congr. Intell. Tech. Soft Comput., 1997, pp. 1602–1606.

[12] T. S. Lin, Y. Y. Yao, and L. A. Zadeh, Data Mining, Rough Sets andGranular Computing. Heidelberg, Germany: Physica-Verlag, 2002.

[13] A. Bargiela and W. Pedrycz, Granular Computing: An Introduction.Amsterdam, The Netherlands: Kluwer, 2003.

[14] R. R. Yager, “Perception based granular probabilities in risk modeling anddecision making,” IEEE Trans. Fuzzy Syst., vol. 14, no. 2, pp. 329–339,Apr. 2006.

[15] L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, pp. 338–353, 1965.[16] L. Zadeh, “The concept of a linguistic variable and its application

to approximate reasoning: Part 1,” Inf. Sci., vol. 8, pp. 199–249,1975.

[17] L. A. Zadeh, “Similarity relations and fuzzy orderings,” Inf. Sci., vol. 3,pp. 177–200, 1971.

[18] C. V. Negoita and D. Ralescu, Applications of Fuzzy Sets to SystemsAnalysis. New York: Wiley, 1975.

[19] R. R. Yager, “A procedure for ordering fuzzy subsets of the unit interval,”Inf. Sci., vol. 24, pp. 143–161, 1981.

[20] D. Dubois and H. Prade, “Measuring properties of fuzzy sets: A generaltechnique and its use in fuzzy query evaluation,” Fuzzy Sets Syst., vol. 38,pp. 137–152, 1990.

[21] G. Choquet, “Theory of capacities,” Annales de l’Institut Fourier, vol. 5,pp. 131–295, 1953.

[22] R. R. Yager, “Entropy and specificity in a mathematical theory of evi-dence,” Int. J. Gen. Syst., vol. 9, pp. 249–260, 1983.

[23] R. R. Yager, “On measures of specificity,” in Computational Intelligence:Soft Computing and Fuzzy-Neuro Integration With Applications, O. Kay-nak, L. A. Zadeh, B. Turksen, and I. J. Rudas, Eds. Berlin, Germany:Springer-Verlag, 1998, pp. 94–113.

[24] G. J. Klir, Uncertainty and Information. New York: Wiley, 2006.[25] R. R. Yager, “Measures of specificity over continuous spaces under sim-

ilarity relations,” Mach. Intell. Inst., Iona College, New Rochelle, NYTech. Rep. MII-2707, 2007.

[26] L. A. Zadeh, “Fuzzy sets as a basis for a theory of possibility,” Fuzzy SetsSyst., vol. 1, pp. 3–28, 1978.

[27] L. A. Zadeh, “Generalized theory of uncertainty (GTU)-principal conceptsand ideas,” Comput. Stat. Data Anal., vol. 51, pp. 15–46, 2006.

[28] R. R. Yager, “Intelligent control of the hierarchical agglomerative cluster-ing process,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 30, no. 6,pp. 835–845, Dec. 2000.

[29] S. Janssens, B. D. Baets, and H. D. Meyer, “Bell-type inequalities forquasi-copulas,” Fuzzy Sets Syst., vol. 148, pp. 263–278, 2004.

[30] C. Alsina, E. Trillas, and L. Valverde, “On some logical connectives forfuzzy set theory,” J. Math Anal. Appl., vol. 93, pp. 15–26, 1983.

[31] E. P. Klement, R. Mesiar, and E. Pap, Triangular Norms. Norwell, MA:Kluwer, 2000.

[32] R. R. Yager, “Generalized triangular norm and conorm aggregation op-erators on ordinal spaces,” Int. J. Gen. Syst., vol. 32, pp. 475–490,2003.

[33] R. B. Nelsen, An Introduction to Copulas. New York: Springer-Verlag,1999.

[34] S. Grossberg, “Adaptive pattern classification and universal recoding, I:Parallel development and coding of neural feature detectors,” Biol. Cy-bern., vol. 23, pp. 121–134, 1976.

Page 13: Participatory Learning With Granular Observations

YAGER: PARTICIPATORY LEARNING WITH GRANULAR OBSERVATIONS 13

Ronald R. Yager (S’66–M’68–SM’93–F’97)received the B.E.E. degree from the City Collegeof New York, New York, and the Ph.D. degreefrom the Polytechnic University of New York, NewYork.

He was a National Aeronautics and SpaceAdministration (NASA)/Stanford Visiting Fellowand a Research Associate at the University ofCalifornia, Berkeley. He has been a Lecturer at NorthAtlantic Treaty Organization (NATO) AdvancedStudy Institutes. He is currently the Director of

the Machine Intelligence Institute, Iona College, New Rochelle, NY, wherehe is also a Professor of information and decision technologies. He is theEditor-in-Chief of the International Journal of Intelligent Systems. He serveson the Editorial Boards of a number of journals including Neural Networks,Data Mining and Knowledge Discovery, Fuzzy Sets and Systems, Journal ofApproximate Reasoning, and International Journal of General Systems. He hasbeen engaged in the area of fuzzy sets and related disciplines of computationalintelligence for over 25 years. In addition to his pioneering work in the area offuzzy logic, he has made fundamental contributions in decision making underuncertainty and the fusion of information. He has authored or coauthored over500 papers published and 15 books.

Prof. Yager was the recipient of the IEEE Computational IntelligenceSociety Pioneer Award in Fuzzy Systems. He is a Fellow of the New YorkAcademy of Sciences and the Fuzzy Systems Association. He was givenan award by the Polish Academy of Sciences for his contributions. He wasthe Program Director in the Information Sciences program at the NationalScience Foundation. He is a member of the Editorial Boards of a number ofjournals, including the IEEE TRANSACTIONS ON FUZZY SYSTEMS and IEEEINTELLIGENT SYSTEMS.