undesirable effects of output normalization in multiple classifier systems

Undesirable effects of output normalizationin multiple classifier systems

Hakan Altınc�ay *, M€uubeccel Demirekler

Department of Computer Engineering, Eastern Mediterranean University, Gazi Ma�ggusa, KKTC, Mersin 10, Turkey

Department of Electrical and Electronics Engineering, Middle East Technical University, P.K. 06531, Ankara, Turkey

Received 5 December 2001; received in revised form 20 May 2002

Abstract

Incomparability of the classifier output scores is a major problem in the combination of different classification

systems. In order to deal with this problem, the measurement level classifier outputs are generally normalized. However,

empirical results have shown that output normalization may lead to some undesirable effects. This paper presents

analyses for some most frequently used normalization methods and it is shown that the main reason for these unde-

sirable effects of output normalization is the dimensionality reduction in the output space. An artificial classifier

combination example and a real-data experiment are provided where these effects are further clarified.

� 2002 Elsevier Science B.V. All rights reserved.

Keywords: Output score normalization; Dimensionality reduction; Class separability; Output post-processing; Measurement level

classifier combination

1. Introduction

Extensive research is carried out in the last de-cade on the use of multiple classifier systems

(MCSs) for complex classification problems and

the potential of performance improvement is pro-

ven. Plenty of different combination methods are

proposed and it is shown that, with the use of a set

of classifiers providing complementary informa-

tion for each other, the classification accuracy can

be highly improved (Xu et al., 1992; Battiti and

Colla, 1994; Bloch, 1996; Benediktsson and Swain,

1992; Kittler et al., 1998; Jain et al., 1999).MCSs are generally categorized according to

the levels of the classifier outputs. Abstract level

MCS takes into account the most likely pattern

class provided by each classifier. A well known

combination approach for this category is the

majority voting (Ho et al., 1994). Rank level MCS

make use of a ranked list of pattern classes where

the ranking is based on decreasing likelihood.Borda count method is the most frequently used

rank level combination approach. In the mea-

surement level combination, likelihood values of

the pattern classes provided by the classifiers that

are based on a parametric representation like

Pattern Recognition Letters 24 (2003) 1163–1170

www.elsevier.com/locate/patrec

Pattern Recognition Letters 24 (2003) 1163–1170

*Corresponding author. Address: Department of Computer

Engineering, Eastern Mediterranean University, Gazi Ma�ggusa,KKTC, Mersin 10, Turkey.

E-mail addresses: [email protected] (H. Altınc�ay),[email protected] (M. Demirekler).

0167-8655/03/$ - see front matter � 2002 Elsevier Science B.V. All rights reserved.

PII: S0167-8655 (02 )00286-6

mail to: [email protected]

Gaussian mixture modeling (Reynolds and Rose,

1995) or the cumulative distance values provided

by a non-parametric modeling approach like vec-

tor quantization are used (Campbell, 1997; Chen

et al., 1997). Linear combination method where

the output vectors from different classifiers areadded and Bayesian combination with indepen-

dence assumption where the classifier outputs are

multiplied are typical measurement level combi-

nation methodologies (Altınc�ay and Demirekler,

2000).

Among three different levels of classifier out-

puts, measurement level is the one conveying the

greatest amount of information about the relativedegree that each particular class may be the correct

one or not and this information can be quite useful

during combination. For example, when the mea-

surement values of all classes are very close to each

other, the classifier may be considered as not being

sure about its most likely pattern class. Consider

the case of three pattern classes and two different

output vectors as p1 ¼ ½1:5; 0:0; 0:0� and p2 ¼½0:11; 0:10; 0:10�. Let the correct class be the first

class. For abstract or rank based combination

approaches, these outputs cannot be differentiated

from each other. However, the first output vector

conveys strong evidence that the correct class is the

first one which is not the case for the second.

Suppose that the outputs p1 ¼ ½0:0; 2:0; 0:0� and

p2 ¼ ½0:10; 0:11; 0:10� are obtained when the first

class is tested. For the abstract or rank level

combination approaches, both outputs have

equivalent effects in the combination operation

where, for the case of measurement level combi-

nation, the erroneous information coming from

the second output vector can be much more easily

compensated with another correct output.

These discussions mainly emphasize the ad-vantages of using measurement level classifier

outputs in combination. However, a major prob-

lem in this approach is the incomparability of

classifier outputs (Ho et al., 1994). Classifiers

based on parametric modeling provide likelihood

values where non-parametric classifiers provide

some cost or distance values. Also, different clas-

sifiers may depend on different feature vectors andthe dynamic ranges of these vectors are not gen-

erally the same. As a matter of fact, the scales of

the outputs from different classifiers are incompa-

rable and needs preprocessing before combination

(Ho et al., 1994).

In order to tackle with this problem, basically

two different approaches are generally used. The

first approach is based on density estimation in theoutput space. For this purpose, either non-para-

metric techniques like Parzen windows or k-nearestneighbors or parametric techniques where the

density of the outputs for each class is approxi-

mated by a multi-dimensional Gaussian are con-

sidered (Denker and leCun, 1991; Duin and Tax,

1998; Giacinto and Roli, 1999; Woods et al.,

1997). These density estimates are later used toconvert the actual outputs into probabilities. In

this approach, the classifiers can be considered as

some preprocessors producing more regular dis-

tributions than the original input space or the

density estimation can be treated as a statistical

post-processor (Denker and leCun, 1991).

The second approach is based on normalization

of the classifier outputs so that they satisfy theaxioms of probability (Brunelli and Falavigna,

1995; Huang and Suen, 1994; Chen et al., 1997;

Tax et al., 2000). However, output normalization

may result in quite undesirable effects and hence

the combined system may provide smaller correct

classification accuracy than the best individual

(Huang and Suen, 1994; Battiti and Colla, 1994).

In this paper, the output normalization problemis addressed and the resultant effects of some

normalization techniques are analyzed. It is shown

that, if not carefully done, normalization may re-

move the valuable measurement level information

from the classifier outputs, even converting the

problem into abstract level combination. In Sec-

tion 2, analyses of three most frequently used

normalization methods are presented. An artificialclassifier combination example and a real-data

experiment illustrating the undesirable effects of

output normalization are provided in Section 3. A

series of conclusions are drawn in Section 4.

2. Analyses of normalization methods

Consider an N -class pattern classification

problem where the actual classifier outputs are

1164 H. Altınc�ay, M. Demirekler / Pattern Recognition Letters 24 (2003) 1163–1170

denoted by x ¼ ½x1; x2; . . . ; xN � and let x0 ¼ ½x01;x02; . . . ; x

0N � denote the normalized form of x. Out-

put normalization can be considered as a trans-

formation x 7!x0 that should satisfy some

constraints summarized as follows (Huang and

Suen, 1994):

• The transformed outputs are desired to be in

the interval ½0; 1�, i.e. 06 x0i 6 1.

• The summation of all of the transformed classi-

fier outputs should be set to 1,PN

i¼1 x0i ¼ 1.

• The ranking of the pattern classes should not be

modified after this transformation, i.e. if xi <xj ) x0i < x0j.

In this context, it is assumed that the larger

measurement value a pattern class has, the more

likely it is to be the correct class. Consider a 2-class

problem where the entries in output vector

x ¼ ½x1; x2� denote the measurement values for

classes 1 and 2 respectively. Let x0 ¼ ½x01; x02� denotethe output vector after output normalizationwhere, x01 þ x02 ¼ 1:0. A scatter plot for the actual

outputs of a classifier is given in of Fig. 1(a). As-

sume that the outputs of the classifier denoted by

�o� correspond the samples of pattern class 1

whereas the outputs denoted by �þ� correspond to

the samples of class 2. Consider the very general

case where there is no constraint on the outputs of

the classifiers such that the measurement valueobtained for a class can be any positive real

number. As seen in the figure, the outputs for

different classes do not overlap. Hence, the classes

are well separated which means that an optimum

classification rule can be designed so that, if the

test data are similarly distributed, perfect recog-

nition can be obtained (Duda et al., 2000; Theo-

doridis and Koutroumbas, 1999).Assume that the classifier is going to be used

together with some others in a MCS and for this

purpose, the constraint that the sum of the ele-

ments of the output vectors should be equal to one

is satisfied by setting x01 ¼ x1=ðx1 þ x2Þ and x02 ¼x2=ðx1 þ x2Þ. In order to see the resultant effect of

this normalization, consider the scatter plot of the

normalized outputs given in of Fig. 1(b). As seenin the figure, the actual output space of the clas-

sifier is mapped onto the line x1 þ x2 ¼ 1:0 in the

unit square and hence the dimension of the output

space is reduced by one. Reduction in dimensions

resulted in loss of information in the way that, the

non-overlapping and separable (or well separated)

actual classifier outputs in the region x1 > x2overlapped in the transformed line as seen in (b) ofthe figure. In mathematical terms, normalization

has a resultant effect of many-to-one mapping. The

dimensionality reduction will be the case for larger

number of classes as well. For instance, for the

case of three classes, the three-dimensional outputs

space will be mapped onto the two-dimensional

plane x1 þ x2 þ x3 ¼ 1.

Another important aspect of classifier normal-ization is the deformation of the outputs. In order

to clarify this, consider two different outputs of the

Fig. 1. A typical output space of a classifier with two classes. �o� denotes class 1 and �þ� denotes class 2 as given in (a). (b) The location

of the outputs after normalization which has a resulting effect of overlapping of originally separable pattern classes.

H. Altınc�ay, M. Demirekler / Pattern Recognition Letters 24 (2003) 1163–1170 1165

same classifier as ½2:0; 6:0� and ½0:1; 0:3�. These

outputs will be mapped onto the same point

½0:25; 0:75� if each element of these vectors is nor-

malized by the sum of the elements of the corre-

sponding vector. In other words, these outputs will

be interpreted as the same after normalization.However, do these outputs mean the same thing

from the classifier point of view so that they will be

treated this way during combination? Similarly, it

should also be questioned whether the actual

outputs would have the same effect during com-

bination or not. Although the effect is dependent

on the combination rule, it is quite natural to

predict that they would not. For instance, the ac-tual forms of these outputs do not have an

equivalent effect for linear combination.

Both of the concepts mentioned above are im-

portant aspects of output normalization in MCSs

that deserve further attention. In this context, the

main interest will be on the analysis of dimensio-

nality reduction effect of classifier output normal-

ization. In the following subsections, three mostfrequently used normalization techniques are ana-

lyzed and the resultant overlapping behavior in the

output space is formulated. The analysis can be

easily extended to some other approaches de-

scribed in (Chen et al., 1997).

2.1. Output sum normalization method

In this method, the normalized values of the

classifier outputs are calculated as (Xu et al., 1992;

Kittler et al., 1998),

x01 ¼x1

x1 þ x2

x02 ¼x2

x1 þ x2

ð1Þ

Consider a particular normalized output

½x10; x20�. The set of classifier outputs whose nor-malized values will be identical to this particular

one can be calculated by the simultaneous solution

of the above equations for x01 ¼ x10 and x02 ¼ x20.After some manipulations, the solution is obtained

as, x1ð1� x10Þ ¼ x10x2 or equivalently x2 ¼ ðx20=x10Þx1. The solution set includes infinitely many

outputs lying on the line described by the solution.

The resultant effect of this normalization is illus-

trated in Fig. 2. As seen in the figure, all the outputvectors lying on the line x2 ¼ 1:5x1 will be mapped

on the point ½0:4; 0:6�. For the general case, all theclassifier outputs on a line passing through the

origin will be mapped on the point of intersection

between that line and the line x1 þ x2 ¼ 1:0. Hence,

the output ½0:20; 0:30� in the figure cannot be dif-

ferentiated from ½0:30; 0:45� after normalization.

2.2. Minimum output normalization method

In this method, normalized values of the clas-

sifier outputs are calculated as (Yu et al., 1997),

x01 ¼x1 �minfx1; x2g

x1 �minfx1; x2g þ x2 �minfx1; x2g

x02 ¼x2 �minfx1; x2g

x1 �minfx1; x2g þ x2 �minfx1; x2g

ð2Þ

In the output region x1 > x2, since minfx1; x2g ¼x2,

x01 ¼x1 � x2x1 � x2

¼ 1

x02 ¼ 0ð3Þ

and similarly, in the output region x2 > x1, sinceminfx1; x2g ¼ x1, x01 ¼ 0 and x02 ¼ 1. The resultant

effect is that the whole region x1 > x2 is mapped

onto the point ½1:0; 0:0�. Hence, all the outputs

corresponding to the pattern samples of both

1.0

x = x12

1x +x = 1.02

Classifier outputs alongthe above line are mappedonto the intersection point

x =1.5 x12

x11.0

x2

(0.4,0.6)

Fig. 2. Transformation and dimensionality reduction effect in

output sum normalization method.


classes 1 and 2 falling into this region will be

mapped onto a single point. The same situation is

also true for the region x2 > x1. The resultant effectcan also be seen in Fig. 3. It can be argued that the

measurement level outputs are converted into ab-

stract level for two classes case.

2.3. Output square sum normalization method

This method is proposed by Huang et al. where

the normalized values of the classifier outputs are

calculated as (Huang and Suen, 1994; Lin et al.,

1998; Chen et al., 1997),

x01 ¼x21

x21 þ x22

x02 ¼x22

x21 þ x22

ð4Þ

Consider a particular normalized output ðx10; x20Þ.The set of classifier outputs whose normalized

values will be identical to this output can be cal-

culated as the simultaneous solution of the above

equations for x01 ¼ x10 and x02 ¼ x20. After somemanipulations the solution can be obtained as,

x2 ¼ffiffiffiffiffiffix20x10

rx1 ð5Þ

The resultant effect can more easily be seen in Fig.

4. For instance, the outputs on the line x2 ¼ 2x1 aremapped onto the point ½0:2; 0:8� and the outputs

on the line x2 ¼ 0:5x1 are mapped onto the point

½0:8; 0:2�. This transformation is interesting since,

after normalization, the actual outputs are moved

away from the decision boundary.

In this section, three different and frequently

used normalization methods and their dimensio-

nality reduction effects are described. In the nextsection, the undesirable effects due to output nor-

malization are investigated in an artificial example.

This is followed by a real-data experiment where it

is shown that, depending on the distribution of the

output vectors, the performance achieved by the

combination of the normalized outputs may be

worse than the accuracy of the best individual

classifier.

3. Artificial and real-data simulations

Example 1: In this experiment, a 2-class 2-clas-

sifier combination experiment is considered and it

is assumed that the outputs of classes 1 and 2 are

normally distributed random vectors. An examplescatter plot of the output vectors for two different

classifiers are given in of Fig. 5(a) and (b). Let �þ�denote class 1 and �o� denote class 2. As seen in the

figure, x1 < x2 and y1 < y2 in general which means

that that class 2 samples are generally correctly

classified and class 1 samples are misclassified. The

mean vectors of the normal distributions are fixed

1.0

x = x12

1x +x = 1.02

+

+

+

o

Separable classifier outputsare mapped onto a single point

1.0 x1

2x


minimum output normalization method.

1.0

x = x2

1x +x = 1.02

12

1

2 1x =1.22 xx =2 x

the above line are mappedonto this point

1x1.0

x2

(0.2,0.8)

Classifier outputs along


output square normalization method.


to ½0:3; 0:5� and ½0:6; 0:9� for the first classifier and,½0:7; 1:0� and ½0:3; 0:5� for the second classifier.

Since the covariance matrix entries are selected tobe small, the randomly generated output vectors

generally provide well separated classes for both of

the classifiers.

In order to analyze the undesirable effects of

output normalization, the outputs of these classi-

fiers are combined using neural network toolbox

of MATLAB. For this purpose, a neural network

is created with 4 neurons at the input, 10 neuronsat the hidden and 2 neurons at the output layer.

The target values are selected as ½0:9; 0:1� and

½0:1; 0:9� respectively for classes 1 and 2 outputs.

Two neural networks are trained using two dif-

ferent approaches, namely the resilient backprop-

agation and Fletcher–Powell conjugate gradient

backpropagation algorithms. In tens of randomly

generated datasets as described above, the net-works are trained and tested with the same data

separately for the case when output sum normal-

ization is applied and for the case when normal-

ization is not applied. It should be noted that, after

the transformation onto the line x1 þ x2 ¼ 1 due to

the normalization, originally well separated classes

are transformed into a confused mixture of output

vectors from both of the classes. Simulation ex-periments have shown that, independent of which

of the training algorithms mentioned above is

used, the networks trained on the actual output

values performed 100% correct classification in

almost all the cases whereas, depending on the

distribution of the output vectors, the classifica-tion performance came out to be in between 60%

and 97.5% in the case of the normalized output

vectors. It can be argued that the extent to which

output normalization will reduce the combined

accuracy mainly depends on the distribution of the

output vectors.

Example 2: Real-data application. The example

given above has shown that output normalizationmay deteriorate the classification accuracy. Dif-

ferent normalization methods correspond to dif-

ferent transformations and hence the undesirable

effects caused by different normalization tech-

niques can be considered as data dependent. In

other words, for a given data set, one normaliza-

tion method may perform better than another but,

this cannot be generalized to say that the former isa better normalization method. In order to illus-

trate this fact, some experiments are conducted on

real-data. For this purpose, speaker identification

problem is considered and experiments are con-

ducted on 25 speakers from the POLYCOST

database (Petrovska et al., 1998). This database

contains around 10 sessions for each of the 74

male and 60 female speakers from 14 differentcountries. The records are done on telephone lines,

sampled at 8 kHz and a-law coded. First three

sessions are used for training and cross validation.

Fig. 5. A scatter plot for two classifiers with two classes. (a) and (b) the scatter plots for the first and second classifier respectively.

�þ� denotes class 1 and �o� denotes class 2.


In the simulation experiments, four different

classifiers are used. Table 1 gives the list of these

classifiers. They are based on vector quantization

(VQ) modeling where the models are trained with

the LGB algorithm and each speaker is repre-

sented by 32 code vectors. Twelve linear predictionderived cepstral coefficients (LPCC) and 12 Mel-

frequency cepstral coefficients (MFCC) type fea-

tures are extracted and used in the classifiers

(Picone, 1993; Markhoul, 1975a,b; Atal, 1974).

Delta features (difference between next and previ-

ous feature vectors) are also appended to obtain 24

element feature vectors. Cepstral mean subtraction

(CMS) which corresponds to the subtraction of themean of the feature vectors over the entire speech

record from each individual feature vector is also

used for some classifiers.

The outputs of classifiers e1 and e2 are linearly

combined without any normalization and a correct

classification rate of 92.3% is obtained. The correct

classification rate corresponding to the linear

combination of output sum normalized classifieroutputs is obtained as 91.9% whereas the accuracy

is increased to 93.0% when minimum output nor-

malization is used. In all cases, the combined ac-

curacy is better than the best individual. The

output sum normalization provides smaller correct

classification accuracy compared to the case when

normalization is not applied. The minimum output

normalization approach provided the highest ac-curacy.

The outputs of classifiers e3 and e4 are linearly

combined without any normalization and a correct

classification rate of 87.9% is obtained. The clas-

sification rate corresponding to the linear combi-

nation of output sum normalized classifier outputs

is obtained as 87.7% whereas the accuracy is de-

creased to 86.2% when minimum output normal-

ization is used. In this case, the minimum output

normalization method provided a worse result

than the best individual classifier. For this pair of

classifiers, both of the normalization methods

provided a decreased accuracy compared to the

case when normalization is not applied. Hence, itcan be concluded that the relative performance of

different normalization approaches depend on the

distribution of the output vectors and due to

the reduced separability effect of normalization, the

combined system may provide less accuracy than

the best classifier.

4. Conclusion

In this study, many-to-one mapping and the

dimensionality reduction effects of some widely

used classifier output normalization techniques

are addressed. The exact forms of these effects are

formulated and the reduced output spaces are

specified. It is observed that the reduction in thedimension of the output space by one may de-

crease the separability of the pattern classes. The

analyses presented in this study has clarified the

basic reason for the undesirable effects of output

normalization on classifier outputs. The proposed

analyses can also be applied to some other nor-

malization methods to analyze the exact forms of

their mapping.The example given at the end should be cor-

rectly understood. The main intention in giving

this example is not to say that normalization is not

necessary for developing better measurement level

MCSs. On the contrary, we believe that it is in

general necessary. However, it should be done

quite carefully so as not to reduce the combined

system performance by reducing the separabilityof the pattern classes.

The use of abstract or rank level output infor-

mation can also be considered as different forms of

output normalization since each abstract or rank

level output is a mapping of a set of different

measurement level outputs onto a point in the

transformed output space. For instance, in the

case of three classes the outputs x ¼ ½0:6; 0:4; 0:3�and x ¼ ½0:4; 0:1; 0:01� would be mapped onto

x0 ¼ ½2; 1; 0�.

Table 1

Classifiers used in the simulation experiments

Classifier Model Feature CMS Perf.

e1 VQ LPCC 89.4

e2 VQ LPCCp

84.4

e3 VQ MFCC 86.3

e4 VQ MFCCp

79.9


We believe that further research has to be car-

ried out in statistical post-processing of classifier

outputs in order to convert them into probabili-

ties. As a conclusion of this study, it can be argued

that for measurement level classifier combination,

the post-processing of the classifier outputs isnecessary but, it should be carefully done so as not

to reduce the separability of the pattern classes as a

result of dimensionality reduction.

References

Altınc�ay, H., Demirekler, M., 2000. An information theoretic

framework for weight estimation in the combination of

probabilistic classifiers for speaker identification. Speech

Commun. 30 (4), 255–272.

Atal, B.S., 1974. Effectiveness of linear prediction characteris-

tics of the speech wave for automatic speaker identification

and verification. J. Acoust. Soc. Am. 55 (6), 1304–1312.

Battiti, R., Colla, A.M., 1994. Democracy in neural nets:

Voting schemes for classification. Neural Networks 7 (4),

691–707.

Benediktsson, J.A., Swain, P., 1992. Consensus theoretic

classification methods. IEEE Trans. Systems Man Cybernet.

22 (4), 688–704.

Bloch, I., 1996. Information combination operators for data

fusion: A comparative review with classification. IEEE

Trans. Systems Man Cybernet. 26 (1), 52–67.

Brunelli, R., Falavigna, D., 1995. Person identification using

multiple cues. IEEE Trans. Pattern Anal. Machine Intell.

17 (10), 955–966.

Campbell, J.P., 1997. Speaker recognition: A tutorial. Proc.

IEEE 85 (9), 1437–1462.

Chen, K., Wang, L., Chi, H., 1997. Methods of combining

multiple classifiers with different features and their appli-

cations to text-independent speaker identification. Int. J.

Pattern Recognit. Artificial Intell. 11 (3), 417–445.

Denker, J.S., leCun, Y., 1991. Transforming neural-net output

levels to probability distributions. Technical Report. AT&T

Bell Laboratories.

Duda, R.O., Hart, P.E., Stork, D.G., 2000. Pattern Classifica-

tion. John Wiley and Sons.

Duin, R.P.W., Tax, M.J., 1998. Classifier conditional posteriori

probabilities. SSPR/SPR, pp. 611–619.

Giacinto, G., Roli, F., 1999. Methods for dynamic classifier

selection. ICIAP�99, 10th International Conference on Image

Analysis and Processing, Italy, pp. 659–664.

Ho, T., Hull, J., Srihari, S., 1994. Decision combination in

multiple classifier systems. IEEE Trans. Pattern Anal.

Machine Intell. 16, 66–75.

Huang, Y.S., Suen, C.Y., 1994. A method of combining

multiple classifiers––a neural network approach. In: Pro-

ceedings of the 12th IAPR International Conference 2,

pp. 473–475.

Jain, A.K., Prabhakar, S., Chen, S., 1999. Combining multiple

matchers for a high security fingerprint verification system.

Pattern Recognition Lett. 20 (11–13), 1371–1379.

Kittler, J. et al., 1998. On combining classifiers. IEEE Trans.

Pattern Anal. Machine Intell. 20 (3), 226–239.

Lin, X., Ding, X., Chen, M., Zhang, R., Wu, Y., 1998.

Adaptive confidence transform based classifier combination

for Chinese character recognition. Pattern Recognition Lett.

19, 975–988.

Markhoul, J., 1975a. Linear prediction: A tutorial review. Proc.

IEEE 63, 561–580.

Markhoul, J., 1975b. Spectral linear prediction: Properties and

applications. IEEE Trans. Acoust. Speech Signal Process.

23 (3), 283–296.

Petrovska, D., Hennebert, J., Melin, H., Genoud, D., 1998.

POLYCOST: A telephone-speech database for speaker

recognition. RLA2C Proceedings, Avignon, France.

Picone, J.W., 1993. Signal modelling techniques in speech

recognition. Proc. IEEE 81 (9), 1215–1247.

Reynolds, D.A., Rose, R.C., 1995. Robust text-independent

speaker recognition using Gaussian mixture speaker models.

IEEE Trans. Speech Audio Process. 3 (1), 72–83.

Tax, D.M.J., Breukelen, M., Duin, R.P.W., Kittler, J., 2000.

Combining multiple classifiers by averaging or by multiply-

ing. Pattern Recognit. 33, 1475–1485.

Theodoridis, S., Koutroumbas, K., 1999. Pattern Recognition.

Academic Press, London.

Woods, K., Kegelmeyer, W.P., Bowyer, K., 1997. Combination

of multiple classifiers using local accuracy estimates. IEEE

Trans. Pattern Anal. Machine Intell. 19 (4), 405–410.

Xu, L., Krzyzak, A., Suen, C.Y., 1992. Methods of combining

multiple classifiers and their applications to handwriting

recognition. IEEE Trans. Systems Man Cybernet. 22, 418–

435.

Yu, K., Jiang, X., Bunke, H., 1997. Lipreading: A classifier

combination approach. Pattern Recognition Lett. 18, 1421–

1426.


undesirable effects of output normalization in multiple classifier systems

Documents