undesirable effects of output normalization in multiple classifier systems
TRANSCRIPT
Undesirable effects of output normalizationin multiple classifier systems
Hakan Altınc�ay *, M€uubeccel Demirekler
Department of Computer Engineering, Eastern Mediterranean University, Gazi Ma�ggusa, KKTC, Mersin 10, Turkey
Department of Electrical and Electronics Engineering, Middle East Technical University, P.K. 06531, Ankara, Turkey
Received 5 December 2001; received in revised form 20 May 2002
Abstract
Incomparability of the classifier output scores is a major problem in the combination of different classification
systems. In order to deal with this problem, the measurement level classifier outputs are generally normalized. However,
empirical results have shown that output normalization may lead to some undesirable effects. This paper presents
analyses for some most frequently used normalization methods and it is shown that the main reason for these unde-
sirable effects of output normalization is the dimensionality reduction in the output space. An artificial classifier
combination example and a real-data experiment are provided where these effects are further clarified.
� 2002 Elsevier Science B.V. All rights reserved.
Keywords: Output score normalization; Dimensionality reduction; Class separability; Output post-processing; Measurement level
classifier combination
1. Introduction
Extensive research is carried out in the last de-cade on the use of multiple classifier systems
(MCSs) for complex classification problems and
the potential of performance improvement is pro-
ven. Plenty of different combination methods are
proposed and it is shown that, with the use of a set
of classifiers providing complementary informa-
tion for each other, the classification accuracy can
be highly improved (Xu et al., 1992; Battiti and
Colla, 1994; Bloch, 1996; Benediktsson and Swain,
1992; Kittler et al., 1998; Jain et al., 1999).MCSs are generally categorized according to
the levels of the classifier outputs. Abstract level
MCS takes into account the most likely pattern
class provided by each classifier. A well known
combination approach for this category is the
majority voting (Ho et al., 1994). Rank level MCS
make use of a ranked list of pattern classes where
the ranking is based on decreasing likelihood.Borda count method is the most frequently used
rank level combination approach. In the mea-
surement level combination, likelihood values of
the pattern classes provided by the classifiers that
are based on a parametric representation like
Pattern Recognition Letters 24 (2003) 1163–1170
www.elsevier.com/locate/patrec
Pattern Recognition Letters 24 (2003) 1163–1170
*Corresponding author. Address: Department of Computer
Engineering, Eastern Mediterranean University, Gazi Ma�ggusa,KKTC, Mersin 10, Turkey.
E-mail addresses: [email protected] (H. Altınc�ay),[email protected] (M. Demirekler).
0167-8655/03/$ - see front matter � 2002 Elsevier Science B.V. All rights reserved.
PII: S0167-8655 (02 )00286-6
Gaussian mixture modeling (Reynolds and Rose,
1995) or the cumulative distance values provided
by a non-parametric modeling approach like vec-
tor quantization are used (Campbell, 1997; Chen
et al., 1997). Linear combination method where
the output vectors from different classifiers areadded and Bayesian combination with indepen-
dence assumption where the classifier outputs are
multiplied are typical measurement level combi-
nation methodologies (Altınc�ay and Demirekler,
2000).
Among three different levels of classifier out-
puts, measurement level is the one conveying the
greatest amount of information about the relativedegree that each particular class may be the correct
one or not and this information can be quite useful
during combination. For example, when the mea-
surement values of all classes are very close to each
other, the classifier may be considered as not being
sure about its most likely pattern class. Consider
the case of three pattern classes and two different
output vectors as p1 ¼ ½1:5; 0:0; 0:0� and p2 ¼½0:11; 0:10; 0:10�. Let the correct class be the first
class. For abstract or rank based combination
approaches, these outputs cannot be differentiated
from each other. However, the first output vector
conveys strong evidence that the correct class is the
first one which is not the case for the second.
Suppose that the outputs p1 ¼ ½0:0; 2:0; 0:0� and
p2 ¼ ½0:10; 0:11; 0:10� are obtained when the first
class is tested. For the abstract or rank level
combination approaches, both outputs have
equivalent effects in the combination operation
where, for the case of measurement level combi-
nation, the erroneous information coming from
the second output vector can be much more easily
compensated with another correct output.
These discussions mainly emphasize the ad-vantages of using measurement level classifier
outputs in combination. However, a major prob-
lem in this approach is the incomparability of
classifier outputs (Ho et al., 1994). Classifiers
based on parametric modeling provide likelihood
values where non-parametric classifiers provide
some cost or distance values. Also, different clas-
sifiers may depend on different feature vectors andthe dynamic ranges of these vectors are not gen-
erally the same. As a matter of fact, the scales of
the outputs from different classifiers are incompa-
rable and needs preprocessing before combination
(Ho et al., 1994).
In order to tackle with this problem, basically
two different approaches are generally used. The
first approach is based on density estimation in theoutput space. For this purpose, either non-para-
metric techniques like Parzen windows or k-nearestneighbors or parametric techniques where the
density of the outputs for each class is approxi-
mated by a multi-dimensional Gaussian are con-
sidered (Denker and leCun, 1991; Duin and Tax,
1998; Giacinto and Roli, 1999; Woods et al.,
1997). These density estimates are later used toconvert the actual outputs into probabilities. In
this approach, the classifiers can be considered as
some preprocessors producing more regular dis-
tributions than the original input space or the
density estimation can be treated as a statistical
post-processor (Denker and leCun, 1991).
The second approach is based on normalization
of the classifier outputs so that they satisfy theaxioms of probability (Brunelli and Falavigna,
1995; Huang and Suen, 1994; Chen et al., 1997;
Tax et al., 2000). However, output normalization
may result in quite undesirable effects and hence
the combined system may provide smaller correct
classification accuracy than the best individual
(Huang and Suen, 1994; Battiti and Colla, 1994).
In this paper, the output normalization problemis addressed and the resultant effects of some
normalization techniques are analyzed. It is shown
that, if not carefully done, normalization may re-
move the valuable measurement level information
from the classifier outputs, even converting the
problem into abstract level combination. In Sec-
tion 2, analyses of three most frequently used
normalization methods are presented. An artificialclassifier combination example and a real-data
experiment illustrating the undesirable effects of
output normalization are provided in Section 3. A
series of conclusions are drawn in Section 4.
2. Analyses of normalization methods
Consider an N -class pattern classification
problem where the actual classifier outputs are
1164 H. Altınc�ay, M. Demirekler / Pattern Recognition Letters 24 (2003) 1163–1170
denoted by x ¼ ½x1; x2; . . . ; xN � and let x0 ¼ ½x01;x02; . . . ; x
0N � denote the normalized form of x. Out-
put normalization can be considered as a trans-
formation x 7!x0 that should satisfy some
constraints summarized as follows (Huang and
Suen, 1994):
• The transformed outputs are desired to be in
the interval ½0; 1�, i.e. 06 x0i 6 1.
• The summation of all of the transformed classi-
fier outputs should be set to 1,PN
i¼1 x0i ¼ 1.
• The ranking of the pattern classes should not be
modified after this transformation, i.e. if xi <xj ) x0i < x0j.
In this context, it is assumed that the larger
measurement value a pattern class has, the more
likely it is to be the correct class. Consider a 2-class
problem where the entries in output vector
x ¼ ½x1; x2� denote the measurement values for
classes 1 and 2 respectively. Let x0 ¼ ½x01; x02� denotethe output vector after output normalizationwhere, x01 þ x02 ¼ 1:0. A scatter plot for the actual
outputs of a classifier is given in of Fig. 1(a). As-
sume that the outputs of the classifier denoted by
�o� correspond the samples of pattern class 1
whereas the outputs denoted by �þ� correspond to
the samples of class 2. Consider the very general
case where there is no constraint on the outputs of
the classifiers such that the measurement valueobtained for a class can be any positive real
number. As seen in the figure, the outputs for
different classes do not overlap. Hence, the classes
are well separated which means that an optimum
classification rule can be designed so that, if the
test data are similarly distributed, perfect recog-
nition can be obtained (Duda et al., 2000; Theo-
doridis and Koutroumbas, 1999).Assume that the classifier is going to be used
together with some others in a MCS and for this
purpose, the constraint that the sum of the ele-
ments of the output vectors should be equal to one
is satisfied by setting x01 ¼ x1=ðx1 þ x2Þ and x02 ¼x2=ðx1 þ x2Þ. In order to see the resultant effect of
this normalization, consider the scatter plot of the
normalized outputs given in of Fig. 1(b). As seenin the figure, the actual output space of the clas-
sifier is mapped onto the line x1 þ x2 ¼ 1:0 in the
unit square and hence the dimension of the output
space is reduced by one. Reduction in dimensions
resulted in loss of information in the way that, the
non-overlapping and separable (or well separated)
actual classifier outputs in the region x1 > x2overlapped in the transformed line as seen in (b) ofthe figure. In mathematical terms, normalization
has a resultant effect of many-to-one mapping. The
dimensionality reduction will be the case for larger
number of classes as well. For instance, for the
case of three classes, the three-dimensional outputs
space will be mapped onto the two-dimensional
plane x1 þ x2 þ x3 ¼ 1.
Another important aspect of classifier normal-ization is the deformation of the outputs. In order
to clarify this, consider two different outputs of the
Fig. 1. A typical output space of a classifier with two classes. �o� denotes class 1 and �þ� denotes class 2 as given in (a). (b) The location
of the outputs after normalization which has a resulting effect of overlapping of originally separable pattern classes.
H. Altınc�ay, M. Demirekler / Pattern Recognition Letters 24 (2003) 1163–1170 1165
same classifier as ½2:0; 6:0� and ½0:1; 0:3�. These
outputs will be mapped onto the same point
½0:25; 0:75� if each element of these vectors is nor-
malized by the sum of the elements of the corre-
sponding vector. In other words, these outputs will
be interpreted as the same after normalization.However, do these outputs mean the same thing
from the classifier point of view so that they will be
treated this way during combination? Similarly, it
should also be questioned whether the actual
outputs would have the same effect during com-
bination or not. Although the effect is dependent
on the combination rule, it is quite natural to
predict that they would not. For instance, the ac-tual forms of these outputs do not have an
equivalent effect for linear combination.
Both of the concepts mentioned above are im-
portant aspects of output normalization in MCSs
that deserve further attention. In this context, the
main interest will be on the analysis of dimensio-
nality reduction effect of classifier output normal-
ization. In the following subsections, three mostfrequently used normalization techniques are ana-
lyzed and the resultant overlapping behavior in the
output space is formulated. The analysis can be
easily extended to some other approaches de-
scribed in (Chen et al., 1997).
2.1. Output sum normalization method
In this method, the normalized values of the
classifier outputs are calculated as (Xu et al., 1992;
Kittler et al., 1998),
x01 ¼x1
x1 þ x2
x02 ¼x2
x1 þ x2
ð1Þ
Consider a particular normalized output
½x10; x20�. The set of classifier outputs whose nor-malized values will be identical to this particular
one can be calculated by the simultaneous solution
of the above equations for x01 ¼ x10 and x02 ¼ x20.After some manipulations, the solution is obtained
as, x1ð1� x10Þ ¼ x10x2 or equivalently x2 ¼ ðx20=x10Þx1. The solution set includes infinitely many
outputs lying on the line described by the solution.
The resultant effect of this normalization is illus-
trated in Fig. 2. As seen in the figure, all the outputvectors lying on the line x2 ¼ 1:5x1 will be mapped
on the point ½0:4; 0:6�. For the general case, all theclassifier outputs on a line passing through the
origin will be mapped on the point of intersection
between that line and the line x1 þ x2 ¼ 1:0. Hence,
the output ½0:20; 0:30� in the figure cannot be dif-
ferentiated from ½0:30; 0:45� after normalization.
2.2. Minimum output normalization method
In this method, normalized values of the clas-
sifier outputs are calculated as (Yu et al., 1997),
x01 ¼x1 �minfx1; x2g
x1 �minfx1; x2g þ x2 �minfx1; x2g
x02 ¼x2 �minfx1; x2g
x1 �minfx1; x2g þ x2 �minfx1; x2g
ð2Þ
In the output region x1 > x2, since minfx1; x2g ¼x2,
x01 ¼x1 � x2x1 � x2
¼ 1
x02 ¼ 0ð3Þ
and similarly, in the output region x2 > x1, sinceminfx1; x2g ¼ x1, x01 ¼ 0 and x02 ¼ 1. The resultant
effect is that the whole region x1 > x2 is mapped
onto the point ½1:0; 0:0�. Hence, all the outputs
corresponding to the pattern samples of both
1.0
x = x12
1x +x = 1.02
Classifier outputs alongthe above line are mappedonto the intersection point
x =1.5 x12
x11.0
x2
(0.4,0.6)
Fig. 2. Transformation and dimensionality reduction effect in
output sum normalization method.
1166 H. Altınc�ay, M. Demirekler / Pattern Recognition Letters 24 (2003) 1163–1170
classes 1 and 2 falling into this region will be
mapped onto a single point. The same situation is
also true for the region x2 > x1. The resultant effectcan also be seen in Fig. 3. It can be argued that the
measurement level outputs are converted into ab-
stract level for two classes case.
2.3. Output square sum normalization method
This method is proposed by Huang et al. where
the normalized values of the classifier outputs are
calculated as (Huang and Suen, 1994; Lin et al.,
1998; Chen et al., 1997),
x01 ¼x21
x21 þ x22
x02 ¼x22
x21 þ x22
ð4Þ
Consider a particular normalized output ðx10; x20Þ.The set of classifier outputs whose normalized
values will be identical to this output can be cal-
culated as the simultaneous solution of the above
equations for x01 ¼ x10 and x02 ¼ x20. After somemanipulations the solution can be obtained as,
x2 ¼ffiffiffiffiffiffix20x10
rx1 ð5Þ
The resultant effect can more easily be seen in Fig.
4. For instance, the outputs on the line x2 ¼ 2x1 aremapped onto the point ½0:2; 0:8� and the outputs
on the line x2 ¼ 0:5x1 are mapped onto the point
½0:8; 0:2�. This transformation is interesting since,
after normalization, the actual outputs are moved
away from the decision boundary.
In this section, three different and frequently
used normalization methods and their dimensio-
nality reduction effects are described. In the nextsection, the undesirable effects due to output nor-
malization are investigated in an artificial example.
This is followed by a real-data experiment where it
is shown that, depending on the distribution of the
output vectors, the performance achieved by the
combination of the normalized outputs may be
worse than the accuracy of the best individual
classifier.
3. Artificial and real-data simulations
Example 1: In this experiment, a 2-class 2-clas-
sifier combination experiment is considered and it
is assumed that the outputs of classes 1 and 2 are
normally distributed random vectors. An examplescatter plot of the output vectors for two different
classifiers are given in of Fig. 5(a) and (b). Let �þ�denote class 1 and �o� denote class 2. As seen in the
figure, x1 < x2 and y1 < y2 in general which means
that that class 2 samples are generally correctly
classified and class 1 samples are misclassified. The
mean vectors of the normal distributions are fixed
1.0
x = x12
1x +x = 1.02
+
+
+
o
Separable classifier outputsare mapped onto a single point
1.0 x1
2x
Fig. 3. Transformation and dimensionality reduction effect in
minimum output normalization method.
1.0
x = x2
1x +x = 1.02
12
1
2 1x =1.22 xx =2 x
the above line are mappedonto this point
1x1.0
x2
(0.2,0.8)
Classifier outputs along
Fig. 4. Transformation and dimensionality reduction effect in
output square normalization method.
H. Altınc�ay, M. Demirekler / Pattern Recognition Letters 24 (2003) 1163–1170 1167
to ½0:3; 0:5� and ½0:6; 0:9� for the first classifier and,½0:7; 1:0� and ½0:3; 0:5� for the second classifier.
Since the covariance matrix entries are selected tobe small, the randomly generated output vectors
generally provide well separated classes for both of
the classifiers.
In order to analyze the undesirable effects of
output normalization, the outputs of these classi-
fiers are combined using neural network toolbox
of MATLAB. For this purpose, a neural network
is created with 4 neurons at the input, 10 neuronsat the hidden and 2 neurons at the output layer.
The target values are selected as ½0:9; 0:1� and
½0:1; 0:9� respectively for classes 1 and 2 outputs.
Two neural networks are trained using two dif-
ferent approaches, namely the resilient backprop-
agation and Fletcher–Powell conjugate gradient
backpropagation algorithms. In tens of randomly
generated datasets as described above, the net-works are trained and tested with the same data
separately for the case when output sum normal-
ization is applied and for the case when normal-
ization is not applied. It should be noted that, after
the transformation onto the line x1 þ x2 ¼ 1 due to
the normalization, originally well separated classes
are transformed into a confused mixture of output
vectors from both of the classes. Simulation ex-periments have shown that, independent of which
of the training algorithms mentioned above is
used, the networks trained on the actual output
values performed 100% correct classification in
almost all the cases whereas, depending on the
distribution of the output vectors, the classifica-tion performance came out to be in between 60%
and 97.5% in the case of the normalized output
vectors. It can be argued that the extent to which
output normalization will reduce the combined
accuracy mainly depends on the distribution of the
output vectors.
Example 2: Real-data application. The example
given above has shown that output normalizationmay deteriorate the classification accuracy. Dif-
ferent normalization methods correspond to dif-
ferent transformations and hence the undesirable
effects caused by different normalization tech-
niques can be considered as data dependent. In
other words, for a given data set, one normaliza-
tion method may perform better than another but,
this cannot be generalized to say that the former isa better normalization method. In order to illus-
trate this fact, some experiments are conducted on
real-data. For this purpose, speaker identification
problem is considered and experiments are con-
ducted on 25 speakers from the POLYCOST
database (Petrovska et al., 1998). This database
contains around 10 sessions for each of the 74
male and 60 female speakers from 14 differentcountries. The records are done on telephone lines,
sampled at 8 kHz and a-law coded. First three
sessions are used for training and cross validation.
Fig. 5. A scatter plot for two classifiers with two classes. (a) and (b) the scatter plots for the first and second classifier respectively.
�þ� denotes class 1 and �o� denotes class 2.
1168 H. Altınc�ay, M. Demirekler / Pattern Recognition Letters 24 (2003) 1163–1170
In the simulation experiments, four different
classifiers are used. Table 1 gives the list of these
classifiers. They are based on vector quantization
(VQ) modeling where the models are trained with
the LGB algorithm and each speaker is repre-
sented by 32 code vectors. Twelve linear predictionderived cepstral coefficients (LPCC) and 12 Mel-
frequency cepstral coefficients (MFCC) type fea-
tures are extracted and used in the classifiers
(Picone, 1993; Markhoul, 1975a,b; Atal, 1974).
Delta features (difference between next and previ-
ous feature vectors) are also appended to obtain 24
element feature vectors. Cepstral mean subtraction
(CMS) which corresponds to the subtraction of themean of the feature vectors over the entire speech
record from each individual feature vector is also
used for some classifiers.
The outputs of classifiers e1 and e2 are linearly
combined without any normalization and a correct
classification rate of 92.3% is obtained. The correct
classification rate corresponding to the linear
combination of output sum normalized classifieroutputs is obtained as 91.9% whereas the accuracy
is increased to 93.0% when minimum output nor-
malization is used. In all cases, the combined ac-
curacy is better than the best individual. The
output sum normalization provides smaller correct
classification accuracy compared to the case when
normalization is not applied. The minimum output
normalization approach provided the highest ac-curacy.
The outputs of classifiers e3 and e4 are linearly
combined without any normalization and a correct
classification rate of 87.9% is obtained. The clas-
sification rate corresponding to the linear combi-
nation of output sum normalized classifier outputs
is obtained as 87.7% whereas the accuracy is de-
creased to 86.2% when minimum output normal-
ization is used. In this case, the minimum output
normalization method provided a worse result
than the best individual classifier. For this pair of
classifiers, both of the normalization methods
provided a decreased accuracy compared to the
case when normalization is not applied. Hence, itcan be concluded that the relative performance of
different normalization approaches depend on the
distribution of the output vectors and due to
the reduced separability effect of normalization, the
combined system may provide less accuracy than
the best classifier.
4. Conclusion
In this study, many-to-one mapping and the
dimensionality reduction effects of some widely
used classifier output normalization techniques
are addressed. The exact forms of these effects are
formulated and the reduced output spaces are
specified. It is observed that the reduction in thedimension of the output space by one may de-
crease the separability of the pattern classes. The
analyses presented in this study has clarified the
basic reason for the undesirable effects of output
normalization on classifier outputs. The proposed
analyses can also be applied to some other nor-
malization methods to analyze the exact forms of
their mapping.The example given at the end should be cor-
rectly understood. The main intention in giving
this example is not to say that normalization is not
necessary for developing better measurement level
MCSs. On the contrary, we believe that it is in
general necessary. However, it should be done
quite carefully so as not to reduce the combined
system performance by reducing the separabilityof the pattern classes.
The use of abstract or rank level output infor-
mation can also be considered as different forms of
output normalization since each abstract or rank
level output is a mapping of a set of different
measurement level outputs onto a point in the
transformed output space. For instance, in the
case of three classes the outputs x ¼ ½0:6; 0:4; 0:3�and x ¼ ½0:4; 0:1; 0:01� would be mapped onto
x0 ¼ ½2; 1; 0�.
Table 1
Classifiers used in the simulation experiments
Classifier Model Feature CMS Perf.
e1 VQ LPCC 89.4
e2 VQ LPCCp
84.4
e3 VQ MFCC 86.3
e4 VQ MFCCp
79.9
H. Altınc�ay, M. Demirekler / Pattern Recognition Letters 24 (2003) 1163–1170 1169
We believe that further research has to be car-
ried out in statistical post-processing of classifier
outputs in order to convert them into probabili-
ties. As a conclusion of this study, it can be argued
that for measurement level classifier combination,
the post-processing of the classifier outputs isnecessary but, it should be carefully done so as not
to reduce the separability of the pattern classes as a
result of dimensionality reduction.
References
Altınc�ay, H., Demirekler, M., 2000. An information theoretic
framework for weight estimation in the combination of
probabilistic classifiers for speaker identification. Speech
Commun. 30 (4), 255–272.
Atal, B.S., 1974. Effectiveness of linear prediction characteris-
tics of the speech wave for automatic speaker identification
and verification. J. Acoust. Soc. Am. 55 (6), 1304–1312.
Battiti, R., Colla, A.M., 1994. Democracy in neural nets:
Voting schemes for classification. Neural Networks 7 (4),
691–707.
Benediktsson, J.A., Swain, P., 1992. Consensus theoretic
classification methods. IEEE Trans. Systems Man Cybernet.
22 (4), 688–704.
Bloch, I., 1996. Information combination operators for data
fusion: A comparative review with classification. IEEE
Trans. Systems Man Cybernet. 26 (1), 52–67.
Brunelli, R., Falavigna, D., 1995. Person identification using
multiple cues. IEEE Trans. Pattern Anal. Machine Intell.
17 (10), 955–966.
Campbell, J.P., 1997. Speaker recognition: A tutorial. Proc.
IEEE 85 (9), 1437–1462.
Chen, K., Wang, L., Chi, H., 1997. Methods of combining
multiple classifiers with different features and their appli-
cations to text-independent speaker identification. Int. J.
Pattern Recognit. Artificial Intell. 11 (3), 417–445.
Denker, J.S., leCun, Y., 1991. Transforming neural-net output
levels to probability distributions. Technical Report. AT&T
Bell Laboratories.
Duda, R.O., Hart, P.E., Stork, D.G., 2000. Pattern Classifica-
tion. John Wiley and Sons.
Duin, R.P.W., Tax, M.J., 1998. Classifier conditional posteriori
probabilities. SSPR/SPR, pp. 611–619.
Giacinto, G., Roli, F., 1999. Methods for dynamic classifier
selection. ICIAP�99, 10th International Conference on Image
Analysis and Processing, Italy, pp. 659–664.
Ho, T., Hull, J., Srihari, S., 1994. Decision combination in
multiple classifier systems. IEEE Trans. Pattern Anal.
Machine Intell. 16, 66–75.
Huang, Y.S., Suen, C.Y., 1994. A method of combining
multiple classifiers––a neural network approach. In: Pro-
ceedings of the 12th IAPR International Conference 2,
pp. 473–475.
Jain, A.K., Prabhakar, S., Chen, S., 1999. Combining multiple
matchers for a high security fingerprint verification system.
Pattern Recognition Lett. 20 (11–13), 1371–1379.
Kittler, J. et al., 1998. On combining classifiers. IEEE Trans.
Pattern Anal. Machine Intell. 20 (3), 226–239.
Lin, X., Ding, X., Chen, M., Zhang, R., Wu, Y., 1998.
Adaptive confidence transform based classifier combination
for Chinese character recognition. Pattern Recognition Lett.
19, 975–988.
Markhoul, J., 1975a. Linear prediction: A tutorial review. Proc.
IEEE 63, 561–580.
Markhoul, J., 1975b. Spectral linear prediction: Properties and
applications. IEEE Trans. Acoust. Speech Signal Process.
23 (3), 283–296.
Petrovska, D., Hennebert, J., Melin, H., Genoud, D., 1998.
POLYCOST: A telephone-speech database for speaker
recognition. RLA2C Proceedings, Avignon, France.
Picone, J.W., 1993. Signal modelling techniques in speech
recognition. Proc. IEEE 81 (9), 1215–1247.
Reynolds, D.A., Rose, R.C., 1995. Robust text-independent
speaker recognition using Gaussian mixture speaker models.
IEEE Trans. Speech Audio Process. 3 (1), 72–83.
Tax, D.M.J., Breukelen, M., Duin, R.P.W., Kittler, J., 2000.
Combining multiple classifiers by averaging or by multiply-
ing. Pattern Recognit. 33, 1475–1485.
Theodoridis, S., Koutroumbas, K., 1999. Pattern Recognition.
Academic Press, London.
Woods, K., Kegelmeyer, W.P., Bowyer, K., 1997. Combination
of multiple classifiers using local accuracy estimates. IEEE
Trans. Pattern Anal. Machine Intell. 19 (4), 405–410.
Xu, L., Krzyzak, A., Suen, C.Y., 1992. Methods of combining
multiple classifiers and their applications to handwriting
recognition. IEEE Trans. Systems Man Cybernet. 22, 418–
435.
Yu, K., Jiang, X., Bunke, H., 1997. Lipreading: A classifier
combination approach. Pattern Recognition Lett. 18, 1421–
1426.
1170 H. Altınc�ay, M. Demirekler / Pattern Recognition Letters 24 (2003) 1163–1170