adaptive learning vector quantization for online parametric estimation

10
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013 3119 Adaptive Learning Vector Quantization for Online Parametric Estimation Pascal Bianchi, Member, IEEE, and Jérémie Jakubowicz, Member, IEEE Abstract—This paper addresses the problem of parameter esti- mation in a quantized and online setting. A sensing unit collects random vector-valued samples from the environment. These samples are quantized and transmitted to a central processor which generates an online estimate of the unknown parameter. This paper provides a closed-form expression of the excess mean square error (MSE) caused by quantization in the high-rate regime i.e., when the number of quantization levels is supposed to be large. Next, we determine the quantizers which mitigate the excess MSE. The optimal quantization rule unfortunately depends on the unknown parameter. To circumvent this issue, we introduce a novel adaptive learning vector quantization scheme which allows to simultaneously estimate the parameter of interest and select an efcient quantizer. Index Terms—Adaptive algorithms, vector quantization, wire- less sensor networks. I. INTRODUCTION C ONSIDER a sensing unit which transmits a sequence of measurements to an estimation device whose mission is to estimate a given parameter. For example, a CCTV camera in a surveillance system transmits its data to a remote controller interested in the localization of a particular object in its eld of view or in estimating its features. This situation also arises in the context of wireless sensor networks (WSN) where a fusion center collects the individual measurements of a large number of identical sensors and processes these measurements in order to estimate certain features, to localize or track a target, etc. In such applications, due to bandwidth, delay or storage limi- tations, transmitted data rates are often limited. Therefore, mea- surements must be quantized prior to transmission. This quanti- zation step may signicantly reduce the overall estimation per- formance of the system. In the past decades, numerous papers were dedicated to the search for relevant quantization strategies and their practical de- sign [9]. The most popular criterion used to select quantizers is the mean square error (MSE) between the quantized signal and the initial source [8]. An analytical characterization of quan- tizers minimizing the MSE is difcult in the general case. Ben- Manuscript received July 19, 2012; revised December 21, 2012 and February 26, 2013; accepted March 26, 2013. Date of publication April 12, 2013; date of current version May 21, 2013. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Tongtong Li. P. Bianchi is with Telecom Paris-Tech, Paris 75013, France (e-mail: [email protected]). J. Jakubowicz is with Telecom SudParis, Evry 91000, France (e-mail: [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TSP.2013.2258017 Fig. 1. Illustration of the framework. Raw observations are quantized by the sensing unit. The quantized process is transmitted to the estimation device. nett [2] pioneered the study of high-resolution (or high-rate) quantization for the reconstruction of scalar signals. The idea of Bennett was to study the MSE in the asymptotic regime where the number of quantization levels tends to innity. A closed form expression of the (properly normalized) MSE can be de- termined in that case, and the families of quantizers minimizing the asymptotic MSE can be directly characterized. Extension of the work of Bennett to vector-valued observations was later achieved in [20]. However, the MSE criterion is especially rel- evant when the aim is to reconstruct the source. On the other hand, it can be inappropriate as far as other applications are con- cerned. For this reason, various distortion measures have been proposed in the literature in a task-oriented setting for estima- tion, classication and detection [11], [19], [33], [22], [12], [26], [25], [23], [31], [28], [30], [10], [32]. In this paper, we consider the framework illustrated by Fig. 1. A sensing unit collects an independent identically distributed (i.i.d.) stochastic process on where is an integer. The distribution of depends on an unknown vector-valued parameter to be estimated. Each observation is quantized on bits and then transmitted to an estimation unit through an error-free communication channel. Integer represents the size of the quantization alphabet. This paper focuses on online Maximum Likelihood Estimation (MLE). By online, we mean that the quantized data is not stored but is used only once to update the estimate. The estimation unit generates a sequence of estimates of the unknown parameter . The main objective is twofold: First, we need to quantify the impact of quantization on the estimation error . Second, we need to propose relevant quantization strategies to mitigate the esti- mation error. Our contributions are threefold. i) Under suitable assumptions, it can be shown using re- sults of stochastic approximation theory that the normal- ized estimation error behaves as a zero mean Gaussian random variable when the length of the obser- vation window goes to innity, where the covariance coincides with the inverse of the Fisher Information Ma- trix (FIM) associated with the quantized samples. Note the analogy with [35] which establish convergence to a Gaussian for the reconstruction error, while we address 1053-587X/$31.00 © 2013 IEEE

Upload: jeremie

Post on 15-Dec-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Adaptive Learning Vector Quantization for Online Parametric Estimation

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013 3119

Adaptive Learning Vector Quantization forOnline Parametric Estimation

Pascal Bianchi, Member, IEEE, and Jérémie Jakubowicz, Member, IEEE

Abstract—This paper addresses the problem of parameter esti-mation in a quantized and online setting. A sensing unit collectsrandom vector-valued samples from the environment. Thesesamples are quantized and transmitted to a central processorwhich generates an online estimate of the unknown parameter.This paper provides a closed-form expression of the excess meansquare error (MSE) caused by quantization in the high-rateregime i.e., when the number of quantization levels is supposedto be large. Next, we determine the quantizers which mitigatethe excess MSE. The optimal quantization rule unfortunatelydepends on the unknown parameter. To circumvent this issue, weintroduce a novel adaptive learning vector quantization schemewhich allows to simultaneously estimate the parameter of interestand select an efficient quantizer.

Index Terms—Adaptive algorithms, vector quantization, wire-less sensor networks.

I. INTRODUCTION

C ONSIDER a sensing unit which transmits a sequence ofmeasurements to an estimation device whose mission is

to estimate a given parameter. For example, a CCTV camera ina surveillance system transmits its data to a remote controllerinterested in the localization of a particular object in its field ofview or in estimating its features. This situation also arises inthe context of wireless sensor networks (WSN) where a fusioncenter collects the individual measurements of a large numberof identical sensors and processes these measurements in orderto estimate certain features, to localize or track a target, etc.In such applications, due to bandwidth, delay or storage limi-tations, transmitted data rates are often limited. Therefore, mea-surements must be quantized prior to transmission. This quanti-zation step may significantly reduce the overall estimation per-formance of the system.In the past decades, numerous papers were dedicated to the

search for relevant quantization strategies and their practical de-sign [9]. The most popular criterion used to select quantizers isthe mean square error (MSE) between the quantized signal andthe initial source [8]. An analytical characterization of quan-tizers minimizing the MSE is difficult in the general case. Ben-

Manuscript received July 19, 2012; revised December 21, 2012 and February26, 2013; accepted March 26, 2013. Date of publication April 12, 2013; date ofcurrent version May 21, 2013. The associate editor coordinating the review ofthis manuscript and approving it for publication was Dr. Tongtong Li.P. Bianchi is with Telecom Paris-Tech, Paris 75013, France (e-mail:

[email protected]).J. Jakubowicz is with Telecom SudParis, Evry 91000, France (e-mail:

[email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TSP.2013.2258017

Fig. 1. Illustration of the framework. Raw observations are quantized by thesensing unit. The quantized process is transmitted to the estimation device.

nett [2] pioneered the study of high-resolution (or high-rate)quantization for the reconstruction of scalar signals. The idea ofBennett was to study the MSE in the asymptotic regime wherethe number of quantization levels tends to infinity. A closedform expression of the (properly normalized) MSE can be de-termined in that case, and the families of quantizers minimizingthe asymptotic MSE can be directly characterized. Extensionof the work of Bennett to vector-valued observations was laterachieved in [20]. However, the MSE criterion is especially rel-evant when the aim is to reconstruct the source. On the otherhand, it can be inappropriate as far as other applications are con-cerned. For this reason, various distortion measures have beenproposed in the literature in a task-oriented setting for estima-tion, classification and detection [11], [19], [33], [22], [12], [26],[25], [23], [31], [28], [30], [10], [32].In this paper, we consider the framework illustrated by Fig. 1.

A sensing unit collects an independent identically distributed(i.i.d.) stochastic process on where is an integer.The distribution of depends on an unknown vector-valuedparameter to be estimated. Each observation is quantized on

bits and then transmitted to an estimation unit throughan error-free communication channel. Integer represents thesize of the quantization alphabet. This paper focuses on onlineMaximum Likelihood Estimation (MLE). By online, we meanthat the quantized data is not stored but is used only once toupdate the estimate. The estimation unit generates a sequenceof estimates of the unknown parameter . The mainobjective is twofold: First, we need to quantify the impact ofquantization on the estimation error . Second, we needto propose relevant quantization strategies to mitigate the esti-mation error.Our contributions are threefold.i) Under suitable assumptions, it can be shown using re-sults of stochastic approximation theory that the normal-ized estimation error behaves as a zero mean Gaussianrandom variable when the length of the obser-vation window goes to infinity, where the covariancecoincides with the inverse of the Fisher Information Ma-trix (FIM) associated with the quantized samples. Notethe analogy with [35] which establish convergence to aGaussian for the reconstruction error, while we address

1053-587X/$31.00 © 2013 IEEE

Page 2: Adaptive Learning Vector Quantization for Online Parametric Estimation

3120 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

Fig. 2. Illustration of the proposed sensor management scheme. The quantizeris recursively updated based on the most recent value of the estimate.

the estimation error. As the number of quantizationlevels is large, we prove that:

where is the ideal FIM that one would obtain if per-fect/unquantized measurements were available at the es-timation device, is a nonnegative matrix andstands for a term which can be neglected as goes to in-finity. Hence, as converges to the ideal FIM

at rate and matrixrepresents the asymptotic Fisher information loss.

The normalized Fisher information loss depends on thequantization strategy through the so-called model pointdensity and model covariation profile. The model pointdensity can be interpreted as the asymptotic density ofcells in the neighborhood of each point of the observationspace. The model covariation profile captures the shapeof the cells.

ii) Our second contribution is the characterization of thehigh-rate quantizers which reduce the asymptotic meansquare error (MSE). As a consequence of the previous re-sult, the selection of relevant high-rate quantizers reducesto the determination of relevant point densities and co-variation profiles. In case of scalar quantization ,our compact expression of immediately yields a simplecharacterization of optimal high-rate quantizers. In caseof vector quantization , an exact characteriza-tion of optimal quantizers is more difficult. Nevertheless,relevant and practical families of quantizers with a lowasymptoticMean Square Error (MSE) can be determined.

iii) As may be expected, optimal high-rate quantizers de-pend on the true parameter value which is of courseunknown. To circumvent this issue, we propose a newadaptive quantization scheme illustrated by Fig. 2. In theproposed system, the quantizer at the sensing unit is re-placed by a Learning Vector Quantizer (LVQ) governedby the estimation unit, whose aim is to track the optimalunknown quantizer. Simulation results demonstrate thebenefits of the adaptive quantization scheme w.r.t. fixedquantization.

The paper is organized as follows. Section II details the Max-imum Likelihood Estimation with a fixed quantizer: conver-gence results are established and the asymptotic loss incurredby quantization is explicitly computed. The framework of highresolution quantizers is presented in Section III and previousasymptotic loss computations are then derived in this frame-work. Section IV presents a new adaptive quantized estimatorand Section V illustrates its good behavior on simulated data.

II. RECURSIVE MLE WITH A FIXED QUANTIZER

A. Framework

On a probability space , consider a random i.i.d.multivariate process on where is an integer.Define and assume thatfor some density . Let be a measureable set containing thesupport of . Consider a collection of densities w.r.t.the Lebesgue measure on , where is some open subset ofand is an integer. We define . We make

the following assumption.Assumption 1:a) For any is absolutely continuous w.r.t. .b) For any , the function is continuouslydifferentiable on .

c) For any compact set.

Consider a fixed integer . We refer to a -point quan-tizer as a tesselation of the set formedby measurable sets with nonzero volume i.e., is strictlypositive for any . For such a quantizer , we define the process

as:

where represents the indicator function of a set . Other-wise stated, if and only if the sample hits the cell. Note that is an i.i.d. process whose distribution is

given by . Fig. 1 illus-trates the system of interest. The whole device is formed by thesensing unit communicating to an estimation unit. The sensingunit observes the raw data and transmits the quantizedprocess to the estimation unit.Our aim is search for the value of such that the distributionbest fits the observations in a maximum likelihood sense. In

this paper, we moreover focus on recursive (on-line) estimationi.e., one estimate is generated at each time instantbased on themost recent observation and the past estimate. To that end, a natural approach is to use a stochastic gradient

descent of the form:

(1)

for any , where represents the gradient operator w.r.t.. and is a deterministic step size. Generally speaking,it is known that stochastic gradient ML estimators are likelyto produce a relatively significant residual estimation error. Inorder to improve the accuracy of the estimate, several methodshave been proposed. A first approach consists in weighting thegradient in (7) with an on-line estimate of the inverse Fisher In-formation Matrix, in order to mimic a Newton-Raphson method[29], [16]. We also mention on-line Expectation-Maximizationalgorithms as an alternative to gradient descents, which are ofspecial interest in case of models with latent variables [29], [4].In this paper, we focus on averaging techniques. In [24], [15],

Page 3: Adaptive Learning Vector Quantization for Online Parametric Estimation

BIANCHI AND JAKUBOWICZ: ADAPTIVE LEARNING VECTOR QUANTIZATION 3121

it is shown that the following averaged estimate asymptoticallyreduces the estimation error:

In this paper, we follow this approach and investigate the impactof quantization on the performance of the estimate .

B. Asymptotic Behavior of the MLE

In this section, we investigate the behavior of the estimateas in terms of both almost sure convergence and centrallimit theorem. We assume the following.Assumption 2: Sequence satisfies:a) ,b) .We shall say that the algorithm is stable if, with probability

one, there exists a compact set such that the sequenceremains in . We introduce the Kullback-Leibler divergence:

for any . The following result is a consequence of standardresults in stochastic approximation theory, see for instance [5]or [15]. The proof is provided in Appendix A.Theorem 1: Let Assumptions 1 hold true. Assume in addition

that the algorithm is stable and that the setis finite. Then, both and converge almost surely to

some point in .The stability condition may not be easy to check in practice.

There are several ways to guarantee stability. A possible ap-proach is to confine the sequence to a predetermined boundedset. This can be achieved by introducing a projection step ateach iteration of the stochastic gradient algorithm. Each timean estimate falls outside some convex compact set , it isbrought back by replacing with the nearest point in .In that case, differential inclusion arguments may show that theconclusions of Theorem 1 remain true: converges to the set ofKarush-Kuhn-Tucker points of the functional over the set. Refer to [1] or [14] for further details on projected stochastic

approximation algorithms. Alternatively, one can stipulate ad-ditional assumptions for the weak classifier functions, see forinstance [5].Theorem 1 proves that quantization does not modify the limit

points of the algorithm. However, the result does not allow toquantify the effect of quantization on the fluctuations of theerror. To that end, we need further assumptions. We denote by

the Hessian matrix of a function . We denote bythe transpose of a matrix . Set . Also set

. For any -uplet ,Denote .Assumption 3: Let and neighborhood of such

that the following holds.a) , the function is three times continu-ously differentiable on .

b) for anyand any .

Introduce the following matrices:

where the subscript is used to recall that each data sample isquantized on bits. In case , it is well knownthat both matrices coincide (see for instance ([17],pp. 125) and are then refered to as the Fisher InformationMatrix(FIM). We make the following assumption.Assumption 4:a) The step size satisfies where .b) Matrices and are positive definite.We state the main result of this paragraph. Notation

stands for the convergence in distribution as . Notationstands for the multivariate normal distribution with

zero mean and covariance matrix . The following Theoremrelies on results of [21] and [6]. The proof is provided inAppendix B.Theorem 2: Let Assumptions 1, 3 and 4 hold true. Given the

event

Moreover if , then

Let us provide some comments. To illustrate the results, con-sider the case where the Kullback-Leibler divergenceis strictly convex and where for some “true” param-eter . It is clear that is reduced to the singleton . By The-orem 1, is a consistent estimate of . Moreover, the normal-ized error converges to by Theorem 2.Hence, the estimator is asympotically efficient in the sense thatit asymptotically achieves the Cramér-Rao bound . Undersuitable uniform integrability conditions [3], this implies thatthe mean square error is such that:

(2)

where represents the Euclidean norm. When is no longerconvex, the risk for the estimate to be trapped in a localminimum of is of course unavoidable. Nevertheless, Theorem2 implies that, on the event , the (conditional) MSE stillbehaves as in (2).In any case, the quantity is crucial, as it allows

to quantify the asymptotic mean square error associated with. In the sequel, these considerations shall incite us to search

for quantizers which minimize the value . Unfortu-nately, the FIM has no closed form expression as a functionof quantizer . It seems therefore hopeless to analytically char-acterize the quantizers which minimize . For thesereasons, the next section investigates the case where the numberof cells is large. In the literature, such quantizers are usually

refered to as high-resolution quantizers, high-rate quantizers orsometimes small-cell quantizers.

Page 4: Adaptive Learning Vector Quantization for Online Parametric Estimation

3122 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

III. HIGH-RESOLUTION QUANTIZERS

A. Notations and Assumptions

The aim of this section is to derive an approximation of theFIM which is accurate when the number of cells is large. Thefollowing will be assumed in this section.Assumption 5:a) for some .b) is a compact convex set.c) Functions and are three timescontinuously differentiable on .

d) .For any measurable set , we define:• its volume by ,• its diameter by ,• its centroid by ,• its covariation as the matrix:

.In this section, we consider a sequence of quan-

tizers . For any , notation denotesi.e., represents the th cell of quantizer

. We respectively refer to the specific density profile and thespecific covariation profile of quantizer as the piecewiseconstant functions and definedby:

Assumption 6: The family of quantizers is suchthat the following holds as :a) converges uniformly to a function such that

.b) converges uniformly to a matrix-valued functionsuch that .

c) The sequence is bounded.Assumption 6 implies that the volume of each cell vanishes

at speed . Assumption 6-c) moreover implies that that cellsshrink at the same speed on each dimension. We referto as the model point density of the family . It rep-resents the fraction of cells in the neighborhood of a given point. Function will be referred to as the model covariation pro-file. For each is a non-negative matrix. Inthe literature, function is usually referred to asthe inertial profile [9], [20], [10], [32]. Function provides in-formation about the shape of the cells.Remark 1: Assumption 6 is for instance valid for sequence of

quantizers constructed as companders [2], [9]. Such quantizerswrite as the composition of an invertible function (the so-calledcompressor) and a uniform quantizer. Since [2], it is known thatany scalar quantizer can be written as a compander. Under mildconditions on the compressor, it can be shown that any sequenceof companders with a given fixed compressor satisfies Assump-tion 6 (in this case, the asymptotic density profile is fully de-termined by the first order derivative of the compressor).

Intuitively, high-rate quantizers should be constructed in sucha way that is large at those points for which a fine quan-tization is essential to estimate the parameter . Theorem 3below provides a more rigorous formulation of this intuition.

B. Asymptotic Result

We introduce the following matrices:

It is worth noting that the above quantities are well definedfor any and for any in a neighborhood of when-ever Assumptions 1-a), 3, 4-c) and 5 are satisfied. Matrix

can be interpretedas the Fisher information matrix which would have beenobtained in the absence of quantization i.e., if the raw datasequence were perfectly observed.We now state the main result of this section. The proof is

provided in Appendix C.Theorem 3: Suppose that Assumptions 1, 3, 4-c), 5 and 6

hold true. For any , denote by the FIM associated with. Then,

(3)

The righthand side of (3) can be interpreted as the asymptoticFisher information loss caused by the quantization. Similarly,the quantity

will be refered to as the asymptotic MSE loss for the reasondiscussed in Section II.B.Corollary 1: Under the assumptions of Theorem 3,

Proof: Denote by the asymptotic Fisher informationloss in the righthand side of (3). We write

and thus

which completes the proof.

C. Search for Relevant Quantizers

We search for relevant couple . In this paragraph, werestrict ourselves to functions of the form:

where is some constant. The reason for introducing this re-striction is twofold. First, it is not known what functions areallowable as covariation profiles. It is thus hopeless to maxi-mize the Fisher information w.r.t. without further hints

Page 5: Adaptive Learning Vector Quantization for Online Parametric Estimation

BIANCHI AND JAKUBOWICZ: ADAPTIVE LEARNING VECTOR QUANTIZATION 3123

on the precise domain of . Second, we are eventually in-terested in proposing practical quantizers with a finite number ofcells: we shall therefore rely on practical design rules. Some ofthe most common quantizer design algorithms are the general-ized Loyd algorithm, or its recursive (on line) counterparts [27],[18], [13]. Such algorithms aim to find the -point quantizer

which minimizes the mean square error between a randomvariable and its quantized version, based on a training set com-posed of independent realizations of the latter random variable.The widely acknowledged conjecture of Gersho [7] states thatfor such quantizers, all cells (except those in the boundary of thedomain ) become congruent as becomes large. On the oth-erhand, Zamir et al. [35] state that the cells of an MSE quantizerbecome close to balls as the number of cells increases. There-fore, as long as we plan to make use of knownMSE quantizationscheme, it is reasonable to search for covariation profiles of theform . In that case,

Proposition 1: Set for some . Then,

and equality is achieved when coincides with the functiongiven by:

(4)

Proof: Set and .By Hölder’s inequality,

(5)

where we used the fact that . One may easily check thatthe righthand side of (5) is equal to:

Noticing that equality is achieved when , this proves thedesired result.Proposition 1 has crucial practical consequences. Indeed, if

the system designer manages to select a quantizer with pointdensity , then a better estimation accuracy is expected. Un-fortunately, finding such a quantizer is not an easy task because

the optimal density depends on the which is unknown,by definition. In the next section, we propose a joint method tosearch for the optimal quantizer while estimating .

IV. ADAPTIVE LEARNING VECTOR QUANTIZATION

In this section, we detail the closed-loop sensor managementscheme illustrated by Fig. 2. This schemes relies on the simul-taneous use of the online MLE and a Kohonen Learning Algo-rithm (KLA) whose aim is to search for the desired quantizer ofmodel point density . We first recall the principle of the KLAbefore describing the whole sensor management system.

A. Preliminaries: Kohonen Learning Algorithm (KLA)

Kohonen’s algorithm is a learning algorithm for neural net-works, which has also been extensively used in the frameworkof Learning Vector Quantization (see [34] for an analysis). Wedescribe the algorithm in its simpliest WTA (Winner Takes All)version also known as k-means algorithm [27], [18].For any , we let

be the refer to the Voronoi tesselation associated withi.e., for any ,

Set . KLA is a recursive procedure which generatesa codebook sequence of points in based on atraining set . At the iteration , KLA is fedwith a new training sample in the set . The codebook

is updated as follows:

(6)

B. Proposed Adaptive Quantization Scheme

We now describe the proposed sensor management system.The latter is based on the simultaneous use of MLE and KLA.We set:

for any . Consider a fixed integer . At each time instant ,suppose that an estimate is available, along with a codebook

. The proposed quantizer is theVoronoi tesselation associated with . The update goes in twosteps.• Update of the estimate. Define the quantized process as:

The on-line estimate is given by:

(7)

Page 6: Adaptive Learning Vector Quantization for Online Parametric Estimation

3124 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

• Update of the codebook. Draw a random variableunder the distribution where we set:

and otherwise. Then, the codebookis updated according to (6). We may equivalently write forany :

(8)

Remark 2: Note that, in the proposed scheme, the estima-tion unit must send the current value of to the sensing unit.In practice, quantization is unavoidable on the feedback link aswell: the sensing unit will receive a quantized version of ,say where denotes the feedback quantizer. Never-theless, provided that feedback quantization is fine enough, thequantizers respectively derived from shall not signifi-cantly differ from the one derived from .

V. NUMERICAL EXPERIMENTS

A. Illustration

We begin this section with a visual example comparingthe optimal quantizer obtained in Section III.C and the usualMSE-optimal quantizer which minimizes the mean square errorbetween the observed sample and the quantized one. Ourmodel is as follows. We set and keep cells inall experiments. Observations are drawn from a multivariateGaussian distribution with covariance matrixwhere is set to one. The parametric model is thus givenby: . The optimal quantizerdescribed in Section III.C shown in Fig. 3(a) has been obtainedby drawing an i.i.d. sequence of 20.000 random variablesdistributed w.r.t. the density defined by (4). Straightforwardderivations prove that in this case:

As a matter of fact, turns out to be the distribution of therandom variable where is uniformly dis-tributed in and follows a Gamma distribution of pa-rameters . As opposed to theMSE-optimal quantizerprovided in Fig. 3(b), the quantizer based on has a weak den-sity of cells near the origin.

B. Performance Study

In this paragraph we numerically validate the proposed adap-tive quantizer using the following model. We consider scalarobservations and set . We simu-late according to using in our simulations, andfollow the implementation given by (7) and (8).The proposed adaptive quantizer is compared to a stan-

dard online unquantized maximum likelihood estimate usingstochastic gradient ascent, and to other fixed quantizationschemes. The first comparison is obviously unfair. Its goal is

Fig. 3. Comparison of the optimal quantizer (a) given in Section III.C and theMSE-optimal quantizer (b).

not to assess which estimator is the best but measure how muchis lost using quantization and confront this loss with theory.Unquantized estimators are an idealized abstraction providingus with an ideal MSE bound. The second comparison is fair as itis performed using the same number of cells for all quantizers.Several computations are needed:• .

•• Assume is a random variable following thedistribution and denote a random variable such that

, then has a density proportionalto .

We are now in position to describe the performed experi-ments which consists in comparing three algorithms that esti-mate the variance parameter : unquantized online maximum

Page 7: Adaptive Learning Vector Quantization for Online Parametric Estimation

BIANCHI AND JAKUBOWICZ: ADAPTIVE LEARNING VECTOR QUANTIZATION 3125

likelihood, quantized online “maximum likelihood” and onlineadaptive quantized “maximum likelihood”.1) Unquantized Online Maximum Likelihood: Let us denote

by the unquantized ML sequence and by the corre-sponding averaged version. The iterations of the correspondingalgorithm are described by:

(9)

where are iid and follow a distribution andforms a standard stochastic approximation decreasing stepsizesequence. We chose in our experiments.2) Quantized OnlineMaximum Likelihood: Let us denote the

quantized estimate by (and by its averaged version) inorder to distinguish the quantized estimate with the ideal one.The iterations are described by:

(10)

where is the quantizer cell centers and is a decreasingstepsize. We chose in our experiments, and usedtwo different quantizers, the uniform quantizer between

and the Gaussian quantizer having the formwhich is MSE optimal for the reconstruc-

tion of , and denotes the c.d.f of a standard Gaussian dis-tribution (hence is the quantile function).3) Adaptive Quantized Online Maximum Likelihood: It is

the algorithm proposed in this paper. Its iterations are describedby (11), shown at the bottom of the page, where is iiddistributed , and are i.i.d., drawn accordingto .4) Results: We implemented the three previous scheme,

namely: the (unquantized) online maximum likelihood es-timator , the quantized online maximum likelihoodestimator using two different quantizers, uniform andGaussian, and the algorithm proposed in this paper which

Fig. 4. Boxplots of over 1000 independent experiments for (i) the (un-quantized) online maximum likelihood estimator, (ii) the quantized online max-imum likelihood estimator using the uniform quantizer, (iii) the quantized onlinemaximum likelihood estimator using the Gaussian quantizer, (iv) the proposedadaptive quantized maximum likelihood estimator proposed in this paper. Thedata are simulated according to a distribution with . One cansee that for most experiments results are comparable (the box are nearly thesame for the various estimators) and is correctly estimated (near theredline which corresponds to the median). However the uniform quantizer haslarger outliers (not represented for scaling reasons), which incurs a larger lossin its variance (0.8). The adaptive estimator reaches the same variance than theunquantized estimator, while being slightly superior to the Gaussian quantizer.

is an adaptive quantized version of the maximum likelihoodestimator .The results are represented in Fig. 4. The data are simulated

according to a distribution with . One cansee that for most experiments results are comparable (the boxare nearly the same for the various estimators) andis correctly estimated (near the red line corresponding to themedian). However the uniform quantizer has larger outliers,which incurs a larger loss in its variance (0.8). The adaptive esti-mator reaches the same variance than the unquantized estimator

, while being slightly superior to the Gaussianquantizer . Corresponding quantizers are shown inFig. 5. Fig. 5 illustrates that data reconstruction and parameterestimation are two separate tasks that should be addressed usingseparate quantizers. The proposed adaptive quantifier addressesthe problem of parameter estimation.

(11)

Page 8: Adaptive Learning Vector Quantization for Online Parametric Estimation

3126 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

Fig. 5. Top: Uniform quantizer with cells. Middle: Gaussian quan-tizer with cells (MSE optimal). Bottom: Proposed adaptive quantizerwith 10 cells. Note again that the density of the proposed adaptive quantizeris weaker near the origin. This illustrates the fact that data reconstruction andparameter estimation are two separate goals that should be addressed using sep-arate quantizers.

VI. CONCLUSION

In this paper, the loss incurred by quantization was studiedusing an asymptotic viewpoint relying on Fisher information.In the context of high-rate quantization, it allowed to deriveuseful guidelines to design vector quantizers minimizing the es-timation error. This paper also introduced a novel estimationscheme by coupling the online estimator with a learning vectorquantizer trained with simulated data. Further works would in-clude the generalization of our algorithm to other contexts thanmaximum likelihood estimation, and the study of constant step-size stochastic approximation algorithms that are able to trackevolving parameters.

APPENDIX APROOF OF THEOREM 1

Lemma 1 below is needed to ensure that the iteration (7) isalways well defined. The proof of Lemma 1 is a direct conse-quence of the dominated convergence theorem, and is thereforeleft to the reader.Lemma 1: Under Assumption 1, the following holds for any

measurable set such that .• The function is continuously differentiable on

and .• .• The function is continuously differentiableon .

Denote by the natural filtration i.e.,. By the above lemma,

. Thus, iter-ation (7) can be equivalently written as:

where is a martingale difference noise adapted to the nat-ural filtration i.e., . Let us now define functionby:

It is straightforward to show from Minkowski’s inequality thatfor any compact set and for any :

The final result is obtained by application of ([5], Theorem 13,pp.26).

APPENDIX BPROOF OF THEOREM 2

We first need the following technical Lemma. The latter isa straightforward consequence of the dominated convergencetheorem. The proof is left to the reader.Lemma 2: Under Assumptions 1-a) and 3), the following

holds for any measurable set such that .• The function is three times continuously dif-ferentiable on and .

• The function is three times continuouslydifferentiable on .

Lemma 2 allows to show that for any

The final results directly follows from ([6], Theorem 1.4, pp. 7).

APPENDIX CPROOF OF THEOREM 3

Consider the sequence of quantizers where wewrite . Consider fixed integersin . Denote respectively by and the

th component of and . Its is straightforward toshow that:

(12)

(13)

For any and any , we define the vector:

Theorem 3 is a direct consequence of the following proposition.Proposition 2: Let be a measurable set with nonzero

volume and set for convenience. Then,

We now complete the proof of the Theorem 3 andpostpone the proof of Proposition 2 to the end of thissection. Denote

. Putting together (12),(13) andProposition 2, we obtain:

By Assumption 6, letting in the above inequality yieldsthe final result.We now proceed with proof of Proposition 2. Proposition 2 is

the immediate consequence of Lemma 3 and Lemma 4 below.

Page 9: Adaptive Learning Vector Quantization for Online Parametric Estimation

BIANCHI AND JAKUBOWICZ: ADAPTIVE LEARNING VECTOR QUANTIZATION 3127

Lemma 3: Let be a measurable set with nonzero volumeand set . Then,

where

Proof: Recall that . We first con-sider the Taylor-Lagrange expansion of at point . Forany , note that . Thus,

(14)

Integrate the above equation on the set and normalize by :

(15)

Using the same expansion for , we obtain similarly:

(16)

Putting together (15) and (16) yields the desired result after stan-dard algebra.Lemma 4: Let be a measurable set with nonzero volume

and set . Then,

Proof: Similarly to (14), we obtain:

Thus,

(17)

where we set:

We need to expand . Using (14):

(18)

where we set:

Putting together (17) and (18), we obtain:

(19)

where we set:

After some algebra, one may rewrite matrix as:

Integrating (19) on the set , we obtain:

This completes the proof.

REFERENCES

[1] M. Benaim, J. Hofbauer, and S. Sorin, “Stochastic approximations anddifferential inclusions,” SIAM J. Contr. Optimiz., vol. 44, no. 1, pp.328–348, 2005.

[2] W. Bennett, “Spectra of quantized signals,” Bell Syst. Tech. J., vol. 27,pp. 446–472, 1948.

[3] P. Billingsley, Probability and Measure, 3rd ed. New York, NY,USA: Wiley-Intersci., 1995.

Page 10: Adaptive Learning Vector Quantization for Online Parametric Estimation

3128 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 61, NO. 12, JUNE 15, 2013

[4] O. Cappé and E. Moulines, “On-line expectation-maximization al-gorithm for latent data models,” J. Roy. Statist. Soc., Ser. B (Statist.Methodol.), vol. 71, no. 3, pp. 593–613, 2009.

[5] B. Delyon, “Stochastic approximation with decreasing gain: Con-vergence and asymptotic theory,” Unpublished Lecture Notes,2000 [Online]. Available: http://perso.univ-rennes1.fr/bernard.de-lyon/as_cours.ps

[6] G. Fort, “A Central Limit Theorem for a Stochastic Approximation Al-gorithm and its Polyak-averaged version,” Tech. Rep., 2012 [Online].Available: http://perso.telecom-paristech.fr/~gfort/Preprints/CLT-forSA.pdf, preprint.

[7] A. Gersho, “Asymptotically optimal block quantization,” IEEE Trans.Inf. Theory, vol. 25, no. 4, pp. 373–380, 1979.

[8] A. Gersho and R. Gray, Vector Quantization and Signal Compres-sion. New York, NY, USA: Kluwer, 1992.

[9] R. Gray and D. Neuhoff, “Quantization,” IEEE Trans. Inf. Theory, vol.44, no. 6, pp. 2325–2383, 1998.

[10] R. Gupta and A. Hero, “High-rate vector quantization for detection,”IEEE Trans. Inf. Theory, vol. 49, no. 8, pp. 1951–1969, 2003.

[11] T. Han and S. Amari, “Statistical inference under multiterminal datacompression,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2300–2324,1998.

[12] S. Kassam, “Optimum quantization for signal detection,” IEEE Trans.Commun., vol. 25, no. 5, pp. 479–484, 1977.

[13] T. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9, pp.1464–1480, 1990.

[14] H. Kushner and D. Clark, Stochastic Approximation Methods forConstrained and Unconstrained Systems. New York, NY, USA:Springer-Verlag, 1978.

[15] H. Kushner and G. Yin, Stochastic Approximation and Recursive Al-gorithms and Applications. New York, NY, USA: Springer, 2003.

[16] K. Lange, “A gradient algorithm locally equivalent to the em algo-rithm,” J. Roy. Statist. Soc., Ser. B (Methodol.), pp. 425–437, 1995.

[17] E. Lehman and G. Casella, Theory of Point Estimation, ser. SpringerTexts in Statistics, 2nd ed. New York, NY, USA: Springer, 1998.

[18] J. MacQueen, “On convergence of k-means and partitions with min-imum average variance,” Ann. Math. Statist., vol. 36, p. 1084, 1965.

[19] V. Misra, V. K. Goyal, and L. R. Varshney, “Distributed scalar quanti-zation for computing: High-resolution analysis and extensions,” IEEETrans. Inf. Theory, vol. 57, no. 8, pp. 5298–5325, 2011.

[20] S. Na and D. Neuhoff, “Bennett’s integral for vector quantizers,” IEEETrans. Inf. Theory, vol. 41, no. 4, pp. 886–900, 1995.

[21] M. Pelletier, “Weak convergence rates for stochastic approximationwith application to multiple targets and simulated annealing,” Ann.Appl. Probabil., vol. 8, no. 1, pp. 10–44, 1998.

[22] K. Perlmutter, S. Perlmutter, R. Gray, R. Olshen, and K. Oehler, “Bayesrisk weighted vector quantization with posterior estimation for imagecompression and classification,” IEEE Trans. Image Process., vol. 5,no. 2, pp. 347–360, 1996.

[23] B. Picinbono and P. Duvaut, “Optimum quantization for detection,”IEEE Trans. Commun., vol. 36, no. 11, pp. 1254–1258, 1988.

[24] B. Polyak and A. Juditsky, “Acceleration of stochastic approximationby averaging,” SIAM J. Contr. Optimiz., vol. 30, pp. 838–855, 1992.

[25] H. Poor, “Fine quantization in signal detection and estimation,” IEEETrans. Inf. Theory, vol. 34, no. 5, pp. 960–972, 1988.

[26] H. Poor and J. Thomas, “Applications of Ali-Silvey distance measuresin the design of generalized quantizers for binary decision systems,”IEEE Trans. Commun., vol. 25, no. 9, pp. 893–900, 1977.

[27] H. Steinhaus, “Sur la division des corps materiels en parties,” (inFrench) Bull. Acad. Polon. Sci., vol. 1, pp. 801–804, 1956.

[28] R. Tenney and N. Sandell, “Detection with distributed sensors,” IEEETrans. Aerosp. Electron. Syst., vol. 17, no. 4, pp. 501–510, 1981.

[29] D. Titterington, “Recursive parameter estimation using incompletedata,” J. Roy. Statist. Soc. Ser. B (Methodol.), pp. 257–267, 1984.

[30] J. Tsitsiklis, “Decentralized detection by a large number of sensors,”Math. Control., Signals, Syst., vol. 1, no. 2, pp. 167–182, 1988.

[31] J. Tsitsiklis, “Extremal properties of likelihood-ratio quantizers,” IEEETrans. Commun., vol. 41, no. 4, pp. 550–558, 1993.

[32] J. Villard and P. Bianchi, “High-rate vector quantization for theNeyman-Pearson detection of some mixing processes,” IEEE Trans.Inf. Theory, vol. 57, no. 8, pp. 5387–5409, 2011.

[33] J.-J. Xiao, A. Ribeiro, Z.-Q. Luo, and G. Giannakis, “Distributedcompression-estimation using wireless sensor networks,” IEEE SignalProcess. Mag., vol. 23, no. 4, pp. 27–41, 2006.

[34] E. Yair, K. Zeger, and A. Gersho, “Competitive learning and soft com-petition for vector quantizer design,” IEEE Trans. Signal Process., vol.40, no. 2, pp. 294–309, Feb. 1992.

[35] R. Zamir and M. Feder, “On lattice quantization noise,” IEEE Trans.Inf. Theory, vol. 42, no. 4, pp. 1152–1159, 1996.

Pascal Bianchi (M’12) was born in 1977 in Nancy, France. He received theM.Sc. degree from the University of Paris XI and Supélec in 2000 and the Ph.D.degree from the University of Marne-la-Vallée in 2003.From 2003 to 2009, he was an Associate Professor at the Telecommunica-

tion Department of Supélec. In 2009, he joined the Statistics and ApplicationsGroup, LTCI-Telecom ParisTech. His current research interests are in the areaof distributed algorithms for multiagent networks.

Jérémie Jakubowicz (M’09) received the M.S. and Ph.D. degrees in appliedmathematics, both from the Ecole Normale Supérieure de Cachan, in 2004 and2007, respectively.He was an Assistant Professor with Télécom ParisTech. Since 2011, he

has been an Assistant Professor with Télécom SudParis and an AssociateResearcher with the CNRS. His current research interests include distributedstatistical signal processing, image processing, and data mining.