[ieee 2014 information theory and applications workshop (ita) - san diego, ca, usa...

6
Performance improvement with decoder output smoothing in differential predictive coding Jerry D. Gibson and Malavika Bhaskaranand Department of Electrical and Computer Engineering University of California, Santa Barbara Santa Barbara, CA 93106, USA Email: {gibson, malavika}@ece.ucsb.edu Abstract—We provide a theoretical analysis of the performance of differential predictive coding using fixed-lag smoothing of the standard decoder output. This performance is compared to related results for coding using latency at the encoder, and causal encoding with delayed decoding, as well as with some prior theoretical analyses of these methods. Surprisingly, it is shown that fixed-lag smoothing of the standard decoder output with causal encoding achieves the asymptotic and finite lag performance promised by a completely reoptimized decoder. I. I NTRODUCTION Differential predictive coding, whether for the removal of temporal redundancy in speech, for motion compensation in video, or spatial redundancy removal for still images and video frames, is one of the most powerful and ubiquitous building blocks for practical lossy source coding. This fact is due, in part, because of the basic coding paradigm of removing any redundancy present in a source at the encoder prior to coding that can then be added back in at the decoder, thus reducing the transmitted bit rate for a desired target distortion. This basic predictive coding approach is strongly motivated by physical models of practical sources as well as by theorems from rate distortion theory for Gaussian autoregressive source models, although in the latter, the source model and its attendant parameters are assumed known, which is seldom possible in physical applications. Encoder and decoders that employ the differential predictive coding paradigm are notoriously difficult to optimize and analyze since the prediction is around a coding method, often a quantizer, contained within a closed feedback loop. Further complicating the analysis and optimization is that the prediction usually has a finite memory of a few past coded values—the prediction is not and cannot be accomplished using the uncoded input with asymptotic memory. Therefore, although difficult, performance analyses and the optimization of differential predictive coders are fertile areas for research. It is well known that rate distortion optimal source coding implies the exploitation of latency at the encoder as motivated by proofs of the fundamental theorems of rate distortion theory. To further exploit the power of the predictive coding structure, more than 35 years ago, researchers added multipath searching to harvest some of the performance gains promised by increased latency at the encoder. However, it is often advantageous to incorporate latency into the design of the decoder if the encoding is not rate distortion optimal. When latency is utilized at the decoder, it can be inherent to the decoder design or latency can be exploited outside of the decoding loop as an add-on filtering or smoothing operation. The attractiveness of this latter approach to incorporating latency is that it is a pure add-on technology and can be used to improve the performance of any codec. Furthermore, it requires no reoptimization or modification of the basic matched encoder. Periodically, theoreticians become aware of the predictive coding paradigm and its variants and attempt to provide theoretical performance analyses to understand the fundamen- tal limits of the predictive coding method. Because of the differential predictive coding structure involving prediction around a nonlinearity (the quantizer) as stated above, a number of assumptions are invoked in order to make progress. Often these assumptions are counter to the basic differential encoding structure being analyzed and to any possible physical problem. Furthermore, these theoretical contributions often do not cite the rich prior literature on the topic, and thus do not build upon existing prior results. The primary goal of this paper is to provide a theoreti- cal analysis of differential predictive coding with an add-on smoothing operation outside of the basic decoding structure. This performance is then contrasted with rate distortion opti- mal coding, coding using latency at the encoder, and delayed decoding methods, as well as with some prior theoretical analyses of these methods. In order to do so, we provide some context for each of the approaches and key prior results from differential predictive coding and rate distortion theory. In addition to examinining the performance improvement available using a smoother at the output of the decoder, we also suggest a specific smoother structure to obtain the promised performance improvement. II. THE SHANNON BACKWARD CHANNEL While it is difficult to obtain closed form solutions for R(D) in general, there have been a number of lower bounds that have been developed. The most useful for many applications is the Shannon lower bound (SLB) for difference distortion measures derived by Shannon in his 1959 paper [1]. Shannon showed that for a continuous source X and a difference distortion measure d(x, ˆ x)= d(x ˆ x) subject to the constraint

Upload: malavika

Post on 21-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Performance improvement with decoder outputsmoothing in differential predictive coding

Jerry D. Gibson and Malavika BhaskaranandDepartment of Electrical and Computer Engineering

University of California, Santa Barbara

Santa Barbara, CA 93106, USA

Email: {gibson, malavika}@ece.ucsb.edu

Abstract—We provide a theoretical analysis of the performanceof differential predictive coding using fixed-lag smoothing ofthe standard decoder output. This performance is comparedto related results for coding using latency at the encoder, andcausal encoding with delayed decoding, as well as with someprior theoretical analyses of these methods. Surprisingly, it isshown that fixed-lag smoothing of the standard decoder outputwith causal encoding achieves the asymptotic and finite lagperformance promised by a completely reoptimized decoder.

I. INTRODUCTION

Differential predictive coding, whether for the removal of

temporal redundancy in speech, for motion compensation in

video, or spatial redundancy removal for still images and video

frames, is one of the most powerful and ubiquitous building

blocks for practical lossy source coding. This fact is due, in

part, because of the basic coding paradigm of removing any

redundancy present in a source at the encoder prior to coding

that can then be added back in at the decoder, thus reducing the

transmitted bit rate for a desired target distortion. This basic

predictive coding approach is strongly motivated by physical

models of practical sources as well as by theorems from rate

distortion theory for Gaussian autoregressive source models,

although in the latter, the source model and its attendant

parameters are assumed known, which is seldom possible in

physical applications.

Encoder and decoders that employ the differential predictive

coding paradigm are notoriously difficult to optimize and

analyze since the prediction is around a coding method,

often a quantizer, contained within a closed feedback loop.

Further complicating the analysis and optimization is that the

prediction usually has a finite memory of a few past coded

values—the prediction is not and cannot be accomplished

using the uncoded input with asymptotic memory. Therefore,

although difficult, performance analyses and the optimization

of differential predictive coders are fertile areas for research.

It is well known that rate distortion optimal source coding

implies the exploitation of latency at the encoder as motivated

by proofs of the fundamental theorems of rate distortion

theory. To further exploit the power of the predictive coding

structure, more than 35 years ago, researchers added multipath

searching to harvest some of the performance gains promised

by increased latency at the encoder.

However, it is often advantageous to incorporate latency

into the design of the decoder if the encoding is not rate

distortion optimal. When latency is utilized at the decoder,

it can be inherent to the decoder design or latency can be

exploited outside of the decoding loop as an add-on filtering or

smoothing operation. The attractiveness of this latter approach

to incorporating latency is that it is a pure add-on technology

and can be used to improve the performance of any codec.

Furthermore, it requires no reoptimization or modification of

the basic matched encoder.

Periodically, theoreticians become aware of the predictive

coding paradigm and its variants and attempt to provide

theoretical performance analyses to understand the fundamen-

tal limits of the predictive coding method. Because of the

differential predictive coding structure involving prediction

around a nonlinearity (the quantizer) as stated above, a number

of assumptions are invoked in order to make progress. Often

these assumptions are counter to the basic differential encoding

structure being analyzed and to any possible physical problem.

Furthermore, these theoretical contributions often do not cite

the rich prior literature on the topic, and thus do not build

upon existing prior results.

The primary goal of this paper is to provide a theoreti-

cal analysis of differential predictive coding with an add-on

smoothing operation outside of the basic decoding structure.

This performance is then contrasted with rate distortion opti-

mal coding, coding using latency at the encoder, and delayed

decoding methods, as well as with some prior theoretical

analyses of these methods. In order to do so, we provide some

context for each of the approaches and key prior results from

differential predictive coding and rate distortion theory.

In addition to examinining the performance improvement

available using a smoother at the output of the decoder, we also

suggest a specific smoother structure to obtain the promised

performance improvement.

II. THE SHANNON BACKWARD CHANNEL

While it is difficult to obtain closed form solutions for R(D)in general, there have been a number of lower bounds that have

been developed. The most useful for many applications is the

Shannon lower bound (SLB) for difference distortion measures

derived by Shannon in his 1959 paper [1]. Shannon showed

that for a continuous source X and a difference distortion

measure d(x, x) = d(x− x) subject to the constraint

Binary EncoderQuantizer

+

-

s(k) e(k) eq(k)

++

Binary DecoderChannel

+

+ ŝ(k)

Σ aiz-ii=1

NΣ aiz-ii=1

N

Fig. 1. ADPCM encoder and decoder.

∫∫p(x)p(x|x)d(x− x) dx dx ≤ D .

the lower bound is

R(D) ≥ h(X) +

∫g(e) log g(e) de

which simplifies to

R(D) ≥ h(X)−max h(g(e)),

where g(e) = g(x − x) and the maximum is taken over

all probability densities for the error that satisfy the average

distortion constraint [2]. Equality is achieved iff∫p(x)g(x− x) dx = p(x) .

The equality condition produces the Shannon optimum

backward channel, which states that the reconstructed value

and the encoding error are statistically independent and sum

to the input source value. It is this statistical independence

result via the Shannon optimum backward channel that is so

useful in lossy source coding analyses and designs.

III. PRIOR WORK

A. Differential Predictive Coding

The basic predictive coding block diagram is shown in Fig.

1, wherein it can be seen that the prediction is around the quan-

tizer so that the prediction of the input utilizes reconstructed

versions of the source samples containing quantization noise.

This structure is necessary in order to avoid accumulation

of quantization noise at the decoder. The consequence of

this structure is that analyses must necessarily contend with

modeling the quantization noise. Since the most interesting

applications for lossy source coding are not high rate with

small distortion or negligible quantization noise, providing

analyses that are relevant to actual applications is difficult.

The power of the differential predictive coding structure is

that it incorporates a model of the source for the prediction,

and perhaps more importantly, a representation of the predic-

tion error is coded and transmitted to the decoder. The critical

importance of this latter observation is that even if the source

model is not accurate, the error in the model contains any

residual source correlation and can be sent to the decoder to

use in reconstruction. As will be pointed out in what follows,

this property is fundamental to the on-going importance and

impact of differential predictive coding on applications over

many decades. Optimal encoding and decoding structures can

destroy this property and introduce perhaps unforeseen and

unexpected drops in perceptual performance for real sources.

For more extensive discussions of differential predictive

coding, see Jayant [3], Gibson [4], Jayant and Noll [5], Gibson

[6], and O’Neal [7], among many, many others.

B. Tree Coding

While differential predictive coding came out of the practi-

cal source coding community, tree coding and its combinations

with differential predictive coding were direct offshoots of

rate distortion theory. The seminal papers in this area are

by Anderson [8] and his students, with other early important

contributions by Berger [2], Wilson [9], and Stewart and

Gray [10]. As shown in the block diagram in Fig. 2, the

three principal components of tree coders were (and are) the

code generator (which includes the predictor), the distortion

measure, and the tree search algorithm.

A surprise that came out of the practical applications of tree

coding was that while tree coding promised and could attain

significant improvements in the reduction of mean squared

reconstruction error, or identically, significant increases in

output signal to noise ratio, these objective increases did not

translate to improved perceptual performance. In fact, for

speech sources, the perceptual performance was degraded.

This led to the consideration of weighted squared error as

the distortion measure to be minimized. Like differential pulse

code modulation (DPCM), there is a rich tree coding literature

going back to the 1970’s.

C. Delayed Decoding

Delayed decoding indicates changing the decoder structure

to take advantage of latency to improve the performance of

suboptimal encoding. Ishwar and Ramchandran [11] inves-

tigate a decoder that implements noncausal minimum mean

squared error (MMSE) smoothing coupled with a differential

predictive encoder with a causal MMSE predictor in the pre-

diction loop around a quantizer modeled as a rate distortion (R-

D) optimal encoder. The causal MMSE predictor is matched

to the source input, not the quantized signal in the feedback

Distortion Calculation

Code Generator Tree Search Symbol

Release RulePath Maps

Candidate Outputs

Input Sequences(k)ŝ(k)

Fig. 2. Tree coder

loop at the decoder. Under these and other assumptions to

be elaborated later, they show that performance gains of 1 to

1.5 dB can be obtained for a first order Gauss-Markov source

with high temporal correlation at low bit rates and low delay

values. They also argue that at high rates there is little to

be gained by incorporating latency at the decoder and that

at low rates their conclusion is true as well. Their imagined

application is video coding. We compare our results using

fixed-lag smoothing outside the decoder loop and without

assumptions on the optimality of encoding later in this paper.

Melkote and Rose [12] propose and study decoder designs

that modify the decoder to incorporate delay for a standard

differential predictive encoder. They utilize the quantizer in-

dices as input and propagate the MMSE optimal conditional

densities at the decoder which allow for delay. They reoptimize

the quantizer output values at the decoder in addition to using

delay. They also develop a lower complexity codebook-based

type decoder. They show up to a 1 dB improvement over

Smoothed DPCM (SDPCM) [13] for autoregressive sources

with correlation coefficient of 0.95. These results are revisited

later in the paper for comparisons with the current analysis.

D. Smoothing

Apparently the idea of using latency at the decoder was

introduced by Gibson in 1981 at the Information Theory Sym-

posium [14]. At that time, the ISIT only published abstracts

and not summaries or full papers. A quote from the abstract

of this paper is as follows: “A new tree coder structure is

proposed that uses a reduced rate code in conjunction with

a delayed decoding method based on fixed-lag smoothing.”

Much of the content of the presentation was issued as an

unpublished research report in July, 1980 [15]. We provide

some of these results and additional details in subsequent

sections.

Variations on this smoothing idea were pursued later by

Sethia and Anderson [16], through Intepolative DPCM (ID-

PCM), and Chang and Gibson with Smoothed DPCM [13].

These papers considered low rates of 1 and 2 bits/sample

and both synthetic and speech sources. The work for speech

sources in Chang and Gibson used adaptive quantization and

least squares lattice adaptive predictor estimation. For syn-

thetic sources, both SDPCM and IDPCM outperform DPCM,

and SDPCM performs considerably better than IDPCM.

IV. FIXED LAG SMOOTHING

We now consider the decoder block diagram shown in Fig.

3, wherein fixed-lag smoothing is incorporated at the output

of a standard differential predictive coder to improve the

performance [14]. This is quite different from modifying the

decoder itself and is a structure that can be used to improve the

performance of any basic differential predictive coder without

reoptimization of the encoder or decoder.

In order to find the maximum gains possible by fixed-lag

smoothing of the reconstructed source output at the decoder,

we compute the steady state smoothing errors as a function

of the lag. The state model for the first order autoregressive

source is given as

x(k + 1) = αx(k) + w(k + 1), (1)

where α ∈ [−1, 1] and w(k + 1) is a stationary, zero mean,

white process with variance q. It is important to notice that

the driving term process can also be nonzero mean and cross-

correlated with the observation model noise. The observation

model can be expressed as

z(k + 1) = x(k + 1) + v(k + 1), (2)

where v(k + 1) is the zero mean quantization error with

variance r. Again, the observation noise can be nonzero mean,

temporally correlated, and cross correlated with the driving

term of the state model process with suitable modifications

to the fixed-lag smoothing algorithms. We keep the simplified

models here since it allows a more direct comparison to prior

results. The steady-state expression for the fixed-lag smoothing

error covariance as a function of the lag L is [14], [17]

PL = P −L∑

j=1

[(1−K)α]2j(P − P ), (3)

q(k)

Σ aiz-i

Path Map to Quantization

Levels

Fixed-Lag Smoother

i=1

N

eq(k-1),eq(k) ŝ(k) ŝs(k)

Fig. 3. Delayed decoding with fixed lag smoothing.

where

P = α2P + q (4)

K = P (P + r)−1 (5)

P = (1−K)P . (6)

Given α, q, and r, P can be computed as the positive root of

the following quadratic equation

α2P 2 + (q + r − α2r)P − qr = 0. (7)

Then P , K, and PL can be evaluated using Equations (4), (5),

and (3) respectively.

The asymptotic expression for the smoothing error covari-

ance as L goes to infinity is given by

PLmin = P −∞∑j=1

[(1−K)α]2j(P − P ),

= P − (P − P ) · (1−K)2α2

1− (1−K)2α2(8)

This result can be used to determine what value should be

selected for the maximum delay to obtain near asymptotic

performance.

A. Non-zero mean errors and Colored Noises

When the state model error w(k + 1) and the observation

error v(k + 1) in equations (1) and (2) have non-zero means,

equations (3) - (6) remain unchanged, since they all involve

variances which do not depend on the means. See [18].

For the analyses presented here, we have used what is

called the white noise assumption Kalman fixed-lag smoothing

algorithms, which assume both the message model driving

term noise w(k + 1) and the observation or measurement

noise v(k + 1) are white. However, these assumptions can

be removed and colored noise versions of the Kalman fixed-

lag smoothers are available [19] and we have studied them

for several different cases. The colored noise versions have

the potential to improve performance if the model(s) for the

colored noise(s) are accurate.

V. PERFORMANCE RESULTS

Some indication of the possible improvement with fixed

lag smoothing is available from three scalar examples in

[17]. For one example, α = 0.95, q = 1, and r = 10,

which yields P = 2.4098 and PLmin= P17 = 1.5811.

Thus, the percent reduction in mean squared error (MSE)

is (P − PLmin)/P × 100% = 34%. Additional lags reduce

PL negligibly. If, however, α = 0.95, but q = 10, and

r = 1, then P = 0.9154 and PLmin= P2 = 0.8511. The

percent reduction is thus only 7%. Another interesting example

from [17] is when α = 0.1, q = 1, and r = 1, so that

P = 0.5012 and PLmin= P1 = 0.500. The percent reduction

is less than half of one percent. Two conclusions can be drawn

from these examples. First, smoothing can provide substantial

improvement when r > q but not when r ≤ q. Second, if the

correlation is low, improvement due to smoothing is not likely

(depending on r and q).

For source coding of speech, q is the variance of the linear

prediction model driving term and r is the variance of the

reconstruction error before smoothing, and the values used in

the examples from [17] may not fit the speech coding problem.

More typical values are used in the following two examples

[14].

Example 1: Consider the scalar models in Equations (1)

and (2) with α = 0.864 [20], q = 0.2535, and r = 0.5. These

values of α and q yield E{s2(k + 1)} = 1 in Equation (1)

(assuming stationarity). The choice of r = 0.5 corresponds to

a fairly noisy reconstructed signal. Table I lists the smoothing

error covariance as a function of L. The percent reduction in

MSE due to smoothing is thus 24%.

TABLE IIMPROVEMENT DUE TO SMOOTHING IN EXAMPLE 1 WITH

α = 0.864, q = 0.2535, r = 0.5, P = 0.2297

L PL

1 0.1871

2 0.1778

3 0.1758

4 0.1754

5 0.1753

10 0.17525 ≈ PLmin

Example 2: Although the average first order autocorrelation

for speech is about 0.85, the short term correlation may be

as high as 0.95 for highly voiced segments. To investigate

the smoothing improvement for this case, the parameters in

Equations (1) and (2) are set to α = 0.95, q = 0.0975, and r =0.5. The results are show in Table II. The percent reduction

in MSE is approximately 34%.

TABLE IIIMPROVEMENT DUE TO SMOOTHING IN EXAMPLE 2 WITH

α = 0.95, q = 0.0975, r = 0.5, P = 0.1651

L PL

1 0.1322

2 0.1188

3 0.1134

4 0.1112

5 0.1103

10 0.1097 ≈ PLmin

For Examples 1 and 2, it is concluded that the improvement

due to fixed lag smoothing in the decoder may be sufficient

enough to warrant further investigation. This smoothing tech-

nique can be used with differential encoders and existing tree

coders. It was proposed for use in [14] with reduced rate tree

coders for two reasons: (1) the use of a reduced rate is an

advantage tree coders can exploit more easily than differential

encoders, and (2) the fixed lag smoothing should help “fill in

the gaps” left due to disallowed sequences. The analysis of

the smoothing gain available presented in this section is valid

for DPCM and tree coders, however.Example 3: We now consider the scalar models in Equa-

tions (1) and (2) with α = 0.98, q = 0.118, and r = 1.18.

The choice of this ratio of q/r = 0.1 corresponds to a low

transmitted bit rate and a very noisy reconstructed signal. Table

III lists the smoothing error covariance as a function of L. The

percent reduction in MSE due to smoothing is thus 39%.

TABLE IIIIMPROVEMENT DUE TO SMOOTHING IN EXAMPLE 3 WITH

α = 0.98, q = 0.118, r = 1.18, P = 0.3045

L PL

1 0.2485

2 0.2189

3 0.2033

4 0.1950

5 0.1906

15 0.1857 ≈ PLmin

These substantial improvements are available without mod-

ifying the decoder with only add-on signal processing using

well known fixed-lag smoothing algorithms [18] and [17].

VI. COMPARISONS TO PRIOR RESULTS

In this section we compare to asymptotic performance

results for exploiting delay at the decoder published elsewhere.

To begin, we consider the paper by Ishwar and Ramchandran

[11]. As summarized earlier, Ishwar and Ramchandran inves-

tigate a decoder that implements noncausal minimum mean

squared error (MMSE) smoothing coupled with a differential

predictive encoder with a causal MMSE predictor in the pre-

diction loop around a quantizer modeled as a rate distortion (R-

D) optimal encoder. The causal MMSE predictor is matched to

the source input, not the quantized signal in the feedback loop

at the decoder. The smoother uses the transmitted quantized

error signal as input and not the reconstructed output of a

standard matched DPCM decoder. As such, it is in the spirit

of Melkote and Rose [12].

To obtain their results, Ishwar and Ramchandran invoke

several assumptions in addition to those mentioned above. The

input source is a first order Gaussian AR source, and the R-

D optimal encoding implies the Shannon backward channel

result, and since the source is Gaussian and the distortion

measure is MSE, there is a corresponding forward channel

result as described by Berger [2]. The existence of this forward

channel is unique to Gaussian sources. Along with the other

assumptions, the result of this forward channel is that the

prediction error process is Gaussian and stationary. Further,

the quantization noise is assumed temporally independent and

also assumed independent of the input source process.

Their analysis is primarily done in the spectral domain. Two

of their primary results are that for α = 0.98 and bit rate

R = 0.1, there is a maximum of about 40% improvement

in performance for large delays with about half of this gain

available with latency of only 2 or 3 samples. Examining

Example 3 in Section V and the corresponding table III, we see

that the fixed lag smoothing approach presented there achieves

a virtually identical maximum gain of 39%, and from the table,

half of this gain is available at lags of only 2 or 3 samples. This

is intriguing since our add-on fixed lag smoothing approach

does not require any redesign of the standard DPCM decoder

as in [11] and [12].

Comparing to the results of Melkote and Rose [12], they do

not consider the low rate of R = 0.1 and the high correlation

of α = 0.98, but their plots imply a gain of 0.25 dB or less

at these rates with α = 0.95 for SDPCM over DPCM. Their

method does appear to obtain a gain of about 1 dB at a rate

of 0.25 for this value of α. It is not clear what values for the

driving term and observation noise variances were chosen in

their results.

The proposed decoder structure including the smoother is

shown in Fig. 3. The decoder at the transmitter has the same

structure, but the smoother is outside the encoding loop (as

it must be). The smoothed value of s(k) denoted as ss(k), is

used to replace the stored value of s(k) after the smoothing

has been accomplished. The path map symbols are generated

before the smoothing takes place, and therefore, are unchanged

by the smoothing process.

VII. OPTIMAL SMOOTHING AND THE SHANNON LOWER

BOUND

It is interesting to note that the smoother is choosing ss(k)to produce [18]

E{[s(k)− ss(k)]ss(k)|s(k + L)} = 0. (9)

Letting n(k) = s(k)− ss(k), then Equation (9) becomes

E{n(k)ss(k)|s(k + L)} = 0. (10)

If n(k) and ss(k) are Gaussian, then they are also independent,

and therefore the smoother is implementing a version of the

Shannon optimum backward channel [2, p. 94], but not at

the encoder; and further, it is outside of the decoding loop.

However, the channel input and output are not the same as

would be obtained for an optimum encoder, and as a result,

smoothing outside the decoding loop and only at the decoder

(as opposed to processing at the encoder) should be bounded

away from the rate distortion bound for the original source

and squared error distortion measure.

VIII. OPTIMALITY, ASSUMPTIONS, AND CAUTIONARY

TALES

When using the analytical results presented here and the

theoretical analyses in the several cited references to obtain

inferences for practical applications, there are a few caveats.

First, for any claims of source coding optimality or for any

specified rate distortion functions, the source model must be

accurate. For example, if the Shannon backward channel result

is invoked for an inaccurate source model, the independence

of the additive noise (error) will not accurately portray the

physical problem. Second, invoking the Gaussian assumption

in optimal encoding to obtain the forward channel with addi-

tive independent noise should be avoided when modeling the

behavior of a quantizer, since the quantization noise is not

independent of the quantizer input. Third, the assumption of

white noise to model the measurement or observation noise

for quantization error, used here and in the references, is

inaccurate in most cases. As noted in a prior section, this

assumption can be removed for our proposed approach using

colored noise fixed-lag smoothing algorithms as derived in

[19].

IX. CONCLUSIONS

The performance of differential predictive codecs can be

improved substantially by employing fixed-lag smoothing al-

gorithms to smooth the output of the standard decoder for

a causal encoder. In particular, the fixed-lag smoother can

provide performance improvements equivalent to a complete

reoptimization of the decoder. Therefore, the performance of

existing differential predictive coders can be improved without

redesign of the core codecs.

REFERENCES

[1] C. E. Shannon, “Coding theorems for a discrete source with a fidelitycriterion,” IRE Nat. Conv. Rec, vol. 4, no. 142-163, 1959.

[2] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: Prentice Hall,1971.

[3] N. S. Jayant, Waveform Quantization and Coding, N. S. Jayant, Ed.IEEE Press, 1976.

[4] J. D. Gibson, “Adaptive prediction in speech differential encodingsystems,” Proc. IEEE, vol. 68, no. 4, pp. 488–525, 1980.

[5] N. S. Jayant and P. Noll, Digital Coding of Waveforms. Prentice Hall,1984.

[6] J. D. Gibson, “Adaptive prediction for speech encoding,” IEEE ASSPMag., vol. 1, no. 3, pp. 12–26, 1984.

[7] J. B. O’Neal, “A bound on signal-to-quantizing noise ratios for digitalencoding systems,” Proc. IEEE, vol. 55, no. 3, pp. 287–292, 1967.

[8] J. Anderson and J. Bodie, “Tree encoding of speech,” IEEE Trans. Inf.Theory, vol. 21, no. 4, pp. 379–387, 1975.

[9] S. Wilson, “Adaptive tree encoding of speech at 8000 bits/s with afrequency-weighted error criterion,” IEEE Trans. Commun., vol. 27,no. 1, pp. 165–170, 1979.

[10] L. Stewart and R. M. Gray, “The design of trellis waveform coders,”IEEE Trans. Commun., vol. 30, no. 4, pp. 702–710, 1982.

[11] P. Ishwar and K. Ramchandran, “On decoder-latency versus performancetradeoffs in differential predictive coding,” in International Conferenceon Image Processing (ICIP), vol. 2, 2004, pp. 1097–1100.

[12] V. Melkote and K. Rose, “Optimal delayed decoding of predictively en-coded sources,” in IEEE International Conference on Acoustics Speechand Signal Processing (ICASSP), 2010, pp. 3470–3473.

[13] W.-W. Chang and J. D. Gibson, “Smoothed DPCM codes,” IEEE Trans.Commun., vol. 39, pp. 1351–1359, 1991.

[14] J. D. Gibson, “Tree coding versus differential encoding of speech: Aperspective,” in IEEE International Symposium on Information Theory(ISIT), Feb. 1981, p. 108.

[15] ——, “Tree coding versus differential encoding speech: A perspective,”Texas A&M University, Tech. Rep., July, 1980.

[16] M. L. Sethia and J. B. Anderson, “Interpolative DPCM,” IEEE Trans.Commun., vol. 32, no. 6, pp. 729–736, 1984.

[17] S. Chirarattananon and B. Anderson, “The fixed-lag smoother as a stablefinite-dimensional linear system,” Automatica, vol. 7, no. 6, pp. 657–669,Nov. 1971.

[18] A. P. Sage and J. L. Melsa, Estimation Theory with Applications toCommunications and Control. McGraw-Hill, 1971.

[19] R. J. Brown and A. P. Sage, “New algorithms for esimation withsequentially correlated measuement noise,” International Journal ofSystems Science, vol. 2, pp. 253–270, 1971.

[20] R. A. McDonald, “Signal-to-noise and idle channel performance ofdifferential pulse code modulation systems — particular applicationsto voice signals,” Bell System Technical Journal, vol. 45, no. 7, p.1123–1151, 1966.