a survey on methods for blind acoustic dereverberation830362/fulltext01.pdf · abstract...
TRANSCRIPT
A survey on methods for blind acoustic dereverberation
Masters Thesis in Electrical Engineering (ETD018)
Osunkunle Biodun Isaac Sayed Ali Shekarchi
[email protected] [email protected]
Supervisor: Benny Sallberg
Abstract
Reverberation is a phenomenon in auditoriums such as concert halls and churches. Re-
verberation consists of a combination of multiple echoes, and its intensity and duration
depends on factors such as the dimensions of the enclosure, materials used in construction
and shape. Reverberation is desirable in music reproduction, however, it renders speech
unintelligible. Thus there is a requirement to control reverberation of speech. This the-
sis work investigates the performances of different signal processing algorithms applied to
suppress reverberation. Theoretical methods which have been verified with simulations are
tested with real measurements. This gives a practical evaluation of the performance to be
expected in the use of the algorithms.
ii
CONTENTS
List of Figures viii
1 Introduction 1
1.1 Reverberation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The basics of room acoustics . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Reverberation Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Example Reverberation Calculation . . . . . . . . . . . . . . . . . . . . . . 4
1.5 System types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.1 Single Input Single Output System - SISO . . . . . . . . . . . . . . 5
1.5.2 Single Input Multiple Output (SIMO) Model . . . . . . . . . . . . . 6
1.5.3 Multiple Input Single Output (MISO) Systems . . . . . . . . . . . . 8
1.5.4 Multiple Input Multiple Output (MIMO) Systems . . . . . . . . . . 8
1.6 Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6.1 Normalized Projection Misalignment . . . . . . . . . . . . . . . . . 10
1.6.2 Signal to Deviation Ratio . . . . . . . . . . . . . . . . . . . . . . . 10
2 Supervised Inverse filtering based dereverberation 11
2.1 The NLMS algorithm for a Single Input Single Output System Identification 11
2.1.1 Supervised inverse filtering . . . . . . . . . . . . . . . . . . . . . . . 11
iii
2.1.2 Impulse Response Measurements - Channel identification using the
NLMS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Impulse Response Measurement Results . . . . . . . . . . . . . . . . . . . 14
2.2.1 Inverse Filtering and Performance Evaluation . . . . . . . . . . . . 15
2.3 Inverse Filtering - Least squares method . . . . . . . . . . . . . . . . . . . 26
2.4 Inverse Filtering - The Multichannel Inverse Theorem - MINT . . . . . . . 30
3 Robust Inverse filtering with MINT 33
3.1 Generalized MINT performance . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Regularization performance . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Algorithm Optimization Procedure . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Algorithm Optimization Procedure Step 1 - Delay . . . . . . . . . . 38
3.3.2 Algorithm Optimization Procedure Step 2 - Regularization . . . . . 39
3.3.3 Algorithm Optimization Procedure Step 3 - Filter Length . . . . . . 40
3.3.4 Algorithm Optimization Results . . . . . . . . . . . . . . . . . . . . 41
4 Unsupervised Inverse filtering based dereverberation 45
4.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Identifiability Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Constrained Time Domain Multichannel LMS . . . . . . . . . . . . . . . . 49
4.5 Constrained Time Domain Multichannel Newton Algorithm . . . . . . . . 56
4.6 Unconstrained Blind Multichannel LMS algorithm with Optimal Step Size
control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.7 Frequency Domain Normalized Multichannel LMS . . . . . . . . . . . . . . 60
4.8 Performance of Selected Blind Methods . . . . . . . . . . . . . . . . . . . . 67
5 Conclusion 75
6 Matlab Scripts 78
6.1 Two Channel Blind Identification : 3-tap channels . . . . . . . . . . . . . . 78
6.2 Identification with the NLMS Algorithm . . . . . . . . . . . . . . . . . . . 80
6.3 Normalized LMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4 Single Channel Dereverberation with NLMS . . . . . . . . . . . . . . . . . 84
6.5 System Identification using NLMS . . . . . . . . . . . . . . . . . . . . . . . 85
6.6 Blind SIMO LMS Well Conditioned Inputs . . . . . . . . . . . . . . . . . . 85
iv
6.7 Blind SIMO LMS Bad Conditioned Inputs . . . . . . . . . . . . . . . . . . 87
v
LIST OF FIGURES
1.1 Multiple sound propagation paths from source (S) to receiver (R) . . . . . 2
1.2 Example Reverberation Calculation[1] . . . . . . . . . . . . . . . . . . . . . 4
1.3 SISO Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 SIMO System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 MISO System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Multiple Input Multiple Output - MIMO System . . . . . . . . . . . . . . 9
2.1 Acoustic Channel Identification with the NLMS algorithm . . . . . . . . . 15
2.2 Impulse Response - Distance : 1m, Room : Amethyst, Channels : 1 - 4 . . 16
2.3 Impulse Response - Distance : 1m, Room : Amethyst, Channels : 5 - 7 . . 17
2.4 Smoothed Filter Coefficients - Distance : 1m, Room : Amethyst, Channels
: 1 - 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Smoothed Filter Coefficients - Distance : 1m, Room : Amethyst, Channels
: 5 - 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Impulse Response - Distance : 2m, Room : Amethyst, Channels : 1 - 4 . . 20
2.7 Impulse Response - Distance : 2m, Room : Amethyst, Channels : 5 - 7 . . 21
2.8 Smoothed Filter Coefficients - Distance : 2m, Room : Amethyst, Channels
: 1 - 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Smoothed Filter Coefficients - Distance : 2m, Room : Amethyst, Channels
: 5 - 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
vi
2.10 Measurement Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.11 Direct Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.12 Mean Square Error Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.13 MINT Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.14 Impuse Response Convolution with LS Inverse channels 1 to 4 . . . . . . . 26
2.15 Impuse Response Convolution with LS Inverse channels 5 to 7 . . . . . . . 27
2.16 A SISO acoustic system, with Loudspeaker S1, microphone M, and acoustic
room impulse response G . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.17 Inverse filtering a SISO system with the LSE Method . . . . . . . . . . . . 27
2.18 SDR with the supervised inverse filtering method . . . . . . . . . . . . . . 29
2.19 MINT Filtering of a 1-Input 2-Output system . . . . . . . . . . . . . . . . 30
3.1 SDR dependence on MINT filter delay with different SNR 2 channel system 38
3.2 SDR dependence on MINT filter delay with different SNR 3 channel system 39
3.3 MINT performance with a regularization parameter and different SNR val-
ues for a two channel system . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 MINT performance with a regularization parameter and different SNR val-
ues for a three channel system . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 SDR dependence on MINT filter length with different SNR 2 channel system 42
3.6 SDR dependence on MINT filter length with different SNR 3 channel system 43
3.7 Effects of filter length and delay on SDR with the MINT method with 2
Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.8 Effects of filter length and delay on SDR with the MINT method with 3
Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.1 Constrained Time Domain Multichannel LMS algorithm . . . . . . . . . . 55
4.2 Frequency Domain Normalized Multichannel LMS . . . . . . . . . . . . . . 68
4.3 Algorithm performance in well conditioned 2 channel, 3-tap system, 40dB
SNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Scaled impulse response estimates of a well conditioned system obtained
using the SIMO LMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5 Algorithm performance in badly conditioned 2 channel, 3-tap system, SNR
= 40dB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6 Scaled impulse response estimates of a badly conditioned system obtained
using the Multichannel Newton Algorithm . . . . . . . . . . . . . . . . . . 71
vii
4.7 Performance of algorithms on 16 tap 3-channel system . . . . . . . . . . . 72
4.8 Scaled filter estimates using the blind identification methods for a 16 tap
3-channel system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.9 Normalized Projection Misalignment using a Simulated Room Response,
10dB SNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.10 Impulse Response Estimates of Simulated Room Response . . . . . . . . . 74
5.1 System Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 System performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
viii
CHAPTER 1
Introduction
1.1 Reverberation
Reverberation is the phenomenon which can occur when acoustic waves are propagating
in an enclosure such as an auditorium. Possible sound paths are shown in Figure 1.1 [2].
If the source of sound is suddenly turned off, there will be some observed residual sound.
The reverberation effect is a desirable effect in music, however, it is undesirable for
speech. For speech to be intelligible, reverberation time must be less that two sec-
onds. Because the dimensions and other physical properties of an enclosure determine
its reverberation[1] properties, there is a conflicting requirement on reverberation in such
enclosures. In the situation that an individual needs to speak to the audience, reverber-
ation must be eliminated or minimized, and when an orchestral performance is to begin,
reverberation should be restored. Thus auditorium design require acoustic considerations.
The human body is a good sound absorber[1], between the time an auditorium is empty
and full, there will be a significant change in acoustic properties of the auditorium. Be-
cause of this, seats in auditoriums are made of materials which absorb sound similar to
the human body. Thus the equipment configuration such as equalization and amplitude
settings will not require adjustment.
1
Chapter 1 starts by outlining concepts of reverberation and models which we use in
its study in this chapter. In Chapter 2, supervised identification of the acoustic channels
using the Normalized Least Mean Squares (NLMS) algorithm is considered. These chan-
nels which may not be minimum phase, which implies that they will not have an inverse,
however, a least squares inverse is obtained using the NLMS algorithm. In order to im-
prove performance of the system, the MINT (Multichannel Inverse Theory) method, which
obtains the inverse of a channel by using a set of channels to represent a single channel, is
applied, thereby introducing the possibility of an exact inverse. The sensitivity of MINT
to noise and channel identification errors is investigated in Chapter 3. In Chapter 4, al-
gorithms for unsupervised blind channel identification are derived and implemented. To
conclude the study we use the MINT method to find the inverse of the blindly identified
channels.
Figure 1.1: Multiple sound propagation paths from source (S) to receiver (R)
1.2 The basics of room acoustics
The physical interpretation of terminology applied in the audio community follows: A
room with long reverberation time is referred to as a live room [3], while one with a short
reverberation time is referred to as a dead room. The term intimacy is a psychological
feeling of proximity to the source of sound, which a spectator experiences. Intimacy is
experienced if the listener hears the first reverberant sound less than 20 milliseconds after
the arrival of the original sound. A small room is usually considered intimate, however,
special consideration such as a canopy placed above a stage, or orchestral shells on the
stage can be used to achieve intimacy in auditoriums[3]. The relative amplitude for the
reverberant sound to the direct sound is termed the fullness of the sound. Fullness is the
2
opposite of clarity. Fullness is due to a long reverberation time, clarity is due to a short
reverberation time. When there is an increase in reverberation at low frequencies compared
to high frequencies, the sound is considered warm, while if the reverberation time decreases
as frequency decreases, the sound is considered as brilliant. The time difference between
the arrival of the direct sound and the first few reverberations determines the texture
of the sound. Good texture requires that the first five reflections arrive a the listener at
about 60 milliseconds after the direct sound. Also, the amplitude of the reverberations
should decrease at a constant rate. When the sounds from different performers combine
with uniform distribution at the observer, the sound is consider to have a good blend.
The reverberant sounds must not be heard by the performers on stage, but only by the
audience.
1.3 Reverberation Estimation
A mathematical relationship between the dimensions of a room and its reverberation was
derived in 1898 by Wallace Sabine[1]. Sabine described the time interval it took for sound
intensity to drop by 60dB as the reverberation time. Sabine realized that the reverberation
time is proportional to volume of the room and the materials of which the walls and objects
in the room are made of. Sabine proposed a formula for the determination of reverberation
time as:
RT60 =0.049V
Sa(1.1)
where
RT60- reverberation time in seconds
V - Volume of room in cubic feet
S- total surface area of room, sq ft
a- average absorption coefficient of room surfaces
Sa- total absorption in Sabins
A perfectly absorbing square foot of material is considered to have the absorption
value of 1 sabin. It has been observed that the absorption coefficients usually provided by
manufacturers of building materials are Sabine coefficients.
3
Figure 1.2: Example Reverberation Calculation[1]
1.4 Example Reverberation Calculation
An example reverberation calculation in shown in Figure 1.2, where the absorption coef-
ficient for an enclosure with a concrete floor and gypsum board walls and ceiling. It can
be seen that the absorption of gypsum board is greater than of concrete. Both materials
have a frequency dependence of the absorption coefficient. At the last row in the table,
the reverberation time for the frequency in each column is calculate. It can be seen that
at 1kHz the reverberation time is 3.39 seconds, this will make speech unintelligible.
In order to develop algorithms applicable to any enclosure, we assume a model on which
the make assumptions on reverberation and noise. These are described next.
4
1.5 System types
Application of signal processing techniques in the control of reverberation requires a model
of the system which is under consideration. This model will be used in algorithm devel-
opment and simulations, before practical testing of the concepts on real systems. Models
considered in this thesis are assumed to be linear and shift invariant. These system models
are generally classified into four types namely:
• Single Input Single Output (SISO) model
• Single Input Multiple Output (SIMO) model
• Multiple Input Single Output (MISO) model
• Multiple Input Multiple Output (SISO) model
1.5.1 Single Input Single Output System - SISO
This is the simplest model used in signal processing and it is very useful for initial analysis.
It is also very widely used. The SISO model is shown in Figure 1.3 and as the name suggests,
it describes a system with a single input, which gives a single output signal. This output
signal is dependent on the input and is given by:
x(k) = h ∗ s(k) + b(k) (1.2)
where x(k) is the output signal, s(k) is the source signal of interest, and b(k) is additive
noise at the output of the system. The symbol * represents the linear convolution operation.
There may be noise at the input of the system which could have added to the source signal
s(k) before it gets convolved with the system h, but this kind of noise is disregarded in this
work.
H(z)
s(k)
b(k)
x(k)+
Figure 1.3: SISO Model
5
1.5.2 Single Input Multiple Output (SIMO) Model
This model forms the core of the research work of this study. The system is shown in figure
1.4.
H1(z)
b1(k)
x1(k)
H2(z)
b2(k)
x2(k)
HN(z)
bN(k)
xN(k)
.
.
.
.
.
.
s(k)
Figure 1.4: SIMO System
In this system, a single source signal s(k) produces multiple output signals. The system
can be represented mathematically as
xn(k) = hn ∗ s(k) + bn(k), n = 1, 2, ..., N (1.3)
where the output signals are identified by their subscripts, and the signal have the same
meaning as the SISO system. If vector notation is utilized, this system can be re written
as
6
xn(k) = hTns(k) + bn(k) (1.4)
where
h = [h0 h1 ... hL−1]T
s(k) = [s(k) s(k − 1) ... s(k − L + 1)]T
[·]T has been used to represent the transpose of a vector in this expression.
The subscripts can be discarded if matrix notation is adopted for the SIMO system,
which gives:
x(k) = Hs(k) + b(k) (1.5)
where
x(k) = [x1(k)x2(k)...xN (k)]T
H =
h1,0 h1,1 ... h1,L−1
h2,0 h2,1 ... h2,L−1
......
......
hN,0 hN,1 ... hN,L−1
NxL
b(k) = [b1(k)b2(k)...bN (k)]T
7
1.5.3 Multiple Input Single Output (MISO) Systems
It is possible to have an acoustic system with multiple input sources and a single microphone
signal as output. This is referred to as a Multiple Input Single Output (MISO) system and
is shown below in Figure 1.5.
H1(z)
H2(z)
HM(z)
bN(k)
s1(k)
s2(k)
sM(k)
.
.
.
. . .
x1(k)
Figure 1.5: MISO System
1.5.4 Multiple Input Multiple Output (MIMO) Systems
A system with multiple signal sources, and multiple microphone pickups(outputs) is re-
ferred to as a Multiple Input Multiple Output (MIMO) system, and is illustrated in Figure
1.6. This is the most general configuration of all acoustic system models. Any MIMO
system can be decomposed into a set of SIMO systems, thus this study focuses on the
SIMO system.
8
s1(k)
s2(k)
sM(k)
H11(z)
H21(z)
HN1(z)
H31(z)
H11(z)
H21(z)
HN1(z)
H31(z).
.
.
H11(z)
H21(z)
HNM(z)
H31(z).
.
.
b1(k)
b2(k)
b3(k)
bN(k)
x1(k)
x2 (k)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
x3 (k)
xN (k)
Figure 1.6: Multiple Input Multiple Output - MIMO System
9
1.6 Performance Criteria
1.6.1 Normalized Projection Misalignment
The Normalized Projection Misalignment (NPM) [4] criterion projects estimation onto true
impulse response, ignoring scaling factors. This is mandatory in many situations, because
the estimated impulse response is usually a scaled version of the true impulse response.
NPM(k) =‖ς(k)‖‖h‖ (1.6)
where
ς(k) = h− hT h(k)
h(k)T h(k)h(k) (1.7)
This implies that with a perfectly identified set of channels, the NPM will be zero.
1.6.2 Signal to Deviation Ratio
Signal to Deviation Ration (SDR) evaluates the difference between the original and dere-
verberated signals. This difference is determined as
SDR = 10 · log10
( ∑N
k=0 s(k)∑N
k=0 (s(k) − s(k))
)
(1.8)
where:
s(k) is the original source signal and
s(k) is the dereverberated source signal
This implies that the higher the SDR, the more closely the dereverberated signal ap-
proaches the original signal.
10
CHAPTER 2
Supervised Inverse filtering based
dereverberation
2.1 The NLMS algorithm for a Single Input Single
Output System Identification
The NLMS algorithm is one of a class of algorithms, which provides an approximation of
the Wiener solution to a given system setup[5][6][7]. In the first part of this work, we used
the NLMS algorithm as described in [6] to obtain an supervised inverse filter of an acoustic
channel.
2.1.1 Supervised inverse filtering
The steps that were taken are detailed below:
1. Channel identification using the NLMS algorithm
2. Estimation of minimum steady state error in ideal system identification
3. Calculation of observed relative error in measurement
11
4. Selection of RT60 using averaged squared-impulse response amplitude decay
5. Re-estimation of impulse response using RT60
6. Inverse filtering of output signal using observed impulse response
2.1.2 Impulse Response Measurements - Channel identification
using the NLMS algorithm
The system setup for channel identification based on the SISO model is shown in Figure
2.1. The signal received by the microphone contains, in addition to that received from
the speaker, ambient noise. This ambient noise comprises ventilation noise, computer
power supply noise, computer fan noise, fluorescent lighting noise, and other possible noise
contributors. This noise limits the error performance of the system identification. The
source signal (white noise bandlimited to 4kHz) is generated using CoolEditPro Software
for Windows. Analysis of the theoretical minimum error is as follows:
yi = k ∗ wi (2.1)
xi = k ∗ hi + ni (2.2)
ei = k ∗ hi + ni − k ∗ wi (2.3)
At the initial state wi = 0, the ratio of the error variance to the signal variance. This
ratio is evaluated as relative error
relative error = 10 × log10
(var(k ∗ hi + ni)
var(k ∗ hi + ni)
)
= 0 dB (2.4)
After convergence, wi = hi, where hi is the estimate of the room impulse response and the
error can be approximated as
ei = k ∗ hi + ni − k ∗ hi ≈ ni (2.5)
12
This means that the measure minimum relative error will be given by:
minimum relative error = 10 × log10
(var(ni)
var(k ∗ hi + ni)
)
= 10 × log10
(var(ni)
var(xi)
)
(2.6)
Equation 2.6 implies that the best results are obtained when the noise power is minimal.
The theoretical and observed values of the minimum relative error and the corresponding
relative error can be seen in tables 2.1, 2.2 and 2.3.
Channel Variance of Variance of Minimum Measured RT60 Filter betaNumber ambient noise desired signal Error Error length
2 9,91301E-06 0,000741502 -18,73907176 -17,83349399 3500 5000 0.33 1,6257E-05 0,00138798 -19,31342429 -18,38510005 3500 5000 0.34 1,05435E-05 0,000777626 -18,67785155 -18,0108574 3500 5000 0.35 5,67785E-06 0,000404648 -18,5289299 -17,35749282 3500 5000 0.36 2,62405E-05 0,001071439 -16,10995427 -14,620191 3500 5000 0.37 8,69834E-06 0,000621331 -18,53886558 -17,55140998 3500 5000 0.38 2,5568E-05 0,001025777 -16,03356823 -14,2437262 3500 5000 0.3
Table 2.1: Room:Amethyst - Error Analysis and Measurement at a distance of 2m
Channel Variance of Variance of Minimum Measured RT60 Filter betaNumber ambient noise desired signal Error Error length
2 1,55207E-05 0,000454341 -14,66469448 -15,06718021 3500 5000 0.33 2,72855E-05 0,000830839 -14,83584301 -14,85222986 3500 5000 0.34 1,53082E-05 0,000458906 -14,76798329 -14,67470254 3500 5000 0.35 8,60381E-06 0,000243119 -14,51127992 -14,38037285 3500 5000 0.36 4,89463E-05 0,000631809 -11,10865978 -11,38598911 3500 5000 0.37 1,34268E-05 0,000360833 -14,29335597 -14,28905174 3500 5000 0.38 4,71883E-05 0,000639579 -11,32060273 -10,98358546 3500 5000 0.3
Table 2.2: Room:Amethyst - Error Analysis and Measurement at a distance of 3m
13
Channel Variance of Variance of Minimum Measured RT60 Filter betaNumber ambient noise desired signal Error Error length
2 N/A 0,000857483 N/A -17,8348108 2100 5000 0.33 N/A 0,001721477 N/A -18,35951894 2100 5000 0.34 N/A 0,000990037 N/A -18,30126177 2100 5000 0.35 N/A 0,000438182 N/A -17,21057975 2100 5000 0.36 N/A 0,001236768 N/A -15,44003787 2100 5000 0.37 N/A 0,00078554 N/A -18,2797752 2100 5000 0.38 N/A 0,001378459 N/A -15,71278552 2100 5000 0.3
Table 2.3: Room:Smaragden - Error Analysis and Measurement at a distance of 1m
2.2 Impulse Response Measurement Results
The identification of the impulse response at different distances in two rooms are shown
next. From the impulse response, the RT60 of the reverberation was estimated by measur-
ing the time for the filter weights to decay to 60dB below the initial level. The observed
impulse response and corresponding filter coefficient values are shown in figures 2.2, 2.3,
2.4 and 2.5 for measurements taken at a distance of 1 meter in room Amethyst. Impulse
response and filter coefficients obtained at a distance of 2 meters is shown in figures 2.6,
2.7, 2.8 and 2.9 respectively. From the Smoothed filter coefficients, an estimate of the
RT60 can be obtained to be 2100 samples in room Smaragden, and 3500 samples in room
Amethyst. The measurement setup used in this experiment is shown in figure 2.10.
k - white noise generated using CoolEditPro, band limited to 4kHz
hi - room impulse response for each channel ( i = 1, 2, 3, 4, 5, 6, 7)
wi - impulse response of adaptive filter after convergence (system identification)
di - filtered broadband noise output from room impulse response
ni - observed ambient noise at each channel (obtained from the first 5000 samples before
the excitation signal is activated, or at the end of the recorded signal after the
reverberation has decayed to > 99%)
xi - recorded signal at each microphone output (desired) for channel 1, 2, ... , 7
ei - error signal of NLMS algorithm for each channel
14
w
+ +room
h
k d
observation noise
n
y
e
white noise
x
Figure 2.1: Acoustic Channel Identification with the NLMS algorithm
2.2.1 Inverse Filtering and Performance Evaluation
The following steps were taken to perform inverse filtering using the NLMS algorithm:
1. Filter source signal using impulse response estimated with the NLMS algorithm
2. Identify the inverse filter using the NLMS algorithm
3. Perform a convolution of identified impulse response and inverse impulse response
verified to be a delta function
4. Perform an analysis of the convolution in step 3
The result of the convolution of the inverse filter of room Amethyst at a distance of 1 meter
is show for all the channels in figures 2.14 and 2.15. A perfect inverse filter should give the
kronecker delta as the result. It can be observed that the figures indicate the kronecker
delta is being approached.
15
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 1 Min Error = −17.8891 Observed Error = −16.5325
Filter tap #
W
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 2 Min Error = −18.1333 Observed Error = −16.7462
Filter tap #
W0 1000 2000 3000 4000 5000
−0.1
−0.05
0
0.05
0.1
0.15
Channel = 3 Min Error = −17.5344 Observed Error = −16.6053
Filter tap #
W
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 4 Min Error = −17.4135 Observed Error = −16.637
Filter tap #W
Figu
re2.2:
Impulse
Resp
onse
-D
istance
:1m
,R
oom
:A
meth
yst,
Chan
nels
:1
-4
16
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 5 Min Error = −15.4438 Observed Error = −13.4331
Filter tap #
W
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 6 Min Error = −16.9427 Observed Error = −15.9498
Filter tap #
W0 1000 2000 3000 4000 5000
−0.1
−0.05
0
0.05
0.1
0.15
Channel = 7 Min Error = −15.3329 Observed Error = −13.3704
Filter tap #
W
Figu
re2.3:
Impulse
Resp
onse
-D
istance
:1m
,R
oom
:A
meth
yst,
Chan
nels
:5
-7
17
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 1 Min Error = −17.8891 Observed Error = −16.5325
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 2 Min Error = −18.1333 Observed Error = −16.7462
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 3 Min Error = −17.5344 Observed Error = −16.6053
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 4 Min Error = −17.4135 Observed Error = −16.637
Fiter tap #A
vera
ged
W2
Figu
re2.4:
Sm
ooth
edFilter
Coeffi
cients
-D
istance
:1m
,R
oom
:A
meth
yst,
Chan
nels
:1
-4
18
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 5 Min Error = −15.4438 Observed Error = −13.4331
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 6 Min Error = −16.9427 Observed Error = −15.9498
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 7 Min Error = −15.3329 Observed Error = −13.3704
Fiter tap #
Ave
rage
d W
2
Figu
re2.5:
Sm
ooth
edFilter
Coeffi
cients
-D
istance
:1m
,R
oom
:A
meth
yst,
Chan
nels
:5
-7
19
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 1 Min Error = −18.7391 Observed Error = −17.8335
Filter tap #
W
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 2 Min Error = −19.3134 Observed Error = −18.3851
Filter tap #
W0 1000 2000 3000 4000 5000
−0.1
−0.05
0
0.05
0.1
0.15
Channel = 3 Min Error = −18.6779 Observed Error = −18.0109
Filter tap #
W
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 4 Min Error = −18.5289 Observed Error = −17.3575
Filter tap #W
Figu
re2.6:
Impulse
Resp
onse
-D
istance
:2m
,R
oom
:A
meth
yst,
Chan
nels
:1
-4
20
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 5 Min Error = −16.11 Observed Error = −14.6202
Filter tap #
W
0 1000 2000 3000 4000 5000−0.1
−0.05
0
0.05
0.1
0.15
Channel = 6 Min Error = −18.5389 Observed Error = −17.5514
Filter tap #
W0 1000 2000 3000 4000 5000
−0.1
−0.05
0
0.05
0.1
0.15
Channel = 7 Min Error = −16.0336 Observed Error = −14.2437
Filter tap #
W
Figu
re2.7:
Impulse
Resp
onse
-D
istance
:2m
,R
oom
:A
meth
yst,
Chan
nels
:5
-7
21
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 1 Min Error = −18.7391 Observed Error = −17.8335
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 2 Min Error = −19.3134 Observed Error = −18.3851
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 3 Min Error = −18.6779 Observed Error = −18.0109
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 4 Min Error = −18.5289 Observed Error = −17.3575
Fiter tap #A
vera
ged
W2
Figu
re2.8:
Sm
ooth
edFilter
Coeffi
cients
-D
istance
:2m
,R
oom
:A
meth
yst,
Chan
nels
:1
-4
22
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 5 Min Error = −16.11 Observed Error = −14.6202
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 6 Min Error = −18.5389 Observed Error = −17.5514
Fiter tap #
Ave
rage
d W
2
0 1000 2000 3000 4000 5000−70
−60
−50
−40
−30
−20
−10
0
10Channel = 7 Min Error = −16.0336 Observed Error = −14.2437
Fiter tap #
Ave
rage
d W
2
Figu
re2.9:
Sm
ooth
edFilter
Coeffi
cients
-D
istance
:2m
,R
oom
:A
meth
yst,
Chan
nels
:5
-7
23
2m
4cm
Height of Microphone Array = 1m
Height of speaker = 1m
Microphone Array
Speaker
Figure 2.10: Measurement Setup
H(z)
s(k)
b(k)
x(k)G(z) =
H(z)
1 s(k)^
+
Figure 2.11: Direct Inverse
24
H(z)
s(k)
b(k)
x(k)G(z)
s(k)^
z
+
+-k d
ds(k - k )
Figure 2.12: Mean Square Error Inverse
s(k)+H1(z)
+H2(z)
+HN(z)
.
.
.
.
.
.
.
.
.
b1(k)
b2(k)
bN(k)
G1(z)
G2(z)
GN(z)
.
.
.
s(k)x1(k)
x2(k)
xN(k)
SIMO System MINT Inverse Filters
Figure 2.13: MINT Method
25
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
Convolution of Inverse Filter and Impulse Response Channel = 1
Offset #
Am
plitu
de
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
Convolution of Inverse Filter and Impulse Response Channel = 2
Offset #
Am
plitu
de
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
Convolution of Inverse Filter and Impulse Response Channel = 3
Offset #
Am
plitu
de
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
Convolution of Inverse Filter and Impulse Response Channel = 4
Offset #
Am
plitu
de
Figure 2.14: Impuse Response Convolution with LS Inverse channels 1 to 4
2.3 Inverse Filtering - Least squares method
Sequel to system identification, there is a requirement to obtain the inverse of the identified
filters. It can be expected that the inverse filter can be obtained by the mathematical
inversion of the impulse response, however, in the case of room impulse responses, the
inverse filter will be unstable. This is because room impulse responses are generally non
minimum-phase[8].
As shown in figure 2.16 and depicted in figure 2.17, a SISO acoustic system consisting
of a speaker S1 and microphone M, with a transfer function G(z−1) is considered. The
inverse of the transfer function will be
H(z−1) =1
G(z−1)(2.7)
An approximate inverse filter can be obtained using the least squares error criterion[9].
This least square estimate inverse is constructed using a stable FIR filter. Considering the
system in figure 2.16, the following relationship exists between inverse filter h(k) and the
26
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
Convolution of Inverse Filter and Impulse Response Channel = 5
Offset #
Am
plitu
de
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
Convolution of Inverse Filter and Impulse Response Channel = 6
Offset #
Am
plitu
de
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0
0.2
0.4
0.6
0.8
Convolution of Inverse Filter and Impulse Response Channel = 7
Offset #
Am
plitu
de
Figure 2.15: Impuse Response Convolution with LS Inverse channels 5 to 7
H(z-1)Acoustic
transmission
channel G(z-1
)
Input Output
Inverse FilterS1M
Figure 2.16: A SISO acoustic system, with Loudspeaker S1, microphone M, and acousticroom impulse response G
g(k) h(k)Input Output
Linear SISO system FIR Filter
Figure 2.17: Inverse filtering a SISO system with the LSE Method
27
system impulse response g(k)
d(k) = g(k) ∗ h(k) (2.8)
where
d(k) =
{
1, when k = 0
0, when k = 1, 2, ...
linear convolution is denoted with the ∗ symbol. In matrix form, we can write equation
2.8 as
1
0
0...............
0
︸ ︷︷ ︸
L+1 × 1
=
g(0) 0 0
g(1) g(0) 0... g(1) � 0
g(m)... 0
0 g(m) g(0)
0 0 g(1)
0 0 �...
0 0 g(m)
︸ ︷︷ ︸
L+1 × i+1
h(0)
h(1)......
h(i)
︸ ︷︷ ︸
i+1 × 1
(2.9)
equation 2.9 can also be written
D = GH (2.10)
wherem = order of the z-transform of g(k)
i = order of the FIR inverse filter
L = i + m
D = (L+1) by 1 column vector
H = (i+1) by 1 column vector
G = (L+1) by (i+1) matrix
Equation 2.10 is overdetermined, as there are more rows than columns in matrix G as given
by
L + 1 = m + i + 1 > i + 1 (2.11)
An approximate solution using the least squares method can be computed using the rela-
tionship
H = (GTG)−1GTD (2.12)
28
where GT is the transpose of matrix G and the error energy of the approximate filter is
given by
MSE = (D −GH)T (D − GH) (2.13)
This error does not converge to zero even if the order of the inverse is increased, because g(k)
has a nonminimum phase impulse response[10]. Thus there is a limit to the performance
that can be obtained with the least squares error method. An exact inverse of the room
impulse response can be obtained using the multiple input/output inverse theorem (MINT).
The performance of the LSE method when used as an inverse filter is shown in figure 2.18.
To improve this performance, MINT can be used. To apply MINT, additional channels
are introduced in the room impulse response.
0 10 20 30 40 501
2
3
4
5
6
7
8
9
10
11
SD
R in
dB
SNR in dB
Nonblind Inverse Filtering
NLMS methodFilter Length = 300 tapsStep Size = 0.07 Delay = 165 samples
Figure 2.18: SDR with the supervised inverse filtering method
29
g1(k) h1(k)x(k) y(k)
g2(k) h2(k)
+
1 input 2 output SIMO system Inverse Filters
Figure 2.19: MINT Filtering of a 1-Input 2-Output system
2.4 Inverse Filtering - The Multichannel Inverse The-
orem - MINT
The MINT method is illustrated in figure 2.19, in which an extra channel has been added
to the SISO system of figure 2.17. The exact inverse filter is obtained by satisfying
D(z−1) = 1 = G1(z−1)H1(z
−1) + G2(z−1)H2(z
−1) (2.14)
where
D(z−1) is the z-transform of d(k) in 2.8
Because G1(z−1),G2(z
−1),H1(z−1) and H2(z
−1) are polynomials of z−1, the solution for
equation 2.14 exist when these conditions are satisfied
1. G1(z−1) and G2(z
−1) are relatively prime i.e. they do not have any same zero in the
z-plane
2. The orders of the inverse filters H1(z−1) and H2(z
−1) are less than those of G1(z−1)
and G2(z−1)
The solutions to the polynomial equation 2.14, will be the coefficients to a pair of FIR
filters H1(z−1) and H2(z
−1), and will be an exact inverse of the system [10].
To compute the coefficients of these FIR filters (solve the equation) we write equation 2.14
in the time domain as
d(k) = g1(k) ∗ h1(k) + g2(k) ∗ h2(k) (2.15)
where
30
g1(k) and g2(k) denote the impulse response functions of G1(z−1) and G2(z
−1)
h1(k), h2(k) denote the response functions of H1(z−1) and H2(z
−1)
A compact notation in the time domain can be achieved through the use of matrix formu-
lation
D = G1H1 + G2H2 =[
G1 G2
][
H1
H2
]
(2.16)
or
1
0
0...............
0
︸ ︷︷ ︸
L+1×1
=
g1(0) 0 0 g2(0) 0 0
g1(1) g1(0) 0 g2(1) g(0) 0... g1(1) � 0
... g2(1) � 0
g1(m)... 0 g2(m)
... 0
0 g1(m) g1(0) 0 g2(n) g2(0)
0 0 g1(1) 0 0 g2(1)
0 0 �... 0 0 �
...
0 0 g1(m) 0 0 g2(n)
︸ ︷︷ ︸
L+1×i+j+2
h1(0)
h1(1)......
h1(i)
h2(0)
h2(1)......
h2(j)
︸ ︷︷ ︸
i+1×1
(2.17)
where
m + 1, n + 1 denotes the length of the impulse responses g1(k) and g2(k)
i, j denote the length of the impulse responses of H1(z−1) and H2(z
−1)
L = m + i = j + n
D denotes an (L + 1) × 1 column vector
[HT
1 HT2
]denotes a (i + j + 2) × 1 column vector
[G1G2] denotes a (L + 1) × (i + j + 2) matrix
31
[G1G2] is forced to become a square matrix by selecting the length i and j of the FIR
inverse filters to satisfy
i = n − 1
j = m − 1
The coefficients of the FIR filters, h1(k), h2(k) can thus be computed by[
H1
H2
]
= [G1G2]−1 D (2.18)
The SIMO sytem using three MINT channels is also considered in this study. In this case,
we have
D(z−1) = 1 = G1(z−1)H1(z
−1) + G2(z−1)H2(z
−1) + G3(z−1)H3(z
−1) (2.19)
Similar to equation 2.16
D = G1H1 + G2H2 + G3H3 =[
G1 G2 G3
]
H1
H2
H3
(2.20)
And the filter coefficients can be obtained by
H1
H2
H3
= [G1G2G3]
−1 D (2.21)
System dereverberation using filters obtained with the MINT method, with two and three
channels are compared in this study.
32
CHAPTER 3
Robust Inverse filtering with
MINT
It cannot be assumed that the transfer functions between a source and receiver in a room,
referred to as room transfer function (RTF) will be invariant. In this chapter, the per-
formance of the MINT method with respect to three parameters are investigated. These
parameters are
• Regularization
• Filter length
• Modeling delay
These parameters are varied, under different signal to noise ratio situations. These param-
eters were adjusted to reduce the filter energy, since less signal degradation is experience
in filters with less energies[11]. A SIMO system is considered in this performance investi-
gation.
33
3.1 Generalized MINT performance
Performance of the MINT method can be improved with the use of a regularization pa-
rameter, equation 2.16
D = G1H1 + G2H2 =[
G1 G2
][
H1
H2
]
(3.1)
Generalizing, equation 2.16
D = G1H1 + G2H2 + ... + GPHP =[
G1 G2 ... GP
]
H1
H2
...
HP
(3.2)
can be re-written as
D = GH (3.3)
where
G = [G1 G2...GP ]
And
Gi =
gi(0) 0 0
gi(1) gi(0) 0... gi(1) � 0
gi(J)... 0
0 gi(J) gi(0)
0 0 gi(1)
0 0 �...
0 0 gi(J)
︸ ︷︷ ︸
M×J+M
34
H = [H1 H2...HP ]T
= [h1(1), ..., h1(M), ..., hP (1), ..., hP (M)]
D = [0, ..., 0︸ ︷︷ ︸
d
, 1, ..., 0]T
where
M denotes the filter length for each channel
H is the inverse filter vector
P denotes the number of channels
hi(n) is the impulse response estimate between the source and i-th microphone
J denotes the number of taps of the impulse response estimate
d denotes a modelling delay
3.2 Regularization performance
Using a cost function to design the inverse filter, we can write
C = ‖D − GH‖2 + δ‖H‖2 (3.4)
where δ is used as a regularization parameter. The minimization of the cost function of
equation 3.4 is given by[12]
H(δ) = (GTG + δI)−1GTD (3.5)
35
where I is an identity matrix, and the power of the L2 norm of this filter vecctor is given
by
‖H(δ)‖2 ≤ ‖(GTG + δI)−1GT‖2
= ‖G(GTG + δI)−1(GTG + δI)−1GT‖= ‖GTG(GTG + δI)−1(GTG + δI)−1‖≈ ‖(GTG + δI)−1‖
=1
µmin
[(GT G + δI)]
≤ ‖H‖2 (3.6)
Where the approximation above was obtained by applying the Taylor series expansion to
(GTG + δI)−1. Let (GTG)−1 denote T. If δ is sufficiently small, then ‖I‖ ≫ ‖δG‖, and
(GTG + δI)−1 = ((GTG)(I + δ(GTG)−1))−1
= ((GTG)(I + δT))−1
= (I + δT)−1(GTG)−1
= (I − δT + δ2T2 − ...)(GTG)−1
≈ (GTG)−1 (3.7)
The definition of the L2 norm used in equation 3.6 is defined as[12]
‖A‖ =1
µmin
[A]
where µmin[A] is the minimum singular values of matrix A. The regularization parameter δ
increases the minimum singular value and reduces the norm of the inverse filter, this reduces
the sensitivity to impulse response variation[11]. Because the regularization parameter also
reduces the accuracy of the inverse filter, a compromise is selected. The performance due
to regularization is shown in figure 3.4 for a three-channel MINT system. This system
was simulated with a fixed delay of 175 taps and different SNR. It can be observed that
the optimum value of the regularization parameter is 10−7, at this point, the SDR for the
dereverberated signal is obtained as 23dB, for an input signal of 50 dB. The choice of delay
was determined by the optimum value obtained in the performance study of the MINT
method with varying delay. Figure 3.3 shows the optimum values of SDR in a two-channel
36
MINT system. The delay used for the two-channel system is 115 taps as obtained from the
performance curve of the MINT system, with filter length as a variable, in a two channel
MINT system.
37
3.3 Algorithm Optimization Procedure
In order to obtain the parameters for the optimal results, a three step procedure was
utilized. These steps were applied to the two and three channel MINT inverse filter system.
3.3.1 Algorithm Optimization Procedure Step 1 - Delay
The appropriate delay is obtained for the MINT filter as shown in figures 3.1 and 3.2 for
signals corrupted by three different input SNR situations. It is observed that by choosing
the appropriate delay the performance can be improved significantly in both two and three
channel MINT systems. Also, it can be seen that the optimum delay is independent of
noise power in the signal(SNR)
0 20 40 60 80 100 120 140 160 180 200−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
Delay
SD
R d
B
Effect of delay on the output
SNR = 10 dBSNR = 20 dBSNR = 30 dB
optimum delay=115
Figure 3.1: SDR dependence on MINT filter delay with different SNR 2 channel system
38
0 20 40 60 80 100 120 140 160 180 200−80
−70
−60
−50
−40
−30
−20
−10
0
Dealy
SD
R d
B
Optimum delay for 3 channel MINT filter
SNR=10 dBSNR=20 dBSNR=30 dB
optimum delay=175
Figure 3.2: SDR dependence on MINT filter delay with different SNR 3 channel system
3.3.2 Algorithm Optimization Procedure Step 2 - Regularization
The optimum regularization parameter is obtained as shown in figure 3.3 and 3.4, which
reveals noisier the environment, the higher the regularization parameter is required. Also
the regularization parameter is more effective in noisy environments.
39
−15 −10 −5 0−25
−20
−15
−10
−5
0
5
10
15
20
25
regularization paramete 10^
SD
R in
dB
SDR dependence on regularization parameter in various SNR
SNR =10 dBSNR =20 dBSNR =30 dBSNR =40 dBSNR =50 dB
Delay=115 taps
Figure 3.3: MINT performance with a regularization parameter and different SNR valuesfor a two channel system
3.3.3 Algorithm Optimization Procedure Step 3 - Filter Length
The filter norm depends on the filter length M. This implies that we should use the min-
imum possible filter length for the inverse filter. The observed MINT performance while
the filter lengths is varied is shown in figures 3.5 and 3.6 for a two and three channel MINT
system. The minimum required filter length is given by
Lh =L − 1
N − 1(3.8)
where
L - denotes the channel length
N - denotes the number of the channels
The figures show performance for extra taps added to the system impulse response filter
length.
40
−15 −10 −5 0−30
−20
−10
0
10
20
30
regularization parameter 10^
SD
R in
dB
SDR dependence on regularization for various SNR
10dB20dB30dB40dB50dB
Delay = 175 taps
λ =10−3
Figure 3.4: MINT performance with a regularization parameter and different SNR valuesfor a three channel system
3.3.4 Algorithm Optimization Results
In order to evaluate the the decision delay dependence on filter length in two and three
channel MINT filter, figures 3.7 and 3.8 are plotted. As we see in the two channel MINT
filter, by adding more coefficients to filter length the optimum decision delay varies, how-
ever, in three channel mint filter, a longer filter length does not affect the best decision
delay.
Comparing the least squares inverse to two channel and three channel MINT algorithms
it can be concluded that the least squares inverse method has less computational complexity
however, even in a noisy environment the de-reverberated signal using MINT has less
distortion than the least squares method.
The least squares inverse method cannot improve the performance (SDR) by increasing
SNR and the trend will become saturated due to the non minimum phase room impulse
response, while MINT can improve SDR proportional to SNR
In a two channel MINT system, these parameters, are highly dependent to each other,
thus obtaining the best value for each one is tricky while in three channel MINT the
dependency is relatively much less. It is also observed that there is no significant improve-
41
0 20 40 60 80 100 120 140 160 180 2006
8
10
12
14
16
18
20
22
24
Extending taps to minimum filer length
SD
R in
dB
SDR dependence on MINT filter length for various SNR
SNR = 10 dBSNR = 20 dBSNR = 30 dBSNR = 40 dBSNR = 50 dB
Delay=115 tapsoptimum λ
Figure 3.5: SDR dependence on MINT filter length with different SNR 2 channel system
ment in the de-reverberated signal from two channel MINT to three channel MINT and
computational complexity will increase by increasing the number of channels in MINT
method.
42
0 20 40 60 80 100 120 140 160 180 2000
5
10
15
20
25
30
35
Extending taps to minimum filer length
SD
R in
dB
SDR dependence on MINT filter length for various SNR
SNR=50 dB ,Regularization parameter=10−7
SNR=40 dB,Regularization parameter=10−6
SNR=30 dB,Regularization parameter=10−5
SNR=20 dB,regularization parameter=10−4
SNR=10 dB,Regularization parameter=10−3
Figure 3.6: SDR dependence on MINT filter length with different SNR 3 channel system
100120
140160
180200
220240
260280
300
0
50
100
150
20013
14
15
16
17
18
19
Delay
Dependency of Delay and Filer length in 2 channel
Adding length
SD
R d
B
SNR=30 dB2 channel
Figure 3.7: Effects of filter length and delay on SDR with the MINT method with 2Channels
43
180200
220240
260280
300320
340
0
50
100
150
2004
6
8
10
12
14
16
18
20
Delay
Dependency of Delay and Filer length in 3 channel
Adding length
SD
R d
B
Figure 3.8: Effects of filter length and delay on SDR with the MINT method with 3Channels
44
CHAPTER 4
Unsupervised Inverse filtering
based dereverberation
4.1 Basic Principles
The method employed uses the second order statistics of a signal. As shown in [13], if the
single input multiple output system shown in Figure1.4 is considered, for any pair of two
noise-free output signals xi(k) and xi(j) it yields that
xi(k) = hi(k) ∗ s(k) (4.1)
xj(k) = hj(k) ∗ s(k) (4.2)
then
hj(k) ∗ xi(k) = hj(k) ∗ [hi(k) ∗ s(k)]
= hi(k) ∗ [hj(k) ∗ s(k)]
= hi(k) ∗ xj(k) (4.3)
45
This equation shows that a relationship exists between the outputs of each channel pair.
This is because they have the same source. The relationship depends on each individual
channel response. Thus by taking advantage this relationship, an overdetermined system
of equations can be formulated using input data and the corresponding output. Under
certain conditions, the impulse responses hi(k) and hj(k) can be obtained uniquely, up to
a scalar multiple [13]. This identification can be achieved as follows: For k = L, ..., N ,
where N is the last sample index of the received data xi(k) and xj(k) equation 4.3 becomes
N − L + 1 linear equations involving hj(·) and hi(·)
[
Xi(L)... − Xj(L)
][
hj
hi
]
= 0 (4.4)
where hm∆= [hm(L), . . . , hm(0)]T and
Xm(L) =
xm(L) xm(L + 1) ... xm(2L)
xm(L + 1) xm(L + 2) ... xm(2L + 1)...
.... . .
...
xm(N − L) xm(N − L + 1) ... xm(N)
(4.5)
Because this equation can be written for each pair of channels (i,j), the set of linear equa-
tions is combined as shown in [13] to solve all the channel impulse responses simultaneously.
Denoting all the channel impulses by h∆=[hT
1 , . . . , hTL
]Tand
Xi(L) =
0 · · · 0 Xi+1(L) −Xi(L) 0 0...
... 0. . . 0
0 · · · 0 XM(L) 0 0 −Xi(L)
(4.6)
In the noise free case, h is in the null space of the following matrix:
X(L) =
X1(L)...
XM−1(L)︸ ︷︷ ︸
Mblocks
(4.7)
46
This can be written as
X(L)h = 0 (4.8)
In order to find solutions to the above equation, certain conditions must be met which are
discussed in the next section.
4.2 Identifiability Conditions
Blind channel identifiability using the cross relations method is dependent on the condition
of the input signals and the condition of the input.
• The channel coefficients do not share any common roots (they are coprime).
• There must be an input signal to excite the channel
• The autocorrelation matrix of the input signal is of full rank
4.3 Algorithms
The set of simultaneous linear equations obtained from the channel outputs can solved
using adaptive algorithms [4]. The following adaptive algorithms are considered
• Constrained time domain Multichannel LMS
• Constrained Time Domain Multichannel Newton Algorithm
• Unconstrained Blind Multichannel LMS with Optimal Stepsize Control
• Frequency Domain Normalized Multichannel LMS
47
As outlined in [4], the constrained time domain multichannel LMS follows. In the
absence of noise:
xi ∗ hj = s ∗ hi ∗ hj = xj ∗ hi, i, j = 1, 2, ..., N, i 6= j, (4.9)
thus at time k
xTi (k)hj = xT
j (k)hi, i, j = 1, 2, ..., N, i 6= j (4.10)
where
xn(k) = [xn(k) xn(k − 1) ... xn(k − L + 1)]T , n = 1, 2, ..., N (4.11)
and multiplying equation 4.10 by xi(k) and taking the expectation, yields
Rxixihj = Rxixj
hi, i, j = 1, 2, ..., N, i 6= j (4.12)
where Rxixj= E
{xi(k)xT
j (k)}. Equation 4.12 contains N(N − 1) different equations, and
by summing up N − 1 cross relations associated with one particular channel hj, yields
N∑
i=1,i6=j
Rxixihj =
N∑
i=1,i6=j
Rxixjhi, j = 1, 2, ..., N, i 6= j (4.13)
This gives N equations when all the channels are considered, and the set of equations for
all the channels can be written as
Rx+h = 0 (4.14)
where:
Rx+ =
∑
n 6=1 Rxnxn−Rx2x1 ... −RxN x1
−Rx1x2
∑
n 6=2 Rxnxn... −RxN x2
xm(L + 1) xm(L + 2) ... xm(2L + 1)...
.... . .
...
−Rx1xN−Rx2xN
...∑
n 6=N Rxnxn
(4.15)
When the identifiability conditions have been met and there is no noise, the matrix Rx+
is rank deficient by one, thus the a unique channel impulse response can be determined.
However, in the presence of noise, the right hand side of equation 4.14 is not equal to zero,
48
but becomes an error vector given as:
e = Rx+h (4.16)
A cost function based on this error can be defined as
J = ‖e‖2 = eTe (4.17)
This cost function can be minimized in the least squares sense in order to determine the
estimate vector h
h = arg minhJ = arg minhhTRT
x+Rx+h (4.18)
The estimate h is the eigenvector of Rx+ obtained from the smallest eigenvalue of Rx+.
Rx+ is positive definite. The estimated channel impulse response is a non-zero scaled
fraction of the true impulse response, even thought it is properly oriented along the true
impulse response in vector space. The algorithms developed also have constraints limiting
the estimated impulse response from attaining the trivial all-zero estimate in the process
of converging.
4.4 Constrained Time Domain Multichannel LMS
In order to derive the update equation for the constrained time domain multichannel LMS
algorithm(also referred to as Multichannel LMS: MCLMS), the cross relations between
output i and output j in equation 4.10 are used. In the presence of noise, the a priori error
signal produced is
eij(k + 1) = xTi (k + 1)hj(k) − xT
j (k + 1)hi(k), i, j = 1, 2, ..., N (4.19)
in this equation, hi(k) is the filter channel estimate for channel i at time k, and if all the
channel estimation errors are of the same importance, a cost function can be defined:
χ(k + 1) =N−1∑
i=1
N∑
j=i+1
e2ij(k + 1) (4.20)
This expression excludes cases where eii(k) = 0, (i = 1, 2, ..., N) and counts the eij(k) =
−eij(k) pair only once. A unit norm constraint is imposed on the channel estimates h(k)
49
in order to avoid an all zero estimate, giving a normalized error signal
ǫij(k + 1) =eij(k + 1)∥∥∥h(k)
∥∥∥
(4.21)
The cost functions is
J(k + 1) =
N−1∑
i=1
N∑
j=i+1
ǫ2ij(k + 1) =
χ(k + 1)
‖h(k)‖2(4.22)
The update equation of the constrained time domain multichannel LMS is
h(k + 1) = h(k) − µ∇J(k + 1) (4.23)
As proposed in the LMS update equation, µ is a small positive step size and ∇J(k + 1) is
the gradient of J(k + 1) with respect to h(k). This gradient
∇J(k + 1) =∂J(k + 1)
δh(k)=
∂
∂h(k)
[
χ(k + 1)
‖h(k)‖2
]
=∂
∂h(k)
[
χ(k + 1)
h(k)T h(k)
]
=1
‖h(k)‖2
[
∂χ(k + 1)
∂h(k)− 2J(k + 1)h(k)
]
(4.24)
where
∂χ(k + 1)
∂h(k)=
(
∂χ(k + 1)
∂h1(k)
)T (
∂χ(k + 1)
∂h2(k)
)T
...
(
∂χ(k + 1)
∂hN (k)
)T
T
(4.25)
50
Next the partial derivative of χ(k + 1) is evaluated with respect to the coefficients of each
of the n channel impulse responses (n = 1, 2, ... N)
∂χ(k + 1)
∂hn(k)=
∂[∑N−1
i=1
∑N
j=i+1 e2ij(k + 1)
]
∂hn(k)
=n−1∑
i=1
2ein(k + 1)xi(k + 1) +N∑
j=n+1
2enj(k + 1)[−xj(k + 1)]
=n−1∑
i=1
2ein(k + 1)xi(k + 1) +N∑
j=n+1
2ejn(k + 1)[xj(k + 1)]
=
N∑
i=1
2ein(k + 1)xi(k + 1) (4.26)
In matrix form, this equation can be expressed as follows:
∂χ(k + 1)
∂hn(k)= 2X(k + 1)en(k + 1)
= 2X(k + 1) [Cn − Dn(k + 1)] h(k) (4.27)
51
where:
X(k + 1) = [x1(k + 1) x2(k + 1) . . . xN(k + 1)]L×N
en(k + 1) = [e1n(k + 1) e2n(k + 1) . . . eNn(k + 1)]T
=
xT1 (k + 1)hn(k) − xT
n (k + 1)h1(k)...
xTN(k + 1)hn(k) − xT
n (k + 1)hN (k)
= [Cn(k + 1) − Dn(k + 1)] h(k)
Cn(k + 1) =
0 . . . 0 xT1 (k + 1) 0 . . . 0
... . . ....
...... . . .
...
0 . . . 0 xTN(k + 1) 0 . . . 0
N×NL
=[0N×(n−1)L XT (k + 1) 0N×(N−n)L
]
Dn(k + 1) =
xTn (k + 1) 0 . . . 0
0 xTn (k + 1) . . . 0
......
. . ....
0 0 . . . xTn (k + 1)
N×NL
(4.28)
The two matrix products in equation 4.27 can be evaluated
X(k + 1)Cn(k + 1) = X(k + 1)[0N×(n−1)L XT (k + 1)0N×(N−n)L
]
N×NL
=
[
0L×(n−1)L
N∑
i=1
Rxixi(k + 1) 0L×(N−n)L
]
L×NL
(4.29)
and
52
X(k + 1)Dn(k + 1) =
[
Rx1xn(k + 1) Rx2xn
(k + 1) Rx2xn(k + 1) . . . RxNxn
(k + 1)]
L×NL(4.30)
where Rxixj(k + 1) = xi(k + 1)xT
j (k + 1) (i, j = 1, 2, ..., N). The tilde represents the
instantaneous value of Rxixj, and by substituting equation 4.29 and equation 4.30 into
equation 4.27
∂χ(k + 1)
ˆ∂hn(k)= 2
[
−Rx1xn(k + 1) − Rx2xn
(k + 1) . . .∑
i6=n
Rxixi(k + 1) . . . − RxN xn
(k + 1)
]
h(k)
(4.31)
Thus, using Equation 4.31 into equation 4.24 yields
∂χ(k + 1)
∂h(k)= 2Rx+(k + 1)h(k) (4.32)
∇J(k + 1) =1
‖h(k)‖2
[
2Rx+(k + 1)h(k)]
− 2J(k + 1)h(k) (4.33)
where:
Rx+ =
∑
n 6=1 Rxnxn(k) −Rx2x1(k) ... −RxN x1(k)
−Rx1x2(k)∑
n 6=2 Rxnxn(k) ... −RxN x2(k)
......
. . ....
−Rx1xN(k) −Rx2xN
(k) ...∑
n 6=N Rxnxn(k)
(4.34)
The update equation is derived by substituting equation 4.33 into equation 4.23 as
h(k + 1) = h(k) − 2µ
‖h(k)‖2
[
Rx+(k + 1)h(k) − J(k + 1)h(k)]
(4.35)
If this estimate is normalized with the norm constraint, it results in the final form of the
update equation
h(k + 1) =h(k) − 2µ[Rx+(k + 1)h(k) − χ(k + 1)h(k)]
‖h(k) − 2µ[Rx+(k + 1)h(k) − χ(k + 1)h(k)]‖(4.36)
53
For this update algorithm to converge, it can be shown that the step size must satisfy[9]
0 < µ <1
λmax
(4.37)
This equation λmax is the largest eigenvalue of matrix E{
Rx+(k) − J(k)INL×NL
}
and
INL×NL is an identity matrix of size NL by NL. The expectation of Equation 4.35 gives:
Rx+h(∞)
‖h(∞)‖= E {J(∞)} h(∞)
‖h(∞)‖(4.38)
which is the desired outcome, because h converges in the mean to the eigenvector of Rx+
corresponding to the smallest eigenvalue of E {J(∞)} The Constrained Time Domain
Multichannel LMS algorithm is illustrated in figure 4.1
54
Initialize hi = [1 0 0 … 0]T /
i = 1, 2, … N
x2
x1
x3
h1
h2
h3
)1(~
kRx
x1 x2 x3
Nj
i
kh
kx
kh
kx
ei
T jj
T iij
...,
,2,1
,
)(
ˆ)1
()
(ˆ
)1(
)(ˆ
)1(
)(ˆ
)1(
~2
)1(
)1(
1 11
2
kh
kk
hk
Rh
ke
k
x
N i
N ij
ij
xi*h2
xi*h1
xi*h3
^
^
^
)1()(ˆ
)1()(ˆ)1(ˆ
khkh
khkhkh
Figure 4.1: Constrained Time Domain Multichannel LMS algorithm
55
4.5 Constrained Time Domain Multichannel Newton
Algorithm
The Constrained Time Domain Multichannel Newton Algorithm (also referred to as the
Multichannel Newton Algorithm: MCN)generally has a better performance than the MCLMS
algorithm[4]. However, it is computationally intensive do to the nature of the Newton
Method[14]. The Newton update equation is given by:
h(k + 1) = h(k) − E−1{∇2J(k + 1)
}∇J(k + 1) (4.39)
where ∇2J(k + 1) is the Hessian matrix of J(k + 1) with respect to h(k). By taking the
derivative of equation 4.33 with respect to h(k) and using the formula[4]
∂
∂h(k)
[
J(k + 1)h(k)]
= h(k)
[
∂J(k + 1)
∂h(k)
]T
+ J(k + 1)INL×NL (4.40)
= h(k)[∇J(k + 1)]T + J(k + 1)INL×NL (4.41)
The obtained Hessian matrix is
∇2J(k + 1) =2{
Rx+(k + 1) −[
h(k)(∇J(k + 1))T + J(k + 1)INL×NL
]}
‖h(k)‖2
−4[
Rx+(k + 1)h(k) − J(k + 1)h(k)]
hT(k)
‖h(k)‖4(4.42)
By using the unit-norm constraint ‖h(k)‖ = 1, equation 4.42 can be re-written as
∇2J(k + 1) = 2{
Rx+(k + 1) − h(k) [∇J(k + 1)]T − J(k + 1)INL×NL
}
− 4[
Rx+(k + 1)h(k) − J(k + 1)h(k)]
hT(k) (4.43)
By taking the mathematical expectation of equation 4.43 and using the independence
56
assumption[12] yields
E{∇2J(k + 1)
}= 2Rx+ − 4h(k)h
T(k) − 4Rx+h(k)h
T(k)
−2E {J(k + 1)}[
INL×NL − 4h(k)hT(k)]
(4.44)
Because Rx+ and E {J(k + 1)} are unknown, estimated values are used. J(k+1) decreases
as the estimates approach convergence and will become small after convergence, the term
E {J(k + 1)} can be neglected in equation 4.44 for simplification. Matrix Rx+ is estimated
recursively conventionally as shown
Rx+(1) = diag{σ2
x1, . . . , σ2
x1, σ2
x2, . . . , σ2
x2, . . . , σ2
xN, . . . , σ2
xN
}
Rx+(k + 1) = λRx+(k) + Rx+(k + 1), fork ≥ 1 (4.45)
where σ2xn
(n = 1, 2, ..., N) is the power of xn(k) and λ(0 < λ < 1) is an exponential
forgetting factor. An estimate W(k + 1) for the mean Hessian matrix of J(k + 1) can be
obtained as
W(k + 1) = 2Rx+(k + 1) − 4h(k)hT(k)Rx+(k + 1) − 4Rx+(k + 1)h(k)h
T(k) (4.46)
from this equation, the multichannel Newton algorithm can be obtained:
h(k + 1) =h(k) − 2ρW−1(k + 1)
[
Rx+(k + 1))h(k)]
− χ(k + 1))h(k)
‖h(k) − 2ρW−1(k + 1)[
Rx+(k + 1))h(k)]
− χ(k + 1))h(k)‖(4.47)
where ρ is the step size which is close to but less than 1.
57
4.6 Unconstrained Blind Multichannel LMS algorithm
with Optimal Step Size control
The unconstrained blind multichannel LMS (also referred to as Variable Step Size Uncon-
strained Multichannel:LMS- VSSUMCLMS)with optimal step size control algorithm has
the potential to overcome the slow convergence property of the Constrained Blind Multi-
channel LMS algorithm. This is due to the dynamic control of the step size depending on
the signal properties. The gradient of the cost function in the update equation 4.35 of the
multichannel LMS
∇J(k + 1) ≈ 2Rx+(k + 1)h(k)
‖h(k)‖2(4.48)
excluding the unit norm constraint gives
h(k + 1) = h(k) − 2µRx+(k + 1)h(k + 1) (4.49)
This update algorithm will not converge to the all-zero estimate if the initial estimate ˆh(0)
and the true channel impulse response are orthogonal to one another. This can be shown
as true by pre-multiplying Equation 4.49 with hT and we get
hT h(k + 1) = hT h(k) − 2µhT Rx+(k + 1)h(k) (4.50)
As previously shown in the cross relations approach in equation 4.10, in the absence of
noise the following relationship holds
hT Rx+(k + 1) = 0T (4.51)
This relationship implies that the gradient ∇J(k + 1) is orthogonal to h at any time k,
further yielding that equation 4.50 becomes
hT h(k + 1) = hT h(k) (4.52)
this shows that hT h(k + 1) is time invariant for the unconstrained multichannel LMS
(MCLMS) algorithm. If hT h(0) 6= 0 then h(k) will not converge to zero. To determine the
update equation for the unconstrained MCLMS, we begin by representing the model filter
with two components, where one component h⊥(k) is parallel to the true impulse response
58
h(k) and the other h‖(k) is perpendicular to the true impulse response
h(k) = h⊥(k) + h‖(k) (4.53)
Because the gradient ∇J(k +1) is orthogonal to h, and h is parallel to h‖(k) then ∇J(k +
1) is orthogonal to h‖(k), implying that the update equation 4.49 of the unconstrained
MCLMS algorithm can be represented with a pair of equations
h⊥(k + 1) = h⊥(k) − µ∇J(k + 1) (4.54)
and
h‖(k + 1) = h‖(k) (4.55)
This pair of equations shows that the unconstrained MCLMS algorithm update equation
adapts the model filter coefficients only in the direction perpendicular to h. Because the
SIMO FIR system identification is a scaled version of the true channel impulse response,
the identification estimate misalignment of h(k) with respect to the true channel impulse
response vector h will be
d(k) =minα ‖h− αh(k)‖2 (4.56)
variable α represents an arbitrary scale. By inserting the representation of h(k) given by
equation 4.53 into equation 4.56 the minimum value of d(k) is obtained as
d(k) =minα[
‖h(k)‖2α2 − 2‖h‖(k)‖‖h‖α + ‖h‖2]
=‖h‖2
1 +(
‖h‖(k)‖/‖h⊥(k)‖)2 (4.57)
This shows that the ratio of ‖h‖(k)‖ to ‖h⊥(k)‖ is an indication of how close the estimate
is to the true impulse response, indicating the optimal step size µopt(k + 1) for the uncon-
strained multichannel LMS algorithm at time k + 1 will be such that it makes h⊥(k + 1)
to have a minimum norm
µopt(n + 1) = argminµ ‖h⊥(k + 1)‖
= argminµ ‖h⊥(k) − µ∇J(k + 1)‖ (4.58)
59
To minimize the norm of h(k+1) = h(k)−µ(k+1)∇J(k+1), the stepsize µ(k+1) should
be such that h(k + 1) is orthogonal to ∇J(k + 1). Thus h(k) is projected onto ∇J(k + 1)
to obtain the optimum step size
µopt(k + 1) =h(k)∇J(k + 1)
‖∇J(k + 1)‖2(4.59)
This algorithm is termed the Variable Step-Size Unconstrained Multichannel LMS (VSS-
UMCLMS)[4].
4.7 Frequency Domain Normalized Multichannel LMS
The motivation for the frequency-domain normalized multichannel LMS algorithm (FNM-
CLMS) is presented in two stages. First the frequency domain unnormalized multichannel
LMS algorithm is derived. Secondly, the algorithm is modified for normalization. Normal-
ization of the algorithm is desired because the unnormalized frequency-domain multichan-
nel LMS has a slow convergence rate. The slow convergence is due to[4]
• the cross coupling between the channels
• the overall convergence rate is determined by the slowest converging channel
Netwon’s method is applied in the normalization procedure, further, approximations are
utilized to simplify the updated algorithm. Thus eigenvalue differences are reduced and
convergence is improved.
To derive the frequency domain normalized multichannel LMS, the following represen-
tations will be used
Time domain vector - x
Time domain matrix - X
Frequency domain vector - x⇒
Frequency domain matrix - X⇒We begin by defining signal yij(k+1) as the result of convolving xi(k+1) with the j the
model filter hj(k)
60
yij(k + 1)∆= xi(k + 1) ∗ hj(k) (4.60)
Using the overlap and save method, let vector yij(t + 1) of length 2L and denote the
result of the circular convolution between xi(t + 1) and hj(t) yields
yij(t + 1) = Cxi(t + 1)h
10
j (t) (4.61)
where
yij(t + 1) = [yij(tL) yij(tL + 1) . . . yij(tL + 2L − 1)]T
Cxi(t + 1) =
xi(tL) xi(tL + 2L − 1) . . . xi(tL + 1)
xi(tL + 1) xi(tL) . . . xi(tL + 2)...
.... . .
...
xi(tL + 2L − 1) xi(tL + 2L − 2) . . . xi(tL)
h10
j (t) =[
hT
j (t) 0TL×1
]T
=[
hj,0(t) · · · hj,L−1(t) 0 · · · 0]T
The matrix Cxi(t + 1) is a circulant matrix. The last L points in the circular convolution
corresponds to the results of a linear convolution:
yij(t + 1) = W01L×2Lyij(t + 1)
= W01L×2LCxi
(t + 1)h10
j (t)
= W01L×2LCxi
(t + 1)W102L×Lhj(t) (4.62)
where
yij(t + 1) =[yij(tL) yij(tL + 1) · · · yij(tL + L − 1)
]T
W01L×2L = [0L×L IL×L]
W102L×L = [IL×L 0L×L]T
hj(t) =[
hj,0(t) hj,1(t) . . . hj,L−1(t)]T
overlapping the input data blocks by L points, and discarding the first circular convo-
lution results.
61
An a priori error signal based on cross relations:
eij(t + 1) = yij(t + 1) − yji(t + 1)
= W01L×2L
[
Cxi(t + 1)W10
2L×Lhj(t) − Cxj(t + 1)W10
2L×Lhi(t)]
(4.63)
where the DFT is used to perform the circular convolution, with the DFT transform
matrix F defined as
FL×L =
1 1 1 ... 1
1 e−j 2πL e−j 4π
L ... e−j2π(L−1)
L
1 e−j 4πL e−j 8π
L ... e−j4π(L−1)
L
......
.... . .
...
1 e−j2π(L−1)
L e−j4π(L−1)
L ... e−j2π(L−1)2
L
(4.64)
where j is the square root of -1. Matrix FL×L and its inverse are related by
FHL×L = LF−1
L×L (4.65)
and (.)H is the Hermitian transpose of a matrix. F2L×2L and F−12LL can be decomposed as
Cxi(t + 1)
Cxi(t + 1) = F−1
2L×2L D⇒
xi(t + 1)F2L×2L (4.66)
where D⇒
xi is a diagonal matrix with the diagonal elements given by the DFT of the
first column of Cxi(t+1), which is the overlapped ith channel output of the (t+1)th block:
xi(t + 1)2L×1 = [xi(tL) xi(tL + 1) . . . xi(tL + 2L − 1)]T (4.67)
By multiplying equation 4.63 by FL×L using (4.66) to determine the block error sequence
in the frequency domain
e−→ij(t + 1) = FL×Leij(t + 1)
= FL×LW01L×2L
[
Cxi(t + 1)W10
2L×Lhj(t) −Cxj(t + 1)W10
2L×Lhi(t)]
= W−→01
L×2L
[
D−→xi(t + 1)W−→
102L×L h−→j(t) − D−→xj
(t + 1)W−→102L×L h−→i(t)
]
(4.68)
62
where
W−→01
L×2L= FL×LW
01L×2LF
−12L×2L
W−→10
2L×L= F2L×2LW
102L×LF−1
L×L
and h−→i(t) is the L-point DFT of the vector hi(t) at the tth block.
The frequency-domain mean square error criterion similar to that of the time domain
is now formulated
Jf = E {Jf(t)} (4.69)
where the instantaneous square error of the tth block is given by Jf(t) as
Jf(t) =
N−1∑
i=1
N∑
j=i+1
e−→Hij (t) e−→ij(t) (4.70)
By taking the partial derivative of Jf(t) with respect to h−→∗n(t) where n = 1, 2, .., N and
(.)∗ is the complex conjugate and taking h−→∗n(t) is a constant yields
∂Jf
∂ h−→∗n(t)
= E
{
∂Jf (t + 1)
∂ h−→∗n(t)
}
(4.71)
63
A single sample is used as an estimate of the expectation as proposed in the LMS algorithm,
thus the instantaneous value is
∂Jf (t + 1)
∂ h−→∗n(t)
=∂
∂ h−→∗n(t)
[N−1∑
i=1
N∑
j=i+1
e−→Hij (t + 1) e−→ij(t + 1)
]
=∂
∂ h−→∗n(t)
[n−1∑
i=1
e−→Hin(t + 1) e−→in(t + 1)
]
+∂
∂ h−→∗n(t)
[N∑
j=n+1
e−→Hnj(t + 1) e−→nj(t + 1)
]
=
n−1∑
i=1
[
W−→01L×2LD−→xi
(t + 1)W−→102L×L
]H
e−→in(t + 1) −
N∑
j=n+1
[
W−→01L×2LD−→xj
(t + 1)W−→102L×L
]H
e−→nj(t + 1)
=
N∑
i=1
[
W−→01L×2LD−→xi
(t + 1)W−→102L×L
]H
e−→in(t + 1) (4.72)
the last simplification step is due to e−→nn(t) = 0, and using this gradient, with a small
positive step size given by µf , the frequency domain unconstrained multichannel LMS al-
gorithm is obtained:
h−→n(t + 1) = h−→n(t) − µfW−→10L×2L
N∑
i=1
D−→∗xi
(t + 1)W−→012L×L e−→in(t + 1) (4.73)
where
W−→10
L×2L= FL×LW
10L×2LF−1
2L×2L =1
2
(
W−→10
2L×L
)H
W−→01
2L×L= F2L×2LW01
2L×LF−1L×L = 2
(
W−→01
L×2L
)H
W−→10
L×2L= [IL×L 0L×L]
W−→01
2L×L= [0L×L 0L×L]T
Selecting a unit norm yields
‖ h−→(t)‖2 =‖h(t)‖2
L=
1
L(4.74)
64
where
h−→(t)∆=[
h−→T1 (t) h−→
T2 (t) . . . h−→
TN(t)
]T
(4.75)
To deduce the frequency-domain constrained multichannel LMS (FCMLMS) algorithm,
the unit norm constraint is enforced on equation 4.73 to get
h−→n(t + 1) =h−→n(t + 1) = h−→n(t) − µfW−→
10L×2L
∑N
i=1 D−→∗xi
(t + 1)W−→012L×L e−→in(t + 1)
√L‖ h−→n(t + 1) = h−→n(t) − µfW−→
10L×2L
∑N
i=1 D−→∗xi
(t + 1)W−→012L×L e−→in(t + 1)‖
,
n = 1, 2, ..., N (4.76)
Normalization
As previously stated, Newton’s method is applied. If we define
S⇒
xn(t + 1)∆= W−→
01
L×2LD⇒
xn(t + 1)W−→10
2L×L(4.77)
where n = 1, 2, ..., N And the frequency domain block error in equation 4.68 can be written
as
e−→ij(t + 1) = S⇒
xi(t + 1) h−→j(t) − S⇒
xj(t + 1) h−→i(t) (4.78)
and the instantaneous gradient in equation 4.72 can be written as
∂Jf (t + 1)
∂ h−→∗n(t)
=N∑
i=1
S⇒
xiH(t + 1) e−→in(t + 1) = S
⇒
H(t + 1) e−→n(t + 1) (4.79)
where
S−→(t + 1) = [ S−→Hx1
(t + 1) S−→Hx2
(t + 1) ... S−→HxN
(t + 1)]H
e−→n(t + 1) = [ e−→T1n(t + 1) e−→
T2n(t + 1) ... e−→
TNn(t + 1)]H
Evaluating the Hessian matrix of Jf(t + 1) with respect to the filter coefficients can be
computed by taking the row gradient of 4.79
65
T−→n(t + 1) =∂
∂ h−→Tn (t)
[
∂Jf (t + 1)
∂ h−→∗n(t)
]
=∂
∂ h−→Tn (t)
[
S⇒
H(t + 1) e−→n(t + 1)
]
= S⇒
H(t + 1)∂ e−→
TNn(t + 1)
∂ h−→n(t)
=
N∑
i=1,i6=n
S⇒
xHi (t + 1)S
⇒xi(t + 1) (4.80)
and the filter coefficients will be updated as
h−→n(t + 1) = h−→n(t) − ρf T−→−1n (t + 1)S
⇒
H(t + 1) e−→n(t + 1) (4.81)
ρf is the new step size of this algorithm. Further simplifications are then applied as shown
in [4] to obtain the frequency domain normalized multichannel LMS algorithm update
equation as
h−→10n (t + 1) = h−→
10n (t) − ρf P−→
−1
6n (t + 1)N∑
i=1
D⇒
x∗i (t + 1) e−→
01in(t + 1) (4.82)
and n = 1, 2 ..., N the symbols used are
h−→10n (t) = W−→
10
2L×Lh−→n(t) = F2L×2L
[
h−→Tn (t) 01×L
]T
e−→01in(t + 1) = W−→
10
2L×Le−→in(t + 1) = F2L×2L
[
01×L e−→Tin(t + 1)
]
where
P−→6n(t + 1) = λp P−→6n(t) + (1 − λp)N∑
i=1,i6=n
D⇒
xi(t + 1)D⇒
x∗i (t + 1) (4.83)
P−→6n is the power spectrum of the multiple channel outputs, which is obtained using the
recursion given in equation 4.83, a forgetting factor λp is set as
λp =
[
1 − 1
3L
]L
a small regularization parameter can be applied to the normalized algorithm to overcome
the situation where the input signal is too small, which causes the inverse of the power
66
calculation to diverge, this makes the frequency domain normalized multichannel LMS
algorithm to be implemented as
h−→10n (t + 1) = h−→
10n (t) − ρf
[
P−→−1
6n (t + 1) + δI2L×2L
] N∑
i=1
D⇒
x∗i (t + 1) e−→
01in(t + 1) (4.84)
and n = 1, 2 ..., N
The Frequency Domain Normalized Multichannel LMS algorithm is illustrated in figure
4.2
4.8 Performance of Selected Blind Methods
The performance of the selected blind identification methods were studied using increas-
ingly longer channel impulse response, and with varying SNR. As the purpose of this study
is to de-reverberate room impulse responses, algorithms which perform with channel im-
pulse responses which are greater than 256 taps are required. At the end of the tests, the
frequency domain normalized multichannel LMS algorithm was selected for the complete
two-stage dereverberation system. This decision was made because of the speed of this
algorithm, due to the frequency domain operation. The tests were executed in three stages
with different SNR values in each stage
• 3 tap channel impulse response
• 16 tap random channel impulse response
• ≥256 tap channel room impulse response
The simulations were performed in the following sequence - a) white noise is filtered with
the impulse response, b) white noise is added to the filtered signals to simulate SNR, c)
the algorithm is applied on the filtered signals to recover the impulse responses.
Minimal length channel impulse response estimation
Initial tests for convergence were carried out using a simple two-channel 3-tap system given
by
h1 = [1 − 2 cos(θ) 1]T
h2 = [1 − 2 cos(θ + v) 1]T
67
FF
T
x1
x2
x3
x1
x2
x3
Nj
i
ht
xh
tx
te
ij
ji
ij
,...
,2,
1,
ˆ.)
1(
ˆ.)
1(
)1(
La
st L
ele
me
nts
of e
ij
Conjugate x
IFF
TF
FT
Average Power of x
Regularize and invert
h1
h2
h3
FFT
h1
h2
h3
Schur product : sum of error between
channel and others, regularized
inverse of average power
IFF
T
h1
h2
h3
h1
h2
h3
h1
h2
h3
^^ ^
Initia
lize h
i = [1
0 0
… 0
] T
i = 1
, 2, …
N
^
^
^
Figure 4.2: Frequency Domain Normalized Multichannel LMS
68
0 50 100 150 200 250 300−60
−50
−40
−30
−20
−10
0
Sample
Mis
alig
nmen
t dB
Normalized Projection Misalignment
VSS−UMCLMSMCLMS µ = 0.005MCLMS µ = 0.01
0 50 100 150 200 250 3000
5
10
15
20
25
30
35
40
Sample
Mag
nitu
de
Error
VSS−UMCLMSMCLMS µ = 0.005MCLMS µ = 0.01
Figure 4.3: Algorithm performance in well conditioned 2 channel, 3-tap system, 40dB SNR
The relative conditioning of the channels can be controlled by changing the value of v. All
the blind identification algorithms performed satisfactorily with this two channel system
when the channels are well conditioned. Figure 4.3 shows the response of the unconstrained
multichannel LMS algorithm with optimal step-size control and that of the constrained
time domain multichannel multichannel LMS with µ = 0.005 and µ = 0.01. It can be
observed that a NPM of between -30dB to -55dB is achieved for the algorithms after 300
samples. Figure 4.4 shows the scaled estimated impulse responses estimated using the
constrained time domain multichannel LMS compared with the original channel impulse
responses. The constrained time domain multichannel LMS algorithm did not perform well
with a badly conditioned channel pair. Figure 4.5 shows the performance of the previous
algorithms, and also that of the multichannel newton algorithm using ρ = 0.5, and the
error estimation as the algorithm progresses up to 8000 samples. The SNR is 40dB in
this configuration, and it can be observed that the multichannel LMS algorithm diverges
for both values of µ, however, the unconstrained blind multichannel LMS with optimal
step-size control and the multichannel newton algorithms converge with NPMs of -30dB
and -40dB respectively. The estimate impulse responses are shown in figure 4.6.
69
1 1.5 2 2.5 3−2
−1.5
−1
−0.5
0
0.5
1h1 to be identified
1 1.5 2 2.5 30
0.5
1
1.5
2h2 to be identified
1 1.5 2 2.5 3−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3h1 estimate
1 1.5 2 2.5 30
0.05
0.1
0.15
0.2
0.25
0.3
0.35h2 estimate
Figure 4.4: Scaled impulse response estimates of a well conditioned system obtained usingthe SIMO LMS
0 1000 2000 3000 4000 5000 6000 7000 8000−70
−60
−50
−40
−30
−20
−10
0
Sample
Mis
alig
nmen
t dB
Normalized Projection Misalignment
VSS−UMCLMSMCLMS µ = 0.025MCLMS µ = 0.01MCN ρ = 0.5
0 1000 2000 3000 4000 5000 6000 7000 80000
1
2
3
4
5
6x 10
6
Sample
Mag
nitu
de
Error
VSS−UMCLMSMCLMS µ = 0.025MCLMS µ = 0.01MCN ρ = 0.5
Figure 4.5: Algorithm performance in badly conditioned 2 channel, 3-tap system, SNR =40dB
70
1 1.5 2 2.5 3−2
−1.5
−1
−0.5
0
0.5
1h1 to be identified
1 1.5 2 2.5 3−2
−1.5
−1
−0.5
0
0.5
1h2 to be identified
1 1.5 2 2.5 3−0.3
−0.2
−0.1
0
0.1
0.2h1 estimate
1 1.5 2 2.5 3−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15h2 estimate
Figure 4.6: Scaled impulse response estimates of a badly conditioned system obtained usingthe Multichannel Newton Algorithm
Algorithm performance with random 16 tap channel impulse responses
The following algorithms were tested using random 16 tap channel responses for a 3 channel
system : MCN, VSS-UMCLMS, MCLMS, and FNMCLMS. Figure 4.7 shows the perfor-
mance of the four algorithms. It can be seen that three of the algorithms perform satis-
factorily with NPM between -38dB and -48dB with a signal to noise ratio of 40dB. The
multichannel newton algorithm diverged in this simulation. Figure 4.8 shows the scaled
estimated impulse responses obtained using FNMCLMS.
Algorithm performance with higher order impulse responses
Time domain algorithms were very computationally intensive with longer channel impulse
responses. The time domain multichannel newton and frequency domain normalized mul-
tichannel newton algorithms were observed to estimate the channel impulse response. The
focus was on the frequency domain normalize multichannel LMS. Simulated room impulse
responses were also used to observe the performance of the algorithms. The room impulse
response generator, using the image method[15], with the implementation in [16] was used
to obtain the room impulse responses. The following parameters were used in this algo-
71
0 0.5 1 1.5 2 2.5 3
x 104
−60
−50
−40
−30
−20
−10
0
Sample
Mis
alig
nmen
t dB
Normalized Projection Misalignment
VSS−UMCLMSMCLMS µ = 0.0075FNMCLMS ρ
f = 1.2
MCN ρ = 0.5
0 0.5 1 1.5 2 2.5 3
x 104
0
0.5
1
1.5
2
2.5x 10
11
Sample
Mag
nitu
de
Error
VSS−UMCLMSMCLMS µ = 0.025MCLMS µ = 0.01MCN ρ = 0.5
Figure 4.7: Performance of algorithms on 16 tap 3-channel system
2 4 6 8 10 12 14 16
−1.5
−1
−0.5
0
0.5
1
1.5
h1 to be identified
2 4 6 8 10 12 14 16
−1
−0.5
0
0.5
1
h2 to be identified
2 4 6 8 10 12 14 16
−2
−1.5
−1
−0.5
0
0.5
1
1.5
h3 to be identified
2 4 6 8 10 12 14 16
−0.2
−0.1
0
0.1
0.2
0.3
h1 estimate
2 4 6 8 10 12 14 16
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
h2 estimate
2 4 6 8 10 12 14 16
−0.3
−0.2
−0.1
0
0.1
0.2
h3 estimate
Figure 4.8: Scaled filter estimates using the blind identification methods for a 16 tap3-channel system
72
rithm. The algorithm was able to attain a NPM of -11dB using a SNR of 10dB, for a 3
channel system with 256 taps. The input parameters used to generate the room impulse
response are shown below:
sv = 340; % Sound velocity (m/s)
fs = 8000; % Sample frequency (samples/s)
r = [2 1.5 2 ; 1 1.5 2; 1.5 1.5 1.5];
% Receiver positions
[x_1 y_1 z_1 ; x_2 y_2 z_2] (m)
s = [2 3.5 2]; % Source position [x y z] (m)
L = [5 4 6]; % Room dimensions [x y z] (m)
c = 0.3; % Reverberation time (s)
n = 296; % Number of samples
mtype = ’omnidirectional’; % Type of microphone
order = -1; % -1 equals maximum reflection order!
dim = 3; % Room dimension
orientation = 0; % Microphone orientation (rad)
hp_filter = 1; % Enable high-pass filter
The performance of the FNMCLMS algorithm is shown in figure 4.9, and the corre-
sponding estimated and scaled filter coefficients are shown in figure 4.10
200 400 600 800 1000 1200 1400 1600 1800 2000−12
−10
−8
−6
−4
−2
0
Samples
Nor
mal
ized
Pro
ject
ion
Mis
alig
nmen
t
Figure 4.9: Normalized Projection Misalignment using a Simulated Room Response, 10dBSNR
73
0 50 100 150 200 250−0.02
−0.01
0
0.01
0.02
0.03
0.04h1 to be identified
0 50 100 150 200 250−0.02
−0.01
0
0.01
0.02
0.03h2 to be identified
0 50 100 150 200 250−0.01
0
0.01
0.02
0.03
0.04h3 to be identified
0 50 100 150 200 250−0.02
0
0.02
0.04
0.06
0.08h1 estimate
0 50 100 150 200 250−0.02
−0.01
0
0.01
0.02
0.03
0.04
0.05h2 estimate
0 50 100 150 200 250−0.02
0
0.02
0.04
0.06
0.08h3 estimate
Figure 4.10: Impulse Response Estimates of Simulated Room Response
74
CHAPTER 5
Conclusion
This work has been a study of algorithms suitable for dereverberation of room acoustics.
The focus was on 2-stage dereverberation based on blind identification and channel in-
version with the MINT method. A representation of this approach to dereverberation is
shown in figure 5.1, where Stage 1 illustrates identification. In a real world scenario, blind
identification will be usually required, as the source signal x is usually unknown. In Stage
2, channel inversion using the LS or the MINT method, followed by filtering is executed.
Estimates of the room transfer functions between the source and each microphone denoted
by h1, h2 and h3, are obtained in Stage 1. These estimates are denoted h1, h2, and h2.
These room transfer functions, which are not usually minimum phase, are then inverted to
give the inverse responses denoted by h1, h2 and h3. With the obtained inverse response,
the source signals are filtered to recover the original input signal x. Various algorithms were
analyzed for blind system performance, with further study of the frequency domain un-
normalized multichannel LMS algorithm. Blind identification was possible with truncated
simulated impulse responses. The early part of the simulated impulse response required to
be truncated for the algorithm to perform satisfactorily. This implies that the algorithms
are not suitable for real time operation of unknown room impulse responses. The identi-
fied impulse responses were inverted using the MINT method. The complete gave an SDR
improvement from 0.3dB to 7.3dB in a system with an SNR of 10dB, as shown in figure
75
Identification
Non-Blind / BlindChannel Inversion
MINT / LSInverse Filteringx
x1
x2
x3
h1
h2
h3
h1
h2
h3
x1x2x3
h1
h2
h3
x
^
^
^
Stage 1 Stage 2
Figure 5.1: System Diagram
5.2. Currently, there is much ongoing research in the dereverberation of room acoustics,
and several methods have been used to improve the convergence properties of the blind
identification algorithms. Thus there is a strong possibility of better identification. Also,
the algorithms diverge after prolonged operation, and many methods have been suggested
to improve the long term performance.
76
0 0.2 0.4 0.6 0.8 1−8.5
−8
−7.5
−7
−6.5
−6
−5.5
−5
−4.5
Normalized Frequency (×π rad/sample)
Pow
er/fr
eque
ncy
(dB
/rad
/sam
ple)
Speaker Signal
0 0.2 0.4 0.6 0.8 1−40
−35
−30
−25
−20
−15
−10
Normalized Frequency (×π rad/sample)
Pow
er/fr
eque
ncy
(dB
/rad
/sam
ple)
Microphone 1 Signal
0 0.2 0.4 0.6 0.8 1−40
−35
−30
−25
−20
−15
−10
Normalized Frequency (×π rad/sample)
Pow
er/fr
eque
ncy
(dB
/rad
/sam
ple)
Microphone 2 Signal
0 0.2 0.4 0.6 0.8 1−40
−35
−30
−25
−20
−15
−10
Normalized Frequency (×π rad/sample)
Pow
er/fr
eque
ncy
(dB
/rad
/sam
ple)
Microphone 3 Signal
0 0.2 0.4 0.6 0.8 1−40
−35
−30
−25
−20
−15
−10
Normalized Frequency (×π rad/sample)
Pow
er/fr
eque
ncy
(dB
/rad
/sam
ple)
Dereverberated Signal
Figure 5.2: System performance
77
CHAPTER 6
Matlab Scripts
6.1 Two Channel Blind Identification : 3-tap channels
clear all; close all;
L = 4;
N = 2;
theta = pi/10;
v = pi;
h1 = [1 -2*cos(theta) 1]’;
h2 = [1 -2*cos(theta+v) 1]’;
h1 = [h1; zeros(L-length(h1),1)];
h2 = [h2; zeros(L-length(h2),1)];
x1 = wgn(100000, 1, 40, ’real’);
x2 = wgn(100000, 1, 40, ’real’);
78
x1_filt = filter(h1, 1, x1);
x2_filt = filter(h2, 1, x2);
x = [x1_filt x2_filt];
h = ones(L,N);
e = zeros(N,N);
H_model = [h1;h2];
npm = []; x_chi_s = [];
[m,n] = size(h);
pos = L;
for k = 1:9000
for i = 1:N
for j = 1:N
e(i,j) = (x(pos:-1:pos-L+1,i)’*h(:,j)) - (x(pos:-1:pos-L+1,j)’*h(:,i));
end
end
x_chi = 0;
for i = 1:N-1
for j = i+1:N
x_chi = x_chi + e(i,j)^2;
end
end
x_chi_s = [x_chi_s x_chi];
xtemp = x(pos:-1:pos-L+1,:);
rxx = rx(xtemp);
pos = pos+1;
h_long = reshape(h,m*n,1);
J = 2*rxx*h_long / (norm(h_long))^2;
mu = (h_long.’*J)/((norm(J)).^2);
h_long = h_long - mu*J;
npm_temp = H_model - (((H_model’*h_long)/(h_long’*h_long))*h_long);
79
npm = [npm norm(npm_temp)/norm(H_model)];
h = reshape(h_long,m,n);
end
figure;
subplot(2,2,1);plot(abs(fft(h1,512)).^2);title(’h1 response’);
subplot(2,2,2);plot(abs(fft(h2,512)).^2);title(’h2 response’);
subplot(2,2,3);plot(abs(fft(h(:,1),512)).^2);title(’h1 estimate’);
subplot(2,2,4);plot(abs(fft(h(:,2),512)).^2);title(’h2 estimate’);
figure; plot(10*log10(npm(1:300)));
6.2 Identification with the NLMS Algorithm
clear all; close all;
amethyst_1m_start = 11820; amethyst_1m_stop = 102630;
amethyst_2m_start = 11833; amethyst_2m_stop = 104400;
amethyst_3m_start = 11820; amethyst_3m_stop = 103170;
smaragden_1m_start = 2750; smaragden_1m_stop = 128790;
smaragden_2m_start = 6341; smaragden_2m_stop = 132340;
smaragden_3m_start = 11820; smaragden_3m_stop = 103200;
ali1m_start = 6300; ali1m_stop = 92000;
ali2m_start = 1; ali2m_stop= 90000;
ali3m_start = 1; ali3m_stop = 90000;
[xx, dd1, dd2, dd3, dd4, dd5, dd6, dd7] = xload(’amethyst1m’);
start = amethyst_1m_start; stop = amethyst_1m_stop;
x = xx(start:stop); d1 = dd1(start:stop); d2 = dd2(start:stop);
d3 = dd3(start:stop); d4 = dd4(start:stop); d5 = dd5(start:stop);
d6 = dd6(start:stop); d7 = dd7(start:stop);
DD = [dd1 dd2 dd3 dd4 dd5 dd6 dd7];
s=5000;%length of the adaptive filter
beta = 0.3;
80
D = [d1 d2 d3 d4 d5 d6 d7];
[m,n] = size(D);
ERROR = zeros(n,1); ENERGY = zeros(m,n);
W = zeros(s,n); RT60 = zeros(s,n);
min_error = zeros(1,7);
sig_variance = zeros(1,7);
noise_variance = zeros(1,7);
[ERROR, ENERGY, W, RT60] = batchnlms(beta,s,x,D);
for i = 1:7
d_ = DD(:,i);
sig_variance(i) = var(d_(12030:35400));
noise_variance(i) = var(d_(512:5761));
min_error(i) = 10*log10((noise_variance(i))/ sig_variance(i));
end
figure;
position = 0;
for i = 1:7
subplot(2,2,i-position);plot(ENERGY(:,i));
title([’Channel = ’,num2str(i),’ Min Error = ’,...
num2str(min_error(i)),’ Observed Error = ’,num2str(ERROR(i))]);
xlabel([’Filter tap #’]);
ylabel(’W^2’);
axis([ 0 5000 -140 1])
if(i == 4)
figure;
position = 4;
end
end
figure;
position = 0;
for i = 1:7
81
subplot(2,2,i-position);plot(W(:,i));
title([’Channel = ’,num2str(i),’ Min Error = ’,...
num2str(min_error(i)),’ Observed Error = ’,num2str(ERROR(i))]);
xlabel([’Filter tap #’]);
ylabel(’W’);
axis([ 0 5000 -0.1 0.16])
if(i == 4)
figure;
position = 4;
end
end
figure;
position = 0;
for i = 1:7
subplot(2,2,i-position);plot(RT60(:,i));
title([’Channel = ’,num2str(i),’ Min Error = ’,...
num2str(min_error(i)),’ Observed Error = ’,num2str(ERROR(i))]);
xlabel([’Fiter tap #’]);
ylabel(’Averaged W^2’);
if(i == 4)
figure;
position = 4;
end
end
6.3 Normalized LMS
function [error, energy, w1, rt60, e] = nlms2(beta,s,x1,d)
w1=zeros(s,1);
%pre allocating our variables
e=zeros(length(x1),1);
y1=zeros(length(x1),1);
82
mu=zeros(length(x1),1);
rt=zeros(s,1);
% beta=0.3;
%adding zero in the beginig of our input signal
% for performing adaptive algorithm
X1=[zeros((s-1),1); x1];
for i=s:length(X1)
%calculating our mu based on the input power
mu(i-s+1)=beta/(X1(i-s+1:i)’*X1(i-s+1:i)+eps);
y1(i-s+1)=w1’*X1(i:-1:i-s+1);
e(i-s+1)=d(i-s+1)-y1(i-s+1);
w1=w1+mu(i-s+1)*e(i-s+1)*X1(i:-1:i-s+1);
end
error=10*log10(var(e(end-5000:end))/var(d));
%ploting the normalized in energy in our impulse response
energy = 10*log10(eps+abs(w1/max(abs(w1))).^2);
for i=1:length(w1)/10
if abs(w1(i))==max(abs(w1))
j=i;
end
end
%calculating the RT60
for i=1:s
if i<100
rt(i)=1;
else
rt(i)=sum((w1(i-100+1:i)/max(abs(w1))).^2)/100;
end
end
% figure;
rt60 = (10*log10(eps+rt));
83
6.4 Single Channel Dereverberation with NLMS
clear all;
load(’sg2’);%g1 is the room impulse response
W = w1;
for j=1:1
g1=W(:,j);
delay=3100;
s=6000;
smaragden_1m_start = 2750; smaragden_1m_stop = 128790;
d_1 = wavread(’smaragden1m\Track 1.wav’);
d1 = d_1(2750:128790).’;%the desired signal
x=filter(g1,1,d1);%x is the input data for our adaptive algorithem
%synchronising the delayed desire(d) signal to input signal(x)
d = d1(1:end-delay).’;
%length of the adaptive filter
%initializing our filter length
w1=zeros(1,s);
%pre allocating our variables
e=zeros(1,length(d));
y1=zeros(1,length(d));
mu=zeros(1,length(d));
beta=.65;
%adding zero in the beginig of our input signal for performing adaptive algorithm
% X1=[zeros(1,(s-1), x1];
X1=[zeros(1,(s-delay-2)), x];
for i=s:length(X1)
%calculating our mu based on the input power
mu(i-s+1)=beta/(X1(i-s+1:i)*X1(i-s+1:i)’+eps);
y1(i-s+1)=w1*X1(i:-1:i-s+1)’;
e(i-s+1)=d(i-s+1)-y1(i-s+1);
w1=w1+mu(i-s+1)*e(i-s+1)*X1(i:-1:i-s+1);
end
error(j)=10*log10(var(e(end-5000:end))/var(d));
winv(:,j)=w1;
84
end
figure;
position = 0;
plot(conv(W(:,1),winv(:,1)));
title([’Channel=’,num2str(i)]);
ylabel(’amplitude’)
6.5 System Identification using NLMS
function [ERROR, ENERGY, W, RT60] = batchnlms(beta,s,x,D)
[m,n] = size(D);
for i = 1:n
[ERROR(i), ENERGY(:,i), W(:,i), RT60(:,i)] = nlms2(beta,s,x,D(:,i));
end
6.6 Blind SIMO LMS Well Conditioned Inputs
clear all; close all;
L = 3;
N = 2;
theta = pi/10;
v = pi;
mu = 0.005;
runs = 300;
source_length = 100000;
h1 = [1 -2*cos(theta) 1]’;
h2 = [1 -2*cos(theta+v) 1]’;
ensemble = 1;
hnew = 0; npm_new = 0; x_chi_s_new = 0;
hmcnew = 0; npmmcnew = 0; x_chi_s_mc_new = 0;
hmcnew_01_new = 0; npmmc_01_new = 0;
x_chi_s_mc_01_new = 0;
85
for i = 1:ensemble
x1 = wgn(source_length, 1, 0, ’real’);
xsig1 = filter(h1, 1, x1);
xsig2 = filter(h2, 1, x1);
meanstd = mean([std(xsig1), std(xsig2)]);
noise_snr = 40;
x1_filt = xsig1 + randn(source_length,1)*meanstd*10^(-noise_snr/20);
x2_filt = xsig2 + randn(source_length,1)*meanstd*10^(-noise_snr/20);
x = [x1_filt x2_filt];
[h,npm,x_chi_s] = vssumclms(x,L,runs,h1,h2);
mu = 0.005;
[hmc,npmmc,x_chi_s_mc] = mclms(x,L,mu,runs,h1,h2);
mu = 0.01;
[hmc_01,npmmc_01,x_chi_s_mc_01] = mclms(x,L,mu,runs,h1,h2);
hnew = h + hnew;
npm_new = npm + npm_new;
x_chi_s_new = abs(x_chi_s) + x_chi_s_new;
hmcnew = hmc + hmcnew; npmmcnew = npmmc + npmmcnew;
x_chi_s_mc_new = abs(x_chi_s_mc) + x_chi_s_mc_new;
hmc_01_new = hmc_01 + hmcnew_01_new; npmmc_01_new = npmmc_01 + npmmc_01_new;
x_chi_s_mc_01_new = abs(x_chi_s_mc_01) + x_chi_s_mc_01_new;
end
npmvss = npm_new/ensemble; npmmc = npmmcnew/ensemble;
npmmc_01 = npmmc_01_new/ensemble;
x_chi_s_vss = x_chi_s_new/ensemble;
xchi_s_mc = x_chi_s_mc_new/ensemble;
86
x_chi_s_mc_01 = x_chi_s_mc_01_new/ensemble;
hvss = h;
figure; subplot(2,1,1);
h = hnew/ensemble;
plot(1:runs,20*log10(npmvss),1:runs,20*log10(npmmc),1:runs,20*log10(npmmc_01));
legend(’VSS-UMCLMS’,’MCLMS \mu = 0.005’,’MCLMS \mu = 0.01’);
title(’Normalized Projection Misalignment’); xlabel(’Sample’);
ylabel(’Misalignment dB’);
subplot(2,1,2);
plot(1:runs,x_chi_s_vss,1:runs,x_chi_s_mc,1:runs,x_chi_s_mc_01);
legend(’VSS-UMCLMS’,’MCLMS \mu = 0.005’,’MCLMS \mu = 0.01’);
title(’Error’);xlabel(’Sample’);
ylabel(’Magnitude’);
figure; subplot(2,2,1);stem(h1,’-x’);title(’h1 to be identified’);
subplot(2,2,2);stem(h2,’-x’);title(’h2 to be identified’);
subplot(2,2,3);stem(hvss(:,1),’-x’);title(’h1 estimate’);
subplot(2,2,4);stem(hvss(:,2),’-x’);title(’h2 estimate’);
6.7 Blind SIMO LMS Bad Conditioned Inputs
clear all; close all;
L = 3;
N = 2;
theta = pi/10;
v = pi/10;
mu = 0.025;
rho = 0.5;
h1 = [1 -2*cos(theta) 1]’;
h2 = [1 -2*cos(theta+v) 1]’;
source_length = 100000;
x1 = wgn(source_length, 1, 0, ’real’);
xsig1 = filter(h1, 1, x1);
xsig2 = filter(h2, 1, x1);
87
meanstd = mean([std(xsig1), std(xsig2)]);
noise_snr = 40;
x1_filt = xsig1 + randn(source_length,1)*meanstd*10^(-noise_snr/20);
x2_filt = xsig2 + randn(source_length,1)*meanstd*10^(-noise_snr/20);
x = [x1_filt x2_filt];
runs = 8000;
[hvss,npmvss,x_chi_s_vss] = vssumclms(x,L,runs,h1,h2);
[hmc,npmmc,x_chi_s_mc] = mclms(x,L,mu,runs,h1,h2);
mu = 0.01;
[hmc_01,npmmc_01,x_chi_s_mc_01] = mclms(x,L,mu,runs,h1,h2);
[hnewton,npmnewton,x_chi_s_newton] = newtonmclms_t(x,L,mu,runs,h1,h2);
figure; subplot(2,1,1);
plot(1:runs,20*log10(npmvss),1:runs,20*log10(npmmc),1:runs,20*log10(npmmc_01),...
1:runs,20*log10(npmnewton));
legend(’VSS-UMCLMS’,’MCLMS \mu = 0.025’,’MCLMS \mu = 0.01’,’MCN \rho = 0.5’);
title(’Normalized Projection Misalignment’); xlabel(’Sample’);
ylabel(’Misalignment dB’);
subplot(2,1,2);
plot(1:runs,abs(x_chi_s_vss),1:runs,abs(x_chi_s_mc),1:runs,abs(x_chi_s_mc_01),...
1:runs,abs(x_chi_s_newton));
legend(’VSS-UMCLMS’,’MCLMS \mu = 0.025’,’MCLMS \mu = 0.01’,’MCN \rho = 0.5’);
title(’Error’);xlabel(’Sample’);
ylabel(’Magnitude’);
figure; subplot(2,2,1);stem(h1,’-x’);title(’h1 to be identified’);
subplot(2,2,2);stem(h2,’-x’);title(’h2 to be identified’);
subplot(2,2,3);stem(hvss(:,1),’-x’);title(’h1 estimate’);
subplot(2,2,4);stem(hvss(:,2),’-x’);title(’h2 estimate’);
88
BIBLIOGRAPHY
[1] F.Everest. The Master Handbook of Acoustics. McGraw-Hill, 1994.
[2] Thomas Funkhouser, Jean-Marc Jot, and Nicolas Tsingos. Sounds good to me.
SIGGRAPH 2002.
[3] Encyclopaedia Britannica Online. Acoustics. 2007 (Retrieved April 3, 2007).
[4] Yiteng Huang, Jacob Benesty, and Jingdong Chen. Acoustic MIMO Signal Processing
(Signals and Communication Technology). Springer-Verlag New York, Inc., Secaucus,
NJ, USA, 2006.
[5] M. H. (Monson H.) Hayes. Statistical digital signal processing and modeling. 1996.
[6] Sen M. Kuo and Dennis Morgan. Active Noise Control Systems: Algorithms and DSP
Implementations. John Wiley & Sons, Inc., New York, NY, USA, 1995.
[7] Osunkunle Biodun Isaac, Sani AlMoudarress, and Sayed Ali Shekarchi. Adaptive echo
cancellation implementation in matlab and dsp. Adaptive Signal Processing ETC004
- Blekinge Tekniska Hogskola.
[8] Stephen T. Neely and Jont B. Allen. Invertibility of a room impulse response. Journal
of the Acoustic Society of America 66(1), VOL. 66(1):165–169, JULY 1979.
89
[9] Bernard Widrow. Adaptive signal processing. Prentice-Hall, Upper Saddle River, New
Jersey 07458, 1985.
[10] IEEE MASATO MIYOSHI, Member and IEEE YUTAKA KANEDA, Member. In-
verse Filtering of Room Acoustics. IEEE TRANSACTIONS ON ACOUSTICS,
SPEECH, AND SIGNAL PROCESSING, VOL. 36(2):145–152, FEBRUARY 1988.
[11] Takafumi Hikichi, Marc Delcroix, and Masato Miyoshi. On robust inverse filter de-
sign for room transfer function fluctuations. European Signal Processing Conference
(EUSIPCO), 2006.
[12] S. Haykin. Adaptive Filter Theory. Prentice-Hall, Englewood Cliffs, NJ, 1986.
[13] G. Xu, H. Liu, L. Tong, and T. Kailath. A Least-Squares approach to blind channel
identification. IEEE Trans. Signal Processing, SP-43(12):2982–2993, December 1995.
[14] T.K. Moon and W.C Stirling. Mathematical Methods and Algorithms. Prentice-Hall,
Upper Saddle River, New Jersey 07458.
[15] Jont B. Allen and David A Berkley. Image method for efficiently simulating small-
room acoustics. Journal of the Acoustical Society of America, 65(4):943–950, 1979.
[16] ir. E.A.P. Habets. Room impulse response generator. Technische University
Eindhoven, The Netherlands, 2006.
90