complex processing
TRANSCRIPT
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 1/25
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011 5101
Complex-Valued Signal Processing: The Proper Way
to Deal With ImproprietyTülay Adalı , Fellow, IEEE , Peter J. Schreier , Senior Member, IEEE , and Louis L. Scharf , Life Fellow, IEEE
Abstract— Complex-valued signals occur in many areas of science and engineering and are thus of fundamental interest.In the past, it has often been assumed, usually implicitly, that
complex random signals are proper or circular. A proper complex
random variable is uncorrelated with its complex conjugate,and a circular complex random variable has a probability dis-tribution that is invariant under rotation in the complex plane.While these assumptions are convenient because they simplify
computations, there are many cases where proper and circularrandom signals are very poor models of the underlying physics.
When taking impropriety and noncircularity into account, theright type of processing can provide significant performance
gains. There are two key ingredients in the statistical signal pro-cessing of complex-valued data: 1) utilizing the complete statistical
characterization of complex-valued random signals; and 2) theoptimization of real-valued cost functions with respect to complex
parameters. In this overview article, we review the necessary tools,among which are widely linear transformations, augmented sta-tistical descriptions, and Wirtinger calculus. We also present someselected recent developments in the field of complex-valued signal
processing, addressing the topics of model selection, filtering, andsource separation.
Index Terms— CR calculus, estimation, improper, independent
component analysis, model selection, noncircular, widely linear,Wirtinger calculus.
I. I NTRODUCTION
C OMPLEX-VALUED signals arise in many areas of sci-
ence and engineering, such as communications, electro-
magnetics, optics, and acoustics, and are thus of fundamental
Manuscriptreceived November 09,2010; revised April 09,2011 and June 30,2011; accepted July 01, 2011. Date of publication July 25, 2011; date of currentversion October 12, 2011. The associate editor coordinating the review of thismanuscript and approving it for publication was Prof. Jean Pierre Delmas. Thework of T. Adalı was supported by the NSF Grants NSF-CCF 0635129 and NSF-IIS 0612076 ; the work of P. J. Schreier was supported by the AustralianResearch Council (ARC) under Discovery Project Grant DP0986391; and thework of L. L. Scharf was supported by NSF Grant NSF-CCF 1018472 andAFOSR Grant FA 9550-10-1-0241. The authors gratefully acknowledge kind permission from Wiley Interscience to use some material from Chapter 1 of the book Adaptive Signal Processing: Next Generation Solutions, and from Cam- bridge University Press to use some material from the book Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals.
T. Adalı is with the Department of Computer Science and Electrical Engi-neering, Universityof Maryland, BaltimoreCounty, Baltimore,MD 21250USA(e-mail: [email protected]).
P. J. Schreier is with the Signal and System Theory Group, Universität Pader- born, 33098 Paderborn, Germany (e-mail: [email protected]).
L. L. Scharf is with the Department of Electrical and Computer Engineering,Colorado State University Ft. Collins, CO 80523 USA (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSP.2011.2162954
interest. In the past, it has often been assumed—usually im-
plicitly—that complex random signals are proper or circular . A
proper complex random variable is uncorrelated with its com-
plex conjugate, and a circular complex random variable has a
probability distribution that is invariant under rotation in the
complex plane. These assumptions are convenient because they
simplify computations and, in many respects, make complex
random signals look and behave like real random signals. While
these assumptions can often be justified, there are also many
cases where proper and circular random signals are very poor
models of the underlying physics. This fact has been known andappreciated by oceanographers since the early 1970s [80], but
it has only more recently started to influence the thinking of the
signal processing community.
In the last two decades, there have been important advances
in this area that show impropriety and noncircularity as an im-
portant characteristic of many signals of practical interest. When
taking impropriety andnoncircularity intoaccount,the right type
of processing can provide significant performance gains. For in-
stance, in mobile multiuser communications, it canenablean im-
provedtradeoff between spectralef ficiencyand power consump-
tion. Important examples of digitalmodulation schemes thatpro-
duce improper complex baseband signals are: binary phase shiftkeying (BPSK), pulse amplitude modulation (PAM), Gaussian
minimum shift keying (GMSK), offset quaternary phase shift
keying (OQPSK), and baseband (but not passband) orthogonal
frequency division multiplexing (OFDM), which is commonly
called discrete multitone (DMT) [119].A small sample of papers
exploiting impropriety of these signals is [22], [31], [45], [46],
[57], [62], [78], [79], [83], [85], [123], [138], [140], and [23].
Improper baseband communication signals can also arise due to
imbalance between their in-phase and quadrature (I/Q) compo-
nents. I/Q imbalance degrades the signal-to-noise-ratio (SNR)
and thus bit error rate performance. Some papersproposing ways
of compensating I/Q imbalance in different types of communica-
tionsystems include[15], [110],and [142]. Techniques for wide-
band system identification when the system (e.g., a wide-band
wireless communication channel) is not rotationally invariant
are presented in [81] and [82].
In array processing, a typical goal is to estimate the direction
of arrival (DOA) of one or more signals-of-interest impinging
on a sensor array. If the signals-of-interest or the interference are
improper (as they would be if they originated, e.g., from a BPSK
transmitter), this can be exploited to achieve higher DOA reso-
lution or higher signal-to-interference-plus-noise ratio (SINR).
Some papers addressing the adaptation of array processing al-
gorithms to improper signals include [25], [26], [36], [49], [77],
and [107].
1053-587X/$26.00 © 2011 IEEE
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 2/25
5102 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
Data-driven methods for signal processing, in particular la-
tent variable analysis (LVA), is another area where exploiting
impropriety and noncircularity has led to important advances.
In LVA, much interest has centered around independent com-
ponent analysis (ICA), where the multivariate data are decom-
posed into additive components that are as independent as pos-
sible. It has been shown that, under certain conditions, ICA
can be achieved by exploiting impropriety [41], [63]. Signif-
icant performance gains are noted when algorithms explicitly
take the noncircular nature of the data into account [7], [69],
[87], [88], [96], [141]. Among the many applications of ICA,
medical data analysis and communications have been two of
the most active. In [141], noncircularity is exploited for feature
extraction in electrocardiograms (ECGs). In [67], [68], noncir-
cularity is used in the analysis of functional magnetic resonance
imaging (fMRI) data, leading to improved estimation of neural
activity.
There are two key ingredients in the statistical signal pro-
cessing of complex-valued data: 1) utilizing the complete
statistical characterization of complex-valued random signals;and 2) the optimization of real-valued cost functions with
respect to complex parameters. With respect to 1), Brown
and Crane [20] in 1969 were among the first to consider the
complementary correlation, which is the correlation of a com-
plex signal with its complex conjugate. They also introduced
linear-conjugate linear (or widely linear) transformations that
allow access to the information contained in this correlation.
At first, the impact of this paper was limited. It took until the
1990s before a string of papers showed renewed interest in this
topic. Many were from the French school of signal processing.
Among the important early contributions were the fundamental
studies by Comon, Duvaut, Amblard, and Lacoume [10],[32], [39], [60], the derivation of the widely linear minimum
mean-square error filter by Picinbono and Chevalier [103],
the work on higher order statistics by Lacoume, Gaeta, and
Amblard [11], [12], [43], [61], [76], [109], the introduction of
the noncircular Gaussian distribution by van den Bos [129],
and the derivation of the optimum Wiener filter for cyclosta-
tionary complex processes by Gardner [44]. We should also
mention the more mathematically oriented work by Vakhania
and Kandelaki [125] on random vectors with values in complex
Hilbert spaces.
The results concerning the optimization of real-valued func-
tions with complex arguments are much older still, although
they have gone largely unnoticed by the engineering commu-
nity. It was already in 1927 that Wirtinger proposed generalized
derivatives of nonholomorphic (including real-valued) func-
tions with respect to complex arguments [137]. However,
differentiating such functions separately with respect to real
and imaginary parts—and using varying definitions for the
derivatives—has been the common practice. This approach is
tedious, and in an attempt to make the terms and analysis more
manageable, simplifying assumptions are usually introduced.
A common such assumption is circularity. Wirtinger calculus
(also called the -calculus [59]) presents an elegant alter-
native, which allows keeping all computations in the complex
domain. In the engineering community, Wirtinger calculus wasrediscovered by Brandwood in 1983 [19], and then further
developed by van den Bos, albeit for gradient and Hessian
formulations in [126], [128]. The extensions of gradient
and Hessian expressions to are more recent [59], [64], [65].
Our goal in this article is to review all these fundamental re-
sults, and to present some selected recent developments in the
field of complex-valued signal processing. Naturally, in a field
as wide as this, one faces the dif ficult decision of what topics to
cover. We have decided to focus on three key problems, namely
model selection, filtering, and source separation. Some of these
results have been drawn from the first author’s chapter in the
edited book [5], and the second and third authors’ book [115],
which provides a comprehensive review of this field. We thank
Wiley Interscience and Cambridge University Press for their
kind permission to use some material from these books. An-
other recent book on this topic is [75], which has a focus on
neural networks.
We also provide some critical perspectives in this paper. As is
often the case when new tools are introduced, there may also be
someoverexcitement and abuse. We comment on several points:
There are certain instances where the use of improper/noncir-cular models is simply unnecessary and will not improve per-
formance. The fact that a signal is improper or noncircular does
not guarantee that this can be exploited to improve a metric
such as mean-square error. Moreover, employing improper/non-
circular models can even be disadvantageous because they are
more complex than proper/circular models. Performance should
be defined in a broader sense that includes not only optimality
with respect to a selected metric, but also other properties of
the algorithm, such as robustness and convergence speed. So an
improvement in mean-square error performance may come at
the price of less robust and more slowly converging iterative al-
gorithms. Therefore, the question of whether to use proper/cir-cular or improper/noncircular models requires careful deliber-
ation. We show that circular models may be preferable when
the degree of impropriety/noncircularity is small, the number of
samples is small, or the signal-to-noise ratio is low. We provide
a number of concrete examples that clarify these points.
In terms of terminology, a general question that has di-
vided researchers is how to extend definitions from the real
to the complex case. For example, since the complementary
covariance has traditionally been ignored, definitions of uncor-
relatedness and orthogonality are given new names when the
complementary covariance is taken into account. This practice
leads to some counterintuitive results such as “two uncorrelated
Gaussian random vectors need not be independent” (because
their complementary covariance matrix need not be diagonal).
We would like to avoid such displeasing results, and therefore
always adhere to the following general principle: De finitions
and conditions derived for the real and complex domains should
be equivalent . This means that complementary covariances
must be considered if they are nonzero.
The rest of the paper is organized as follows: In the next
section, we provide a review of the tools needed for 1) utilizing
the complete statistical characterization of complex-valued
random signals and 2) the optimization of real-valued cost
functions with respect to complex parameters. These are widely
linear transformations, augmented statistical descriptions, andWirtinger calculus. Section III addresses the question of how
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 3/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5103
to detect whether a given signal is circular or noncircular,
and more generally, how to detect the number of circular and
noncircular signals in a given signal subspace. Section IV
discusses widely linear estimation, which takes the information
in the complementary covariance into account, and Section V
deals with ICA. We hope that our paper will illuminate the role
impropriety and noncircularity play in signal processing, and
demonstrate when and how they should be taken into account.
II. PRELIMINARIES
A. Wirtinger Calculus
In statistical signal processing, we often deal with a real-
valued cost function, such as a likelihood function or a quadratic
form, which is then either analytically or numerically optimized
with respect to a vector or matrix of parameters. What happens
when the parameters are complex-valued? That is, how do we
differentiate a real-valued function with respect to a complex
argument?
Consider first a complex-valued function
, where . The classical definition of com-
plex differentiability requires that the derivatives defined as the
limit
(1)
are independent ofthe direction in which approaches0 inthe
complex plane. This requires that the Cauchy–Riemann equa-
tions [3], [5]
and (2)
be satisfied. These conditions are necessary for to be
complex-differentiable. If the partial derivatives of
and are continuous, then they are suf ficient as well.
A function that is complex-differentiable on its entire domain
is called holomorphic or analytic. Obviously, since real-valued
cost functions have , the Cauchy–Riemann
conditions do not hold, and hence cost functions are not ana-
lytic. Indeed, the Cauchy–Riemann equations impose a rigid
structure on and and thus . A simple
demonstration of this fact is that either or
alone suf fices to express the derivatives of an analytic function.
The usual approach to overcome this basic limitation is toevaluate separate derivatives with respect to the real and imagi-
nary parts of a nonanalytic function, which may then be stacked
in a vector of twice the original dimension. In the end, this so-
lution is converted back into the complex domain. Needless to
say, this approach is cumbersome. In some cases (for example,
[5]), it even requires additional assumptions (for
the example, analytic ) to make it work.
Wirtinger calculus [137]—also called the calculus
[59]—provides a framework for differentiating nonanalytic
functions. Importantly, it allows performing all the deriva-
tions in the complex domain, in a manner very similar to the
real-valued case. All the evaluations become quite straightfor-
ward making many tools and methods developed for the real
case readily available for the complex case. By keeping the
expressions simple, we also eliminate the need for simplifying,
but often inaccurate, assumptions such as circularity.
Wirtinger calculus only requires that be differentiable
when expressed as a function . Such a func-
tion is called real-differentiable. If and have
continuous partial derivatives with respect to and , is
real-differentiable. We now define the two generalized complex
derivatives
and (3)
which can be formally “derived” by writing and
and then using the chain rule [104]. These general-
ized derivatives can be formally implemented by regarding as
a bivariate function and treating and as indepen-
dent variables. That is, when applying , we take the deriva-
tive with respect to , while formally treating as a constant.
Similarly, yields the derivative with respect to , formally
regarding as a constant. Thus, there is no need to develop new
differentiation rules. This was shown in [19] in 1983 without
a specific reference to Wirtinger’s earlier work [137]. Interest-
ingly, many of the references that refer to [19] and use the gen-
eralized derivatives (3) evaluate them by computing derivatives
with respect to and , instead of and . This leads to un-
necessarily complicated derivations.
The Cauchy–Riemann equations can simply be stated as
. In other words, an analytic function cannot depend
on . If is analytic, then the usual complex derivative in (1)
and in (3) coincide. So Wirtinger calculus contains standard
complex calculus as a special case.
For real-valued , we have , i.e., the deriva-
tive and the conjugate derivative are complex conjugates of eachother. Because they are related through conjugation, we only
need to compute one or the other. As a consequence, a necessary
and suf ficient condition for real-valued to have a stationary
point is . An equivalent necessary and suf ficient condi-
tion is [19].
As an example for the application of Wirtinger derivatives,
consider the real-valued function
. We can evaluate by differentiating separately with re-
spect to and ,
(4)
or we can write the function as and
differentiate by treating as a constant,
(5)
The second approach is clearly simpler. It can be easily shown
that the two expressions, (4) and (5), are equal. However, while
the expression in (4) can easily be derived from (5), it is not
quite as straightforward the other way round. Because is
real-valued, there is no need to compute : it is simply the
conjugate of .
Evaluation of integrals, e.g., for calculating probabilities, is
also commonly encountered. The same observation that allows
the use ofa representation in the form and treating and
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 4/25
5104 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
as independent variables—when obviously they are not—can
also be made when calculating integrals. When we consider
as a function of real and imaginaryparts, the definition of an inte-
gral is well understoodasthe integral of thefunction in
a region defined on as . The inte-
gral , on the o ther hand, is not meaningful as
we cannot vary thetwo variables and independently, and we
cannot define the region corresponding to in the complex do-
main. However, using Green’s theorem [3] or Stokes’s theorem
[51], [52], we can write the real-valued integral as a contour in-
tegral in the complex domain [91]
(6)
where
Here, we have to assume that is continuous through
the simply connected region , and describes its contour.
By transforming the integral defined in the real domain to a con-tour integral in the complex domain, the formula shows the de-
pendence on the two variables and in a natural manner.
In [91], the application of the integral relationship in (6) is dis-
cussed in detail, along with examples, for evaluating probability
masses when is a probability density function. An-
other example is given in [7], where (6) is used to evaluate the
expectation of the score function with a generalized Gaussian
distribution.
Wirtinger calculus extends straightforwardly to functions
or . There is no need to develop
new differentiation rules for Wirtinger derivatives. All rules for
taking derivatives for real functions remain valid. However, caremust be taken to properly distinguish between the variables
with respect to which differentiation is performed and those
that are formally regarded as constants. So all the expressions
from the real-valued case given, for example, in [98], can be
straightforwardly applied to the complex case. For instance, for
, we obtain
and
There are a number of comprehensive references (e.g., [5], [42],
[59], and [115]) on Wirtinger calculus that deal with the chain
rule for nonanalytic functions, complex gradients and Hessians,
and complex Taylor series expansions.In summary, Wirtinger calculus defines a framework in which
the derivatives of nonanalytic functions can be computed in a
manner very similar to the real-valued or the analytic case. The
framework is general in that it also includes derivatives of an-
alytic functions as a special case. It allows keeping the com-
putations in and eliminates the need to double the dimen-
sionality as in the approach taken by [126]. Most of the tools
developed for real-valued signals can thus be easily extended
to the complex setting; see, e.g., [5], [65], and [66]. In this
overview paper, we use Wirtinger calculus for a straightforward
derivation of the complex least-mean-square (LMS) algorithm
in Section IV-C, and an extension of ICA, based on higher order
statistics, to complex data in Section V-B. Without Wirtinger
calculus, these extensions would be significantly more dif ficult.
B. Widely Linear Transformations, Inner Products, and
Quadratic Forms
Let us now explore the different ways that linear transforma-
tions can be described in the real and complex domains. In order
to do so, we construct three closely related vectors from two real
vectors and . The first is the real composite
-dimensional vector , obtained by stackingon top of . The second is the complex vector ,
and the third is the complex augmented vector ,
obtained by stacking on top of its complex conjugate . The
space of complex augmented vectors, whose bottom entries
are the complex conjugates of the top entries, is denoted by
. Augmented vectors are always underlined. In much of
our discussion, our focus will be on complex-valued quantities,
where we will be using and its augmentation .
The complex augmented vector isrelated to thereal
composite vector as and ,
where the real-to-complex transformation
(7)
is unitary up to a factor of 2, i.e., .
The complex augmented vector is obviously an equivalent
redundant, but convenient, representation of . When the size
of is clear, we may drop the subscript for economy.
If a real linear transformation is applied to
the real composite vector , it yields a real composite
vector ,
where . The augmented complex version of is
(8)
with . The matrix is called an
augmented matrix because it satisfies a particular block pattern,
where the SE block is the conjugate of the NW block, and the
SW block is the conjugate of the NE block:
(9)
Hence, is an augmented description of the widely linear [103]
or linear-conjugate-linear transformation
Even though the representation in (9) contains some redundancy
as the northern blocks determine the southern blocks, it proves
to be a powerful tool. For instance, it enableseasy concatenation
of widely linear transformations.
Obviously, the set of complex linear transformations,
with , is a subset of the set of widely linear
transformations. A complex linear transformation (sometimes
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 5/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5105
called strictly linear for emphasis) has the equivalent real
representation
(10)
To summarize, linear transformations on are linear on
only if they have the particular structure (10). Otherwise,
the equivalent operation on is widely linear. For analysis, it
is often preferable to represent -linear operations as -widely
linear operations. However, from a hardware implementation
point of view, -linear transformations are usually preferable
over -widely linear transformations because the former re-
quire fewer real operations (additions and multiplications) than
the latter.
Next, we look at the different representations of inner prod-
ucts and quadraticforms. Consider the two -dimensional real
composite vectors and , the
corresponding -dimensional complex vectors
and , and their complex augmented descriptions
and . We may now relate the inner prod-ucts defined on , , and as
Thus, the usual inner product defined on equals
(up to a factor of ) the inner product defined on ,
and also the real part of the usual inner product defined on
. Both the inner product on and the inner product on
are useful.
Another common real-valued expression is the quadratic
form , which may be written as a (real-valued) widely
quadratic form in :
The augmented matrix and the real matrix are connected
as before in (9). Thus, we obtain
C. Statistics of Complex-Valued Random Variables and Vectors
We define a complex random variable as ,
where and are a pair of real random variables. Simi-
larly, an -dimensional complex random vector is defined
as , where and are a pair of -dimen-sional real random vectors. Note that we do not differentiate
in notation between a random vector and its realization, as the
meaning should be clear from the context. The probability dis-
tribution (density) of a complex random vector is interpreted as
the -dimensional joint distribution (density) of its real and
imaginary parts. If the probability density function (pdf) exists,
we write this as
If there is no risk of confusion, the subscripts may also be
dropped. For a function whose domain includes
the range of , the expectation operator is defi
ned as
(11)
This integral can also be evaluated using contour integrals as in
(6) writing the function as and the pdf as .
Unless otherwise stated, we assume that all random vectors
have zero mean. In order to characterize the second-order statis-
tical properties of , we consider the real compositerandom vector . Its covariance matrix is
with , , and
. The augmented covariance matrix of is [101],
[114], [129]
(12)
The NW block of the augmented covariance matrix is the usual
(Hermitian and positive semi-definite) covariance matrix
(13)
and the NE block is the complementary covariance matrix
(14)
which uses a regular transpose rather than a Hermitian (conju-
gate) transpose. Other names for include pseudo-covari-
ance matrix [84], conjugate covariance matrix [44], or relation
matrix [102]. It is important to note that, in general, both
and are required for a complete second-order characteri-
zation of . In the important special case where the complemen-
tary covariance matrix vanishes, , is called proper ,
otherwise improper [84].
The conditions for propriety on the covariance and cross-co-
variance of real and imaginary parts and are
and . When is scalar,
then is necessary for propriety. If is proper, its
Hermitian covariance matrix is
and its augmented covariance matrix is block-diagonal. If
complex is proper and scalar, then .
It is easy to see that propriety is preserved by strictly linear
transformations, which are represented by block-diagonal aug-
mented matrices.
If is nonsingular, the following three conditions are
necessary and suf ficient for and to be covariance
and complementary covariance matrices of a complex random
vector [101]:
1) the covariance matrix is Hermitian and positive defi-
nite;
2) the complementary covariance matrix is symmetric,
;
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 6/25
5106 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
3) the Schur complement of within the augmented co-
variance matrix, , is positive semi-
definite.
Circularity: It is also possible to define a stronger version of
propriety in terms of the probability distribution of a random
vector. A vector is called circular if its probability distribution
is rotationally invariant, i .e., and have the same
probability distribution for any given real . Circularity does
not imply any condition on the standard covariance matrix
because
(15)
On the other hand,
(16)
is true for arbitrary if and only if . Because the
Gaussian distribution is completely determined by second-order
statistics, a complex Gaussian random vector is circular if and
only if it is zero-mean and proper [48].Propriety requires that second-order moments be rotationally
invariant, whereas circularity requires that the distribution, and
thus all moments (if they exist), be rotationally invariant. There-
fore, circularity implies zero mean and propriety, but not vice
versa, and impropriety implies noncircularity, but not vice versa.
By extending the reasoning of (15) and (16) to higher order mo-
ments, we see that if is circular, a th order moment can be
nonzero only if it has the same number of conjugated and non-
conjugated terms [100]. In particular, all odd-order moments
must be zero. This holds for arbitrary order .
We call a vector th-order circular [100] or th-order
proper if the only nonzero moments up to order have the
same number of conjugated and nonconjugated terms. In particular, all odd moments up to order must be zero. There-
fore, for zero-mean random vectors, the terms proper and
second-order circular are equivalent. We note that, while this
terminology is most common, there is not uniform agreement
in the literature. Some researchers use the terms “proper” and
“circular” interchangeably. More detailed discussions of higher
order statistics, Taylor series expansions, and characteristic
functions for improper random vectors are provided in [11],
[42], and [115].
D. Statistics of Complex-Valued Random Processes
In this section, we provide a very brief introduction tocomplex discrete-time random processes. Complex random
processes have been studied by [12], [33], [84], [100], [102],
[111], and [132], among others. For a comprehensive review,
we refer the reader to [115].
The covariance function of a complex-valued discrete-time
random process is defined as
where the first term is the correlation function. To completely
describe the second-order statistics, we also define the comple-
mentary covariance function (also called the pseudo-covariance
or relation function) as
If the complementary covariance function vanishes for all pairs
, then is called proper , otherwise improper . If and
only if is zero-mean and proper is it second-order circular.
The definitions in the continuous-time case are completely
analogous.
A random signal is stationary if all its statistical prop-
erties are invariant to any given time-shift (translations of the
origin), or equivalently, if the family of distributions that de-
scribe the random process as a collection of random variables
is shift-invariant. To define wide-sense stationarity (WSS), we
therefore consider the complete second-order statistical char-
acterization including the complementary covariance function.
We call a complex random process WSS if and only if
, , and are all
independent of .
Since the complementary covariance function is commonly
ignored, the “traditional” definition of WSS for complex
processes only considers the covariance function, allowing a
shift-dependent complementary covariance function. Some re-
searchers then introduce a different term (such as second-order stationary [100]) to refer to the condition where both the
covariance and complementary covariance function are
shift-invariant.
Let be a second-order zero-mean WSS process, and let
denote the corresponding augmented
process. We define the augmented power spectral density matrix
of as the discrete-time Fourier transform of the covariance
function of , which is given by
In this equation, is the power spectral den-
sity and is called the complementary power
spectral density. When the complementary power spectral den-
sity vanishes for all , , the process is proper.
The augmented power spectral density matrix is positive
semi-definite. As a consequence, there exists a WSS random
process with power spectral density and comple-
mentary power spectral density if and only if [102]:
1) ;
2) (but is generally complex);
3) .
The third condition allows us to establish the classical re-
sult that analytic WSS signals without a DC componentmust be proper [100]. If is analytic, then for
. If it doesnot have a DC component, then
for . Therefore, implies
. There are further connections between stationarity
and circularity, which are explored in detail in [100]. An al-
gorithm for the simulation of improper WSS processes having
specified covariance and complementary covariance functions
is given in [108].
E. Gaussian Random Variables and Vectors
Next, we look at the ever-important Gaussian distribution. In
order to derive the general complex multivariate Gaussian pdf
(proper/circular or improper/noncircular) for zero-mean
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 7/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5107
, we begin with the Gaussian pdf of the composite
random vector of real and imaginary parts :
(17)
We first consider the nondegenerate case where all covariance
matrices are invertible. Using , ,and , where was defined in (7), we
obtain the pdf of complex [101], [129]:
(18)
This pdf depends algebraically on , i.e., and , but is
interpreted as the joint pdf of and , and can be used for
proper/circular or improper/noncircular . In the past, the term
“complex Gaussian distribution” often implicitly assumed cir-
cularity. Therefore, some researchers call a noncircular complex
Gaussian random vector “generalized complex Gaussian.” The
simplification that occurs when is obvious and leadsto the pdf of a complex proper and circular Gaussian random
vector [47], [139]:
A generalization of the Gaussian distribution is the family of
elliptical distributions. A study of complex elliptical distribu-
tions has been provided by [94] and in [115, sec. 2.3.4].
The scalar complex Gaussian: The case of a scalar Gaussian
provides some interesting insights [92], [115].
The real bivariate Gaussian vector can be characterized
by (the variance of ), (thevariance of ), and (the correlation co-
ef ficient between and ). From (18), the pdf of the complex
Gaussian variable can be expressed as
(19)
where the complex correlation coef ficient between and
is defined as
So there are two equivalent parametrizations of a complex
Gaussian : We can use the three real parameters , ,
and or the three real parameters , , and . We
can obtain the second set of parameters from the first set as
(20)
(21)
and vice versa as
(22)
(23)
(24)
The complex correlation coef ficient satisfies
. If , then with
probability 1. Equivalent conditions for are ,
or , or . The first of these conditions
makes the complex variable purely imaginary and the secondmakes it real. The third condition means
and . All these cases with are
called maximally noncircular because the support of the pdf
for the complex random variable degenerates into a line in
the complex plane.
For , there are basically four different cases:
1) If and have identical variances,
, and are independent, , then is proper and
circular, i.e., , and its pdf takes on the simple and
much better-known form
(25)
2) If and have different variances, , but
and are still independent, , then is real ,
, and is noncircular.
3) If and have identical variances,
, but and are correlated, , then is
purely imaginary, , and is noncircular.
4) If and have different variances, , and
are correlated, . Then is generally complex.
With and , we see that the pdf in
(19) is constant on the level curve .
This contour is an ellipse, and is maximum whenis minimum. This establishes that the ellipse orientation (the
angle between the real axis and the major ellipse axis) is ,
which is half the angle of the complex correlation coef ficient
. It can be shown (see [92]) that the degree of non-
circularity is the square of the ellipse eccentricity.
Fig. 1 shows contours of constant probability density for
cases 1–4 listed above. In Fig. 1(a), we see the circular case
with , which exhibits circular contour lines. All remaining
plots are noncircular, with elliptical contour lines. We can make
two observations: First, increasing the degree of noncircularity
of the signal by increasing leads to ellipses with greater
eccentricity. Secondly, the angle of the ellipse orientation ishalf the angle of , as shown above. Fig. 1(b) shows case 2:
and have different variance but are still independent. In this
situation, the ellipse orientation is either 0 or 90 , depending
on whether or has greater variance. Fig. 1(c) shows case
3: and have the same variance but are now correlated.
In this situation, the ellipse orientation is either 45 or 135 .
The general case 4 is depicted in Fig. 1(d). Now, the ellipse
can have an arbitrary orientation , which is controlled by the
angle of .
Marginal and von Mises distributions: With ,
, , it is possible to change vari-
ables and obtain the joint pdf for the polar coordinates
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 8/25
5108 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
Fig. 1. Probability-density contours of complex Gaussian random variableswith different complex correlation coef ficient .
Fig. 2. Marginal pdfs for magnitude and angle in the bivariate Gaussiandistr ibution with unit variance , , and various values for .
where and . The marginal pdf for is
obtained by integrating over ,
(26)
where isthe modified Bessel function of thefirst kind of order
0. The pdf is invariant to . In the circular case , it
is the Rayleigh pdf
Thissuggests thatwe call in(26) the improper/noncircular
Rayleigh pdf. Integrating over yields the marginal
pdf for :
In the circular case , this is a uniform distribution.
These marginals are illustrated in Fig. 2 for ,
, and various values of . The larger the more noncir-
cular becomes, and the marginal distribution of develops
two peaks at and . At thesame time, the maximum of the pdf of is shifted to the left.
Having derived the joint and marginal distributions of and
, it is also straightforward to write down the von Mises pdf,
which is the conditional distribution of given :
with
All these results show that the parameters , , for complex , rather than the parameters ,
, for r eal , a re t he m ost n atural p arametriza-
tion for the joint and marginal pdfs of the polar coordinates
and .
It is worthwhile to comment on the fact that the Central Limit
Theorem still applies to noncircular random variables. That is,
adding more and more independent and identically distributed
noncircular random variables leads to a sample average that
is asymptotically Gaussian and noncircular. However, this ad-
dition has to be done coherently, preserving the phase of the
random samples. If a large number of noncircular random vari-
ables is added noncoherently, i.e., with randomized phase, thenthe phase information is washed out, and the resulting sample
average becomes more and more circular.
F. Examples
Fig. 3 shows scatter plots for three signals: (a) ice multipa-
rameter imaging -band radar (IPIX) data from the website
http://soma.crl.mcmaster.ca/ipix/; (b) a 16-quadrature ampli-
tude modulated (QAM) signal; and (c) wind data obtained
from http://mesonet.agron.iastate.edu. Fig. 4 shows their cor-
responding covariance functions and complementary co-
variance functions . The radar signal in Fig. 4(a) is
narrow-band. Evidently, the gain and phase of the in-phase and
quadrature channels are matched, as the data appear circular
(and therefore proper). The uniform phase is due to carrier
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 9/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5109
Fig. 3. Scatter plots for (a) circular, (b) proper but noncircular, and (c) improper (and thus noncircular) data.
Fig. 4. Covariance and complementary covariance function plots for the corresponding processes in Fig. 3: (a) circular (b) proper but noncircular; and (c) im- proper.
Fig. 5. (a) Scatter plot of the average voxel values of the motor component estimated using ICA from 16 subjects. (b) Magnitude and (c) phase spatial maps usingMahalanobis -score thresholding; only voxels with are shown.
phase fluctuation from pulse-to-pulse and the amplitude fluctu-
ations are due to variations in the scattering cross-section. The16-QAM signal in Fig. 4(b) has zero complementary covari-
ance function and is therefore proper (second-order circular).
However, its distribution is not rotationally invariant and there-
fore it is noncircular. The wind data in Fig. 4(c) is noncircular
and improper.
In Fig. 5(a), we show the scatter plot of a motor component
estimated using ICA of functional MRI data [106], which is nat-
urally represented as complex valued [4]. The paradigm used in
the collection of the data is a simple motor task with a box-car
type time-course, i.e., the stimulus has periodic on and off pe-
riods. As can be observed in the figure, the distribution of the
given fMRI motor component has a highly noncircular distribu-
tion. In Fig. 5(b) and (c), we show the spatial map for the same
component using a Mahalanobis -score threshold, which is de-
fined as . In this ex-
pression, is the vector of real and imag-
inary parts of the th estimated source of voxel , and and
are the corresponding estimated spatial image mean vector
and covariance matrix.
As demonstrated by these examples, noncircular signals do
arise in practice, even though circularity has been commonly as-
sumed in the derivation of many signal processing algorithms.
As we will elaborate, taking the noncircular or improper nature
of signals into account in their processing may provide signifi-
cant payoffs. It is worth noting that, in these examples, we have
classified signals as circular or noncircular simply by inspec-
tion of their scatter plots and estimated covariance functions.
But such classification should be done based on sound statis-tical arguments. This is the topic of the next section.
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 10/25
5110 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
III. MODEL SELECTION
When should we take noncircularity into account? On the
one hand, if the signals are indeed noncircular, we would ex-
pect a noncircular model to capture their properties more ac-
curately. On the other hand, noncircular models have more de-
grees of freedom than circular models, and the principle of par-
simony says that one should seek simple models to avoid over-fitting to noise fluctuations. So how do we choose between cir-
cular/proper and noncircular/improper models? This raises the
question of how to detect whether a given signal is circular or
noncircular, and more generally, how to detect the number of
circular and noncircular signals in a given signal subspace. The
latter problem can be combined with the detection of the dimen-
sion of the signal subspace so it becomes a simultaneous order
and model selection problem. For either problem, we show that
circular models should be preferred over noncircular models
when the SNR is low, the number of samples is small, or the
signals’ degree of noncircularity is low.
A. Circularity Coef ficientsThroughout this section, we drop the subscripts on covari-
ance matrices for economy. We first derive a maximal invariant
(a complete set of invariants) for the augmented covariance ma-
trix under nonsingular strictly linear transformation. Such a
set is given by the canonical correlations between and its con-
jugate [112], which [41] calls the circularity coef ficients of
. “Maximal invariant” means two things: 1) the circularity co-
ef ficients are invariant under nonsingular linear transformation
and 2) if two jointly Gaussian random vectors and have the
same circularity coef ficients, then and are related by a non-
singular linear transformation, .
Assuming has full rank, the canonical correlations betweenand are determined by starting with the coherence ma-
trix [116]
(27)
Since is complex symmetric, , yet not Hermitian
symmetric, i.e., , there exists a special singular value
decomposition (SVD), called the Takagi factorization [53],
which is
(28)
The complex matrix is unitary, and
contains the canonical correla-tions on its diagonal. The
squared canonical correlations are the eigenvalues of the
squared coherence matrix , or
equivalently, of the matrix [116]. The canonical
correlations between and are invariant to the choice of a
square root for , and they are a maximal invariant for under
nonsingular strictly linear transformation of . Therefore,
any function of that is invariant under nonsingular strictly
linear transformation must be a function of these canonical
correlations only [112].
Following [41], we call these canonical correlations the
circularity coef ficients, and the set the circularity spec-
trum of . However, the term “circularity coef ficient” is not en-
tirely accurate as the circularity coef ficients only characterize
second-order circularity, or (im-)propriety. Thus, the name im-
propriety coef ficients would have been more suitable. For a
scalar random variable , there is only one circularity coef fi-
cient . For Gaussian , characterizes the degree
of noncircularity, and for non-Gaussian , the degree of impro-
priety.
Differential entropy1: The entropy of a complex random
vector is defined to be the entropy of the real composite
vector . The differential entropy of a complex Gaussian
random vector with augmented covariance matrix is thus
Combining our results so far, we may factor as
(29)
Note that each factor is an augmented matrix. This factorization
establishes
(30)
This allows us to write the entropy of a complex noncircular
Gaussian random vector as [41], [112]
(31)
where is the entropy of a circular Gaussian random
vector with the same Hermitian covariance matrix (but
). The entropy is maximized if and only if is
circular. If is noncircular, the loss in entropy compared to the
circular case is given by the second term in (31), which is a
function of the circularity spectrum. Thus, this term can be used
as a measure for the degree of impropriety of a random vector ,
and it is also a test statistic in thegeneralized likelihood ratio test
for noncircularity. Other measures for the degree of impropriety
have been proposed and studied by [112].
B. Testing for Circularity
In this section, we present hypothesis tests for circularity
based on a generalized likelihood ratio test (GLRT). In a GLR,
the unknown parameters ( and in our case) are replaced by
maximum likelihood estimates. The GLR is always invariant to
transformations for which the hypothesis testing problem itself
is invariant [58]. As propriety is preserved by strictly linear,
but not widely linear, transformations, the hypothesis test must
1Since we do not consider discrete random variables in this paper, we refer to differential entropy simply as entropy from now on.
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 11/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5111
be invariant to strictly linear, but not widely linear, transforma-
tions. Since the GLR must be a function of a maximal invariant
statistic the GLR is a function of the circularity coef ficients.
Let be a complex zero-mean Gaussian random vector
with augmented covariance matrix . Consider indepen-
dent and identically distributed (i.i.d.) random samples drawn
from this distribution and arranged in the sample matrix
, and let denote the
corresponding augmented sample matrix. The joint probability
density function of these samples is
(32)
with augmented sample covariance matrix
The GLR test of the hypotheses is circular vs.
is noncircular compares the GLRT statistic
(33)
to a threshold. This statistic is theratio of likelihood with con-
strained to have zero off-diagonal blocks, , to likelihood
with unconstrained. We are thus testing whether or not is
block-diagonal. The unconstrained maximum likelihood (ML)
estimate of is the augmented sample covariance matrix .
The ML estimate of under the constraint is
After a little algebra, the GLR can be expressed as [94], [116]
(34)
In this equation, denotes the estimated circularity coef fi-
cients, which are computed from the augmented sample covari-
ance matrix . A full-rank implementation of this test relies
on the middle expression in (34). A reduced-rank implementa-
tion that considers only the largest estimated circularity coef-ficients is based on the last expression in (34).
This test was first proposed by [13], in complex notation by
[94], and the connection with canonical correlations was estab-
lished by [116]. It is shown in [13] that
rather than is the locally most powerful (LMP) test for circu-
larity. An LMP test has the highest possible power (i.e., prob-
ability of detection) for close to , where all circularity
coef ficients are small. Testing for circularity is reexamined by
[133], which studies the null distributions of and , and de-
rives a distributional approximation for . It also shows that no
uniformly powerful (UMP) test exists for this problem because
the GLR and the LMP tests are different for dimensions .
(In the scalar case , the GLRandLMPtestsareidentical).
Extensions of this test to non-Gaussian data: There have been
recent extensions of this test to non-Gaussian data. In [95], the
test is made asymptotically robust with respect to violations of
Gaussianity, provided the distribution remains elliptically sym-
metric. This is achieved by dividing the GLRT statistic by an
estimated normalized fourth-order moment (which is closely
related to the kurtosis). Generation and estimation of samples
from a complex generalized Gaussian distribution (GGD) and
estimation of its parameters is addressed in [90] and a circu-
larity test for the complex GGD is given in [89]. These results
have been recently extended to elliptically symmetric distribu-
tions [93].
The complex GGD is a complex elliptically symmetric dis-
tribution with a simple probabilistic characterization. It allows
for a direct connection to the Gaussian test given in (34). If is
a scalar complex GGD random variable, it has pdf
(35)
where is the shape parameter, ,
, is the Gamma function, , and is the
augmented covariance matrix. The complex GGD comprises a
large number of symmetric distributions, from super-Gaussian
(for ), Gaussian (for ), to sub-Gaussian (for
), including other distributions such as the bivariate Lapla-
cian. As for all complex elliptically symmetric distributions,
zero mean and propriety is equivalent to circularity.
A GLRT statistic based on the complex GGD, as derived by[89], is given by
(36)
Here, is the ML estimate of the shape parameter , and
and are the ML estimates of the augmented covariance ma-
trix, under and , respectively. These ML estimates have
been derived in [90]. Asymptotically for , the statistic
is -distributed with two degrees of freedom. This al-lows choosing a threshold to yield a specified probability of
false alarm. In the scalar Gaussian case, and , the
last term in (36) vanishes, and the complex GGD and Gaussian
GLRT become equivalent. Using the complex GGD model and
the ML estimators for its parameters, it is also possible to design
a test for Gaussianity [89].
To compare the performance of the different circularity de-
tectors, we generate independent realizations from
a complex GGD with unit variance , as in [89].2
Fig. 6(a)–(c) shows the probability of detection versus the de-
gree of impropriety for the three detectors: the
Gauss-GLRT in (34), the adjusted GLRT (Adj-GLRT) [95], and
2Matlab code to implement the detectors and to generate complex GGD sam- ples can be found at http://mlsp.umbc.edu/resources.
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 12/25
5112 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
Fig. 6. Detection performance of circularity detectors versus degree of impropriety , for sub-Gaussian, Gaussian, and super-Gaussian complex GGDsamples. Probability of false alarm is fixed at 0.01.
the complex GGD-GLRT in (36). Each data point in the fig-
ures is the result of the average of 1000 runs, with the detec-
tion threshold set for a probability of false alarm of 0.01. As
can be observed in the figures, all three detectors yield similar
performance f or sub-Gaussian and identical performance for
Gaussian data. So the Gauss-GLRT detector performs well even
with sub-Gaussian data. When the samples are super-Gaussian,however, the complex GGD-GLRT significantly outperforms.
In [89], examples are presented to show that the circularity test
based on the complex GGD model provides good performance
even with data that are not complex GGD, such as BPSK data.
C. Order S election With a General Signal Subspace Model
Determining the effective order of the signal subspace is an
important problem in many signal processing and communica-
tions applications. The popular solution proposed by Wax and
Kailath [134] uses information theoretic criteria by considering
principal component analysis (PCA) of the observed data to se-
lect the order. However, the model assumes circular multivariateGaussian signals, and is suboptimal in the presence of noncir-
cular signals in the subspace [73].
In [73], a noncircular PCA (ncPCA) approach is proposed
along with an ML procedure for estimating the free parameters
in the model. The procedure can be used for model selection in
the presence of circular Gaussian noise, whereby the numbers
of circular and noncircular signals are determined together with
the total model order. The method reduces to Wax and Kailath’s
signal detection method [134] when all signals are circular, but
may provide a significant performance gain in the presence of
noncircular signals. We first introduce the model for ncPCA and
then demonstrate its use for model selection.
Given a set of observations , withas the observation index, we assume the linear model
(37)
wher e is the complex-valued
signal vector, is an complex-valued matrix with full
column rank, , and is the complex-valued
random vector modeling the additive noise. Given the
set of observations , , a key step in many ap-
plications is to determine the dimension of the signal sub-
space. Traditional PCA provides a decomposition into signal
and noise subspaces. In ncPCA, the -dimensional signal sub-
space is further decomposed into noncircular and circular sig-
nals. As in [134], the noise term is assumed to be a cir-
cular, isotropic, stationary, complex Gaussian random process
with zero mean, statistically independent of the signals, and the
signals are assumed to be multivariate Gaussian. However, in
contrast to [134], we let signals be noncircular and
signals be circular. The underlying assumption here is that the
rank of the source covariance matrix is and
that of the complementary covariance matrix is
. For the degenerate case, the actual number of the signals andthose that are noncircular can be greater than and/or .
In order to determine the orders and , given snapshots
of , , we write the likelihood of as
(38)
where denotes the set of all adjustable parameters in the like-
lihood function
For the model in (37), the covariance and complementary co-
variance matrices of have parametric forms [72], [73]
(39)
where is a complex-valued matrix with orthonormal
columns spanning the signal subspace,
is a diagonal matrix with diagonal elements , is the
noise variance, is an complex-valued
matrix with orthonormal columns, and
is a diagonal matrix with complex-valued entries . The
absolute values of these entries equal the circularity coef ficients:
. Compared to the Takagi factorization in
(28), which leads to nonnegative circularity coef ficients, the de-
composition in (39) incorporates an additional phase factor into
the quantities , making them complex valued. This is simply
a notational convenience [73].
Given the ML estimates for all the free parameters
in (38), we may follow [134] and
select and using information-theoretic criteria such as
Akaike’s information criterion (AIC) [8], the Bayesian in-
formation criterion (BIC) [118], or the minimum description
length (MDL) [105]. This leads to the estimated orders
(40)
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 13/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5113
where is the optimal that maxi-
mizes , which can be obtained as in [72], and results in
the likelihood
(41)
The penalty term in (40) contains the degrees of
freedom of ,
and , which depends on the number of samples and the
chosen criterion. For example, in the BIC(or theMDL criterion)
[105], [118], .
Thesignal detection method given in [134] can be obtained as
a special case without noncircular sources. In the pres-ence of noncircular signals, the presented approach will lead to
a smaller term in (40). At the same time, how-
ever, a noncircular model has more degrees of freedom, so the
penalty term in (40) will increase. This requires the
right tradeoff, resulting in a good model fit without over fitting.
The following example demonstrates this tradeoff and shows
that a circular model can be preferable when the noise level is
high, the degree of noncircularity is low, and/or the number of
samples is small.
In this example, Gaussian sources of unit vari-
ance and identical degree of noncircularity
are mixed through a randomly chosen 20 7
matrix, after which circular Gaussian noise is added to the
mixture. We study the BIC of a circular model with or-
ders , and a noncircular model with orders
. The gain of the noncircular model over the circular
model is defined as (BIC of circular model)/ minus (BIC of
noncircular model)/ . It is clear that a positive gain suggests
that the noncircular model is preferred over the circular model.
Fig. 7 shows the overall information-theoretic (BIC) gain of
using a noncircular model for varying degree of noncircularity,
SNR, and sample size. The results are averaged over 1000 in-
dependent runs. We observe that the noncircular model is pre-
ferred when there is ample evidence that the signals are non-
circular: the cases of large degree of noncircularity, high SNR,and large sample sizes. On the other hand, the simpler circular
model is preferred when there is scarce evidence that the signals
are noncircular: the cases where the degree of noncircularity is
low, the SNR is low, or the number of samples is small. The
ncPCA approach can model both circular and noncircular sig-
nals and hence can avoid the use of an unnecessarily complex
noncircular model when a circular model should be preferred.
We also give an example in array signal processing to
demonstrate the direct performance gain using ncPCA rather
than a circular model [134] for signal subspace estimation
and model selection. We model far-field, independent
narrowband sources (one BPSK signal, one QPSK signal and
one 8-QAM signal, with degrees of impropriety of 1, 0, and
2/3, and variances 1, 2, and 6, respectively) emitting plane
Fig. 7. BIC gain of noncircular over circular model for (a) varying degree of noncircularity, (b) varying SNR, and (c) varying sample size. Each simulation point is averaged over 1000 independent runs. A positive gain suggests that thenoncircular model is preferred over the circular model.
Fig. 8. Comparison of probability of detection (Pd) and subspace distance gainfor (a) ncPCA and (b) circular PCA (cPCA), as a function of SNR.
waves impinging upon a uniform linear array of
sensors with half-wavelength inter-sensor spacing. The re-
ceived observations are ,
where is the snapshot index, is the waveform of
the th source, the circular antenna noise,
is the
steering vector associated with the th source, and is
the direction-of-arrival (DOA) of the th source. We set
and .In Fig. 8, we show the gain using ncPCA compared to a cir-
cular model [134] both in terms of probability of detection and
in terms of subspace distance. Probability of detection is de-
fined as the fraction of trials where the order was detected cor-
rectly, and substance distance is the squared Euclidean distance
between the estimated and the true signal subspace. Each sim-
ulation point is averaged over 1000 independent runs. As ob-
served in Fig. 8, ncPCA outperforms circular PCA by approxi-
mately 4.0 dB in SNR. This holds for both the detection of the
order alone and the joint detection of orders , which
perform almost identically for different SNR levels. In addition,
ncPCA consistently leads to a smaller subspace distance. Fur-
ther examples and additional discussion on the performance of
ncPCA are given in [72].
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 14/25
5114 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
IV. WIDELY LINEAR ESTIMATION
In Section II-B, we have discussed widely linear (linear-con-
jugate linear) transformations, which allow access to the
information contained in the complementary covariance. In
this section, we consider widely linear minimum mean-square
error (WLMMSE) estimation [103], widely linear minimum
variance distortionless response (WLMVDR) estimation [26],[27], [29], [77], and the widely linear least-mean-square (LMS)
algorithm. Using the augmented vector and matrix notation,
many of the results for widely linear estimation are straight-
forward extensions of the corresponding results for linear
estimation.
A. Widely Linear MMSE Estimation
We begin with WLMMSE estimation of the -dimensional
message (or signal) from the -dimensional measurement
. To extend results for LMMSE estimation to WLMMSE es-
timation we need only replace the signal by the augmented
signal and themeasurement by the augmented measurement, and proceed as usual. Thus, most of the results for LMMSE
estimation apply straightforwardly to WLMMSE estimation. It
is, however, still worthwhile to summarize some of these re-
sults and to compare LMMSE with WLMMSE estimation. The
widely linear estimator is
(42)
where
is determined such that the mean-square error
is minimized. Keep in mind that the augmented
notation on the left hand side of (42) is simply a convenient, but
redundant, representation of the right hand side of (42). Obvi-
ously, it is suf ficient to estimate because can be obtained
from through conjugation.
The WLMMSE estimator is found by applying the orthog-
onality principle and [103], or
equivalently, . This says that the error between the
augmented estimator and the augmented signal must be orthog-
onal to the augmented measurement. This leads to
and thus
(43)
Thus, , or equivalently, [103]
In this equation, the Schur complement
is the error covariance matrix for linearly esti-
mating from . The augmented error covariance matrix
of the error vector is
A competing estimator will produce an aug-
mented error with covariance matrix
(44)
which shows . As a consequence, all real-valued
increasing functions of are minimized, in particular,
and .
These statements hold for the error vector as well as the
augmented error vector because . The
error covariance matrix of the error vector is the
NW block of the augmented error covariance matrix , which
can be evaluated as
(45)
A particular choice for a generally suboptimum filter is the
LMMSE filter
which ignores complementary covariance matrices.
Special cases: If the signal is real, we have .
This leads to the simplified expression
While the WLMMSE estimate of a real signal from a complex
signal is always real [103], the LMMSE estimate is generally
complex.
The WLMMSE and LMMSE estimates are identical if andonly if the error of the LMMSE estimate is orthogonal to ,
i.e.,
(46)
There are two important special cases where (46) holds:
• The signal and measurement are cross-proper,i.e.,
, and the measurement is proper, . Joint propriety
of and will suf fice but it is not necessary that be
proper.
• The measurement is maximally improper , i.e.,
with probability 1 for constant with . In this case,and and
. WL estimation is unnecessary since and
both carry exactly the same information about . This is
irrespective of whether or not is proper.
In these cases, WLMMSE estimation has no performance ad-
vantage over LMMSE estimation. The other extreme case is
where a WL operation allows perfect estimation while LMMSE
estimation yields a nonzero estimation error. An example of
such a case is , where the signal is real and the noise
purely imaginary. Here the WL operation yields a
perfect estimate of , whereas the LMMSE estimate is not even
real valued. If with proper and white noise , then
the maximum performance advantage of WLMMSE estimation
over LMMSE estimation is a factor of 2 [117].
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 15/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5115
WLMMSE estimation is optimum in the Gaussian case, but
may be improved upon for non-Gaussian data if we have access
to higher order statistics. The next logical extension of widely
linear processing is to widely linear-quadratic processing [28],
[30], [115], which requires statistical information up to fourth
order. We should add the cautionary note here that there is no
difference between the optimum, generally nonlinear, condi-
tional mean estimator , and . Conditioning
on and changes nothing, since already extracts all
the information there is about from . So the “widely non-
linear” estimator is simply .
B. Widely Linear MVDR Estimation
Next, we extend linear minimum-variance distortionless re-
sponse (LMVDR) estimators to widely linear MVDR estimators
that account for complementary covariance. The measurement
model is
(47)
where the matrix consists of modes that are assumed to be known (as, for instance, in
beamforming), and the vector consists
of complex deterministic amplitudes. Without loss of gener-
ality we assume . The noise has covariance matrix
. Because is modeled as unknown but deter-
ministic, the covariance matrix of the measurement equals the
covariance matrix of the noise: . The matched filter
estimator of is
(48)
with mean and error covariance matrix
. The LMVDR estimator , with, is derived by minimizing the trace of the error
covariance under an unbiasedness constraint:
under constraint
The solution is
(49)
with mean and error covariance matrix
. If there is only a single mode
, then the solution is
(50)
and the covariance matrix of the noise is
.
The measurement model (47) in augmented form is
where
and the noise is generally improper with augmented covari-
ance matrix . The matched filter estimator of , however,
does not take into account noise, and thus the widely linear
matched filter solution is still the linear solution (48). The
WLMVDR estimator, on the other hand, is obtained as the
solution to
under constraint (51)
with
This solution is widely linear [26], [29], [77],
(52)
with mean and augmented error covariance matrix
. The vari-
ance of the WLMVDR estimator is less than or equal to the
variance of the LMVDR estimator because of the following ar-
gument. The optimization (51) is performed under the constraint
, or equivalently, and . Thus,
the WLMMSE optimization problem contains the LMMSE
optimization problem as a special case in which is
enforced. As such, WLMVDR estimation cannot be worsethan LMVDR estimation. However, the additional degree of
freedom of being able to choose can reduce variance
if the noise is improper.
The reduction in variance of the WLMVDR compared to the
LMVDR estimator is entirely due to exploiting the complemen-
tary correlation of the noise . Indeed it is easy to see that for
proper noise , the WLMVDR solution (52) simplifies to the
LMVDR solution (49). Since is not assigned statistical prop-
erties, the solution is independent of whether or not is im-
proper. This stands in marked contrast to WLMMSE estima-
tion discussed in the previous section. A detailed analysis of
the WLMVDR estimator is provided by [26]. Further results are presented in [27].
C. Linear and Widely Linear Filtering and the LMS Algorithm
In this section, we consider the linear and widely linear fil-
tering problem, where we estimate a scalar signal from
observations taken over time instants
. We first derive the normal equations
and the LMS algorithm [135], [136], and then extend it to the
widely linear case. The linear estimate of is
For convenience, we will suppress the time-dependency of
the filter . We could derive the solution for using
orthogonality arguments as in (43). Alternatively, we can take
the Wirtinger derivative of the linear mean-square error term
, which is real
valued, with respect to (by treating as a constant)
(53)
By setting (53) to zero and assuming a nonsingular covariance
matrix, we obtain the normal equations
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 16/25
5116 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
where . If and are jointly
WSS, this equation is independent of . The weight vector
can also be computed adaptively using gradient descent updates
(54)
or using stochastic gradient updates
which replaces the expected value in (54) with its instantaneous
estimate. This is the popular LMS algorithm. The stepsize
determines the tradeoff between the rate of convergence and the
minimum MSE.
Widely linear filter and LMS algorithm: As shown in (42), a
widely linear filter forms the estimate of through the inner
product
(55)
where the weight vector has twice
the dimension of the linear filter. The minimization of the
widely linear MSE cost is obtained analogously by setting
. This results in the widely linear complex normal
equation
where . The difference between thewidely linear MSE and the linear MSE is [103]
(56)
The widely linear LMS algorithm is written similar to the linear
case as
(57)
where is the stepsize and .The study of the LMS filter has been an active research topic.
A thorough account of its properties is given in [50], [74] based
on the different types of assumptions that can be invoked to
simplify the analysis. With the augmented vector notation, most
of the results for the linear LMS filter can be readily extended
to the widely linear case although care must be taken in the
assumptions such as uncorrelatedness and orthogonality of the
signals [6], [38].
The convergence of the LMS algorithm dependson the eigen-
values of the input covariance matrix [21], [50]. For the widely
linearLMSfilter,thisis theaugmented covariance matrix.A mea-
sure typically used for the eigenvalue disparity of a given matrix
is the condition number. For a Hermitian matrix , the condi-
tionnumberis theratio of largest to smallest eigenvalue:
. When the signal is proper, the augmented covariance ma-
trix is block-diagonal and has eigenvalues that occur with even
multiplicity. In thiscase, thecondition numbersof theaugmented
covariance matrix and the Hermitian covariance matrix are
the same. If the signal is improper, then the eigenvalue spread of
is always greater than the eigenvalue spread of . The max-
imally improper case leads to the most spread out eigenvalues.
This can be shown via majorization theory [115], [117].
Thus, the MSE performance advantage of the widely linear
LMS algorithm for improper signals comes at the price of
slower convergence compared to the linear LMS algorithm.
An update scheme such as the recursive least squares (RLS)
algorithm [50], which is less sensitive to the eigenvalue spread,
may be preferable in some cases. In the next example, we
demonstrate the impact of impropriety on the convergence of
LMS algorithm, and show that when the underlying system
is linear, there is no performance gain with a widely linear
model—a simple point, but not always acknowledged in the
literature.
D. Examples
We define a random process
(58)
where and are two uncorrelated real-valued
random processes, both Gaussian distributed with zero mean
and unit variance. By changing the value of , we can
change the degree of noncircularity of . For , the
random process becomes circular (and hence proper),
whereas and lead to maximally noncircular .We then construct . That is, we
pass through a linear system with impulse response
, where to ensure unit weight norm, and we
also add uncorrelated white circular Gaussian noise
with signal-to-noise ratio of 20 dB.
Because was obtained from as the output ofa linear
system plus uncorrelated noise, the optimum filter for estimating
from is obviously linear. However, we use this simple
example to demonstrate what would happen if one were to use
a widely linear filter instead. In order to estimate using the
LMS algorithm, we assemble snapshots of in the vector . Its covariance
matrix is , and its complementary covariance matrix is
. The eigenvalues of the augmented covariance
matrix can be shown to be and , each with
multiplicity . Hence, the condition number is
for and for .
In Fig. 9, we show the convergence behavior of a linear and a
widely linear LMSfilter for estimating from . The s tep
size is fixed at for all runs, and we choose the length
of the filter to match the length of the system’s impulse
response. Fig. 9(a) shows the learning curve for a circular input,
i.e., , and Fig. 9(b) for a noncircular input with
. The figures confirm that the linear and widely linear filters
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 17/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5117
Fig. 9. Convergence of the linear and widely linear LMS algorithm for a linear system, for (a) circular input and (b) noncircular input.
Fig. 10. Convergence of the linear and widely linear LMS algorithm for a widely linear system, for (a) circular input and (b) noncircular input.
yield the same steady-state mean-square error values for circular and noncircular . However, in the noncircular case, the use
of the widely linear LMS filter is even detrimental because of
its decreased convergence rate. This is due to the fact that, for
circular , the condition numbersforboth and areunity,
whereas for noncircular , but .
Let us now consider the widely linear system
with the same as before.
Fig. 10 shows the corresponding learning curves for the linear
and widely linear LMS filters with the same settings for the
simulation as before. As can be observed in the figures, the
widely linear filter provides smaller MSE for both circular and
noncircular . For circular input , the reduction in MSE
by using the widely linear filter can be obtained from (56) asHowever, this perfor-
mance gain again comes at the price of slower convergence due
to the increased eigenvalue spread.
V. I NDEPENDENT COMPONENT A NALYSIS
Data-driven signal processing methods are based on simple
generative models and hence can minimize the assumptions
on the nature of the data. They have emerged as promising
alternatives to the traditional model-based approaches in many
signal processing applications where the underlying dynamics
are hard to characterize. The most common generative model
used in data-driven decompositions is a linear mixing model,
where the mixture (the given set of observations) is written as
a linear combination of source components, i.e., it is decom- posed into a mixing matrix and a source matrix. For unique-
ness of the decomposition (subject to a scaling and permuta-
tion ambiguity), constraints such as sparsity, nonnegativity, or
independence are applied to the two matrices. ICA is a pop-
ular data-driven blind source separation technique that imposes
the constraint of statistical independence on the components,
i.e., the source distributions. It has been successfully applied
to numerous signal processing problems in areas as diverse
as biomedicine, communications, finance, and remote sensing
[5], [34], [35], [56].
We consider the typical ICA problem, where is a linear
mixture of independent components (sources) , as described
by
(59)
where , i.e., the number of sources and observations
are equal, and the mixing matrix is nonsingular.
The objective is to blindly recover the sources from the obser-
vations , without knowledge of , using the demixing (or sep-
arating) matrix such that the source estimates are
where Arbitrary scaling of , i.e.,
multiplication by a diagonal matrix (which may have complex
entries), and reordering the components of , i.e., multiplication
by a permutation matrix, preserves the independence of its com-
ponents. The product of a diagonal and a permutation matrix is a
monomial matrix, which has exactly one nonzero entry in each
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 18/25
5118 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
column and row. Hence, we can determine only up to mul-
tiplication with a monomial matrix.
A limitation of ICA for the real-valued case is that when the
sources are white—i.e., we cannot exploit the sample corre-
lation structure in the data—we can only allow one Gaussian
source in the mixture for successful separation. In the com-
plex case, on the other hand, we can perform ICA of multiple
Gaussian sources as long as they all have distinct circularity co-
ef ficients. In addition, the complex domain enables the sepa-
ration of improper sources through a direct application of the
invariance property of the circularity coef ficients. When all the
sources in the mixture are improper with distinct circularity co-
ef ficients, we can achieve ICA through joint diagonalization of
the covariance and complementary covariance matrices using
the strong uncorrelating transform (SUT) [41], [63]. For the
real-valued case, separation using second-order statistics can be
achieved only when the sources have sample correlation.
Using higher order statistical information, we can perform
ICA for any type of distribution, circular or noncircular, as long
as there are no two complex Gaussian sources with the same cir-cularity coef ficient [41]. To achieve ICA, we can either compute
the higher order statistics explicitly, or we can generate them
implicitly through the use of nonlinear functions. Among the
former group is JADE (which stands for joint approximate diag-
onalization of eigenmatrices) [24], which computes cumulants.
JADE can be used directly for ICA of complex-valued data. A
recent extension of this algorithm [122] enables joint diagonal-
ization of matrices that can be Hermitian or complex symmetric.
Hence, it allows for more ef ficient ICA solutions, considering
also the commonly neglected complementary statistics in the
original formulation of JADE.
Algorithms that rely on joint diagonalization of cumulantmatrices are robust. However, their performance suffers as the
number of sources increases, and the cost of computing and
diagonalizing cumulant matrices may become prohibitive for
separating a large number of sources. ICA techniques that use
nonlinear functions to implicitly generate higher order statistics
may present attractive alternatives. Among these are ML-ICA
[99], information-maximization (Infomax) [16], and maxi-
mization of non-Gaussianity (e.g., the FastICA algorithm) [55],
which are all intimately related to each other. These algorithms
can be easily extended to the complex domain using Wirtinger
calculus as shown in [7]. In addition, one can develop complex
ICA algorithms that adapt to different source distributions,
using general models such as complex generalized Gaussian
distributions [90] as in [87], or more flexible models through
ef ficient entropy estimation techniques [69].
A. ICA Using Second-Order Statistics
We first present the second-order approach to ICA [40], [41],
[63], which is based on the fact that thecircularity coef ficients of
are invariant under the linear mixing transformation [113].
We discussed the circularity coef ficients in Section III-A. There
is a corresponding coordinate system, called the canonical coor-
dinate system, where the latent description has identity corre-
Fig. 11. Two-channel model for second-order complex ICA. The vertical ar-rows show the complementary correlation matrix between the upper and lower lines.
lation matrix, , and diagonal complementary cor-
relation matrix with the circularity coef ficients on the diagonal,
In [41], vectors that are uncorrelated with unit variance, but
possibly improper, are called strongly uncorrelated , and the
transformation , which transforms into canonical co-
ordinates as , is called the strong uncorrelating
transform (SUT). The SUT is found as
where is obtained from the Takagi factorization (28) for the
coherence matrix of . If all circularity coef ficients are distinct
and nonzero, the SUT is unique up to the sign of its rows [41].
We will now show that the SUT , computed from the
mixture , is a separating matrix for the complex linear ICA
problem provided that all circularity coef ficients are distinct.
The assumption of independent components in in the given
ICA model (59) implies that the correlation matrix and the
complementary correlation matrix are both diagonal. It is
therefore easy to compute canonical coordinates between and, denoted by . In the SUT , is
a diagonal scaling matrix, and is a permutation matrix that
rearranges the canonical coordinates such that corresponds
to the largest circularity coef ficient , to the second largest
coef ficient , and so on. This makes the SUT monomial.
As a consequence, has independent components.
The mixture has correlation matrix
and complementary correlation matrix . The
canonical coordinates between and are computed as
, and the SUT is determined analo-
gously to .
Fig. 11 shows the connection between the different coordi-nate systems. The important observation is that and are
both in canonical coordinates with the same circularity coef fi-
cients . It remains to show that and are related by a di-
agonal unitary matrix as , provided
that all circularity coef ficients are distinct. Since and are
both in canonical coordinates with the same diagonal canonical
correlation matrix ,
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 19/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5119
This shows that is unitary and . The latter can
only b e true if whenever . T herefore, is diag-
onal and unitary if all circularity coef ficients are distinct. Since
is real, the corresponding diagonal entries of all nonzero cir-
cularity coef ficients are actually . Thus, has independent
componentsbecause has independent components. Hence, we
have shown that the SUT is a separating matrix for the
complex linear ICA problem if all circularity coef ficients are
distinct, and the ICA solution is with and
.
This development requires that correlation and complemen-
tary correlation matrices exist. For some distributions (e.g.,
Cauchy distribution) this is not the case. Ollila and Koivunen
[96] have presented a generalization of the SUT that works for
such distributions.
A recent source separation algorithm, entropy rate minimiza-
tion (ERM) [71], uses second-order statistics along with sample
correlation to improve the overall separation performance, and
the second-order source separation algorithm SUT described
above becomes a special case of the ERM algorithm.
B. ICA Using Higher Order Statistics
As discussed in Section V-A, ICA can be achieved using only
second-order statistics as long as all the sources are noncircular
with distinct circularity coef ficients. Using higher order statis-
tics, however, one can develop more powerful ICA algorithms
that can achieve separation under more general conditions and
separate sources as long as there are no two complex Gaussians
with the same circularity coef ficient in the given mixture.
A natural cost for achieving independence is mutual informa-
tion, which can be expressed as the Kullback–Leibler distance
between the joint and factored marginal source densities
(60)
where
is the real representation of complex and
is the pdf of source ,
. To write (60), we used the Jacobian transformation
(61)
where .
When we have independent samples , we can
use the mean ergodic theorem to write the first term in (60), the
total source entropy, as
(62)
where denotes the th row of . Since is con-
stant with respect to the weight matrix, it is easy to see that
the minimization of mutual information given in (60) is
equivalent to maximum likelihood estimation for the givensamples. The maximum likelihood is written as
where
(63)
In writing (63), we used by ignoring the scaling
and permutation ambiguity to simplify the notation. We have
also omitted the sample index of for economy.
A simple but effective learning rule for maximizing
the likelihood function can be obtained by directly using
Wirtinger derivatives as explained in Section II-A. We write thereal-valued cost function as , and derive the
relative (natural) gradient updates for the demixing matrix [7]
(64)
where the th element of the score function is the
Wirtinger derivative
(65)
This way the development avoids the need for simplifying as-
sumptions such as circularity [14].
Another natural cost function for performing ICA is negen-
tropy, which measures the entropic distance of a given distri-
bution from the Gaussian. Independence is achieved by moving
the distribution of the transformed mixture —the indepen-
dent source estimate —away from the Gaussian, i.e., by max-
imizing non-Gaussianity. Note that in this case, the objective
function is defined for each individual source rather than for the
whole vector as in ML ICA. With negentropy maximization
as the objective, the cost function for estimating all sources is
written as in (62), under the constraint that be unitary. Since
, the second term in (63) vanishes for
unitary . So maximizing likelihood and maximizing negen-
tropy are equivalent goals for unitary .If we approximate the negentropy—or, equivalently, the en-
tropy (62) under a variance constraint—using a nonlinear func-
tion
(66)
we can estimate the columns of the unitary demixing matrix
in a deflationary procedure, where the sources are estimated
one at a time. We can again use Wirtinger calculus to obtain
either gradient updates [7] or modified Newton type updates [7],
[86] such that the weight vector is updated by
(67)
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 20/25
5120 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
where and denote the first and second derivatives
of . After each update (67), an orthogonalization and nor-
malization step is used for to ensure that the resulting is
unitary.
To summarize, ICA using higher order statistics can be
achieved by maximizing likelihood (which is equivalent to
minimizing mutual information) or by maximizing negentropy.
When the demixing matrix is constrained to be unitary—as it
is for the negentropy cost—the two costs are equivalent. In
both cases, the performance is optimized when the nonlinear
functions, the score function in (64) and the nonlinearity
in (67), are chosen to match the source distributions. This is
discussed in more detail in the next section.
C. ICA Algorithms and Noncircularity
As shown in Section V-A, noncircularity (or rather impro-
priety) allows the separation of sources using only second-order
statistics as long as all the sources in the mixture are noncircular
with distinct circularity coef ficients. Noncircularity also playsan important role in the performance of ICA algorithms. To be
identifiable, the mixing matrix in (59) must have full column
rank and there must not be two Gaussian sources with the same
circularity coef ficient [41]. In terms of local stability, noncir-
cularity plays a more critical role because algorithms become
less stable as the degree of noncircularity increases. As noted
in the previous section, the ML and negentropy cost functions
are equivalent for unitary . Nevertheless, the stability condi-
tions differ for these two cost functions because, for the negen-
tropy cost, the unitary constraint allows a desirable decoupling
for each source estimate.
When the negentropy cost is approximated as, we can obtain local stability condi-
tions. These are conditions for the cost function to have a local
minimum or maximum [88], following the approach in [18],
which assumed circularity. In Fig. 12, we plot the regions
of stability for three nonlinear functions, ,
, and , where is an arbitrary
constant, chosen as 0.1 as in [18]. The region of stability
depends on the kurtosis (which is normalized such that the
Gaussian has zero kurtosis) and the degree of impropriety
. We see that stability for sub-Gaussian sources,
whose kurtosis is in the range , is always affected by
impropriety. Examples of such sources include quadratureamplitude modulation, BPSK, and uniformly distributed
signals. However, for super-Gaussian sources, whose kurtosis
is greater than zero, impropriety affects stability only for
sources whose kurtosis is close to the Gaussian. Hence, for
super-Gaussian sources, matching the density to the improper
nature of the signals is less of a concern than for sub-Gaussians.
For ML ICA, on the other hand, it is shown in [66] that, when
all sources are circular and non-Gaussian, the updates converge
to the inverse of the mixing matrix up to a phase shift. However,
when the sources are noncircular and non-Gaussian, the stability
conditions become more dif ficult to satisfy, especially when the
demixing matrix is constrained to be unitary. When the sources
are Gaussian though, ML ICA is stable as long as the circularity
coef ficients of all sources are distinct. This is the case that can
Fig. 12. Regions of stability for negentropy cost given by .
also be solved using the second-order ICA approach described
in Section V-A.
For ML ICA of circular sources, the th entry of the score
function vector in (65) can be shown to be [7]
because for this case we have . Thus, the
score function always has the same phase as its argument. This
is the form of the score function proposed in [14] where all
sources are assumed to be circular. When the source is circular
Gaussian, the score function becomes linear: .
This is expected because circular Gaussian sources cannot be
separated. However, when they are noncircular, the score func-
tion is , where and are the variance
and the complementary variance of the th source. This func-
tion is widely linear in , which allows separation as long asthe sources have distinct circularity coef ficients.
In addition to identifiabilityandlocalstabilityconditions,non-
circularity also plays a role in terms of the overall performance
of the algorithms. The performance of both ML-ICA and ICA
by maximization of non-Gaussianity is optimal when the non-
linearity used in the algorithm (the score function for ML
and for maximization of non-Gaussianity) is matched to the
density of each estimated source. Thisis also when theestimators
assume their desirable large sample properties [54], [99].
Given the richer structure of possible distributions in the com-
plex (two-dimensional) space compared to the real (one-dimen-
sional) space, the pdf estimation problem becomes more chal-lenging for complex-valued ICA. In the matching procedure, we
need to consider not only the sub- or super-Gaussian but also the
circular/noncircular nature of the sources. One way to do this is
to employ the complex GGD (35) to derive a class of flexible
ICA algorithms. Its shape parameter and the augmented co-
variance matrix can be estimated during the ICA updates. This
is the approach taken in the adaptive complex maximization of
non-Gaussianity (ACMN) algorithm [86], [87] based on the up-
dates given in (67). Another approach to density matching is en-
tropy estimation through the use of a number of suitably chosen
measuring functions [70]. Such an approach can provide robust
performance even with a small set of chosen measuring func-
tions [70]. Since the demixing matrix is not constrained in the
maximum likelihood approach, exact density matching for each
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 21/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5121
Fig. 13. Performance comparison of eight complex ICA algorithms for the separation of (a) mixtures of five Gaussian sources with degrees of noncircularity
0, 0.2, 0.4, 0.6, and 0.8; and (b) mixtures of nine noncircular complex sources drawn from GGD distributions, both as a function of sample size. Each
simulation point is averaged over 100 independent runs.
source becomes a more challenging task. This is due to the fact
that in the update (64), the score function for each source (65)
affects the whole demixing matrix estimate . In the com-
plex ICA by entropy bound minimization (ICA-EBM) algo-
rithm [69], the maximum likelihood updates are combined with
a decoupling approach that enables easier density matching for
each source independently. We study the performance of these
algorithms in Section V-D.
Finally, it is important to note that exact density matching is
not generally critical for the performance of ICA algorithms. As
the discussion in Section III highlights, it is important only when
the sample size is small or the sources are highly noncircular. In
many cases, simpler algorithms that do not explicitly estimate
the distributions, and may not even match the circular/noncir-
cular nature of the data, can provide satisfactory performance
for practical problems such as fMRI analysis [68].
D. Examples
In this section, we present two examples to show the per-
formance of the ICA algorithms we have discussed so far, for
Gaussian and non-Gaussian sources with varying degrees of
noncircularity [69]. Three indices are used to evaluate perfor-
mance. Ideally, the product of demixing and mixing
matrix should be monomial. Based on that, we define the first
performance index as follows: For each row of , we keep
only the entry whose magnitude is largest. If the matrix we
thus obtain is monomial, we declare success, otherwise failure.
The ratio of failed trials is the first performance index. The
second performance index is the average interference-to-signal-
ratio (ISR), which is calculated as the average for all successful
runs. The ISR for a given source is defined as the inverse ratio
of the largest squared magnitude in the corresponding row of
to the sum of squared magnitudes for the remaining entries in
that row. The third performance index is the average CPU time.
We consider three different ICA algorithms: i) the second-
order SUT [41]; ii) ACMN, which maximizes negentropy by
adaptively estimating the parameters of the GGD model in (35)
[86], [87]; and iii) complex ICA-EBM, which maximizes the
likelihood and approximates the entropy (source distributions)
through a set of flexible measuring functions [69].
The first example is the separation of noncircular Gaussian
sources with distinct circularity coef ficients. Five Gaussian
signals with unit variance and complementary variances of
0, 0.2, 0.4, 0.6, and 0.8 are mixed with a randomly
chosen square matrix whose real and imaginary elements are
independently drawn from a zero-mean, unit-variance Gaussian
distribution. Fig. 13(a) shows the performance indices. We ob-
serve that the second-order SUT algorithm exhibits the best
separation performance and also consumes the least CPU
time. ACMN fails to separate the mixtures primarily because
of the simplification it employs in the derivation. Complex
ICA-EBM performs as well as SUT only for large sample
sizes. The entropy estimator used in this algorithm can exactly
match the entropy of a Gaussian source, which accounts for the
asymptotic ef ficiency of complex ICA-EBM for the separation
of Gaussian sources.
In the second example, we study the performance of the
algorithms for complex GGD sources whose pdf is given
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 22/25
5122 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
in (35). Nine sources are generated with shape parameters
and 9, hence including four
super-Gaussian, one Gaussian, and four sub-Gaussian sources.
For each run, the complex correlation coef ficient is selected
randomly inside the unit circle, resulting in different circularity
coef ficients. Fig. 13(b) depicts the performance indices. We
observe that the flexible ICA-EBM outperforms the other
algorithms at a reasonable computational cost. ACMN, which
is based on a complex GGD model, provides the next best per-
formance but still with a significant performance gap primarily
because of the unitary constraint it imposes on the demixing
matrix .
In summary, the second-order SUT is ef ficient for estimation
of Gaussian sources provided that they all have distinct circu-
larity coef ficients. By adaptively estimating the parameters of
a GGD model, ACMN can provide satisfactory performance
for most cases as shown in [69], [87] but it is also computa-
tionally demanding. Flexible density matching in ICA-EBM,
on the other hand, provides an attractive trade-off between per-
formance and computational cost. In addition, in a maximum
likelihood framework, the demixing matrix is not constrained.
This can lead to better performance compared to algorithms that
do impose constraints, such as ACMN. Except when prior in-
formation such as distinct circularity coef ficients is available,
a flexible algorithm such as ICA-EBM would thus provide the
best choice with robust performance.
VI. CONCLUSION
In this overview paper, we have tried to illuminate the
role that impropriety and noncircularity play in statistical
signal processing of complex-valued data. We considered three
questions: Why should we care about impropriety and non-
circularity? When does it make sense to take it into account?
How do we do it? In a nutshell, the answers to these questions
are: We should care because it can lead to significant perfor-
mance improvements in estimation, detection, and time-series
analysis. We should take it into account whenever there is
suf ficient statistical evidence that signals and/or the underlying
nature of the problem are indeed improper. We can do it by
considering the complete statistical characterization of the
signals and employing Wirtinger calculus for the optimization
of cost functions with complex parameters.
In second-order methods such as mean-square error estima-
tion, a complete statistical characterization requires considera-
tion of the complementary correlation; in problems such as ICA,
it requires the use of flexible density models that can account for
noncircularity. The caveat is that noncircular models have more
degrees of freedom than circular models, and can hence lead
to performance degradation in certain scenarios, even when the
signals are noncircular. We noted that circular models are to be
preferred when the SNR is low, the number of samples is small,
or the degree of noncircularity is low. In addition, in the imple-
mentation of adaptive algorithms such as the LMS algorithm,
taking the complete second-order statistical information into ac-
count may come at the expense of slower convergence.
We have reviewed some of the fundamental results, and some
selected more recent developments in thefield. There isa lot that
we have not included, such as Cramér–Rao-type performance
bounds [37], [97], [115], [127] and a much more in-depth dis-
cussion of random processes. We have also paid only scant at-
tention to the singular case of maximally improper/noncircular
signals (also called “rectilinear” or “strict-sense noncircular”
signals). This case deserves more attention because many al-
gorithms developed for improper/noncircular signals either fail
in the maximally improper case or may even give worse per-
formance than algorithms developed for proper/circular signals.
Examples of papers dealing explicitly with maximally improper
signals are[1],[2], [49], [107]. In addition, we would like to note
the growing interest in the extension of these results to hyper-
complex numbers, in particular quaternions (see, e.g., [9], [17],
[120], [121], [124], [130], and [131]).
We hope that this overview paper will encourage more re-
searchers to take full advantage of the power of complex-valued
signal processing.
ACKNOWLEDGMENT
The authors would like to thank the anonymous reviewers.T. Adalı would particularly like to thank X.-L. Li, M. Novey,and H. Li for helpful comments and suggestions.
R EFERENCES
[1] H. Abeida and J. P. Delmas, “MUSIC-like estimation of direction of arrival for non-circular sources,” IEEE Trans. Signal Process., vol. 54,no. 7, pp. 2678–2690, 2006.
[2] H. Abeida and J. P. Delmas, “Statistical performance of MUSIC-likealgorithms in resolving noncircular sources,” IEEE Trans. Signal Process., vol. 56, no. 9, pp. 4317–4329, 2008.
[3] M. J. Ablowitz and A. S. Fokas , Complex Variables. Cambridge,U.K.: Cambridge Univ. Press, 2003.
[4] T. Adalı
and V. D. Calhoun, “Complex ICA of medical imaging data,” IEEE Signal Process. Mag., vol. 24, no. 5, pp. 136–139, Sep. 2007.[5] T. Adalı and H. Li , Adaptive Signal Processing: Next Generation So-
lutions, Chapter Complex-Valued Signal Processing , T. Adalı and S.Haykin, Eds. New York: Wiley Interscience, 2010.
[6] T. Adalı, H. Li, and R. Aloysius, “On properties of the widely linear MSE filter and its LMS implementation,” presented at the Conf. Inf.Sci. Syst., Baltimore, MD, Mar. 2009.
[7] T. Adalı, H. Li, M. Novey, and J.-F. Cardoso, “Complex ICA usingnonlinear functions,” IEEE Trans. Signal Process., vol. 56, no. 9, pp.4356–4544, Sep. 2008.
[8] H. Akaike, “A new look at statistical model identification,” IEEE Trans. Autom. Control , vol. AC-19, no. 6, pp. 716–723, 1974.
[9] P. O. Amblard and N. L. Bihan, “On properness of quaternion valuedrandom variables,” in Proc. Int. Conf. Math. (IMA) Signal Process.,2004, pp. 23–26.
[10] P. O. Amblard and P. Duvaut, “Filtrage adaptè dans le cas Gaussien
complexe non circulaire,” in Proc. GRETSI Conf., 1995, pp. 141–144.[11] P. O. Amblard, M. Gaeta, and J. L. Lacoume, “Statistics for complex
variables and signals—Part 1: Variables,” Signal Process., vol. 53, no.1, pp. 1–13, 1996.
[12] P. O. Amblard, M. Gaeta, and J. L. Lacoume, “Statistics for complexvariables and signals—Part 2: Signals,” Signal Process., vol. 53, no. 1, pp. 15–25, 1996.
[13] S. A. Andersson and M. D. Perlman, “Two testing problems relating thereal and complex multivariate normal distribution,” J. Multivar. Anal.,vol. 15, pp. 21–51, 1984.
[14] J. Anemüller, T. J. Sejnowski, and S. Makeig, “Complex independentcomponent analysis of frequency-domain electroencephalographicdata,” Neural Netw., vol. 16, pp. 1311–1323, 2003.
[15] L. Anttila, M. Valkama, and M. Renfors, “Circularity-based I/Q im- balance compensation in wideband direct-conversion receivers,” IEEE Trans. Veh. Tech., vol. 57, no. 4, pp. 2099–2113, 2008.
[16] A. Bell and T. Sejnowski, “An information maximization approach to blind separation and blind deconvolution,” Neural Comput., vol. 7, pp.1129–1159, 1995.
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 23/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5123
[17] N. Le Bihan and P.O. Amblard, “Detection and estimation of Gaussian proper quaternion valued random processes,” in Proc. Int. Conf. Math.(IMA) Signal Process., 2006, pp. 23–26.
[18] E. Bingham and A. Hyvärinen, “A fast fixed-point algorithm for inde- pendent component analysis of complex valued signals,” Int. J. Neural Syst., vol. 10, pp. 1–8, 2000.
[19] D. H. Brandwood, “A complex gradient operator and its applicationin adaptive array theory,” Proc. Inst. Electr. Eng., vol. 130, no. 1, pp.11–16, Feb. 1983.
[20] W. M. Brown and R. B. Crane, “Conjugate linear filtering,” IEEE Trans. Inf. Theory, vol. 15, no. 4, pp. 462–465, 1969.
[21] H. J. Butterweck, “A steady-state analysis of the LMS adaptive algo-rithm without the use of independence assumption,” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), Detroit, 1995, pp. 1404–1407.
[22] S. Buzzi, M. Lops, and S. Sardellitti, “Widely linear reception strate-gies for layered space-time wireless communications,” IEEE Trans.Signal Process., vol. 54, no. 6, pp. 2252–2262, 2006.
[23] A. S. Cacciapuoti, G. Gelli, and F. Verde, “FIR zero-forcing multiuser detection and code designs for downlink MC-CDMA,” IEEE Trans.Signal Process., vol. 55, no. 10, pp. 4737–4751, 2007.
[24] J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” Proc. Inst. Electr. Eng.—Radar Signal Process., vol. 140, pp. 362–370, 1993.
[25] P. Charge, Y. Wang, and J. Saillard, “A non-circular sources direction
finding method using polynomial rooting,” Signal Process., vol.81,pp.1765–1770, 2001.
[26] P. Chevalier and A. Blin, “Widely linear MVDR beamformers for the reception of an unknown signal corrupted by noncircular interfer-ences,” IEEE Trans. Signal Process., vol. 55, no. 11, pp. 5323–5336,2007.
[27] P. Chevalier, J. P. Delmas, and A. Oukaci, “Optimal widely linear MVDR beamforming for noncircular signals,” Proc. Int. Conf. Acoust.,Speech, Signal Process. (IEEE ICASSP), pp. 3573–3576, 2009.
[28] P. Chevalier, P. Duvaut, and B. Picinbono, “Complex transversalVolterra filters optimal for detection and estimation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), Toronto,ON, 1991, pp. 3537–3540.
[29] P. Chevalier and A. Maurice, “Constrained beamforming for cyclosta-tionary signals,” Proc. IEEE Int. Conf.Acoust., Speech, Signal Process.(IEEE ICASSP), pp. 3789–3792, 1997.
[30] P. Chevalier and B. Picinbono, “Complex linear-quadratic systems for detection and array processing,” IEEE Trans. Signal Process., vol. 44,no. 10, pp. 2631–2634, Oct. 1996.
[31] P. Chevalier and F. Pipon, “New insights into optimal widely linear array receivers for the demodulation of BPSK, MSK, and GMSK in-terferences—application to SAIC,” IEEE Trans. Signal Process., vol.54, no. 3, pp. 870–883, 2006.
[32] P. Comon, “Estimation multivariable complexe,” Traitement duSignal , vol. 3, pp. 97–102, 1986.
[33] P. Comon, “Circularité et signaux aléatoires à temps discret,” Traite-ment du Signal , vol. 11, no. 5, pp. 417–420, 1994.
[34] P. Comon, “Independent component analysis—a new concept?,”Signal Process., vol. 36, pp. 287–314, 1994.
[35] P. Comon and C. Jutten , Handbook of Blind Source Separation. NewYork: Academic, 2010.
[36] J. P. Delmas, “Asymptotically minimum variance second-order esti-mation for noncircular signals with application to DOA estimation,”
IEEE Trans. Signal Process., vol. 52, no. 5, pp. 1235–1241, 2004.[37] J. P. Delmas and H. Abeida,“Stochastic Cramér-Rao bound for noncir-
cular signals with application to DOA estimation,” IEEE Trans. Signal Process., vol. 52, no. 11, pp. 3192–3199, 2004.
[38] S. Douglas and D. Mandic, “Performance analysis of the conventionalcomplex LMS and augmented complex LMS algorithms,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (IEEE ICASSP),Dallas, TX, Mar. 2010, pp. 3794–3797.
[39] P. Duvaut, “Processus et vecteurs complexes elliptiques,” in Proc.GRETSI Conf., 1995, pp. 129–132.
[40] J. Eriksson and V. Koivunen, “Complex-valued ICA using secondorder statistics,” in Proc. IEEE Int. Workshop Mach. Learn. Signal Process. (MLSP), Saõ Luis, Brazil, Sep. 2004, pp. 183–192.
[41] J. Eriksson and V. Koivunen, “Complex random vectors and ICAmodels: Identifiability, uniqueness and separability,” IEEE Trans. Inf.Theory, vol. 52, no. 3, pp. 1017–1029, 2006.
[42] J. Eriksson, E. Ollila, and V. Koivunen, “Essential statistics and toolsfor complex random variables,” IEEE Trans. Signal Process., vol. 58,no. 10, pp. 5400–5408, 2010.
[43] M. Gaeta, “Higher-order statistics applied to source separation,” Ph.D.dissertation, INP, Grenoble, France, 1991.
[44] W. A. Gardner, “Cyclic Wiener filtering: Theory and method,” IEEE Trans. Commun., vol. 41, no. 1, pp. 151–163, 1993.
[45] G. Gelli, L. Paura, and A. R. P. Ragozini, “Blind widely linear mul-tiuser detection,” IEEE Commun. Lett., vol. 4, no. 6, pp. 187–189,2000.
[46] H. Gerstacker, R. Schober, and A. Lampe, “Receivers with widelylinear processing for frequency-selective channels,” IEEE Trans.Commun., vol. 51, no. 9, pp. 1512–1523, 2003.
[47] N. R. Goodman, “Statistical analysis based on a certain multivariatecomplex Gaussian distribution,” Ann.Math. Stat., vol. 34, pp. 152–176,1963.
[48] T. L. Grettenberg, “A representation theorem for complex normal pro-cesses,” IEEE Trans. Inf. Theory, vol. 11, no. 2, pp. 305–306, 1965.
[49] M. Haardt and F. Roemer, “Enhancements of unitary ESPRIT for non-circular sources,” in Proc. Int. Conf. Acoust., Speech, Signal Process.(IEEE ICASSP), 2004, pp. 101–104.
[50] S. Haykin , Adaptive Filter Theory, fourth ed. Upper Saddle River, NJ: Prentice-Hall, Inc., 2002.
[51] P. Henrici , Applied and Computational Complex Analysis. NewYork: Wiley, 1986, vol. III.
[52] L. Hörmander , An Introduction to Complex Analysis in Several Vari-ables. Oxford, U.K: North-Holland, 1990.
[53] R. A. Horn and C. R. Johnson , Matrix Analysis. New York: Cam-
bridge Univ. Press, 1999.[54] A. Hyvärinen, “One-unit contrast functions for independent component
analysis: A statistical analysis,” in Proc. IEEE Workshop Neural Netw.Signal Process. (NNSP), Amelia Island, FL, Sep. 1997, pp. 388–397.
[55] A. Hyvärinen, “Fast and robust fixed-point algorithms for independentcomponent analysis,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp.626–634, 1999.
[56] A. Hyvärinen, J. Karhunen, and E. Oja , Independent Component Anal- ysis. New York: Wiley, 2001.
[57] J.J. Jeon,J. G.Andrews,and K.M. Sung,“The blindwidely linear min-imum output energy algorithm for DS-CDMA systems,” IEEE Trans.Signal Process., vol. 54, no. 5, pp. 1926–1931, 2006.
[58] S. M. Kay and J. R.Gabriel, “An invariance property of the generalizedlikelihood ratio test,” IEEE Signal Process. Lett., vol. 10, no. 12, pp.352–355, 2003.
[59] K. Kreutz-Delgado, “The Complex Gradient Operator and the CR-Cal-culus. ECE275A: Parameter Estimation I, Lecture Supplement onComplex Vector Calculus,” Univ. of California, San Diego, CA, 2007[Online]. Available: http://arxiv.org/abs/0906.4835
[60] J. L. Lacoume, “Variables et signauxalèatoires complexes,” Traitement du Signal , vol. 15, pp. 535–544, 1998.
[61] J. L. Lacoume and M. Gaeta, “Complex random variable: A tensorialapproach,” in Proc. IEEE Workshop on Higher-Order Statistics, 1991,vol. 15.
[62] A. Lampe, R. Schober, and W. Gerstacker, “A novel iterative mul-tiuser detector for complex modulation schemes,” IEEE J. Sel. AreasCommun., vol. 20, no. 2, pp. 339–350, 2002.
[63] L. De Lathauwer and B. De Moor, “On the blind separation of non-cir-cular sources,” presented at the Eur.SignalProcess. Conf. (EUSIPCO),Toulouse, France, 2002.
[64] H. Li and T. Adalı, “Optimization in the complex domain for non-linear adaptive filtering,” in Proc. 33rd Asilomar Conf. Signals, Syst.,Comput., Pacific Grove, CA, Nov. 2006, pp. 263–267.
[65] H. Li and T. Adalı, “Complex-valued adaptive signal processing usingnonlinear functions,” J. Adv. Signal Process., p. 9, 2008, Article ID765615.
[66] H.Li andT. Adalı, “Algorithms for complexML ICAand their stabilityanalysis using Wirtinger calculus,” IEEE Trans. Signal Process., vol.58, no. 12, pp. 6156–6167, 2010.
[67] H. Li and T. Adalı, “Application of independent component analysiswith adaptive density model to complex-valued fMRI data,” IEEE Trans. Biomed. Eng., preprint, DOI: 10.1109/TBME.2011.2159841.
[68] H. Li, T Adalı, N. M. Correa, P. A. Rodriguez, and V. D. Calhoun,“Flexiblecomplex ICA of fMRI data,” presented at the IEEE Int. Conf.Acoust., Speech, Signal Process. (IEEE ICASSP), Dallas, TX, Mar.2010.
[69] X.-L. Li and T. Adalı, “Complex independent component analysis by entropy bound minimization,” IEEE Trans. Circuits Syst. I, Fund.Theory Appl., vol. 57, no. 7, pp. 1417–1430, Jul. 2010.
[70] X.-L. Li and T. Adalı, “Independent component analysis by entropy bound minimization,” IEEE Trans. Signal Process., vol. 58, no. 10,
pp. 5151–5164, Oct. 2010.
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 24/25
5124 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011
[71] X.-L. Li and T. Adalı, “Blind separation of noncircular correlatedsources using Gaussian entropy rate,” IEEE Trans. Signal Process.,vol. 59, no. 6, pp. 2969–2975, Jun. 2011.
[72] X.-L. Li, T. Adalı, and M. Anderson, “Noncircular principal com- ponent analysis and its application to model selection,” IEEE Trans.Signal Process., vol. 59, no. 10, pp. 4516–4528, 2011.
[73] X.-L. Li, M. Anderson, and T. Adalı, “Detection of circular and non-circular signals in the presence of circular white Gaussian noise,” pre-sented at the Asilomar Conf. Signals, Syst., Comput., Pacific Grove,CA, Nov. 2010.
[74] O. Macchi , LMS Adaptive Processing With Applications in Transmis- sion. New York: Wiley, 1996.
[75] D. P. Mandic and V. S. L. Goh , Complex Valued Nonlinear Adaptive Filters. New York: Wiley, 2009.
[76] P. Marchand, P. O. Amblard, and J. L. Lacoume, “Statistiques d’ordresupèrieurá deux pour des signaux cyclostationnairesá valeurs com- plexes,” in Proc. GRETSI Conf., 1995, pp. 69–72.
[77] T. McWhorter and P. Schreier, “Widely-linear beamforming,” in Proc.37th Asilomar Conf. Signals, Syst., Comput., 2003, pp. 753–759.
[78] R. Meyer,W. H. Gerstacker, R. Schober, and J. B. Huber, “A single an-tenna interference cancellation algorithm for increased GSM capacity,” IEEE Trans. Wireless Commun., vol. 5, pp. 1616–1621, 2006.
[79] A. Mirbagheri, N. Plataniotis, and S. Pasupathy, “An enhancedwidely linear CDMA receiver with OQPSK modulation,” IEEE Trans.Commun., vol. 54, pp. 261–272, 2006.
[80] C. N. K. Mooers, “A technique for the cross spectrum analysis of pairsof complex-valued time series, with emphasis on properties of polar-ized components and rotational invariants,” Deep-Sea Res., vol.20,pp.1129–1141, 1973.
[81] D. R. Morgan, “Varianceand correlation of square-law detected allpasschannels with bandpass harmonic signals in Gaussian noise,” IEEE Trans. Signal Process., vol. 54, no. 8, pp. 2964–2975, 2006.
[82] D. R. Morgan and C. K. Madsen, “Wide-band system identificationusing multiple tones with allpass filters and square-law detectors,” IEEE Trans. Circuits Syst. I , vol. 53, no. 5, pp. 1151–1165, 2006.
[83] A. Napolitano and M. Tanda, “Doppler-channel blind identificationfor noncircular transmissions in multiple-access systems,” IEEE Trans.Commun., vol. 52, no. 12, pp. 2073–2078, 2004.
[84] F. D. Neeserand J. L. Massey,“Proper complex random processes withapplications to information theory,” IEEE Trans. Inf. Theory, vol. 39, pp. 1293–1302, Jul. 1993.
[85] R. Nilsson, F. Sjoberg, and J. P. LeBlanc, “A rank-reduced LMMSEcanceller for narrowband interference suppressionin OFDM-based sys-tems,” IEEE Trans. Commun., vol.51, no.12, pp.2126–2140, 2003.
[86] M. Novey, “Complex ICA using nonlinear functions,” Ph.D. disserta-tion, Univ. of Maryland Graduate School, Baltimore, MD, 2009.
[87] M. Novey and T. Adalı, “Complex ICA by negentropy maximization,” IEEE Trans. Neural Netw., vol. 19, no. 4, pp. 596–609, Apr. 2008.
[88] M. Novey and T. Adalı, “On extending the complex fastICA algorithmto noncircular sources,” IEEE Trans. Signal Process., vol. 56, no. 5, pp. 2148–2154, Apr. 2008.
[89] M. Novey, T. Adalı, and A. Roy, “Circularity andGaussianity detectionusing the complex generalized Gaussian distribution,” IEEE Signal Proc. Lett., vol. 16, no. 1, pp. 993–996, Nov. 2009.
[90] M. Novey, T. Adalı, and A. Roy,“A complexgeneralizedGaussian dis-tribution—characterization, generation, and estimation,” IEEE Trans.Signal Process., vol. 58, no. 3, pp. 1427–1433, Mar. 2010.
[91] S. Olhede, “On probability density functions for complex variables,”
IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 1212–1217, Mar. 2006.[92] E. Ollila, “On the circularity of a complex random variable,” IEEE
Signal Process. Lett., vol. 15, pp. 841–844, 2008.[93] E. Ollila, J. Eriksson, and V. Koivunen, “Complex elliptically
symmetric random variables—generation, characterization and circu-larity tests,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 58–69,2011.
[94] E. Ollila and V. Koivunen, “Generalized complex elliptical distribu-tions,” in Proc. 3rd Sensor Array Multichannel Signal Process. Work- shop, Sitges, Spain, Jul. 2004, pp. 460–464.
[95] E. Ollila and V. Koivunen, “Adjusting the generalized likelihood ratiotest of circularity robust to non-normality,” presented at the IEEE Int.Workshop Signal Process., Perugia, Italy, Jun. 2009.
[96] E. Ollila and V. Koivunen, “Complex ICA using generalized uncorre-lating transform,” Signal Process., vol. 89, pp. 365–377, Apr. 2009.
[97] E. Ollila, V. Koivunen, and J. Eriksson, “On the cramer-Rao bound for
the constrained and unconstrained complex parameters,” in Proc. IEEE Sensor Array Multichannel Signal Process. Workshop (SAM), Darm-stadt, Germany, Jul. 2008, pp. 414–418.
[98] K. B. Petersen and M. S. Pedersen, The Matrix Cookbook, Ver-sion 20081110, 2008 [Online]. Available: http://www2.imm.dtu.dk/ pubdb/p.php?3274.
[99] D. Pham and P. Garat, “Blind separation of mixtures of independentsources through a quasi maximum likelihood approach,” IEEE Trans.Signal Process., vol. 45, no. 7, pp. 1712–1725, 1997.
[100] B. Picinbono, “On circularity,” IEEE Trans. Signal Process., vol. 42,no. 12, pp. 3473–3482, Dec. 1994.
[101] B. Picinbono, “Second-order complex random vectors and normaldistributions,” IEEE Trans. Signal Process., vol. 44, no. 10, pp.2637–2640, Oct. 1996.
[102] B. Picinbono and P. Bondon, “Second-order statistics of complex sig-nals,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 411–419, Feb.1997.
[103] B. Picinbono and P. Chevalier,“Widely linearestimationwith complexdata,” IEEE Trans. Signal Process., vol. 43,pp. 2030–2033, Aug.1995.
[104] R. Remmert , Theory of Complex Functions. Harrisonburg, VA:Springer-Verlag, 1991.
[105] J. Rissanen, “Modeling by the shortest data description,” Automatica,vol. 14, pp. 465–471, 1978.
[106] P. A. Rodriguez, V. D. Calhoun, and T. Adalı, “De-noising, phaseambiguity correction and visualization techniques for complex-valuedICA of group fMRI data,” Pattern Recognit., to be published.
[107] F. Römer and M. Haardt, “Multidimensional unitary tensor-ESPRITfor non-circular sources,” in Proc. Int. Conf. Acoust., Speech, Signal
Process. (IEEE ICASSP), 2009, pp. 3577–3580.[108] P. Rubin-Delanchy and A. T. Walden, “Simulation of improper com-
plex-valued sequences,” IEEE Trans. Signal Process., vol. 55, no. 11, pp. 5517–5521, 2007.
[109] P. Ruiz and J. L. Lacoume, “Extraction of independent sources fromcorrelated inputs: A solution based on cumulants,” in Proc. Workshop Higher-Order Spectral Anal., Jun. 1989, pp. 146–151.
[110] P. Rykaczewski, M. Valkama, and M. Renfors, “On the connectionof I/Q imbalance and channel equalization in direct-conversion trans-ceivers,” IEEE Trans. Veh. Tech., vol. 57, no. 3, pp. 1630–1636, 2008.
[111] L. L. Scharf, P. J. Schreier, and A. Hanssen, “The Hilbert space geom-etry of the Rihaczek distribution for stochastic analytic signals,” IEEE Signal Process. Lett., vol. 12, no. 4, pp. 297–300, 2005.
[112] P. J. Schreier, “Bounds on the degree of improperiety of complexrandom vectors,” IEEE Signal Process. Lett., vol. 15, pp. 190–193,2008.
[113] P. J. Schreier, T. Adalı, and L. L. Scharf, “On ICA of improper andnoncircular sources,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), Taipei, Taiwan, Apr. 2009, pp. 3561–3564.
[114] P. J. Schreier and L. L. Scharf, “Second-order analysis of improper complex random vectors and processes,” IEEE Trans. Signal Process.,vol. 51, no. 3, pp. 714–725, Mar. 2003.
[115] P. J. Schreier and L. L. Scharf , Statistical Signal Processing of Com- plex-ValuedData: The Theory of Improper and Noncircular Signals.Cambridge, U.K.: Cambridge Univ. Press, 2010.
[116] P. J. Schreier, L. L. Scharf, and A. Hanssen, “A generalized likelihoodratio test for impropriety of complex signals,” IEEE Signal Process. Lett., vol. 13, no. 7, pp. 433–436, Jul. 2006.
[117] P. J. Schreier, L. L. Scharf, and C. T. Mullis, “Detection and estimationof improper complex random signals,” IEEE Trans. Inf. Theory, vol.51, no. 1, pp. 306–312, Jan. 2005.
[118] G. E. Schwarz, “Estimating thedimensionsof a model,” Ann. Stat., vol.6, no. 2, pp. 461–464, 1978.
[119] G. Tauböck, “Complex noise analysis of DMT,” IEEE Trans. Signal Process., vol. 55, no. 12, pp. 5739–5754, 2007.
[120] C. C. Took and D. P. Mandic, “Quaternion-valued stochastic gradient- based adaptive iir filtering,” IEEE Trans. Signal Process., vol. 58, pp.3895–3901, Jul. 2010.
[121] C. C. Took and D. P. Mandic, “Augmented second-order statistics of quaternion random signals,” Signal Process., vol. 91, pp. 214–224,Feb. 2011.
[122] T. Trainini, X.-L. Li, E. Moreau, and T. Adalı, “A relative gradient al-gorithmfor joint decompositionsof complexmatrices,” presentedat theEur.Signal Process. Conf. (EUSIPCO), Aalborg, Denmark,Aug. 2010.
[123] H. Trigui and D. T. M. Slock, “Performance bounds for cochannelinterference cancellation within the current GSM standard,” Signal Process., vol. 80, pp. 1335–1346, 2000.
[124] N. N. Vakhania, “Random vectors with values in quaternion Hilbertspaces,” Theory Probability Appl., vol. 43, no. 1, pp. 99–115, 1999.
[125] N. N. Vakhania and N. P. Kandelaki, “Random vectors with values incomplex Hilbert spaces,” Theory Probability Appl., vol. 41, no. 1, pp.116–131, Feb. 1996.
8/2/2019 Complex Processing
http://slidepdf.com/reader/full/complex-processing 25/25
ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5125
[126] A. van den Bos, “Complex gradient and Hessian,” Proc. Inst. Electr. Eng.—Vision, Image, Signal Process., vol. 141, no. 6, pp. 380–382,Dec. 1994.
[127] A. van den Bos, “A Cramér-Raolower bound for complexparameters,” IEEE Trans. Signal Process., vol. 42, no. 10, pp. 2859–2859, 1994.
[128] A. van den Bos, “Estimation of complex parameters,” in Proc. 10th IFAC Symp., Jul. 1994, vol. 3, pp. 495–499.
[129] A. van den Bos, “The multivariate complex normal distribution—Ageneralization,” IEEE Trans. Inf. Theory, vol. 41, pp. 537–539, 1995.
[130] J. Via, D. Ramirez, I. Santamaria, and L. Vielva, “Properness andwidely linear processing of quaternion random vectors,” IEEE Trans. Info. Theory, vol. 56, no. 7, pp. 3502–3515, Jul. 2010.
[131] J. Via, D. Ramirez, I. Santamaria, and L. Vielva, “Widely and semi-widely linear processing of quaternion vectors,” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), Dallas, TX,Mar. 2010, pp. 3946–3949.
[132] P. Wahlberg andP. J. Schreier, “Spectral relations formultidimensionalcomplex improper stationary and (almost) cyclostationary processes,” IEEE Trans. Inf. Theory, vol. 54, no. 4, pp. 1670–1682, 2008.
[133] A. T. Walden and P. Rubin-Delanchy, “On testing for impropriety of complex-valued Gaussian vectors,” IEEE Trans. Signal Process., vol.57, no. 3, pp. 825–834, Mar. 2009.
[134] M. Wax and T. Kailath, “Detection of signals by information theoreticcriteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. 33, no. 2, pp. 387–392, Apr. 1985.
[135] B. Widrow, J. Cool, and M. Ball, “Thecomplex LMSalgorithm,” Proc. IEEE , vol. 63, pp. 719–720, 1975.
[136] B. Widrow and M. E. Hopf, Jr., “Adaptive switching circuits,” Conf. IRE WESCON , vol. 4, pp. 96–104, 1960.
[137] W. Wirtinger, “Zur formalen Theorie der Funktionen von mehr kom- plexen Veränderlichen,” Math. Ann., vol. 97, pp. 357–375, 1927.
[138] M. Witzke, “Linear and widely linear filtering applied to iterative de-tection of generalized MIMO signals,” Ann. Telecommun., vol. 60, pp.147–168, 2005.
[139] R. A. Wooding, “The multivariate distribution of complex normal vari-ables,” Biometrika, vol. 43, pp. 212–215, 1956.
[140] Y. C. Yoon and H. Leib, “Maximizing SNR in improper complex noiseand applications to CDMA,” IEEE Commun. Lett., vol. 1, no. 1, pp.5–8, 1997.
[141] V. Zarzosoand P. Comon, “Robustindependent component analysis byiterative maximization of the kurtosis contrast with algebraic optimalstep size,” IEEE Trans. Neural Netw., vol. 21, no. 2, pp. 248–261, Feb.2010.
[142] Y. Zou,M. Valkama,and M. Renfors, “Digital compensation of I/Qim- balance effects in space-time coded transmit diversity systems,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2496–2508, 2008.
Tülay Adalı (S’89–M’93–SM’98–F’09) receivedthe Ph.D. degree in electrical engineering from North Carolina State University, Raleigh, in 1992.
She joined the faculty at the University of Mary-land Baltimore County (UMBC), Baltimore, in1992. She is currently a Professor in the Departmentof Computer Science and Electrical Engineering atUMBC. Her research interests are in the areas of
statistical signal processing, machine learning for signal processing, and biomedical data analysis.Prof. Adalı assisted in theorganization of a number
of international conferences and workshops, including the IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP), the IEEEInternational Workshopon Neural Networks for Signal Processing (NNSP), andthe IEEE International Workshop on Machine Learning for Signal Processing(MLSP). She was the General Co-Chair of the NNSP workshops from 2001to 2003; the Technical Chair of the MLSP workshops from 2004 to 2008; theProgram Co-Chair of the MLSP workshops in 2008 and 2009, the Interna-tional Conference on Independent Component Analysis and Source Separationin 2009; the Publicity Chair of the ICASSP in 2000 and 2005, respectively;and Publications Co-Chair of the ICASSP in 2008. She chaired the IEEE SPSMachine Learning for Signal Processing Technical Committee from 2003 to2005; was Member of the SPS Conference Board from 1998 to 2006; Member of the Bio Imaging and Signal Processing Technical Committee from 2004 to
2007; and AssociateEditor ofthe IEEE TRANSACTIONS ON SIGNAL PROCESSING
from 2003 to 2006 and the Elsevier Signal Processing Journal from 2007 to2010. She is currently Chair of the MLSP Technical Committee, serving on theSignal Processing Theory and Methods Technical Committee; Associate Editor of the IEEE TRANSACTIONS ON BIOMEDICAL E NGINEERING and JOURNAL OF
SIGNAL PROCESSING SYSTEMS FOR SIGNAL, IMAGE, AND VIDEO TECHNOLOGY;and Senior Editorial Board member of the IEEE JOURNAL OF SELECTED AREAS
IN SIGNAL PROCESSING. She is an IEEE SPS Distinguished Lecturer for 2012and 2013, the recipient of a 2010 IEEE Signal Processing Society Best Paper Award, and the past recipient of an NSF CAREER Award. She is a Fellow of the AIMBE.
Peter J. Schreier (S’03–M’04–SM’09) was bornin Munich, Germany, in 1975. He received theMaster of Science degree from the University of Notre Dame, Notre Dame, IN, in 1999 and the Ph.D.degree from the University of Colorado at Boulder,in 2003, both in electrical engineering.
From 2004 until January 2011, he was a facultymember in the School of Electrical Engineering andComputer Science at the University of Newcastle,Australia. Since February 2011, he has been ChairedProfessor of signal and system theory in the Faculty
of Electrical Engineering, Computer Science, and Mathematics at UniversitätPaderborn, Germany. This chair receives financial support from the AlfriedKrupp von Bohlen und Halbach foundation.
Prof. Schreier has received fellowships from the State of Bavaria, the Studi-enstiftung des deutschen Volkes (German National Academic Foundation), andthe Deutsche Forschungsgemeinschaft (German Research Foundation). He cur-rently serves as Area Editor and Associate Editor of the IEEE TRANSACTIONS
ON SIGNAL PROCESSING. He is a member of the IEEE Technical Committee onMachine Learning for Signal Processing.
Louis L. Scharf (S’67–M’69–SM’77–F’86–LF’07)received the Ph.D. degree from the University of Washington, Seattle.
From 1971 to 1982, he served as Professor of
Electrical Engineering and Statistics with ColoradoState University (CSU), Ft. Collins. From 1982 to1985, he was Professor and Chairman of Electricaland Computer Engineering, University of RhodeIsland, Kingston. From 1985 to 2000, he wasProfessor of Electrical and Computer Engineering,University of Colorado, Boulder. In January 2001,
he rejoined CSU as Professor of Electrical and Computer Engineering andStatistics. He has held several visiting positions here and abroad, includingthe Ecole Superieure d’Electricité, Gif-sur-Yvette, France; Ecole NationaleSuperieure des Télécommunications, Paris, France; EURECOM, Nice, Italy;the University of La Plata, La Plata, Argentina; Duke University, Durham, NC; the University of Wisconsin, Madison; and the University of Tromsø,Tromsø, Norway. His interests are in statistical signal processing, as it appliesto adaptive radar, sonar, and wireless communication. His most importantcontributed to date are to invariance theories for detection and estimation;
matched and adaptive subspace detectors and estimators for radar, sonar, anddata communication; and canonical decompositions for reduced dimensionalfiltering and quantizing. His current interests are in rapidly adaptive receiver design for space-time and frequency-time signal processing in the radar/sonar and wireless communication channels.
Prof. Scharf was Technical Program Chair for the 1980 IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP), Denver,CO; Tutorials Chair for ICASSP 2001, Salt Lake City, UT; and TechnicalProgram Chair for the Asilomar Conference on Signals, Systems, and Com- puters 2002. He is past-Chair of the Fellow Committee for the IEEE SignalProcessing Society and serves on the Technical Committee for Sensor Arraysand Multichannel Signal Processing. He has received numerous awards for hisresearch contributions to statistical signal processing, including a College Re-search Award, an IEEE Distinguished Lectureship, an IEEE Third MillenniumMedal, and the Technical Achievement and Society Awards from the IEEESignal Processing Society.