complex processing

8/2/2019 Complex Processing

http://slidepdf.com/reader/full/complex-processing 1/25

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011 5101

Complex-Valued Signal Processing: The Proper Way

to Deal With ImproprietyTülay Adalı , Fellow, IEEE , Peter J. Schreier , Senior Member, IEEE , and Louis L. Scharf , Life Fellow, IEEE

Abstract— Complex-valued signals occur in many areas of science and engineering and are thus of fundamental interest.In the past, it has often been assumed, usually implicitly, that

complex random signals are proper or circular. A proper complex

random variable is uncorrelated with its complex conjugate,and a circular complex random variable has a probability dis-tribution that is invariant under rotation in the complex plane.While these assumptions are convenient because they simplify

computations, there are many cases where proper and circularrandom signals are very poor models of the underlying physics.

When taking impropriety and noncircularity into account, theright type of processing can provide significant performance

gains. There are two key ingredients in the statistical signal pro-cessing of complex-valued data: 1) utilizing the complete statistical

characterization of complex-valued random signals; and 2) theoptimization of real-valued cost functions with respect to complex

parameters. In this overview article, we review the necessary tools,among which are widely linear transformations, augmented sta-tistical descriptions, and Wirtinger calculus. We also present someselected recent developments in the field of complex-valued signal

processing, addressing the topics of model selection, filtering, andsource separation.

Index Terms— CR calculus, estimation, improper, independent

component analysis, model selection, noncircular, widely linear,Wirtinger calculus.

I. I NTRODUCTION

C OMPLEX-VALUED signals arise in many areas of sci-

ence and engineering, such as communications, electro-

magnetics, optics, and acoustics, and are thus of fundamental

Manuscriptreceived November 09,2010; revised April 09,2011 and June 30,2011; accepted July 01, 2011. Date of publication July 25, 2011; date of currentversion October 12, 2011. The associate editor coordinating the review of thismanuscript and approving it for publication was Prof. Jean Pierre Delmas. Thework of T. Adalı was supported by the NSF Grants NSF-CCF 0635129 and NSF-IIS 0612076 ; the work of P. J. Schreier was supported by the AustralianResearch Council (ARC) under Discovery Project Grant DP0986391; and thework of L. L. Scharf was supported by NSF Grant NSF-CCF 1018472 andAFOSR Grant FA 9550-10-1-0241. The authors gratefully acknowledge kind permission from Wiley Interscience to use some material from Chapter 1 of the book Adaptive Signal Processing: Next Generation Solutions, and from Cam- bridge University Press to use some material from the book Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals.

T. Adalı is with the Department of Computer Science and Electrical Engi-neering, Universityof Maryland, BaltimoreCounty, Baltimore,MD 21250USA(e-mail: [email protected]).

P. J. Schreier is with the Signal and System Theory Group, Universität Pader- born, 33098 Paderborn, Germany (e-mail: [email protected]).

L. L. Scharf is with the Department of Electrical and Computer Engineering,Colorado State University Ft. Collins, CO 80523 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSP.2011.2162954

interest. In the past, it has often been assumed—usually im-

plicitly—that complex random signals are proper or circular . A

proper complex random variable is uncorrelated with its com-

plex conjugate, and a circular complex random variable has a

probability distribution that is invariant under rotation in the

complex plane. These assumptions are convenient because they

simplify computations and, in many respects, make complex

random signals look and behave like real random signals. While

these assumptions can often be justified, there are also many

cases where proper and circular random signals are very poor

models of the underlying physics. This fact has been known andappreciated by oceanographers since the early 1970s [80], but

it has only more recently started to influence the thinking of the

signal processing community.

In the last two decades, there have been important advances

in this area that show impropriety and noncircularity as an im-

portant characteristic of many signals of practical interest. When

taking impropriety andnoncircularity intoaccount,the right type

of processing can provide significant performance gains. For in-

stance, in mobile multiuser communications, it canenablean im-

provedtradeoff between spectralef ficiencyand power consump-

tion. Important examples of digitalmodulation schemes thatpro-

duce improper complex baseband signals are: binary phase shiftkeying (BPSK), pulse amplitude modulation (PAM), Gaussian

minimum shift keying (GMSK), offset quaternary phase shift

keying (OQPSK), and baseband (but not passband) orthogonal

frequency division multiplexing (OFDM), which is commonly

called discrete multitone (DMT) [119].A small sample of papers

exploiting impropriety of these signals is [22], [31], [45], [46],

[57], [62], [78], [79], [83], [85], [123], [138], [140], and [23].

Improper baseband communication signals can also arise due to

imbalance between their in-phase and quadrature (I/Q) compo-

nents. I/Q imbalance degrades the signal-to-noise-ratio (SNR)

and thus bit error rate performance. Some papersproposing ways

of compensating I/Q imbalance in different types of communica-

tionsystems include[15], [110],and [142]. Techniques for wide-

band system identification when the system (e.g., a wide-band

wireless communication channel) is not rotationally invariant

are presented in [81] and [82].

In array processing, a typical goal is to estimate the direction

of arrival (DOA) of one or more signals-of-interest impinging

on a sensor array. If the signals-of-interest or the interference are

improper (as they would be if they originated, e.g., from a BPSK

transmitter), this can be exploited to achieve higher DOA reso-

lution or higher signal-to-interference-plus-noise ratio (SINR).

Some papers addressing the adaptation of array processing al-

gorithms to improper signals include [25], [26], [36], [49], [77],

and [107].

1053-587X/$26.00 © 2011 IEEE



5102 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 11, NOVEMBER 2011

Data-driven methods for signal processing, in particular la-

tent variable analysis (LVA), is another area where exploiting

impropriety and noncircularity has led to important advances.

In LVA, much interest has centered around independent com-

ponent analysis (ICA), where the multivariate data are decom-

posed into additive components that are as independent as pos-

sible. It has been shown that, under certain conditions, ICA

can be achieved by exploiting impropriety [41], [63]. Signif-

icant performance gains are noted when algorithms explicitly

take the noncircular nature of the data into account [7], [69],

[87], [88], [96], [141]. Among the many applications of ICA,

medical data analysis and communications have been two of

the most active. In [141], noncircularity is exploited for feature

extraction in electrocardiograms (ECGs). In [67], [68], noncir-

cularity is used in the analysis of functional magnetic resonance

imaging (fMRI) data, leading to improved estimation of neural

activity.

There are two key ingredients in the statistical signal pro-

cessing of complex-valued data: 1) utilizing the complete

statistical characterization of complex-valued random signals;and 2) the optimization of real-valued cost functions with

respect to complex parameters. With respect to 1), Brown

and Crane [20] in 1969 were among the first to consider the

complementary correlation, which is the correlation of a com-

plex signal with its complex conjugate. They also introduced

linear-conjugate linear (or widely linear) transformations that

allow access to the information contained in this correlation.

At first, the impact of this paper was limited. It took until the

1990s before a string of papers showed renewed interest in this

topic. Many were from the French school of signal processing.

Among the important early contributions were the fundamental

studies by Comon, Duvaut, Amblard, and Lacoume [10],[32], [39], [60], the derivation of the widely linear minimum

mean-square error filter by Picinbono and Chevalier [103],

the work on higher order statistics by Lacoume, Gaeta, and

Amblard [11], [12], [43], [61], [76], [109], the introduction of

the noncircular Gaussian distribution by van den Bos [129],

and the derivation of the optimum Wiener filter for cyclosta-

tionary complex processes by Gardner [44]. We should also

mention the more mathematically oriented work by Vakhania

and Kandelaki [125] on random vectors with values in complex

Hilbert spaces.

The results concerning the optimization of real-valued func-

tions with complex arguments are much older still, although

they have gone largely unnoticed by the engineering commu-

nity. It was already in 1927 that Wirtinger proposed generalized

derivatives of nonholomorphic (including real-valued) func-

tions with respect to complex arguments [137]. However,

differentiating such functions separately with respect to real

and imaginary parts—and using varying definitions for the

derivatives—has been the common practice. This approach is

tedious, and in an attempt to make the terms and analysis more

manageable, simplifying assumptions are usually introduced.

A common such assumption is circularity. Wirtinger calculus

(also called the -calculus [59]) presents an elegant alter-

native, which allows keeping all computations in the complex

domain. In the engineering community, Wirtinger calculus wasrediscovered by Brandwood in 1983 [19], and then further

developed by van den Bos, albeit for gradient and Hessian

formulations in [126], [128]. The extensions of gradient

and Hessian expressions to are more recent [59], [64], [65].

Our goal in this article is to review all these fundamental re-

sults, and to present some selected recent developments in the

field of complex-valued signal processing. Naturally, in a field

as wide as this, one faces the dif ficult decision of what topics to

cover. We have decided to focus on three key problems, namely

model selection, filtering, and source separation. Some of these

results have been drawn from the first author’s chapter in the

edited book [5], and the second and third authors’ book [115],

which provides a comprehensive review of this field. We thank

Wiley Interscience and Cambridge University Press for their

kind permission to use some material from these books. An-

other recent book on this topic is [75], which has a focus on

neural networks.

We also provide some critical perspectives in this paper. As is

often the case when new tools are introduced, there may also be

someoverexcitement and abuse. We comment on several points:

There are certain instances where the use of improper/noncir-cular models is simply unnecessary and will not improve per-

formance. The fact that a signal is improper or noncircular does

not guarantee that this can be exploited to improve a metric

such as mean-square error. Moreover, employing improper/non-

circular models can even be disadvantageous because they are

more complex than proper/circular models. Performance should

be defined in a broader sense that includes not only optimality

with respect to a selected metric, but also other properties of

the algorithm, such as robustness and convergence speed. So an

improvement in mean-square error performance may come at

the price of less robust and more slowly converging iterative al-

gorithms. Therefore, the question of whether to use proper/cir-cular or improper/noncircular models requires careful deliber-

ation. We show that circular models may be preferable when

the degree of impropriety/noncircularity is small, the number of

samples is small, or the signal-to-noise ratio is low. We provide

a number of concrete examples that clarify these points.

In terms of terminology, a general question that has di-

vided researchers is how to extend definitions from the real

to the complex case. For example, since the complementary

covariance has traditionally been ignored, definitions of uncor-

relatedness and orthogonality are given new names when the

complementary covariance is taken into account. This practice

leads to some counterintuitive results such as “two uncorrelated

Gaussian random vectors need not be independent” (because

their complementary covariance matrix need not be diagonal).

We would like to avoid such displeasing results, and therefore

always adhere to the following general principle: De finitions

and conditions derived for the real and complex domains should

be equivalent . This means that complementary covariances

must be considered if they are nonzero.

The rest of the paper is organized as follows: In the next

section, we provide a review of the tools needed for 1) utilizing

the complete statistical characterization of complex-valued

random signals and 2) the optimization of real-valued cost

functions with respect to complex parameters. These are widely

linear transformations, augmented statistical descriptions, andWirtinger calculus. Section III addresses the question of how



ADALI et al.: COMPLEX-VALUED SIGNAL PROCESSING: THE PROPER WAY TO DEAL WITH IMPROPRIETY 5103

to detect whether a given signal is circular or noncircular,

and more generally, how to detect the number of circular and

noncircular signals in a given signal subspace. Section IV

discusses widely linear estimation, which takes the information

in the complementary covariance into account, and Section V

deals with ICA. We hope that our paper will illuminate the role

impropriety and noncircularity play in signal processing, and

demonstrate when and how they should be taken into account.

II. PRELIMINARIES

A. Wirtinger Calculus

In statistical signal processing, we often deal with a real-

valued cost function, such as a likelihood function or a quadratic

form, which is then either analytically or numerically optimized

with respect to a vector or matrix of parameters. What happens

when the parameters are complex-valued? That is, how do we

differentiate a real-valued function with respect to a complex

argument?

Consider first a complex-valued function

, where . The classical definition of com-

plex differentiability requires that the derivatives defined as the

limit

(1)

are independent ofthe direction in which approaches0 inthe

complex plane. This requires that the Cauchy–Riemann equa-

tions [3], [5]

and (2)

be satisfied. These conditions are necessary for to be

complex-differentiable. If the partial derivatives of

and are continuous, then they are suf ficient as well.

A function that is complex-differentiable on its entire domain

is called holomorphic or analytic. Obviously, since real-valued

cost functions have , the Cauchy–Riemann

conditions do not hold, and hence cost functions are not ana-

lytic. Indeed, the Cauchy–Riemann equations impose a rigid

structure on and and thus . A simple

demonstration of this fact is that either or

alone suf fices to express the derivatives of an analytic function.

The usual approach to overcome this basic limitation is toevaluate separate derivatives with respect to the real and imagi-

nary parts of a nonanalytic function, which may then be stacked

in a vector of twice the original dimension. In the end, this so-

lution is converted back into the complex domain. Needless to

say, this approach is cumbersome. In some cases (for example,

[5]), it even requires additional assumptions (for

the example, analytic ) to make it work.

Wirtinger calculus [137]—also called the calculus

[59]—provides a framework for differentiating nonanalytic

functions. Importantly, it allows performing all the deriva-

tions in the complex domain, in a manner very similar to the

real-valued case. All the evaluations become quite straightfor-

ward making many tools and methods developed for the real

case readily available for the complex case. By keeping the

expressions simple, we also eliminate the need for simplifying,

but often inaccurate, assumptions such as circularity.

Wirtinger calculus only requires that be differentiable

when expressed as a function . Such a func-

tion is called real-differentiable. If and have

continuous partial derivatives with respect to and , is

real-differentiable. We now define the two generalized complex

derivatives

and (3)

which can be formally “derived” by writing and

and then using the chain rule [104]. These general-

ized derivatives can be formally implemented by regarding as

a bivariate function and treating and as indepen-

dent variables. That is, when applying , we take the deriva-

tive with respect to , while formally treating as a constant.

Similarly, yields the derivative with respect to , formally

regarding as a constant. Thus, there is no need to develop new

differentiation rules. This was shown in [19] in 1983 without

a specific reference to Wirtinger’s earlier work [137]. Interest-

ingly, many of the references that refer to [19] and use the gen-

eralized derivatives (3) evaluate them by computing derivatives

with respect to and , instead of and . This leads to un-

necessarily complicated derivations.

The Cauchy–Riemann equations can simply be stated as

. In other words, an analytic function cannot depend

on . If is analytic, then the usual complex derivative in (1)

and in (3) coincide. So Wirtinger calculus contains standard

complex calculus as a special case.

For real-valued , we have , i.e., the deriva-

tive and the conjugate derivative are complex conjugates of eachother. Because they are related through conjugation, we only

need to compute one or the other. As a consequence, a necessary

and suf ficient condition for real-valued to have a stationary

point is . An equivalent necessary and suf ficient condi-

tion is [19].

As an example for the application of Wirtinger derivatives,

consider the real-valued function

. We can evaluate by differentiating separately with re-

spect to and ,

(4)

or we can write the function as and

differentiate by treating as a constant,

(5)

The second approach is clearly simpler. It can be easily shown

that the two expressions, (4) and (5), are equal. However, while

the expression in (4) can easily be derived from (5), it is not

quite as straightforward the other way round. Because is

real-valued, there is no need to compute : it is simply the

conjugate of .

Evaluation of integrals, e.g., for calculating probabilities, is

also commonly encountered. The same observation that allows

the use ofa representation in the form and treating and




as independent variables—when obviously they are not—can

also be made when calculating integrals. When we consider

as a function of real and imaginaryparts, the definition of an inte-

gral is well understoodasthe integral of thefunction in

a region defined on as . The inte-

gral , on the o ther hand, is not meaningful as

we cannot vary thetwo variables and independently, and we

cannot define the region corresponding to in the complex do-

main. However, using Green’s theorem [3] or Stokes’s theorem

[51], [52], we can write the real-valued integral as a contour in-

tegral in the complex domain [91]

(6)

where

Here, we have to assume that is continuous through

the simply connected region , and describes its contour.

By transforming the integral defined in the real domain to a con-tour integral in the complex domain, the formula shows the de-

pendence on the two variables and in a natural manner.

In [91], the application of the integral relationship in (6) is dis-

cussed in detail, along with examples, for evaluating probability

masses when is a probability density function. An-

other example is given in [7], where (6) is used to evaluate the

expectation of the score function with a generalized Gaussian

distribution.

Wirtinger calculus extends straightforwardly to functions

or . There is no need to develop

new differentiation rules for Wirtinger derivatives. All rules for

taking derivatives for real functions remain valid. However, caremust be taken to properly distinguish between the variables

with respect to which differentiation is performed and those

that are formally regarded as constants. So all the expressions

from the real-valued case given, for example, in [98], can be

straightforwardly applied to the complex case. For instance, for

, we obtain

and

There are a number of comprehensive references (e.g., [5], [42],

[59], and [115]) on Wirtinger calculus that deal with the chain

rule for nonanalytic functions, complex gradients and Hessians,

and complex Taylor series expansions.In summary, Wirtinger calculus defines a framework in which

the derivatives of nonanalytic functions can be computed in a

manner very similar to the real-valued or the analytic case. The

framework is general in that it also includes derivatives of an-

alytic functions as a special case. It allows keeping the com-

putations in and eliminates the need to double the dimen-

sionality as in the approach taken by [126]. Most of the tools

developed for real-valued signals can thus be easily extended

to the complex setting; see, e.g., [5], [65], and [66]. In this

overview paper, we use Wirtinger calculus for a straightforward

derivation of the complex least-mean-square (LMS) algorithm

in Section IV-C, and an extension of ICA, based on higher order

statistics, to complex data in Section V-B. Without Wirtinger

calculus, these extensions would be significantly more dif ficult.

B. Widely Linear Transformations, Inner Products, and

Quadratic Forms

Let us now explore the different ways that linear transforma-

tions can be described in the real and complex domains. In order

to do so, we construct three closely related vectors from two real

vectors and . The first is the real composite

-dimensional vector , obtained by stackingon top of . The second is the complex vector ,

and the third is the complex augmented vector ,

obtained by stacking on top of its complex conjugate . The

space of complex augmented vectors, whose bottom entries

are the complex conjugates of the top entries, is denoted by

. Augmented vectors are always underlined. In much of

our discussion, our focus will be on complex-valued quantities,

where we will be using and its augmentation .

The complex augmented vector isrelated to thereal

composite vector as and ,

where the real-to-complex transformation

(7)

is unitary up to a factor of 2, i.e., .

The complex augmented vector is obviously an equivalent

redundant, but convenient, representation of . When the size

of is clear, we may drop the subscript for economy.

If a real linear transformation is applied to

the real composite vector , it yields a real composite

vector ,

where . The augmented complex version of is

(8)

with . The matrix is called an

augmented matrix because it satisfies a particular block pattern,

where the SE block is the conjugate of the NW block, and the

SW block is the conjugate of the NE block:

(9)

Hence, is an augmented description of the widely linear [103]

or linear-conjugate-linear transformation

Even though the representation in (9) contains some redundancy

as the northern blocks determine the southern blocks, it proves

to be a powerful tool. For instance, it enableseasy concatenation

of widely linear transformations.

Obviously, the set of complex linear transformations,

with , is a subset of the set of widely linear

transformations. A complex linear transformation (sometimes




called strictly linear for emphasis) has the equivalent real

representation

(10)

To summarize, linear transformations on are linear on

only if they have the particular structure (10). Otherwise,

the equivalent operation on is widely linear. For analysis, it

is often preferable to represent -linear operations as -widely

linear operations. However, from a hardware implementation

point of view, -linear transformations are usually preferable

over -widely linear transformations because the former re-

quire fewer real operations (additions and multiplications) than

the latter.

Next, we look at the different representations of inner prod-

ucts and quadraticforms. Consider the two -dimensional real

composite vectors and , the

corresponding -dimensional complex vectors

and , and their complex augmented descriptions

and . We may now relate the inner prod-ucts defined on , , and as

Thus, the usual inner product defined on equals

(up to a factor of ) the inner product defined on ,

and also the real part of the usual inner product defined on

. Both the inner product on and the inner product on

are useful.

Another common real-valued expression is the quadratic

form , which may be written as a (real-valued) widely

quadratic form in :

The augmented matrix and the real matrix are connected

as before in (9). Thus, we obtain

C. Statistics of Complex-Valued Random Variables and Vectors

We define a complex random variable as ,

where and are a pair of real random variables. Simi-

larly, an -dimensional complex random vector is defined

as , where and are a pair of -dimen-sional real random vectors. Note that we do not differentiate

in notation between a random vector and its realization, as the

meaning should be clear from the context. The probability dis-

tribution (density) of a complex random vector is interpreted as

the -dimensional joint distribution (density) of its real and

imaginary parts. If the probability density function (pdf) exists,

we write this as

If there is no risk of confusion, the subscripts may also be

dropped. For a function whose domain includes

the range of , the expectation operator is defi

ned as

(11)

This integral can also be evaluated using contour integrals as in

(6) writing the function as and the pdf as .

Unless otherwise stated, we assume that all random vectors

have zero mean. In order to characterize the second-order statis-

tical properties of , we consider the real compositerandom vector . Its covariance matrix is

with , , and

. The augmented covariance matrix of is [101],

[114], [129]

(12)

The NW block of the augmented covariance matrix is the usual

(Hermitian and positive semi-definite) covariance matrix

(13)

and the NE block is the complementary covariance matrix

(14)

which uses a regular transpose rather than a Hermitian (conju-

gate) transpose. Other names for include pseudo-covari-

ance matrix [84], conjugate covariance matrix [44], or relation

matrix [102]. It is important to note that, in general, both

and are required for a complete second-order characteri-

zation of . In the important special case where the complemen-

tary covariance matrix vanishes, , is called proper ,

otherwise improper [84].

The conditions for propriety on the covariance and cross-co-

variance of real and imaginary parts and are

and . When is scalar,

then is necessary for propriety. If is proper, its

Hermitian covariance matrix is

and its augmented covariance matrix is block-diagonal. If

complex is proper and scalar, then .

It is easy to see that propriety is preserved by strictly linear

transformations, which are represented by block-diagonal aug-

mented matrices.

If is nonsingular, the following three conditions are

necessary and suf ficient for and to be covariance

and complementary covariance matrices of a complex random

vector [101]:

1) the covariance matrix is Hermitian and positive defi-

nite;

2) the complementary covariance matrix is symmetric,

;




3) the Schur complement of within the augmented co-

variance matrix, , is positive semi-

definite.

Circularity: It is also possible to define a stronger version of

propriety in terms of the probability distribution of a random

vector. A vector is called circular if its probability distribution

is rotationally invariant, i .e., and have the same

probability distribution for any given real . Circularity does

not imply any condition on the standard covariance matrix

because

(15)

On the other hand,

(16)

is true for arbitrary if and only if . Because the

Gaussian distribution is completely determined by second-order

statistics, a complex Gaussian random vector is circular if and

only if it is zero-mean and proper [48].Propriety requires that second-order moments be rotationally

invariant, whereas circularity requires that the distribution, and

thus all moments (if they exist), be rotationally invariant. There-

fore, circularity implies zero mean and propriety, but not vice

versa, and impropriety implies noncircularity, but not vice versa.

By extending the reasoning of (15) and (16) to higher order mo-

ments, we see that if is circular, a th order moment can be

nonzero only if it has the same number of conjugated and non-

conjugated terms [100]. In particular, all odd-order moments

must be zero. This holds for arbitrary order .

We call a vector th-order circular [100] or th-order

proper if the only nonzero moments up to order have the

same number of conjugated and nonconjugated terms. In particular, all odd moments up to order must be zero. There-

fore, for zero-mean random vectors, the terms proper and

second-order circular are equivalent. We note that, while this

terminology is most common, there is not uniform agreement

in the literature. Some researchers use the terms “proper” and

“circular” interchangeably. More detailed discussions of higher

order statistics, Taylor series expansions, and characteristic

functions for improper random vectors are provided in [11],

[42], and [115].

D. Statistics of Complex-Valued Random Processes

In this section, we provide a very brief introduction tocomplex discrete-time random processes. Complex random

processes have been studied by [12], [33], [84], [100], [102],

[111], and [132], among others. For a comprehensive review,

we refer the reader to [115].

The covariance function of a complex-valued discrete-time

random process is defined as

where the first term is the correlation function. To completely

describe the second-order statistics, we also define the comple-

mentary covariance function (also called the pseudo-covariance

or relation function) as

If the complementary covariance function vanishes for all pairs

, then is called proper , otherwise improper . If and

only if is zero-mean and proper is it second-order circular.

The definitions in the continuous-time case are completely

analogous.

A random signal is stationary if all its statistical prop-

erties are invariant to any given time-shift (translations of the

origin), or equivalently, if the family of distributions that de-

scribe the random process as a collection of random variables

is shift-invariant. To define wide-sense stationarity (WSS), we

therefore consider the complete second-order statistical char-

acterization including the complementary covariance function.

We call a complex random process WSS if and only if

, , and are all

independent of .

Since the complementary covariance function is commonly

ignored, the “traditional” definition of WSS for complex

processes only considers the covariance function, allowing a

shift-dependent complementary covariance function. Some re-

searchers then introduce a different term (such as second-order stationary [100]) to refer to the condition where both the

covariance and complementary covariance function are

shift-invariant.

Let be a second-order zero-mean WSS process, and let

denote the corresponding augmented

process. We define the augmented power spectral density matrix

of as the discrete-time Fourier transform of the covariance

function of , which is given by

In this equation, is the power spectral den-

sity and is called the complementary power

spectral density. When the complementary power spectral den-

sity vanishes for all , , the process is proper.

The augmented power spectral density matrix is positive

semi-definite. As a consequence, there exists a WSS random

process with power spectral density and comple-

mentary power spectral density if and only if [102]:

1) ;

2) (but is generally complex);

3) .

The third condition allows us to establish the classical re-

sult that analytic WSS signals without a DC componentmust be proper [100]. If is analytic, then for

. If it doesnot have a DC component, then

for . Therefore, implies

. There are further connections between stationarity

and circularity, which are explored in detail in [100]. An al-

gorithm for the simulation of improper WSS processes having

specified covariance and complementary covariance functions

is given in [108].

E. Gaussian Random Variables and Vectors

Next, we look at the ever-important Gaussian distribution. In

order to derive the general complex multivariate Gaussian pdf

(proper/circular or improper/noncircular) for zero-mean




, we begin with the Gaussian pdf of the composite

random vector of real and imaginary parts :

(17)

We first consider the nondegenerate case where all covariance

matrices are invertible. Using , ,and , where was defined in (7), we

obtain the pdf of complex [101], [129]:

(18)

This pdf depends algebraically on , i.e., and , but is

interpreted as the joint pdf of and , and can be used for

proper/circular or improper/noncircular . In the past, the term

“complex Gaussian distribution” often implicitly assumed cir-

cularity. Therefore, some researchers call a noncircular complex

Gaussian random vector “generalized complex Gaussian.” The

simplification that occurs when is obvious and leadsto the pdf of a complex proper and circular Gaussian random

vector [47], [139]:

A generalization of the Gaussian distribution is the family of

elliptical distributions. A study of complex elliptical distribu-

tions has been provided by [94] and in [115, sec. 2.3.4].

The scalar complex Gaussian: The case of a scalar Gaussian

provides some interesting insights [92], [115].

The real bivariate Gaussian vector can be characterized

by (the variance of ), (thevariance of ), and (the correlation co-

ef ficient between and ). From (18), the pdf of the complex

Gaussian variable can be expressed as

(19)

where the complex correlation coef ficient between and

is defined as

So there are two equivalent parametrizations of a complex

Gaussian : We can use the three real parameters , ,

and or the three real parameters , , and . We

can obtain the second set of parameters from the first set as

(20)

(21)

and vice versa as

(22)

(23)

(24)

The complex correlation coef ficient satisfies

. If , then with

probability 1. Equivalent conditions for are ,

or , or . The first of these conditions

makes the complex variable purely imaginary and the secondmakes it real. The third condition means

and . All these cases with are

called maximally noncircular because the support of the pdf

for the complex random variable degenerates into a line in

the complex plane.

For , there are basically four different cases:

1) If and have identical variances,

, and are independent, , then is proper and

circular, i.e., , and its pdf takes on the simple and

much better-known form

(25)

2) If and have different variances, , but

and are still independent, , then is real ,

, and is noncircular.

3) If and have identical variances,

, but and are correlated, , then is

purely imaginary, , and is noncircular.

4) If and have different variances, , and

are correlated, . Then is generally complex.

With and , we see that the pdf in

(19) is constant on the level curve .

This contour is an ellipse, and is maximum whenis minimum. This establishes that the ellipse orientation (the

angle between the real axis and the major ellipse axis) is ,

which is half the angle of the complex correlation coef ficient

. It can be shown (see [92]) that the degree of non-

circularity is the square of the ellipse eccentricity.

Fig. 1 shows contours of constant probability density for

cases 1–4 listed above. In Fig. 1(a), we see the circular case

with , which exhibits circular contour lines. All remaining

plots are noncircular, with elliptical contour lines. We can make

two observations: First, increasing the degree of noncircularity

of the signal by increasing leads to ellipses with greater

eccentricity. Secondly, the angle of the ellipse orientation ishalf the angle of , as shown above. Fig. 1(b) shows case 2:

and have different variance but are still independent. In this

situation, the ellipse orientation is either 0 or 90 , depending

on whether or has greater variance. Fig. 1(c) shows case

3: and have the same variance but are now correlated.

In this situation, the ellipse orientation is either 45 or 135 .

The general case 4 is depicted in Fig. 1(d). Now, the ellipse

can have an arbitrary orientation , which is controlled by the

angle of .

Marginal and von Mises distributions: With ,

, , it is possible to change vari-

ables and obtain the joint pdf for the polar coordinates




Fig. 1. Probability-density contours of complex Gaussian random variableswith different complex correlation coef ficient .

Fig. 2. Marginal pdfs for magnitude and angle in the bivariate Gaussiandistr ibution with unit variance , , and various values for .

where and . The marginal pdf for is

obtained by integrating over ,

(26)

where isthe modified Bessel function of thefirst kind of order

0. The pdf is invariant to . In the circular case , it

is the Rayleigh pdf

Thissuggests thatwe call in(26) the improper/noncircular

Rayleigh pdf. Integrating over yields the marginal

pdf for :

In the circular case , this is a uniform distribution.

These marginals are illustrated in Fig. 2 for ,

, and various values of . The larger the more noncir-

cular becomes, and the marginal distribution of develops

two peaks at and . At thesame time, the maximum of the pdf of is shifted to the left.

Having derived the joint and marginal distributions of and

, it is also straightforward to write down the von Mises pdf,

which is the conditional distribution of given :

with

All these results show that the parameters , , for complex , rather than the parameters ,

, for r eal , a re t he m ost n atural p arametriza-

tion for the joint and marginal pdfs of the polar coordinates

and .

It is worthwhile to comment on the fact that the Central Limit

Theorem still applies to noncircular random variables. That is,

adding more and more independent and identically distributed

noncircular random variables leads to a sample average that

is asymptotically Gaussian and noncircular. However, this ad-

dition has to be done coherently, preserving the phase of the

random samples. If a large number of noncircular random vari-

ables is added noncoherently, i.e., with randomized phase, thenthe phase information is washed out, and the resulting sample

average becomes more and more circular.

F. Examples

Fig. 3 shows scatter plots for three signals: (a) ice multipa-

rameter imaging -band radar (IPIX) data from the website

http://soma.crl.mcmaster.ca/ipix/; (b) a 16-quadrature ampli-

tude modulated (QAM) signal; and (c) wind data obtained

from http://mesonet.agron.iastate.edu. Fig. 4 shows their cor-

responding covariance functions and complementary co-

variance functions . The radar signal in Fig. 4(a) is

narrow-band. Evidently, the gain and phase of the in-phase and

quadrature channels are matched, as the data appear circular

(and therefore proper). The uniform phase is due to carrier




Fig. 3. Scatter plots for (a) circular, (b) proper but noncircular, and (c) improper (and thus noncircular) data.

Fig. 4. Covariance and complementary covariance function plots for the corresponding processes in Fig. 3: (a) circular (b) proper but noncircular; and (c) improper.

Fig. 5. (a) Scatter plot of the average voxel values of the motor component estimated using ICA from 16 subjects. (b) Magnitude and (c) phase spatial maps usingMahalanobis -score thresholding; only voxels with are shown.

phase fluctuation from pulse-to-pulse and the amplitude fluctu-

ations are due to variations in the scattering cross-section. The16-QAM signal in Fig. 4(b) has zero complementary covari-

ance function and is therefore proper (second-order circular).

However, its distribution is not rotationally invariant and there-

fore it is noncircular. The wind data in Fig. 4(c) is noncircular

and improper.

In Fig. 5(a), we show the scatter plot of a motor component

estimated using ICA of functional MRI data [106], which is nat-

urally represented as complex valued [4]. The paradigm used in

the collection of the data is a simple motor task with a box-car

type time-course, i.e., the stimulus has periodic on and off pe-

riods. As can be observed in the figure, the distribution of the

given fMRI motor component has a highly noncircular distribu-

tion. In Fig. 5(b) and (c), we show the spatial map for the same

component using a Mahalanobis -score threshold, which is de-

fined as . In this ex-

pression, is the vector of real and imag-

inary parts of the th estimated source of voxel , and and

are the corresponding estimated spatial image mean vector

and covariance matrix.

As demonstrated by these examples, noncircular signals do

arise in practice, even though circularity has been commonly as-

sumed in the derivation of many signal processing algorithms.

As we will elaborate, taking the noncircular or improper nature

of signals into account in their processing may provide signifi-

cant payoffs. It is worth noting that, in these examples, we have

classified signals as circular or noncircular simply by inspec-

tion of their scatter plots and estimated covariance functions.

But such classification should be done based on sound statis-tical arguments. This is the topic of the next section.




III. MODEL SELECTION

When should we take noncircularity into account? On the

one hand, if the signals are indeed noncircular, we would ex-

pect a noncircular model to capture their properties more ac-

curately. On the other hand, noncircular models have more de-

grees of freedom than circular models, and the principle of par-

simony says that one should seek simple models to avoid over-fitting to noise fluctuations. So how do we choose between cir-

cular/proper and noncircular/improper models? This raises the

question of how to detect whether a given signal is circular or

noncircular, and more generally, how to detect the number of

circular and noncircular signals in a given signal subspace. The

latter problem can be combined with the detection of the dimen-

sion of the signal subspace so it becomes a simultaneous order

and model selection problem. For either problem, we show that

circular models should be preferred over noncircular models

when the SNR is low, the number of samples is small, or the

signals’ degree of noncircularity is low.

A. Circularity Coef ficientsThroughout this section, we drop the subscripts on covari-

ance matrices for economy. We first derive a maximal invariant

(a complete set of invariants) for the augmented covariance ma-

trix under nonsingular strictly linear transformation. Such a

set is given by the canonical correlations between and its con-

jugate [112], which [41] calls the circularity coef ficients of

. “Maximal invariant” means two things: 1) the circularity co-

ef ficients are invariant under nonsingular linear transformation

and 2) if two jointly Gaussian random vectors and have the

same circularity coef ficients, then and are related by a non-

singular linear transformation, .

Assuming has full rank, the canonical correlations betweenand are determined by starting with the coherence ma-

trix [116]

(27)

Since is complex symmetric, , yet not Hermitian

symmetric, i.e., , there exists a special singular value

decomposition (SVD), called the Takagi factorization [53],

which is

(28)

The complex matrix is unitary, and

contains the canonical correla-tions on its diagonal. The

squared canonical correlations are the eigenvalues of the

squared coherence matrix , or

equivalently, of the matrix [116]. The canonical

correlations between and are invariant to the choice of a

square root for , and they are a maximal invariant for under

nonsingular strictly linear transformation of . Therefore,

any function of that is invariant under nonsingular strictly

linear transformation must be a function of these canonical

correlations only [112].

Following [41], we call these canonical correlations the

circularity coef ficients, and the set the circularity spec-

trum of . However, the term “circularity coef ficient” is not en-

tirely accurate as the circularity coef ficients only characterize

second-order circularity, or (im-)propriety. Thus, the name im-

propriety coef ficients would have been more suitable. For a

scalar random variable , there is only one circularity coef fi-

cient . For Gaussian , characterizes the degree

of noncircularity, and for non-Gaussian , the degree of impro-

priety.

Differential entropy1: The entropy of a complex random

vector is defined to be the entropy of the real composite

vector . The differential entropy of a complex Gaussian

random vector with augmented covariance matrix is thus

Combining our results so far, we may factor as

(29)

Note that each factor is an augmented matrix. This factorization

establishes

(30)

This allows us to write the entropy of a complex noncircular

Gaussian random vector as [41], [112]

(31)

where is the entropy of a circular Gaussian random

vector with the same Hermitian covariance matrix (but

). The entropy is maximized if and only if is

circular. If is noncircular, the loss in entropy compared to the

circular case is given by the second term in (31), which is a

function of the circularity spectrum. Thus, this term can be used

as a measure for the degree of impropriety of a random vector ,

and it is also a test statistic in thegeneralized likelihood ratio test

for noncircularity. Other measures for the degree of impropriety

have been proposed and studied by [112].

B. Testing for Circularity

In this section, we present hypothesis tests for circularity

based on a generalized likelihood ratio test (GLRT). In a GLR,

the unknown parameters ( and in our case) are replaced by

maximum likelihood estimates. The GLR is always invariant to

transformations for which the hypothesis testing problem itself

is invariant [58]. As propriety is preserved by strictly linear,

but not widely linear, transformations, the hypothesis test must

1Since we do not consider discrete random variables in this paper, we refer to differential entropy simply as entropy from now on.




be invariant to strictly linear, but not widely linear, transforma-

tions. Since the GLR must be a function of a maximal invariant

statistic the GLR is a function of the circularity coef ficients.

Let be a complex zero-mean Gaussian random vector

with augmented covariance matrix . Consider indepen-

dent and identically distributed (i.i.d.) random samples drawn

from this distribution and arranged in the sample matrix

, and let denote the

corresponding augmented sample matrix. The joint probability

density function of these samples is

(32)

with augmented sample covariance matrix

The GLR test of the hypotheses is circular vs.

is noncircular compares the GLRT statistic

(33)

to a threshold. This statistic is theratio of likelihood with con-

strained to have zero off-diagonal blocks, , to likelihood

with unconstrained. We are thus testing whether or not is

block-diagonal. The unconstrained maximum likelihood (ML)

estimate of is the augmented sample covariance matrix .

The ML estimate of under the constraint is

After a little algebra, the GLR can be expressed as [94], [116]

(34)

In this equation, denotes the estimated circularity coef fi-

cients, which are computed from the augmented sample covari-

ance matrix . A full-rank implementation of this test relies

on the middle expression in (34). A reduced-rank implementa-

tion that considers only the largest estimated circularity coef-ficients is based on the last expression in (34).

This test was first proposed by [13], in complex notation by

[94], and the connection with canonical correlations was estab-

lished by [116]. It is shown in [13] that

rather than is the locally most powerful (LMP) test for circu-

larity. An LMP test has the highest possible power (i.e., prob-

ability of detection) for close to , where all circularity

coef ficients are small. Testing for circularity is reexamined by

[133], which studies the null distributions of and , and de-

rives a distributional approximation for . It also shows that no

uniformly powerful (UMP) test exists for this problem because

the GLR and the LMP tests are different for dimensions .

(In the scalar case , the GLRandLMPtestsareidentical).

Extensions of this test to non-Gaussian data: There have been

recent extensions of this test to non-Gaussian data. In [95], the

test is made asymptotically robust with respect to violations of

Gaussianity, provided the distribution remains elliptically sym-

metric. This is achieved by dividing the GLRT statistic by an

estimated normalized fourth-order moment (which is closely

related to the kurtosis). Generation and estimation of samples

from a complex generalized Gaussian distribution (GGD) and

estimation of its parameters is addressed in [90] and a circu-

larity test for the complex GGD is given in [89]. These results

have been recently extended to elliptically symmetric distribu-

tions [93].

The complex GGD is a complex elliptically symmetric dis-

tribution with a simple probabilistic characterization. It allows

for a direct connection to the Gaussian test given in (34). If is

a scalar complex GGD random variable, it has pdf

(35)

where is the shape parameter, ,

, is the Gamma function, , and is the

augmented covariance matrix. The complex GGD comprises a

large number of symmetric distributions, from super-Gaussian

(for ), Gaussian (for ), to sub-Gaussian (for

), including other distributions such as the bivariate Lapla-

cian. As for all complex elliptically symmetric distributions,

zero mean and propriety is equivalent to circularity.

A GLRT statistic based on the complex GGD, as derived by[89], is given by

(36)

Here, is the ML estimate of the shape parameter , and

and are the ML estimates of the augmented covariance ma-

trix, under and , respectively. These ML estimates have

been derived in [90]. Asymptotically for , the statistic

is -distributed with two degrees of freedom. This al-lows choosing a threshold to yield a specified probability of

false alarm. In the scalar Gaussian case, and , the

last term in (36) vanishes, and the complex GGD and Gaussian

GLRT become equivalent. Using the complex GGD model and

the ML estimators for its parameters, it is also possible to design

a test for Gaussianity [89].

To compare the performance of the different circularity de-

tectors, we generate independent realizations from

a complex GGD with unit variance , as in [89].2

Fig. 6(a)–(c) shows the probability of detection versus the de-

gree of impropriety for the three detectors: the

Gauss-GLRT in (34), the adjusted GLRT (Adj-GLRT) [95], and

2Matlab code to implement the detectors and to generate complex GGD samples can be found at http://mlsp.umbc.edu/resources.




Fig. 6. Detection performance of circularity detectors versus degree of impropriety , for sub-Gaussian, Gaussian, and super-Gaussian complex GGDsamples. Probability of false alarm is fixed at 0.01.

the complex GGD-GLRT in (36). Each data point in the fig-

ures is the result of the average of 1000 runs, with the detec-

tion threshold set for a probability of false alarm of 0.01. As

can be observed in the figures, all three detectors yield similar

performance f or sub-Gaussian and identical performance for

Gaussian data. So the Gauss-GLRT detector performs well even

with sub-Gaussian data. When the samples are super-Gaussian,however, the complex GGD-GLRT significantly outperforms.

In [89], examples are presented to show that the circularity test

based on the complex GGD model provides good performance

even with data that are not complex GGD, such as BPSK data.

C. Order S election With a General Signal Subspace Model

Determining the effective order of the signal subspace is an

important problem in many signal processing and communica-

tions applications. The popular solution proposed by Wax and

Kailath [134] uses information theoretic criteria by considering

principal component analysis (PCA) of the observed data to se-

lect the order. However, the model assumes circular multivariateGaussian signals, and is suboptimal in the presence of noncir-

cular signals in the subspace [73].

In [73], a noncircular PCA (ncPCA) approach is proposed

along with an ML procedure for estimating the free parameters

in the model. The procedure can be used for model selection in

the presence of circular Gaussian noise, whereby the numbers

of circular and noncircular signals are determined together with

the total model order. The method reduces to Wax and Kailath’s

signal detection method [134] when all signals are circular, but

may provide a significant performance gain in the presence of

noncircular signals. We first introduce the model for ncPCA and

then demonstrate its use for model selection.

Given a set of observations , withas the observation index, we assume the linear model

(37)

wher e is the complex-valued

signal vector, is an complex-valued matrix with full

column rank, , and is the complex-valued

random vector modeling the additive noise. Given the

set of observations , , a key step in many ap-

plications is to determine the dimension of the signal sub-

space. Traditional PCA provides a decomposition into signal

and noise subspaces. In ncPCA, the -dimensional signal sub-

space is further decomposed into noncircular and circular sig-

nals. As in [134], the noise term is assumed to be a cir-

cular, isotropic, stationary, complex Gaussian random process

with zero mean, statistically independent of the signals, and the

signals are assumed to be multivariate Gaussian. However, in

contrast to [134], we let signals be noncircular and

signals be circular. The underlying assumption here is that the

rank of the source covariance matrix is and

that of the complementary covariance matrix is

. For the degenerate case, the actual number of the signals andthose that are noncircular can be greater than and/or .

In order to determine the orders and , given snapshots

of , , we write the likelihood of as

(38)

where denotes the set of all adjustable parameters in the like-

lihood function

For the model in (37), the covariance and complementary co-

variance matrices of have parametric forms [72], [73]

(39)

where is a complex-valued matrix with orthonormal

columns spanning the signal subspace,

is a diagonal matrix with diagonal elements , is the

noise variance, is an complex-valued

matrix with orthonormal columns, and

is a diagonal matrix with complex-valued entries . The

absolute values of these entries equal the circularity coef ficients:

. Compared to the Takagi factorization in

(28), which leads to nonnegative circularity coef ficients, the de-

composition in (39) incorporates an additional phase factor into

the quantities , making them complex valued. This is simply

a notational convenience [73].

Given the ML estimates for all the free parameters

in (38), we may follow [134] and

select and using information-theoretic criteria such as

Akaike’s information criterion (AIC) [8], the Bayesian in-

formation criterion (BIC) [118], or the minimum description

length (MDL) [105]. This leads to the estimated orders

(40)




where is the optimal that maxi-

mizes , which can be obtained as in [72], and results in

the likelihood

(41)

The penalty term in (40) contains the degrees of

freedom of ,

and , which depends on the number of samples and the

chosen criterion. For example, in the BIC(or theMDL criterion)

[105], [118], .

Thesignal detection method given in [134] can be obtained as

a special case without noncircular sources. In the pres-ence of noncircular signals, the presented approach will lead to

a smaller term in (40). At the same time, how-

ever, a noncircular model has more degrees of freedom, so the

penalty term in (40) will increase. This requires the

right tradeoff, resulting in a good model fit without over fitting.

The following example demonstrates this tradeoff and shows

that a circular model can be preferable when the noise level is

high, the degree of noncircularity is low, and/or the number of

samples is small.

In this example, Gaussian sources of unit vari-

ance and identical degree of noncircularity

are mixed through a randomly chosen 20 7

matrix, after which circular Gaussian noise is added to the

mixture. We study the BIC of a circular model with or-

ders , and a noncircular model with orders

. The gain of the noncircular model over the circular

model is defined as (BIC of circular model)/ minus (BIC of

noncircular model)/ . It is clear that a positive gain suggests

that the noncircular model is preferred over the circular model.

Fig. 7 shows the overall information-theoretic (BIC) gain of

using a noncircular model for varying degree of noncircularity,

SNR, and sample size. The results are averaged over 1000 in-

dependent runs. We observe that the noncircular model is pre-

ferred when there is ample evidence that the signals are non-

circular: the cases of large degree of noncircularity, high SNR,and large sample sizes. On the other hand, the simpler circular

model is preferred when there is scarce evidence that the signals

are noncircular: the cases where the degree of noncircularity is

low, the SNR is low, or the number of samples is small. The

ncPCA approach can model both circular and noncircular sig-

nals and hence can avoid the use of an unnecessarily complex

noncircular model when a circular model should be preferred.

We also give an example in array signal processing to

demonstrate the direct performance gain using ncPCA rather

than a circular model [134] for signal subspace estimation

and model selection. We model far-field, independent

narrowband sources (one BPSK signal, one QPSK signal and

one 8-QAM signal, with degrees of impropriety of 1, 0, and

2/3, and variances 1, 2, and 6, respectively) emitting plane

Fig. 7. BIC gain of noncircular over circular model for (a) varying degree of noncircularity, (b) varying SNR, and (c) varying sample size. Each simulation point is averaged over 1000 independent runs. A positive gain suggests that thenoncircular model is preferred over the circular model.

Fig. 8. Comparison of probability of detection (Pd) and subspace distance gainfor (a) ncPCA and (b) circular PCA (cPCA), as a function of SNR.

waves impinging upon a uniform linear array of

sensors with half-wavelength inter-sensor spacing. The re-

ceived observations are ,

where is the snapshot index, is the waveform of

the th source, the circular antenna noise,

is the

steering vector associated with the th source, and is

the direction-of-arrival (DOA) of the th source. We set

and .In Fig. 8, we show the gain using ncPCA compared to a cir-

cular model [134] both in terms of probability of detection and

in terms of subspace distance. Probability of detection is de-

fined as the fraction of trials where the order was detected cor-

rectly, and substance distance is the squared Euclidean distance

between the estimated and the true signal subspace. Each sim-

ulation point is averaged over 1000 independent runs. As ob-

served in Fig. 8, ncPCA outperforms circular PCA by approxi-

mately 4.0 dB in SNR. This holds for both the detection of the

order alone and the joint detection of orders , which

perform almost identically for different SNR levels. In addition,

ncPCA consistently leads to a smaller subspace distance. Fur-

ther examples and additional discussion on the performance of

ncPCA are given in [72].




IV. WIDELY LINEAR ESTIMATION

In Section II-B, we have discussed widely linear (linear-con-

jugate linear) transformations, which allow access to the

information contained in the complementary covariance. In

this section, we consider widely linear minimum mean-square

error (WLMMSE) estimation [103], widely linear minimum

variance distortionless response (WLMVDR) estimation [26],[27], [29], [77], and the widely linear least-mean-square (LMS)

algorithm. Using the augmented vector and matrix notation,

many of the results for widely linear estimation are straight-

forward extensions of the corresponding results for linear

estimation.

A. Widely Linear MMSE Estimation

We begin with WLMMSE estimation of the -dimensional

message (or signal) from the -dimensional measurement

. To extend results for LMMSE estimation to WLMMSE es-

timation we need only replace the signal by the augmented

signal and themeasurement by the augmented measurement, and proceed as usual. Thus, most of the results for LMMSE

estimation apply straightforwardly to WLMMSE estimation. It

is, however, still worthwhile to summarize some of these re-

sults and to compare LMMSE with WLMMSE estimation. The

widely linear estimator is

(42)

where

is determined such that the mean-square error

is minimized. Keep in mind that the augmented

notation on the left hand side of (42) is simply a convenient, but

redundant, representation of the right hand side of (42). Obvi-

ously, it is suf ficient to estimate because can be obtained

from through conjugation.

The WLMMSE estimator is found by applying the orthog-

onality principle and [103], or

equivalently, . This says that the error between the

augmented estimator and the augmented signal must be orthog-

onal to the augmented measurement. This leads to

and thus

(43)

Thus, , or equivalently, [103]

In this equation, the Schur complement

is the error covariance matrix for linearly esti-

mating from . The augmented error covariance matrix

of the error vector is

A competing estimator will produce an aug-

mented error with covariance matrix

(44)

which shows . As a consequence, all real-valued

increasing functions of are minimized, in particular,

and .

These statements hold for the error vector as well as the

augmented error vector because . The

error covariance matrix of the error vector is the

NW block of the augmented error covariance matrix , which

can be evaluated as

(45)

A particular choice for a generally suboptimum filter is the

LMMSE filter

which ignores complementary covariance matrices.

Special cases: If the signal is real, we have .

This leads to the simplified expression

While the WLMMSE estimate of a real signal from a complex

signal is always real [103], the LMMSE estimate is generally

complex.

The WLMMSE and LMMSE estimates are identical if andonly if the error of the LMMSE estimate is orthogonal to ,

i.e.,

(46)

There are two important special cases where (46) holds:

• The signal and measurement are cross-proper,i.e.,

, and the measurement is proper, . Joint propriety

of and will suf fice but it is not necessary that be

proper.

• The measurement is maximally improper , i.e.,

with probability 1 for constant with . In this case,and and

. WL estimation is unnecessary since and

both carry exactly the same information about . This is

irrespective of whether or not is proper.

In these cases, WLMMSE estimation has no performance ad-

vantage over LMMSE estimation. The other extreme case is

where a WL operation allows perfect estimation while LMMSE

estimation yields a nonzero estimation error. An example of

such a case is , where the signal is real and the noise

purely imaginary. Here the WL operation yields a

perfect estimate of , whereas the LMMSE estimate is not even

real valued. If with proper and white noise , then

the maximum performance advantage of WLMMSE estimation

over LMMSE estimation is a factor of 2 [117].




WLMMSE estimation is optimum in the Gaussian case, but

may be improved upon for non-Gaussian data if we have access

to higher order statistics. The next logical extension of widely

linear processing is to widely linear-quadratic processing [28],

[30], [115], which requires statistical information up to fourth

order. We should add the cautionary note here that there is no

difference between the optimum, generally nonlinear, condi-

tional mean estimator , and . Conditioning

on and changes nothing, since already extracts all

the information there is about from . So the “widely non-

linear” estimator is simply .

B. Widely Linear MVDR Estimation

Next, we extend linear minimum-variance distortionless re-

sponse (LMVDR) estimators to widely linear MVDR estimators

that account for complementary covariance. The measurement

model is

(47)

where the matrix consists of modes that are assumed to be known (as, for instance, in

beamforming), and the vector consists

of complex deterministic amplitudes. Without loss of gener-

ality we assume . The noise has covariance matrix

. Because is modeled as unknown but deter-

ministic, the covariance matrix of the measurement equals the

covariance matrix of the noise: . The matched filter

estimator of is

(48)

with mean and error covariance matrix

. The LMVDR estimator , with, is derived by minimizing the trace of the error

covariance under an unbiasedness constraint:

under constraint

The solution is

(49)

with mean and error covariance matrix

. If there is only a single mode

, then the solution is

(50)

and the covariance matrix of the noise is

.

The measurement model (47) in augmented form is

where

and the noise is generally improper with augmented covari-

ance matrix . The matched filter estimator of , however,

does not take into account noise, and thus the widely linear

matched filter solution is still the linear solution (48). The

WLMVDR estimator, on the other hand, is obtained as the

solution to

under constraint (51)

with

This solution is widely linear [26], [29], [77],

(52)

with mean and augmented error covariance matrix

. The vari-

ance of the WLMVDR estimator is less than or equal to the

variance of the LMVDR estimator because of the following ar-

gument. The optimization (51) is performed under the constraint

, or equivalently, and . Thus,

the WLMMSE optimization problem contains the LMMSE

optimization problem as a special case in which is

enforced. As such, WLMVDR estimation cannot be worsethan LMVDR estimation. However, the additional degree of

freedom of being able to choose can reduce variance

if the noise is improper.

The reduction in variance of the WLMVDR compared to the

LMVDR estimator is entirely due to exploiting the complemen-

tary correlation of the noise . Indeed it is easy to see that for

proper noise , the WLMVDR solution (52) simplifies to the

LMVDR solution (49). Since is not assigned statistical prop-

erties, the solution is independent of whether or not is im-

proper. This stands in marked contrast to WLMMSE estima-

tion discussed in the previous section. A detailed analysis of

the WLMVDR estimator is provided by [26]. Further results are presented in [27].

C. Linear and Widely Linear Filtering and the LMS Algorithm

In this section, we consider the linear and widely linear fil-

tering problem, where we estimate a scalar signal from

observations taken over time instants

. We first derive the normal equations

and the LMS algorithm [135], [136], and then extend it to the

widely linear case. The linear estimate of is

For convenience, we will suppress the time-dependency of

the filter . We could derive the solution for using

orthogonality arguments as in (43). Alternatively, we can take

the Wirtinger derivative of the linear mean-square error term

, which is real

valued, with respect to (by treating as a constant)

(53)

By setting (53) to zero and assuming a nonsingular covariance

matrix, we obtain the normal equations




where . If and are jointly

WSS, this equation is independent of . The weight vector

can also be computed adaptively using gradient descent updates

(54)

or using stochastic gradient updates

which replaces the expected value in (54) with its instantaneous

estimate. This is the popular LMS algorithm. The stepsize

determines the tradeoff between the rate of convergence and the

minimum MSE.

Widely linear filter and LMS algorithm: As shown in (42), a

widely linear filter forms the estimate of through the inner

product

(55)

where the weight vector has twice

the dimension of the linear filter. The minimization of the

widely linear MSE cost is obtained analogously by setting

. This results in the widely linear complex normal

equation

where . The difference between thewidely linear MSE and the linear MSE is [103]

(56)

The widely linear LMS algorithm is written similar to the linear

case as

(57)

where is the stepsize and .The study of the LMS filter has been an active research topic.

A thorough account of its properties is given in [50], [74] based

on the different types of assumptions that can be invoked to

simplify the analysis. With the augmented vector notation, most

of the results for the linear LMS filter can be readily extended

to the widely linear case although care must be taken in the

assumptions such as uncorrelatedness and orthogonality of the

signals [6], [38].

The convergence of the LMS algorithm dependson the eigen-

values of the input covariance matrix [21], [50]. For the widely

linearLMSfilter,thisis theaugmented covariance matrix.A mea-

sure typically used for the eigenvalue disparity of a given matrix

is the condition number. For a Hermitian matrix , the condi-

tionnumberis theratio of largest to smallest eigenvalue:

. When the signal is proper, the augmented covariance ma-

trix is block-diagonal and has eigenvalues that occur with even

multiplicity. In thiscase, thecondition numbersof theaugmented

covariance matrix and the Hermitian covariance matrix are

the same. If the signal is improper, then the eigenvalue spread of

is always greater than the eigenvalue spread of . The max-

imally improper case leads to the most spread out eigenvalues.

This can be shown via majorization theory [115], [117].

Thus, the MSE performance advantage of the widely linear

LMS algorithm for improper signals comes at the price of

slower convergence compared to the linear LMS algorithm.

An update scheme such as the recursive least squares (RLS)

algorithm [50], which is less sensitive to the eigenvalue spread,

may be preferable in some cases. In the next example, we

demonstrate the impact of impropriety on the convergence of

LMS algorithm, and show that when the underlying system

is linear, there is no performance gain with a widely linear

model—a simple point, but not always acknowledged in the

literature.

D. Examples

We define a random process

(58)

where and are two uncorrelated real-valued

random processes, both Gaussian distributed with zero mean

and unit variance. By changing the value of , we can

change the degree of noncircularity of . For , the

random process becomes circular (and hence proper),

whereas and lead to maximally noncircular .We then construct . That is, we

pass through a linear system with impulse response

, where to ensure unit weight norm, and we

also add uncorrelated white circular Gaussian noise

with signal-to-noise ratio of 20 dB.

Because was obtained from as the output ofa linear

system plus uncorrelated noise, the optimum filter for estimating

from is obviously linear. However, we use this simple

example to demonstrate what would happen if one were to use

a widely linear filter instead. In order to estimate using the

LMS algorithm, we assemble snapshots of in the vector . Its covariance

matrix is , and its complementary covariance matrix is

. The eigenvalues of the augmented covariance

matrix can be shown to be and , each with

multiplicity . Hence, the condition number is

for and for .

In Fig. 9, we show the convergence behavior of a linear and a

widely linear LMSfilter for estimating from . The s tep

size is fixed at for all runs, and we choose the length

of the filter to match the length of the system’s impulse

response. Fig. 9(a) shows the learning curve for a circular input,

i.e., , and Fig. 9(b) for a noncircular input with

. The figures confirm that the linear and widely linear filters




Fig. 9. Convergence of the linear and widely linear LMS algorithm for a linear system, for (a) circular input and (b) noncircular input.

Fig. 10. Convergence of the linear and widely linear LMS algorithm for a widely linear system, for (a) circular input and (b) noncircular input.

yield the same steady-state mean-square error values for circular and noncircular . However, in the noncircular case, the use

of the widely linear LMS filter is even detrimental because of

its decreased convergence rate. This is due to the fact that, for

circular , the condition numbersforboth and areunity,

whereas for noncircular , but .

Let us now consider the widely linear system

with the same as before.

Fig. 10 shows the corresponding learning curves for the linear

and widely linear LMS filters with the same settings for the

simulation as before. As can be observed in the figures, the

widely linear filter provides smaller MSE for both circular and

noncircular . For circular input , the reduction in MSE

by using the widely linear filter can be obtained from (56) asHowever, this perfor-

mance gain again comes at the price of slower convergence due

to the increased eigenvalue spread.

V. I NDEPENDENT COMPONENT A NALYSIS

Data-driven signal processing methods are based on simple

generative models and hence can minimize the assumptions

on the nature of the data. They have emerged as promising

alternatives to the traditional model-based approaches in many

signal processing applications where the underlying dynamics

are hard to characterize. The most common generative model

used in data-driven decompositions is a linear mixing model,

where the mixture (the given set of observations) is written as

a linear combination of source components, i.e., it is decomposed into a mixing matrix and a source matrix. For unique-

ness of the decomposition (subject to a scaling and permuta-

tion ambiguity), constraints such as sparsity, nonnegativity, or

independence are applied to the two matrices. ICA is a pop-

ular data-driven blind source separation technique that imposes

the constraint of statistical independence on the components,

i.e., the source distributions. It has been successfully applied

to numerous signal processing problems in areas as diverse

as biomedicine, communications, finance, and remote sensing

[5], [34], [35], [56].

We consider the typical ICA problem, where is a linear

mixture of independent components (sources) , as described

by

(59)

where , i.e., the number of sources and observations

are equal, and the mixing matrix is nonsingular.

The objective is to blindly recover the sources from the obser-

vations , without knowledge of , using the demixing (or sep-

arating) matrix such that the source estimates are

where Arbitrary scaling of , i.e.,

multiplication by a diagonal matrix (which may have complex

entries), and reordering the components of , i.e., multiplication

by a permutation matrix, preserves the independence of its com-

ponents. The product of a diagonal and a permutation matrix is a

monomial matrix, which has exactly one nonzero entry in each




column and row. Hence, we can determine only up to mul-

tiplication with a monomial matrix.

A limitation of ICA for the real-valued case is that when the

sources are white—i.e., we cannot exploit the sample corre-

lation structure in the data—we can only allow one Gaussian

source in the mixture for successful separation. In the com-

plex case, on the other hand, we can perform ICA of multiple

Gaussian sources as long as they all have distinct circularity co-

ef ficients. In addition, the complex domain enables the sepa-

ration of improper sources through a direct application of the

invariance property of the circularity coef ficients. When all the

sources in the mixture are improper with distinct circularity co-

ef ficients, we can achieve ICA through joint diagonalization of

the covariance and complementary covariance matrices using

the strong uncorrelating transform (SUT) [41], [63]. For the

real-valued case, separation using second-order statistics can be

achieved only when the sources have sample correlation.

Using higher order statistical information, we can perform

ICA for any type of distribution, circular or noncircular, as long

as there are no two complex Gaussian sources with the same cir-cularity coef ficient [41]. To achieve ICA, we can either compute

the higher order statistics explicitly, or we can generate them

implicitly through the use of nonlinear functions. Among the

former group is JADE (which stands for joint approximate diag-

onalization of eigenmatrices) [24], which computes cumulants.

JADE can be used directly for ICA of complex-valued data. A

recent extension of this algorithm [122] enables joint diagonal-

ization of matrices that can be Hermitian or complex symmetric.

Hence, it allows for more ef ficient ICA solutions, considering

also the commonly neglected complementary statistics in the

original formulation of JADE.

Algorithms that rely on joint diagonalization of cumulantmatrices are robust. However, their performance suffers as the

number of sources increases, and the cost of computing and

diagonalizing cumulant matrices may become prohibitive for

separating a large number of sources. ICA techniques that use

nonlinear functions to implicitly generate higher order statistics

may present attractive alternatives. Among these are ML-ICA

[99], information-maximization (Infomax) [16], and maxi-

mization of non-Gaussianity (e.g., the FastICA algorithm) [55],

which are all intimately related to each other. These algorithms

can be easily extended to the complex domain using Wirtinger

calculus as shown in [7]. In addition, one can develop complex

ICA algorithms that adapt to different source distributions,

using general models such as complex generalized Gaussian

distributions [90] as in [87], or more flexible models through

ef ficient entropy estimation techniques [69].

A. ICA Using Second-Order Statistics

We first present the second-order approach to ICA [40], [41],

[63], which is based on the fact that thecircularity coef ficients of

are invariant under the linear mixing transformation [113].

We discussed the circularity coef ficients in Section III-A. There

is a corresponding coordinate system, called the canonical coor-

dinate system, where the latent description has identity corre-

Fig. 11. Two-channel model for second-order complex ICA. The vertical ar-rows show the complementary correlation matrix between the upper and lower lines.

lation matrix, , and diagonal complementary cor-

relation matrix with the circularity coef ficients on the diagonal,

In [41], vectors that are uncorrelated with unit variance, but

possibly improper, are called strongly uncorrelated , and the

transformation , which transforms into canonical co-

ordinates as , is called the strong uncorrelating

transform (SUT). The SUT is found as

where is obtained from the Takagi factorization (28) for the

coherence matrix of . If all circularity coef ficients are distinct

and nonzero, the SUT is unique up to the sign of its rows [41].

We will now show that the SUT , computed from the

mixture , is a separating matrix for the complex linear ICA

problem provided that all circularity coef ficients are distinct.

The assumption of independent components in in the given

ICA model (59) implies that the correlation matrix and the

complementary correlation matrix are both diagonal. It is

therefore easy to compute canonical coordinates between and, denoted by . In the SUT , is

a diagonal scaling matrix, and is a permutation matrix that

rearranges the canonical coordinates such that corresponds

to the largest circularity coef ficient , to the second largest

coef ficient , and so on. This makes the SUT monomial.

As a consequence, has independent components.

The mixture has correlation matrix

and complementary correlation matrix . The

canonical coordinates between and are computed as

, and the SUT is determined analo-

gously to .

Fig. 11 shows the connection between the different coordi-nate systems. The important observation is that and are

both in canonical coordinates with the same circularity coef fi-

cients . It remains to show that and are related by a di-

agonal unitary matrix as , provided

that all circularity coef ficients are distinct. Since and are

both in canonical coordinates with the same diagonal canonical

correlation matrix ,




This shows that is unitary and . The latter can

only b e true if whenever . T herefore, is diag-

onal and unitary if all circularity coef ficients are distinct. Since

is real, the corresponding diagonal entries of all nonzero cir-

cularity coef ficients are actually . Thus, has independent

componentsbecause has independent components. Hence, we

have shown that the SUT is a separating matrix for the

complex linear ICA problem if all circularity coef ficients are

distinct, and the ICA solution is with and

.

This development requires that correlation and complemen-

tary correlation matrices exist. For some distributions (e.g.,

Cauchy distribution) this is not the case. Ollila and Koivunen

[96] have presented a generalization of the SUT that works for

such distributions.

A recent source separation algorithm, entropy rate minimiza-

tion (ERM) [71], uses second-order statistics along with sample

correlation to improve the overall separation performance, and

the second-order source separation algorithm SUT described

above becomes a special case of the ERM algorithm.

B. ICA Using Higher Order Statistics

As discussed in Section V-A, ICA can be achieved using only

second-order statistics as long as all the sources are noncircular

with distinct circularity coef ficients. Using higher order statis-

tics, however, one can develop more powerful ICA algorithms

that can achieve separation under more general conditions and

separate sources as long as there are no two complex Gaussians

with the same circularity coef ficient in the given mixture.

A natural cost for achieving independence is mutual informa-

tion, which can be expressed as the Kullback–Leibler distance

between the joint and factored marginal source densities

(60)

where

is the real representation of complex and

is the pdf of source ,

. To write (60), we used the Jacobian transformation

(61)

where .

When we have independent samples , we can

use the mean ergodic theorem to write the first term in (60), the

total source entropy, as

(62)

where denotes the th row of . Since is con-

stant with respect to the weight matrix, it is easy to see that

the minimization of mutual information given in (60) is

equivalent to maximum likelihood estimation for the givensamples. The maximum likelihood is written as

where

(63)

In writing (63), we used by ignoring the scaling

and permutation ambiguity to simplify the notation. We have

also omitted the sample index of for economy.

A simple but effective learning rule for maximizing

the likelihood function can be obtained by directly using

Wirtinger derivatives as explained in Section II-A. We write thereal-valued cost function as , and derive the

relative (natural) gradient updates for the demixing matrix [7]

(64)

where the th element of the score function is the

Wirtinger derivative

(65)

This way the development avoids the need for simplifying as-

sumptions such as circularity [14].

Another natural cost function for performing ICA is negen-

tropy, which measures the entropic distance of a given distri-

bution from the Gaussian. Independence is achieved by moving

the distribution of the transformed mixture —the indepen-

dent source estimate —away from the Gaussian, i.e., by max-

imizing non-Gaussianity. Note that in this case, the objective

function is defined for each individual source rather than for the

whole vector as in ML ICA. With negentropy maximization

as the objective, the cost function for estimating all sources is

written as in (62), under the constraint that be unitary. Since

, the second term in (63) vanishes for

unitary . So maximizing likelihood and maximizing negen-

tropy are equivalent goals for unitary .If we approximate the negentropy—or, equivalently, the en-

tropy (62) under a variance constraint—using a nonlinear func-

tion

(66)

we can estimate the columns of the unitary demixing matrix

in a deflationary procedure, where the sources are estimated

one at a time. We can again use Wirtinger calculus to obtain

either gradient updates [7] or modified Newton type updates [7],

[86] such that the weight vector is updated by

(67)




where and denote the first and second derivatives

of . After each update (67), an orthogonalization and nor-

malization step is used for to ensure that the resulting is

unitary.

To summarize, ICA using higher order statistics can be

achieved by maximizing likelihood (which is equivalent to

minimizing mutual information) or by maximizing negentropy.

When the demixing matrix is constrained to be unitary—as it

is for the negentropy cost—the two costs are equivalent. In

both cases, the performance is optimized when the nonlinear

functions, the score function in (64) and the nonlinearity

in (67), are chosen to match the source distributions. This is

discussed in more detail in the next section.

C. ICA Algorithms and Noncircularity

As shown in Section V-A, noncircularity (or rather impro-

priety) allows the separation of sources using only second-order

statistics as long as all the sources in the mixture are noncircular

with distinct circularity coef ficients. Noncircularity also playsan important role in the performance of ICA algorithms. To be

identifiable, the mixing matrix in (59) must have full column

rank and there must not be two Gaussian sources with the same

circularity coef ficient [41]. In terms of local stability, noncir-

cularity plays a more critical role because algorithms become

less stable as the degree of noncircularity increases. As noted

in the previous section, the ML and negentropy cost functions

are equivalent for unitary . Nevertheless, the stability condi-

tions differ for these two cost functions because, for the negen-

tropy cost, the unitary constraint allows a desirable decoupling

for each source estimate.

When the negentropy cost is approximated as, we can obtain local stability condi-

tions. These are conditions for the cost function to have a local

minimum or maximum [88], following the approach in [18],

which assumed circularity. In Fig. 12, we plot the regions

of stability for three nonlinear functions, ,

, and , where is an arbitrary

constant, chosen as 0.1 as in [18]. The region of stability

depends on the kurtosis (which is normalized such that the

Gaussian has zero kurtosis) and the degree of impropriety

. We see that stability for sub-Gaussian sources,

whose kurtosis is in the range , is always affected by

impropriety. Examples of such sources include quadratureamplitude modulation, BPSK, and uniformly distributed

signals. However, for super-Gaussian sources, whose kurtosis

is greater than zero, impropriety affects stability only for

sources whose kurtosis is close to the Gaussian. Hence, for

super-Gaussian sources, matching the density to the improper

nature of the signals is less of a concern than for sub-Gaussians.

For ML ICA, on the other hand, it is shown in [66] that, when

all sources are circular and non-Gaussian, the updates converge

to the inverse of the mixing matrix up to a phase shift. However,

when the sources are noncircular and non-Gaussian, the stability

conditions become more dif ficult to satisfy, especially when the

demixing matrix is constrained to be unitary. When the sources

are Gaussian though, ML ICA is stable as long as the circularity

coef ficients of all sources are distinct. This is the case that can

Fig. 12. Regions of stability for negentropy cost given by .

also be solved using the second-order ICA approach described

in Section V-A.

For ML ICA of circular sources, the th entry of the score

function vector in (65) can be shown to be [7]

because for this case we have . Thus, the

score function always has the same phase as its argument. This

is the form of the score function proposed in [14] where all

sources are assumed to be circular. When the source is circular

Gaussian, the score function becomes linear: .

This is expected because circular Gaussian sources cannot be

separated. However, when they are noncircular, the score func-

tion is , where and are the variance

and the complementary variance of the th source. This func-

tion is widely linear in , which allows separation as long asthe sources have distinct circularity coef ficients.

In addition to identifiabilityandlocalstabilityconditions,non-

circularity also plays a role in terms of the overall performance

of the algorithms. The performance of both ML-ICA and ICA

by maximization of non-Gaussianity is optimal when the non-

linearity used in the algorithm (the score function for ML

and for maximization of non-Gaussianity) is matched to the

density of each estimated source. Thisis also when theestimators

assume their desirable large sample properties [54], [99].

Given the richer structure of possible distributions in the com-

plex (two-dimensional) space compared to the real (one-dimen-

sional) space, the pdf estimation problem becomes more chal-lenging for complex-valued ICA. In the matching procedure, we

need to consider not only the sub- or super-Gaussian but also the

circular/noncircular nature of the sources. One way to do this is

to employ the complex GGD (35) to derive a class of flexible

ICA algorithms. Its shape parameter and the augmented co-

variance matrix can be estimated during the ICA updates. This

is the approach taken in the adaptive complex maximization of

non-Gaussianity (ACMN) algorithm [86], [87] based on the up-

dates given in (67). Another approach to density matching is en-

tropy estimation through the use of a number of suitably chosen

measuring functions [70]. Such an approach can provide robust

performance even with a small set of chosen measuring func-

tions [70]. Since the demixing matrix is not constrained in the

maximum likelihood approach, exact density matching for each




Fig. 13. Performance comparison of eight complex ICA algorithms for the separation of (a) mixtures of five Gaussian sources with degrees of noncircularity

0, 0.2, 0.4, 0.6, and 0.8; and (b) mixtures of nine noncircular complex sources drawn from GGD distributions, both as a function of sample size. Each

simulation point is averaged over 100 independent runs.

source becomes a more challenging task. This is due to the fact

that in the update (64), the score function for each source (65)

affects the whole demixing matrix estimate . In the com-

plex ICA by entropy bound minimization (ICA-EBM) algo-

rithm [69], the maximum likelihood updates are combined with

a decoupling approach that enables easier density matching for

each source independently. We study the performance of these

algorithms in Section V-D.

Finally, it is important to note that exact density matching is

not generally critical for the performance of ICA algorithms. As

the discussion in Section III highlights, it is important only when

the sample size is small or the sources are highly noncircular. In

many cases, simpler algorithms that do not explicitly estimate

the distributions, and may not even match the circular/noncir-

cular nature of the data, can provide satisfactory performance

for practical problems such as fMRI analysis [68].

D. Examples

In this section, we present two examples to show the per-

formance of the ICA algorithms we have discussed so far, for

Gaussian and non-Gaussian sources with varying degrees of

noncircularity [69]. Three indices are used to evaluate perfor-

mance. Ideally, the product of demixing and mixing

matrix should be monomial. Based on that, we define the first

performance index as follows: For each row of , we keep

only the entry whose magnitude is largest. If the matrix we

thus obtain is monomial, we declare success, otherwise failure.

The ratio of failed trials is the first performance index. The

second performance index is the average interference-to-signal-

ratio (ISR), which is calculated as the average for all successful

runs. The ISR for a given source is defined as the inverse ratio

of the largest squared magnitude in the corresponding row of

to the sum of squared magnitudes for the remaining entries in

that row. The third performance index is the average CPU time.

We consider three different ICA algorithms: i) the second-

order SUT [41]; ii) ACMN, which maximizes negentropy by

adaptively estimating the parameters of the GGD model in (35)

[86], [87]; and iii) complex ICA-EBM, which maximizes the

likelihood and approximates the entropy (source distributions)

through a set of flexible measuring functions [69].

The first example is the separation of noncircular Gaussian

sources with distinct circularity coef ficients. Five Gaussian

signals with unit variance and complementary variances of

0, 0.2, 0.4, 0.6, and 0.8 are mixed with a randomly

chosen square matrix whose real and imaginary elements are

independently drawn from a zero-mean, unit-variance Gaussian

distribution. Fig. 13(a) shows the performance indices. We ob-

serve that the second-order SUT algorithm exhibits the best

separation performance and also consumes the least CPU

time. ACMN fails to separate the mixtures primarily because

of the simplification it employs in the derivation. Complex

ICA-EBM performs as well as SUT only for large sample

sizes. The entropy estimator used in this algorithm can exactly

match the entropy of a Gaussian source, which accounts for the

asymptotic ef ficiency of complex ICA-EBM for the separation

of Gaussian sources.

In the second example, we study the performance of the

algorithms for complex GGD sources whose pdf is given




in (35). Nine sources are generated with shape parameters

and 9, hence including four

super-Gaussian, one Gaussian, and four sub-Gaussian sources.

For each run, the complex correlation coef ficient is selected

randomly inside the unit circle, resulting in different circularity

coef ficients. Fig. 13(b) depicts the performance indices. We

observe that the flexible ICA-EBM outperforms the other

algorithms at a reasonable computational cost. ACMN, which

is based on a complex GGD model, provides the next best per-

formance but still with a significant performance gap primarily

because of the unitary constraint it imposes on the demixing

matrix .

In summary, the second-order SUT is ef ficient for estimation

of Gaussian sources provided that they all have distinct circu-

larity coef ficients. By adaptively estimating the parameters of

a GGD model, ACMN can provide satisfactory performance

for most cases as shown in [69], [87] but it is also computa-

tionally demanding. Flexible density matching in ICA-EBM,

on the other hand, provides an attractive trade-off between per-

formance and computational cost. In addition, in a maximum

likelihood framework, the demixing matrix is not constrained.

This can lead to better performance compared to algorithms that

do impose constraints, such as ACMN. Except when prior in-

formation such as distinct circularity coef ficients is available,

a flexible algorithm such as ICA-EBM would thus provide the

best choice with robust performance.

VI. CONCLUSION

In this overview paper, we have tried to illuminate the

role that impropriety and noncircularity play in statistical

signal processing of complex-valued data. We considered three

questions: Why should we care about impropriety and non-

circularity? When does it make sense to take it into account?

How do we do it? In a nutshell, the answers to these questions

are: We should care because it can lead to significant perfor-

mance improvements in estimation, detection, and time-series

analysis. We should take it into account whenever there is

suf ficient statistical evidence that signals and/or the underlying

nature of the problem are indeed improper. We can do it by

considering the complete statistical characterization of the

signals and employing Wirtinger calculus for the optimization

of cost functions with complex parameters.

In second-order methods such as mean-square error estima-

tion, a complete statistical characterization requires considera-

tion of the complementary correlation; in problems such as ICA,

it requires the use of flexible density models that can account for

noncircularity. The caveat is that noncircular models have more

degrees of freedom than circular models, and can hence lead

to performance degradation in certain scenarios, even when the

signals are noncircular. We noted that circular models are to be

preferred when the SNR is low, the number of samples is small,

or the degree of noncircularity is low. In addition, in the imple-

mentation of adaptive algorithms such as the LMS algorithm,

taking the complete second-order statistical information into ac-

count may come at the expense of slower convergence.

We have reviewed some of the fundamental results, and some

selected more recent developments in thefield. There isa lot that

we have not included, such as Cramér–Rao-type performance

bounds [37], [97], [115], [127] and a much more in-depth dis-

cussion of random processes. We have also paid only scant at-

tention to the singular case of maximally improper/noncircular

signals (also called “rectilinear” or “strict-sense noncircular”

signals). This case deserves more attention because many al-

gorithms developed for improper/noncircular signals either fail

in the maximally improper case or may even give worse per-

formance than algorithms developed for proper/circular signals.

Examples of papers dealing explicitly with maximally improper

signals are[1],[2], [49], [107]. In addition, we would like to note

the growing interest in the extension of these results to hyper-

complex numbers, in particular quaternions (see, e.g., [9], [17],

[120], [121], [124], [130], and [131]).

We hope that this overview paper will encourage more re-

searchers to take full advantage of the power of complex-valued

signal processing.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers.T. Adalı would particularly like to thank X.-L. Li, M. Novey,and H. Li for helpful comments and suggestions.

R EFERENCES

[1] H. Abeida and J. P. Delmas, “MUSIC-like estimation of direction of arrival for non-circular sources,” IEEE Trans. Signal Process., vol. 54,no. 7, pp. 2678–2690, 2006.

[2] H. Abeida and J. P. Delmas, “Statistical performance of MUSIC-likealgorithms in resolving noncircular sources,” IEEE Trans. Signal Process., vol. 56, no. 9, pp. 4317–4329, 2008.

[3] M. J. Ablowitz and A. S. Fokas , Complex Variables. Cambridge,U.K.: Cambridge Univ. Press, 2003.

[4] T. Adalı

and V. D. Calhoun, “Complex ICA of medical imaging data,” IEEE Signal Process. Mag., vol. 24, no. 5, pp. 136–139, Sep. 2007.[5] T. Adalı and H. Li , Adaptive Signal Processing: Next Generation So-

lutions, Chapter Complex-Valued Signal Processing , T. Adalı and S.Haykin, Eds. New York: Wiley Interscience, 2010.

[6] T. Adalı, H. Li, and R. Aloysius, “On properties of the widely linear MSE filter and its LMS implementation,” presented at the Conf. Inf.Sci. Syst., Baltimore, MD, Mar. 2009.

[7] T. Adalı, H. Li, M. Novey, and J.-F. Cardoso, “Complex ICA usingnonlinear functions,” IEEE Trans. Signal Process., vol. 56, no. 9, pp.4356–4544, Sep. 2008.

[8] H. Akaike, “A new look at statistical model identification,” IEEE Trans. Autom. Control , vol. AC-19, no. 6, pp. 716–723, 1974.

[9] P. O. Amblard and N. L. Bihan, “On properness of quaternion valuedrandom variables,” in Proc. Int. Conf. Math. (IMA) Signal Process.,2004, pp. 23–26.

[10] P. O. Amblard and P. Duvaut, “Filtrage adaptè dans le cas Gaussien

complexe non circulaire,” in Proc. GRETSI Conf., 1995, pp. 141–144.[11] P. O. Amblard, M. Gaeta, and J. L. Lacoume, “Statistics for complex

variables and signals—Part 1: Variables,” Signal Process., vol. 53, no.1, pp. 1–13, 1996.

[12] P. O. Amblard, M. Gaeta, and J. L. Lacoume, “Statistics for complexvariables and signals—Part 2: Signals,” Signal Process., vol. 53, no. 1, pp. 15–25, 1996.

[13] S. A. Andersson and M. D. Perlman, “Two testing problems relating thereal and complex multivariate normal distribution,” J. Multivar. Anal.,vol. 15, pp. 21–51, 1984.

[14] J. Anemüller, T. J. Sejnowski, and S. Makeig, “Complex independentcomponent analysis of frequency-domain electroencephalographicdata,” Neural Netw., vol. 16, pp. 1311–1323, 2003.

[15] L. Anttila, M. Valkama, and M. Renfors, “Circularity-based I/Q imbalance compensation in wideband direct-conversion receivers,” IEEE Trans. Veh. Tech., vol. 57, no. 4, pp. 2099–2113, 2008.

[16] A. Bell and T. Sejnowski, “An information maximization approach to blind separation and blind deconvolution,” Neural Comput., vol. 7, pp.1129–1159, 1995.




[17] N. Le Bihan and P.O. Amblard, “Detection and estimation of Gaussian proper quaternion valued random processes,” in Proc. Int. Conf. Math.(IMA) Signal Process., 2006, pp. 23–26.

[18] E. Bingham and A. Hyvärinen, “A fast fixed-point algorithm for independent component analysis of complex valued signals,” Int. J. Neural Syst., vol. 10, pp. 1–8, 2000.

[19] D. H. Brandwood, “A complex gradient operator and its applicationin adaptive array theory,” Proc. Inst. Electr. Eng., vol. 130, no. 1, pp.11–16, Feb. 1983.

[20] W. M. Brown and R. B. Crane, “Conjugate linear filtering,” IEEE Trans. Inf. Theory, vol. 15, no. 4, pp. 462–465, 1969.

[21] H. J. Butterweck, “A steady-state analysis of the LMS adaptive algo-rithm without the use of independence assumption,” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), Detroit, 1995, pp. 1404–1407.

[22] S. Buzzi, M. Lops, and S. Sardellitti, “Widely linear reception strate-gies for layered space-time wireless communications,” IEEE Trans.Signal Process., vol. 54, no. 6, pp. 2252–2262, 2006.

[23] A. S. Cacciapuoti, G. Gelli, and F. Verde, “FIR zero-forcing multiuser detection and code designs for downlink MC-CDMA,” IEEE Trans.Signal Process., vol. 55, no. 10, pp. 4737–4751, 2007.

[24] J.-F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” Proc. Inst. Electr. Eng.—Radar Signal Process., vol. 140, pp. 362–370, 1993.

[25] P. Charge, Y. Wang, and J. Saillard, “A non-circular sources direction

finding method using polynomial rooting,” Signal Process., vol.81,pp.1765–1770, 2001.

[26] P. Chevalier and A. Blin, “Widely linear MVDR beamformers for the reception of an unknown signal corrupted by noncircular interfer-ences,” IEEE Trans. Signal Process., vol. 55, no. 11, pp. 5323–5336,2007.

[27] P. Chevalier, J. P. Delmas, and A. Oukaci, “Optimal widely linear MVDR beamforming for noncircular signals,” Proc. Int. Conf. Acoust.,Speech, Signal Process. (IEEE ICASSP), pp. 3573–3576, 2009.

[28] P. Chevalier, P. Duvaut, and B. Picinbono, “Complex transversalVolterra filters optimal for detection and estimation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), Toronto,ON, 1991, pp. 3537–3540.

[29] P. Chevalier and A. Maurice, “Constrained beamforming for cyclosta-tionary signals,” Proc. IEEE Int. Conf.Acoust., Speech, Signal Process.(IEEE ICASSP), pp. 3789–3792, 1997.

[30] P. Chevalier and B. Picinbono, “Complex linear-quadratic systems for detection and array processing,” IEEE Trans. Signal Process., vol. 44,no. 10, pp. 2631–2634, Oct. 1996.

[31] P. Chevalier and F. Pipon, “New insights into optimal widely linear array receivers for the demodulation of BPSK, MSK, and GMSK in-terferences—application to SAIC,” IEEE Trans. Signal Process., vol.54, no. 3, pp. 870–883, 2006.

[32] P. Comon, “Estimation multivariable complexe,” Traitement duSignal , vol. 3, pp. 97–102, 1986.

[33] P. Comon, “Circularité et signaux aléatoires à temps discret,” Traite-ment du Signal , vol. 11, no. 5, pp. 417–420, 1994.

[34] P. Comon, “Independent component analysis—a new concept?,”Signal Process., vol. 36, pp. 287–314, 1994.

[35] P. Comon and C. Jutten , Handbook of Blind Source Separation. NewYork: Academic, 2010.

[36] J. P. Delmas, “Asymptotically minimum variance second-order esti-mation for noncircular signals with application to DOA estimation,”

IEEE Trans. Signal Process., vol. 52, no. 5, pp. 1235–1241, 2004.[37] J. P. Delmas and H. Abeida,“Stochastic Cramér-Rao bound for noncir-

cular signals with application to DOA estimation,” IEEE Trans. Signal Process., vol. 52, no. 11, pp. 3192–3199, 2004.

[38] S. Douglas and D. Mandic, “Performance analysis of the conventionalcomplex LMS and augmented complex LMS algorithms,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (IEEE ICASSP),Dallas, TX, Mar. 2010, pp. 3794–3797.

[39] P. Duvaut, “Processus et vecteurs complexes elliptiques,” in Proc.GRETSI Conf., 1995, pp. 129–132.

[40] J. Eriksson and V. Koivunen, “Complex-valued ICA using secondorder statistics,” in Proc. IEEE Int. Workshop Mach. Learn. Signal Process. (MLSP), Saõ Luis, Brazil, Sep. 2004, pp. 183–192.

[41] J. Eriksson and V. Koivunen, “Complex random vectors and ICAmodels: Identifiability, uniqueness and separability,” IEEE Trans. Inf.Theory, vol. 52, no. 3, pp. 1017–1029, 2006.

[42] J. Eriksson, E. Ollila, and V. Koivunen, “Essential statistics and toolsfor complex random variables,” IEEE Trans. Signal Process., vol. 58,no. 10, pp. 5400–5408, 2010.

[43] M. Gaeta, “Higher-order statistics applied to source separation,” Ph.D.dissertation, INP, Grenoble, France, 1991.

[44] W. A. Gardner, “Cyclic Wiener filtering: Theory and method,” IEEE Trans. Commun., vol. 41, no. 1, pp. 151–163, 1993.

[45] G. Gelli, L. Paura, and A. R. P. Ragozini, “Blind widely linear mul-tiuser detection,” IEEE Commun. Lett., vol. 4, no. 6, pp. 187–189,2000.

[46] H. Gerstacker, R. Schober, and A. Lampe, “Receivers with widelylinear processing for frequency-selective channels,” IEEE Trans.Commun., vol. 51, no. 9, pp. 1512–1523, 2003.

[47] N. R. Goodman, “Statistical analysis based on a certain multivariatecomplex Gaussian distribution,” Ann.Math. Stat., vol. 34, pp. 152–176,1963.

[48] T. L. Grettenberg, “A representation theorem for complex normal pro-cesses,” IEEE Trans. Inf. Theory, vol. 11, no. 2, pp. 305–306, 1965.

[49] M. Haardt and F. Roemer, “Enhancements of unitary ESPRIT for non-circular sources,” in Proc. Int. Conf. Acoust., Speech, Signal Process.(IEEE ICASSP), 2004, pp. 101–104.

[50] S. Haykin , Adaptive Filter Theory, fourth ed. Upper Saddle River, NJ: Prentice-Hall, Inc., 2002.

[51] P. Henrici , Applied and Computational Complex Analysis. NewYork: Wiley, 1986, vol. III.

[52] L. Hörmander , An Introduction to Complex Analysis in Several Vari-ables. Oxford, U.K: North-Holland, 1990.

[53] R. A. Horn and C. R. Johnson , Matrix Analysis. New York: Cam-

bridge Univ. Press, 1999.[54] A. Hyvärinen, “One-unit contrast functions for independent component

analysis: A statistical analysis,” in Proc. IEEE Workshop Neural Netw.Signal Process. (NNSP), Amelia Island, FL, Sep. 1997, pp. 388–397.

[55] A. Hyvärinen, “Fast and robust fixed-point algorithms for independentcomponent analysis,” IEEE Trans. Neural Netw., vol. 10, no. 3, pp.626–634, 1999.

[56] A. Hyvärinen, J. Karhunen, and E. Oja , Independent Component Anal- ysis. New York: Wiley, 2001.

[57] J.J. Jeon,J. G.Andrews,and K.M. Sung,“The blindwidely linear min-imum output energy algorithm for DS-CDMA systems,” IEEE Trans.Signal Process., vol. 54, no. 5, pp. 1926–1931, 2006.

[58] S. M. Kay and J. R.Gabriel, “An invariance property of the generalizedlikelihood ratio test,” IEEE Signal Process. Lett., vol. 10, no. 12, pp.352–355, 2003.

[59] K. Kreutz-Delgado, “The Complex Gradient Operator and the CR-Cal-culus. ECE275A: Parameter Estimation I, Lecture Supplement onComplex Vector Calculus,” Univ. of California, San Diego, CA, 2007[Online]. Available: http://arxiv.org/abs/0906.4835

[60] J. L. Lacoume, “Variables et signauxalèatoires complexes,” Traitement du Signal , vol. 15, pp. 535–544, 1998.

[61] J. L. Lacoume and M. Gaeta, “Complex random variable: A tensorialapproach,” in Proc. IEEE Workshop on Higher-Order Statistics, 1991,vol. 15.

[62] A. Lampe, R. Schober, and W. Gerstacker, “A novel iterative mul-tiuser detector for complex modulation schemes,” IEEE J. Sel. AreasCommun., vol. 20, no. 2, pp. 339–350, 2002.

[63] L. De Lathauwer and B. De Moor, “On the blind separation of non-cir-cular sources,” presented at the Eur.SignalProcess. Conf. (EUSIPCO),Toulouse, France, 2002.

[64] H. Li and T. Adalı, “Optimization in the complex domain for non-linear adaptive filtering,” in Proc. 33rd Asilomar Conf. Signals, Syst.,Comput., Pacific Grove, CA, Nov. 2006, pp. 263–267.

[65] H. Li and T. Adalı, “Complex-valued adaptive signal processing usingnonlinear functions,” J. Adv. Signal Process., p. 9, 2008, Article ID765615.

[66] H.Li andT. Adalı, “Algorithms for complexML ICAand their stabilityanalysis using Wirtinger calculus,” IEEE Trans. Signal Process., vol.58, no. 12, pp. 6156–6167, 2010.

[67] H. Li and T. Adalı, “Application of independent component analysiswith adaptive density model to complex-valued fMRI data,” IEEE Trans. Biomed. Eng., preprint, DOI: 10.1109/TBME.2011.2159841.

[68] H. Li, T Adalı, N. M. Correa, P. A. Rodriguez, and V. D. Calhoun,“Flexiblecomplex ICA of fMRI data,” presented at the IEEE Int. Conf.Acoust., Speech, Signal Process. (IEEE ICASSP), Dallas, TX, Mar.2010.

[69] X.-L. Li and T. Adalı, “Complex independent component analysis by entropy bound minimization,” IEEE Trans. Circuits Syst. I, Fund.Theory Appl., vol. 57, no. 7, pp. 1417–1430, Jul. 2010.

[70] X.-L. Li and T. Adalı, “Independent component analysis by entropy bound minimization,” IEEE Trans. Signal Process., vol. 58, no. 10,

pp. 5151–5164, Oct. 2010.




[71] X.-L. Li and T. Adalı, “Blind separation of noncircular correlatedsources using Gaussian entropy rate,” IEEE Trans. Signal Process.,vol. 59, no. 6, pp. 2969–2975, Jun. 2011.

[72] X.-L. Li, T. Adalı, and M. Anderson, “Noncircular principal component analysis and its application to model selection,” IEEE Trans.Signal Process., vol. 59, no. 10, pp. 4516–4528, 2011.

[73] X.-L. Li, M. Anderson, and T. Adalı, “Detection of circular and non-circular signals in the presence of circular white Gaussian noise,” pre-sented at the Asilomar Conf. Signals, Syst., Comput., Pacific Grove,CA, Nov. 2010.

[74] O. Macchi , LMS Adaptive Processing With Applications in Transmis- sion. New York: Wiley, 1996.

[75] D. P. Mandic and V. S. L. Goh , Complex Valued Nonlinear Adaptive Filters. New York: Wiley, 2009.

[76] P. Marchand, P. O. Amblard, and J. L. Lacoume, “Statistiques d’ordresupèrieurá deux pour des signaux cyclostationnairesá valeurs complexes,” in Proc. GRETSI Conf., 1995, pp. 69–72.

[77] T. McWhorter and P. Schreier, “Widely-linear beamforming,” in Proc.37th Asilomar Conf. Signals, Syst., Comput., 2003, pp. 753–759.

[78] R. Meyer,W. H. Gerstacker, R. Schober, and J. B. Huber, “A single an-tenna interference cancellation algorithm for increased GSM capacity,” IEEE Trans. Wireless Commun., vol. 5, pp. 1616–1621, 2006.

[79] A. Mirbagheri, N. Plataniotis, and S. Pasupathy, “An enhancedwidely linear CDMA receiver with OQPSK modulation,” IEEE Trans.Commun., vol. 54, pp. 261–272, 2006.

[80] C. N. K. Mooers, “A technique for the cross spectrum analysis of pairsof complex-valued time series, with emphasis on properties of polar-ized components and rotational invariants,” Deep-Sea Res., vol.20,pp.1129–1141, 1973.

[81] D. R. Morgan, “Varianceand correlation of square-law detected allpasschannels with bandpass harmonic signals in Gaussian noise,” IEEE Trans. Signal Process., vol. 54, no. 8, pp. 2964–2975, 2006.

[82] D. R. Morgan and C. K. Madsen, “Wide-band system identificationusing multiple tones with allpass filters and square-law detectors,” IEEE Trans. Circuits Syst. I , vol. 53, no. 5, pp. 1151–1165, 2006.

[83] A. Napolitano and M. Tanda, “Doppler-channel blind identificationfor noncircular transmissions in multiple-access systems,” IEEE Trans.Commun., vol. 52, no. 12, pp. 2073–2078, 2004.

[84] F. D. Neeserand J. L. Massey,“Proper complex random processes withapplications to information theory,” IEEE Trans. Inf. Theory, vol. 39, pp. 1293–1302, Jul. 1993.

[85] R. Nilsson, F. Sjoberg, and J. P. LeBlanc, “A rank-reduced LMMSEcanceller for narrowband interference suppressionin OFDM-based sys-tems,” IEEE Trans. Commun., vol.51, no.12, pp.2126–2140, 2003.

[86] M. Novey, “Complex ICA using nonlinear functions,” Ph.D. disserta-tion, Univ. of Maryland Graduate School, Baltimore, MD, 2009.

[87] M. Novey and T. Adalı, “Complex ICA by negentropy maximization,” IEEE Trans. Neural Netw., vol. 19, no. 4, pp. 596–609, Apr. 2008.

[88] M. Novey and T. Adalı, “On extending the complex fastICA algorithmto noncircular sources,” IEEE Trans. Signal Process., vol. 56, no. 5, pp. 2148–2154, Apr. 2008.

[89] M. Novey, T. Adalı, and A. Roy, “Circularity andGaussianity detectionusing the complex generalized Gaussian distribution,” IEEE Signal Proc. Lett., vol. 16, no. 1, pp. 993–996, Nov. 2009.

[90] M. Novey, T. Adalı, and A. Roy,“A complexgeneralizedGaussian dis-tribution—characterization, generation, and estimation,” IEEE Trans.Signal Process., vol. 58, no. 3, pp. 1427–1433, Mar. 2010.

[91] S. Olhede, “On probability density functions for complex variables,”

IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 1212–1217, Mar. 2006.[92] E. Ollila, “On the circularity of a complex random variable,” IEEE

Signal Process. Lett., vol. 15, pp. 841–844, 2008.[93] E. Ollila, J. Eriksson, and V. Koivunen, “Complex elliptically

symmetric random variables—generation, characterization and circu-larity tests,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 58–69,2011.

[94] E. Ollila and V. Koivunen, “Generalized complex elliptical distribu-tions,” in Proc. 3rd Sensor Array Multichannel Signal Process. Work- shop, Sitges, Spain, Jul. 2004, pp. 460–464.

[95] E. Ollila and V. Koivunen, “Adjusting the generalized likelihood ratiotest of circularity robust to non-normality,” presented at the IEEE Int.Workshop Signal Process., Perugia, Italy, Jun. 2009.

[96] E. Ollila and V. Koivunen, “Complex ICA using generalized uncorre-lating transform,” Signal Process., vol. 89, pp. 365–377, Apr. 2009.

[97] E. Ollila, V. Koivunen, and J. Eriksson, “On the cramer-Rao bound for

the constrained and unconstrained complex parameters,” in Proc. IEEE Sensor Array Multichannel Signal Process. Workshop (SAM), Darm-stadt, Germany, Jul. 2008, pp. 414–418.

[98] K. B. Petersen and M. S. Pedersen, The Matrix Cookbook, Ver-sion 20081110, 2008 [Online]. Available: http://www2.imm.dtu.dk/ pubdb/p.php?3274.

[99] D. Pham and P. Garat, “Blind separation of mixtures of independentsources through a quasi maximum likelihood approach,” IEEE Trans.Signal Process., vol. 45, no. 7, pp. 1712–1725, 1997.

[100] B. Picinbono, “On circularity,” IEEE Trans. Signal Process., vol. 42,no. 12, pp. 3473–3482, Dec. 1994.

[101] B. Picinbono, “Second-order complex random vectors and normaldistributions,” IEEE Trans. Signal Process., vol. 44, no. 10, pp.2637–2640, Oct. 1996.

[102] B. Picinbono and P. Bondon, “Second-order statistics of complex sig-nals,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 411–419, Feb.1997.

[103] B. Picinbono and P. Chevalier,“Widely linearestimationwith complexdata,” IEEE Trans. Signal Process., vol. 43,pp. 2030–2033, Aug.1995.

[104] R. Remmert , Theory of Complex Functions. Harrisonburg, VA:Springer-Verlag, 1991.

[105] J. Rissanen, “Modeling by the shortest data description,” Automatica,vol. 14, pp. 465–471, 1978.

[106] P. A. Rodriguez, V. D. Calhoun, and T. Adalı, “De-noising, phaseambiguity correction and visualization techniques for complex-valuedICA of group fMRI data,” Pattern Recognit., to be published.

[107] F. Römer and M. Haardt, “Multidimensional unitary tensor-ESPRITfor non-circular sources,” in Proc. Int. Conf. Acoust., Speech, Signal

Process. (IEEE ICASSP), 2009, pp. 3577–3580.[108] P. Rubin-Delanchy and A. T. Walden, “Simulation of improper com-

plex-valued sequences,” IEEE Trans. Signal Process., vol. 55, no. 11, pp. 5517–5521, 2007.

[109] P. Ruiz and J. L. Lacoume, “Extraction of independent sources fromcorrelated inputs: A solution based on cumulants,” in Proc. Workshop Higher-Order Spectral Anal., Jun. 1989, pp. 146–151.

[110] P. Rykaczewski, M. Valkama, and M. Renfors, “On the connectionof I/Q imbalance and channel equalization in direct-conversion trans-ceivers,” IEEE Trans. Veh. Tech., vol. 57, no. 3, pp. 1630–1636, 2008.

[111] L. L. Scharf, P. J. Schreier, and A. Hanssen, “The Hilbert space geom-etry of the Rihaczek distribution for stochastic analytic signals,” IEEE Signal Process. Lett., vol. 12, no. 4, pp. 297–300, 2005.

[112] P. J. Schreier, “Bounds on the degree of improperiety of complexrandom vectors,” IEEE Signal Process. Lett., vol. 15, pp. 190–193,2008.

[113] P. J. Schreier, T. Adalı, and L. L. Scharf, “On ICA of improper andnoncircular sources,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), Taipei, Taiwan, Apr. 2009, pp. 3561–3564.

[114] P. J. Schreier and L. L. Scharf, “Second-order analysis of improper complex random vectors and processes,” IEEE Trans. Signal Process.,vol. 51, no. 3, pp. 714–725, Mar. 2003.

[115] P. J. Schreier and L. L. Scharf , Statistical Signal Processing of Com- plex-ValuedData: The Theory of Improper and Noncircular Signals.Cambridge, U.K.: Cambridge Univ. Press, 2010.

[116] P. J. Schreier, L. L. Scharf, and A. Hanssen, “A generalized likelihoodratio test for impropriety of complex signals,” IEEE Signal Process. Lett., vol. 13, no. 7, pp. 433–436, Jul. 2006.

[117] P. J. Schreier, L. L. Scharf, and C. T. Mullis, “Detection and estimationof improper complex random signals,” IEEE Trans. Inf. Theory, vol.51, no. 1, pp. 306–312, Jan. 2005.

[118] G. E. Schwarz, “Estimating thedimensionsof a model,” Ann. Stat., vol.6, no. 2, pp. 461–464, 1978.

[119] G. Tauböck, “Complex noise analysis of DMT,” IEEE Trans. Signal Process., vol. 55, no. 12, pp. 5739–5754, 2007.

[120] C. C. Took and D. P. Mandic, “Quaternion-valued stochastic gradient- based adaptive iir filtering,” IEEE Trans. Signal Process., vol. 58, pp.3895–3901, Jul. 2010.

[121] C. C. Took and D. P. Mandic, “Augmented second-order statistics of quaternion random signals,” Signal Process., vol. 91, pp. 214–224,Feb. 2011.

[122] T. Trainini, X.-L. Li, E. Moreau, and T. Adalı, “A relative gradient al-gorithmfor joint decompositionsof complexmatrices,” presentedat theEur.Signal Process. Conf. (EUSIPCO), Aalborg, Denmark,Aug. 2010.

[123] H. Trigui and D. T. M. Slock, “Performance bounds for cochannelinterference cancellation within the current GSM standard,” Signal Process., vol. 80, pp. 1335–1346, 2000.

[124] N. N. Vakhania, “Random vectors with values in quaternion Hilbertspaces,” Theory Probability Appl., vol. 43, no. 1, pp. 99–115, 1999.

[125] N. N. Vakhania and N. P. Kandelaki, “Random vectors with values incomplex Hilbert spaces,” Theory Probability Appl., vol. 41, no. 1, pp.116–131, Feb. 1996.




[126] A. van den Bos, “Complex gradient and Hessian,” Proc. Inst. Electr. Eng.—Vision, Image, Signal Process., vol. 141, no. 6, pp. 380–382,Dec. 1994.

[127] A. van den Bos, “A Cramér-Raolower bound for complexparameters,” IEEE Trans. Signal Process., vol. 42, no. 10, pp. 2859–2859, 1994.

[128] A. van den Bos, “Estimation of complex parameters,” in Proc. 10th IFAC Symp., Jul. 1994, vol. 3, pp. 495–499.

[129] A. van den Bos, “The multivariate complex normal distribution—Ageneralization,” IEEE Trans. Inf. Theory, vol. 41, pp. 537–539, 1995.

[130] J. Via, D. Ramirez, I. Santamaria, and L. Vielva, “Properness andwidely linear processing of quaternion random vectors,” IEEE Trans. Info. Theory, vol. 56, no. 7, pp. 3502–3515, Jul. 2010.

[131] J. Via, D. Ramirez, I. Santamaria, and L. Vielva, “Widely and semi-widely linear processing of quaternion vectors,” in Proc. IEEE Int.Conf. Acoust., Speech, Signal Process. (IEEE ICASSP), Dallas, TX,Mar. 2010, pp. 3946–3949.

[132] P. Wahlberg andP. J. Schreier, “Spectral relations formultidimensionalcomplex improper stationary and (almost) cyclostationary processes,” IEEE Trans. Inf. Theory, vol. 54, no. 4, pp. 1670–1682, 2008.

[133] A. T. Walden and P. Rubin-Delanchy, “On testing for impropriety of complex-valued Gaussian vectors,” IEEE Trans. Signal Process., vol.57, no. 3, pp. 825–834, Mar. 2009.

[134] M. Wax and T. Kailath, “Detection of signals by information theoreticcriteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. 33, no. 2, pp. 387–392, Apr. 1985.

[135] B. Widrow, J. Cool, and M. Ball, “Thecomplex LMSalgorithm,” Proc. IEEE , vol. 63, pp. 719–720, 1975.

[136] B. Widrow and M. E. Hopf, Jr., “Adaptive switching circuits,” Conf. IRE WESCON , vol. 4, pp. 96–104, 1960.

[137] W. Wirtinger, “Zur formalen Theorie der Funktionen von mehr kom- plexen Veränderlichen,” Math. Ann., vol. 97, pp. 357–375, 1927.

[138] M. Witzke, “Linear and widely linear filtering applied to iterative de-tection of generalized MIMO signals,” Ann. Telecommun., vol. 60, pp.147–168, 2005.

[139] R. A. Wooding, “The multivariate distribution of complex normal vari-ables,” Biometrika, vol. 43, pp. 212–215, 1956.

[140] Y. C. Yoon and H. Leib, “Maximizing SNR in improper complex noiseand applications to CDMA,” IEEE Commun. Lett., vol. 1, no. 1, pp.5–8, 1997.

[141] V. Zarzosoand P. Comon, “Robustindependent component analysis byiterative maximization of the kurtosis contrast with algebraic optimalstep size,” IEEE Trans. Neural Netw., vol. 21, no. 2, pp. 248–261, Feb.2010.

[142] Y. Zou,M. Valkama,and M. Renfors, “Digital compensation of I/Qim- balance effects in space-time coded transmit diversity systems,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2496–2508, 2008.

Tülay Adalı (S’89–M’93–SM’98–F’09) receivedthe Ph.D. degree in electrical engineering from North Carolina State University, Raleigh, in 1992.

She joined the faculty at the University of Mary-land Baltimore County (UMBC), Baltimore, in1992. She is currently a Professor in the Departmentof Computer Science and Electrical Engineering atUMBC. Her research interests are in the areas of

statistical signal processing, machine learning for signal processing, and biomedical data analysis.Prof. Adalı assisted in theorganization of a number

of international conferences and workshops, including the IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP), the IEEEInternational Workshopon Neural Networks for Signal Processing (NNSP), andthe IEEE International Workshop on Machine Learning for Signal Processing(MLSP). She was the General Co-Chair of the NNSP workshops from 2001to 2003; the Technical Chair of the MLSP workshops from 2004 to 2008; theProgram Co-Chair of the MLSP workshops in 2008 and 2009, the Interna-tional Conference on Independent Component Analysis and Source Separationin 2009; the Publicity Chair of the ICASSP in 2000 and 2005, respectively;and Publications Co-Chair of the ICASSP in 2008. She chaired the IEEE SPSMachine Learning for Signal Processing Technical Committee from 2003 to2005; was Member of the SPS Conference Board from 1998 to 2006; Member of the Bio Imaging and Signal Processing Technical Committee from 2004 to

2007; and AssociateEditor ofthe IEEE TRANSACTIONS ON SIGNAL PROCESSING

from 2003 to 2006 and the Elsevier Signal Processing Journal from 2007 to2010. She is currently Chair of the MLSP Technical Committee, serving on theSignal Processing Theory and Methods Technical Committee; Associate Editor of the IEEE TRANSACTIONS ON BIOMEDICAL E NGINEERING and JOURNAL OF

SIGNAL PROCESSING SYSTEMS FOR SIGNAL, IMAGE, AND VIDEO TECHNOLOGY;and Senior Editorial Board member of the IEEE JOURNAL OF SELECTED AREAS

IN SIGNAL PROCESSING. She is an IEEE SPS Distinguished Lecturer for 2012and 2013, the recipient of a 2010 IEEE Signal Processing Society Best Paper Award, and the past recipient of an NSF CAREER Award. She is a Fellow of the AIMBE.

Peter J. Schreier (S’03–M’04–SM’09) was bornin Munich, Germany, in 1975. He received theMaster of Science degree from the University of Notre Dame, Notre Dame, IN, in 1999 and the Ph.D.degree from the University of Colorado at Boulder,in 2003, both in electrical engineering.

From 2004 until January 2011, he was a facultymember in the School of Electrical Engineering andComputer Science at the University of Newcastle,Australia. Since February 2011, he has been ChairedProfessor of signal and system theory in the Faculty

of Electrical Engineering, Computer Science, and Mathematics at UniversitätPaderborn, Germany. This chair receives financial support from the AlfriedKrupp von Bohlen und Halbach foundation.

Prof. Schreier has received fellowships from the State of Bavaria, the Studi-enstiftung des deutschen Volkes (German National Academic Foundation), andthe Deutsche Forschungsgemeinschaft (German Research Foundation). He cur-rently serves as Area Editor and Associate Editor of the IEEE TRANSACTIONS

ON SIGNAL PROCESSING. He is a member of the IEEE Technical Committee onMachine Learning for Signal Processing.

Louis L. Scharf (S’67–M’69–SM’77–F’86–LF’07)received the Ph.D. degree from the University of Washington, Seattle.

From 1971 to 1982, he served as Professor of

Electrical Engineering and Statistics with ColoradoState University (CSU), Ft. Collins. From 1982 to1985, he was Professor and Chairman of Electricaland Computer Engineering, University of RhodeIsland, Kingston. From 1985 to 2000, he wasProfessor of Electrical and Computer Engineering,University of Colorado, Boulder. In January 2001,

he rejoined CSU as Professor of Electrical and Computer Engineering andStatistics. He has held several visiting positions here and abroad, includingthe Ecole Superieure d’Electricité, Gif-sur-Yvette, France; Ecole NationaleSuperieure des Télécommunications, Paris, France; EURECOM, Nice, Italy;the University of La Plata, La Plata, Argentina; Duke University, Durham, NC; the University of Wisconsin, Madison; and the University of Tromsø,Tromsø, Norway. His interests are in statistical signal processing, as it appliesto adaptive radar, sonar, and wireless communication. His most importantcontributed to date are to invariance theories for detection and estimation;

matched and adaptive subspace detectors and estimators for radar, sonar, anddata communication; and canonical decompositions for reduced dimensionalfiltering and quantizing. His current interests are in rapidly adaptive receiver design for space-time and frequency-time signal processing in the radar/sonar and wireless communication channels.

Prof. Scharf was Technical Program Chair for the 1980 IEEE InternationalConference on Acoustics, Speech, and Signal Processing (ICASSP), Denver,CO; Tutorials Chair for ICASSP 2001, Salt Lake City, UT; and TechnicalProgram Chair for the Asilomar Conference on Signals, Systems, and Com- puters 2002. He is past-Chair of the Fellow Committee for the IEEE SignalProcessing Society and serves on the Technical Committee for Sensor Arraysand Multichannel Signal Processing. He has received numerous awards for hisresearch contributions to statistical signal processing, including a College Re-search Award, an IEEE Distinguished Lectureship, an IEEE Third MillenniumMedal, and the Technical Achievement and Society Awards from the IEEESignal Processing Society.

complex processing

Documents