[engelberg] compressive sensing

1094-6969/11/$25.00©2012IEEE 42 IEEE Instrumentation & Measurement Magazine February2012

Shlomo Engelberg

W hen measuring a signal, there are rules that must be followed. Given a low-pass signal, the Nyquist sampling theorem states that if you

want to be able to reconstruct the signal from its samples, you must sample the signal at a rate that is greater than twice the signal’s band-width [1]. As with most rules, there are “exceptions” to this rule. As you will see, compressive sensing, a technique cur-rently being developed by researchers the world over, can be considered an “excep-tion” to the Nyquist sampling theorem. Under certain conditions, even when the Nyquist sampling theorem says that a sen-sor needs to store N samples of a signal per second, compressive sensing lets the sen-sor store M<<N linear combinations of

samples per second.Though mathematical theorems like the Nyquist sam-

pling theorem are 100% true and cannot really have exceptions, you have to be very careful when interpreting them. The Nyquist sampling theorem actually says that if all that you know about a low-pass signal is its bandwidth, then you must sample the signal at a rate that is slightly more than twice its bandwidth. When you know more about the signal, you can often get away with fewer samples. (Undersampling [1] is one well known example of this phenomenon.)

Suppose that you know that your signal is a pure tone. Then, you know that the discrete Fourier transform of the (possibly windowed) signal will have most of its energy in a very few frequencies. Another way of saying this is to say that you know that it is possible to represent this signal quite well by a taking a linear combination of a very few elements of a particular set. As we will see, this additional knowledge allows you to store fewer samples per second than the Ny-quist criterion seems to allow. This idea is at the very heart of compressive sensing. Before we can continue describing compressive sensing, however, we need some background from linear algebra.

A Little Linear AlgebraStarting late in high school and continuing into college you are

told that if you have a set of linear equations with more vari-ables than equations, then the solutions to the set of equations cannot be unique. This is a mathematical theorem, and it has no exceptions.

Let us consider a special case, and let us see how we can regain uniqueness – how we can produce something that is al-most an exception to this theorem. Let the number of equations be m, the number of variables be n, and let m<n. Let’s repre-sent the coefficients of the equations by an m × n matrix, A, the variables by an n-dimensional column vector, x, and the values taken by each equation by an m-dimensional column vector, b. A standard result from linear algebra tells us that because there are fewer equations than variables, if A x = b has a solution, it has an infinite number of solutions.

Consider any two solutions x1 and x2. We find that Ax1=Ax2= b or, subtracting the leftmost terms from one an-other, that A(x1–x2)=0. That is, the difference of two solutions must be a vector that A sends to the zero vector; x1-x2 must be an element of the null-space of A. Suppose that in addition to

Compressive Sensing

instrumentationnotes

A Simple ExampleConsider the underdetermined system defined by the equations x + z = b1 and x + 3y = b2. In matrix-vector form this can be written

Any pair of columns of the matrix is independent, and, therefore, any non-zero vector that causes the right-hand side to be zero must have three non-zero entries. It is easy to see that the vectors that make the right-hand side zero are all of the form [-3a a 3a]T, and the only way such a vector can have fewer than three non-zero ele-ments is if all of its elements are zero.

Suppose that you want to find the values of x, y, and z for which the right-hand side equals [1 1]T. Clearly, x = 1 and y = z = 0 is a set of values that work, and, as we have seen, it must be the unique solution with only one non-zero element.

.

February2012 IEEE Instrumentation & Measurement Magazine 43

the other information we have about A, we also know that any m columns of A are independent. Then, all non-zero vectors of A’s null-space have at least m+1 non-zero elements. Other-wise, there would be collections of m or fewer columns that are linearly dependent.

Suppose that a vector x with fewer than (m+1)/2 non-zero elements solves A x = b. Then, x is the unique solution of the equations for which there are fewer than (m+1)/2 non-zero el-ements. To see that this is so, note that any other solution must differ from x by a non-zero vector in A’s null-space, and all such vectors have at least m+1 non-zero elements. Even if each non-zero element in x cancels an element of the vector from the null-space, there will still be more than (m+1)/2 elements of the vector “left standing.” This shows that if there is a vector x with fewer than (m+1)/2 elements that solves the equation A x = b, then this solution is unique among all solutions with fewer than (m+1)/2 non-zero elements. (See the sidebar for a simple exam-ple of these ideas.)

Sometimes knowing that a sparse solution, a solution with few non-zero elements, is desired makes it possible to say that if the desired solution exists, it is unique among sparse solutions.

Compressive SensingSuppose you store n samples of a signal in the vector y, and suppose you know that the sampled signal, y, is a combi-nation of a few vectors from some large set of n-element vectors. Let the size of the large set be k, and represent the set of vectors by REPRESENTATIONmatrix, an n × k matrix whose columns are the vectors of the large set. Then you know that y = REPRESENTATIONmatrix x, and you know that x is sparse. You know that in some representation, perhaps one made up of samples of sines and cosines, the k-dimensional vector x has only l non-zero components. Suppose that in addition to REPRESEN-TATIONmatrix there is an m×n matrix, B, where m<n, such that any m columns of the matrix A =B REPRESENTA-TIONmatrix are independent and that m is substantially larger than (and certainly more than twice as large as) l. Then, if we are given the m-dimensional vector b that satisfies b = A x, we can, in prin-ciple, recover x by searching for the sparse solution of A x=b, and, having found x, we can recover our original samples by calculating y = REPRESENTATIONmatrix x. That is, if there is a sparse representation of y, we can “compress” y from an n-dimensional vector down to an m-dimensional vector, b, without losing any information at all about y.

A Practical Problem…Given the right kind of A, we know that sparse solutions of A x= b are unique. If we know that we must find a sparse solu-tion, then we know we are searching for a unique solution. You might think that we are now doing very well; we know what we are searching for, and we know it is unique. Unfortunately, it has been shown that searching for a sparse solution to a set of linear equations is generally a very hard problem. (In the lan-guage of computational complexity, the problem is known to be NP-hard.) It is not impossible to solve such problems, but it is generally impractical.

… and a Practical SolutionIn a series of groundbreaking papers that appeared in the mid-dle of the previous decade, Emanuel Candès, David Donoho, Justin Romberg, and Terence Tao showed that in many cases a somewhat different approach to this problem leads to the same solutions. They showed that for sufficiently sparse solutions, instead of looking for solutions of A x = b with a minimal num-ber of non-zero elements, you can look for solutions of A x = b for which the sum of the absolute values of the elements of x is minimal. (See, for example, [2], [3], or [4] for more about this subject and for the precise conditions under which this result holds.) There are many efficient ways of solving this prob-lem, and programs that implement these algorithms are freely available. (Of course, the solutions of the two problems cannot always be the same. If they were, then there would be an efficient way of solving the original problem and as that problem is NP-hard, no such general solution can exist.)

What Is Wrong with Our Beloved Mean Square Error Criterion?When studying engineering, it often seems as if the only

way of measuring the size of an object is to consider the sum of the squares of the elements of which it is composed. In our case, we have a vector, x, whose components are x1, x2,…, xn. In college, we learned how to minimize the mean square value,, subject to A x = b. (This minimization leads to a

pseudo-inverse of A.) Why have we suddenly decided to minimize subject to A x = b?

The short answer is that minimizing the mean square error has a tendency to spread out a solution among the ele-ments of x. It does the opposite of what we need. To see why, consider a simple example. Suppose that A x = b has a sparse solution; suppose, for example, that the vector

When studying engineering, it often seems as if the only way of measuring the size of an object

is to consider the sum of the squares of the elements of which

it is composed.

44 IEEE Instrumentation & Measurement Magazine February2012

solves the equation, and that

is in A’s null-space. Then the vector x – α = [1 – α, – α, – α, – α]T

solves the equation for all possible α. Let’s consider small pos-itive values of α and see what the sum of the squares looks like and what the sum of the absolute values looks like.

The sum of the squares is . Be-cause the square of a small number is a very small number, for small positive values of α, this number is less than one. That is, by spreading out the values in x, we make the sum of the squares decrease. If we were to minimize the sum of the squares subject to A x = b, we would be “encouraging” non-sparse solutions.

Now consider the sum of the absolute values. We find that for small values of α, the sum of the absolute values is

Here, taking small non-zero values of α increase the “size” of the vectors. We see that (at least in this case and, in fact, more generally) using the sum of the absolute values of x’s com-ponents to measure x’s size makes minimizing x’s size also minimize the number of non-zero components in x. If there is a sparse solution to A x = b, searching for the solution of A x = b for which the sum of the absolute values of the components of x is as small as possible will tend to search for the sparsest vec-tor x which solves the system of equations.

Compressing a Windowed Sampled Sine WaveWe now consider a simple example of how you go about com-pressing a set of measurements. Suppose that you know that you are receiving a single sine wave, , but you do not know the sine wave’s frequency or amplitude. As-suming that you sample r(t) every Ts seconds and that you take n samples, after sampling you have a vector

According to the Nyquist sampling theorem (and neglect-ing the effects of only sampling for a finite period), if you want to be able to reconstruct r(t) from its samples, you must sample at a rate that is somewhat more than twice the highest fre-quency of interest, and you must save the samples you take.

To use compressive sensing, we must find a set of

vectors in which the signal’s representation is sparse. Re-calling the properties of the discrete Fourier transform (DFT), we realize that the sine wave should have most of its energy in the elements of the DFT that correspond to fre-quencies near f (and its high frequency “image”). To make this even truer, we can multiply the sampled sine wave by the samples of a reasonable window function, w(t) – say a Hanning window – to reduce spectral leakage and keep the number of non-negligible elements in the spec-trum as small as possible. The vector corresponding to the windowed, sampled function is

We are inter-ested in compressing this vector.

Let the DFT of the windowed, sampled signal be the vec-tor Y. It is easy to write the DFT as a matrix product. That is Y = DFTmatrix y (where DFTmatrix is a well known invertible n×n matrix). By assumption, Y is (approximately) sparse; be-cause the original signal has only one frequency, the DFT of the windowed signal has very few elements whose values are not very near zero.

The DFT matrix is invertible. Thus, you can write y =IDFT-

matrix Y where IDFTmatrix is the n×n matrix inverse of DFTmatrix. Because the matrix has as many rows as columns, there is no compression going on here. How can you take advantage of the theory we have presented to store fewer than n samples of sine wave-related data without losing important information about the sine wave? What you do is to take a random matrix, B, with n columns and m rows where m<n, and you consider b = AY = B IDFTmatrix Y. The matrix A ≡ B IDFTmatrix has fewer rows than columns, the vector b has fewer elements than the vectors Y and y, and, because of the random way B’s ele-ments are chosen, generally speaking any set of m of the A’s columns are independent. If such sets of its columns are in-dependent (and certain additional more technical conditions are met [4]), then the value of Y for which b = A Y, and for which the sum of the absolute values of the elements of Y is as small as possible, should give us the coefficients of the DFT of the windowed and sampled sine wave. That is, by saving the m<n elements of b, you can save (almost) all of the infor-mation you need about the windowed, sampled sine wave. You are able to compress your data without any complicated calculations. All that you have to do is save a relatively small number of weighted sums of the samples.

The fact that this type of compression does not require too many arithmetic operations or sophisticated mathematics makes it suitable for use in a not-too-smart sensor – hence the name compressive (or compressed) sensing.

A Practical Implementation Using MATLAB and Freely Available SoftwareYears ago, Romberg and Candès published a set of MAT-LAB routines that perform the computations necessary for

February2012 IEEE Instrumentation & Measurement Magazine 45

a variety of tasks related to compressive sensing [5]. They called the programs l1 magic. (The l1 norm of a vector is the sum of the absolute values of the vector’s elements. To find the solution of A Y = b with the fewest terms, you find the so-lution, Y, whose l1 norm is smallest.) After playing with the programs a bit, I began to understand why they call them magic. You give the programs your compressed measure-ments, b, and the matrix which generates them, A, and, as if by magic, the programs return x which allows you to “un-compress” and recover your measurements, y.

I wanted to try out the routines, so I wrote a MATLAB script that compresses the samples of a windowed sine wave. The script lets me pick m, the number of terms in the com-pressed signal, and the frequency of the sine wave. After I pick a frequency, MATLAB samples a sine wave at that fre-quency 512 times over the period of a second and multiplies each sample by the relevant sample of a Hanning window. (The sampling period, Ts, is (1/512)s. Consequently, the high-est frequency the DFT can handle is just below 256 Hz.) Then, MATLAB calculates m linear combinations of the samples and stores them. (In the MATLAB code, the 512 samples are multiplied by elements of the Hanning window func-tion and the 512 windowed samples are multiplied by an m×512 random matrix B.) This is the “compressive sensing” step. Rather than saving 512 samples, MAT-LAB saves only m samples.

Next, MATLAB un-compresses the signal. First, it finds the Fourier coefficients of the original 512 windowed samples

given only the linear combinations of the windowed sam-ples. Here, I had to be a little bit careful. Though MATLAB’s

FFT command returns the elements of the standard complex valued DFT, the l 1 magic program that I needed to use only “under-stands” real valued matrices and vectors. To get the pro-gram to work properly, I had

MATLAB translate all the complex valued items into real val-ued items. This is not hard work, but it does double the length of some of the vectors and matrices. Then, using the MAT-LAB l1 magic program, I had MATLAB minimize the sum of the absolute values of the elements of Y subject to b = AY (where A and Y have been adjusted so that they are a real ma-trix and a real vector, respectively).

I played with the program for quite a while to see what kind of results I could get. I found that with an m of fifty or more, the Fourier coefficients were generally quite close to those of the original sequence even though MATLAB did not store anything like the 512 values that the Nyquist sampling theorem seems to say that you need to store and even though the vector Y is only approximately sparse.

To get an idea of the accuracy that the routines achieved, see Figs. 1 and 2. Fig. 1 is a plot of the elements of the absolute value of the DFT, of the spectrum, as calculated directly from the original sequence and as calculated from the compressed form of the sequence. Fig. 2 is a plot of the windowed version of the sampled signal and the version reconstructed from the Fourier coefficients as estimated by the l1 magic program. If you would like a copy of the MATLAB script, please drop me a line at [email protected].

Fig. 1. A sampled and windowed 100.5 Hz sine wave was used to produce these results. The plusses represent the absolute value of the coefficients of the DFT of the signal in the region of 100 Hz. The dots represent the absolute value of the coefficients as estimated using the compressive sensing technique.

Fig. 2. The windowed signal and the samples of the reconstructed windowed signal.

To use compressive sensing, we must find a set of vectors in

which the signal’s representation is sparse.

46 IEEE Instrumentation & Measurement Magazine February2012

ConclusionsThough mathematical theorems do not have exceptions, sometimes it is possible to “sneak around” the hypotheses of the theorems and achieve things that seem to be impos-sible. The Nyquist sampling theorem is a case in point. The theorem seems to say that if you have a low-pass signal, then you need to sample the signal at a rate that is more than twice the highest frequency in the signal.

In fact, there are many ways of supplementing the hy-potheses of the theorem and achieving better results. In this brief introduction to compressive sensing, we present one such technique and a simple application.The literature on compressive sensing is vast and is growing all the time. There are many, many other interesting applications of com-pressive sensing. The interested reader might want to read about the one pixel camera [6], for example.

References[1] S. Engelberg, Digital Signal Processing: An Experimental

Introduction, Springer, London, 2008.

[2] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles:

Exact signal reconstruction from highly incomplete frequency

information,” IEEE Trans. Information Theory, vol. 52, no. 2, 2006.

[3] D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory,

vol. 52, no. 4, 2006.

[4] D.L. Donoho, “For most large underdetermined systems of linear

equations, the minimal l1-norm near-solution approximates the

sparsest near-solution, ”Commun. Pure Appl. Math., vol. 59, no. 6, 2006.

[5] E. J. Candès and J. Romberg,”l1 magic”, Cal. Tech., CA. [Online]

Available: http://users.ece.gatech.edu/~justin/l1magic/.

[6] “Compressive Imaging: A new single-pixel camera,” Rice University,

TX. [Online] Available: http://dsp.rice.edu/cscamera.

Shlomo Engelberg ([email protected]) is the editor-in-chief of this magazine. He received his bachelor’s and master’s degrees in engineering from The Cooper Union and his Ph.D. degree in math-ematics from New York University’s Courant Institute. He is an associate professor in the electronics department of the Jerusalem College of Technology. He is the author of A Mathematical Introduc-tion to Control Theory (Imperial College Press, 2005), Random Signals and Noise: A Mathematical Introduction (CRC Press, 2006), and Digi-tal Signal Processing: An Experimental Approach (Springer, 2008). His research interests are applied mathematics, instrumentation and measurement, signal processing, and control theory.

[engelberg] compressive sensing

Documents