lecture4 a6523 spring2017 - cornell...

2/7/17

1

A6523 Modeling, Inference, and Mining �Jim Cordes, Cornell University�

•  Lecture 4 –  Fourier transform shi6, convolu:on theorem examples – DFT of complex exponen:al –  Stochas:c processes

•  Reading –  See web page tomorrow

•  Webpage: www.astro.cornell.edu/~cordes/A6523

1

Fourier Transforms�•  Examples on board – Shi6 theorem •  Finding maximum of func:on •  Shi6ing discrete data

2

2/7/17

2

Discrete Fourier Transform (DFT)The DFT of a uniformly spaced array of data {xn, n = 0, . . . , N � 1} is defined as

Xk = N�1N�1�

n=0

xn e�2⇥i nk/N

The inverse transform is

xn =N�1�

k=0

Xk e+2⇥i nk/N

which may be shown to have the correct normalization, etc. by substituting for Xk:

xn = CN�1�

k=0

Xk e+2⇥i nk/N

= CN�1�

k=0

N�1N�1�

n⇤=0

xn⇤ e�2⇥i n⇤k/Ne+2⇥i nk/N

= C N�1N�1�

n⇤=0

xn⇤N�1�

k=0

e2⇥i (n�n⇤)k/N

= C N�1N�1�

n⇤=0

xn⇤ N �nn⇤

= Cxn = xn for C ⇥ 1

1

The FFT is simply a fast algorithm for calcula4ng the DFT. It exploits redundancies in the exponen4al when N is factorable; esp. N = 2M but any prime will do.

A direct calcula:on of the DFT requires ~ N2 opera:ons. The FFT requires ~ NlogN opera:ons.

3

• The normalization calculation expresses the orthogonality property of the basis functions(exponentials).

• In calculations of this kind, identifying the implied �-function is typical. Here it relied onsumming over products of basis functions.

• In other contexts, the �-function will arise from statistical independence of random vari-ables.

• What happens if the sampling of xn is not uniform? Orthogonality is broken. So what?

• Notation: we often designate that xn and Xk are Fourier transform pairs by writing

xn ⇤⌅ Xk

and we say that n and k are conjugate variables.

• n can be time, a spatial coordinate, a wavelength, anything.

• Extension to ND dimensions is trivial:– E.g. a 2D DFT of an N �M size object can be calculated as a series of M 1D-DFTs of

length N followed by N 1D-DFTs of length M

• From a systems point of view, the DFT is a linear operation and does not lose information.

• An alternative approach is to fit a sinusoidal model to the data using an assumed frequencyk/N . It can be shown that the DFT is the least-squares solution for the amplitudes ofthe sinusoids for all k. We will show this later on by using matrix algebra to solve forleast-squares solutions.

2 4

2/7/17

3

Symmetry Properties of the DFT

Xk = N�1N�1�

n=0

xn e�2⇥i nk/N and xn =

N�1�

k=0

Xk e+2⇥i nk/N

• Periodic with period N in both domains

• Time series:

– Discrete functions with sample intervals �t and �f

– T = N�t = time total time span– �f = 1/T

– Nyquist frequency = maximum frequency that is represented without distortion:

fN =N�f

2=

1

2�t

3 5

Symmetry Properties of the DFT

Xk = N�1N�1�

n=0

xn e�2�i nk/N and xn =

N�1�

k=0

Xk e+2�i nk/N

• Hermitian (show by substituting into the DFT expression for xn)

x⇥n ⌅⇧ X⇥N�k

If xn is real then

xn ⌅⇧ X⇥N�k

X⇥N�k = Xk

• The symmetry properties tell us how to fill an array with data to achieve specific results

• What are the symmetry properties of a 2D DFT?

Xkl =1

NM

�

n

�

m

xnme�2�i (nk+ml)/NM

4 6

2/7/17

4

How do we fill an array to get a real signal in the other domain?

• k = 0, 1, …, N-1 • Need to know N/2-1+2 = N/2+1

unique values

0 1 2 3 N/2-1 N/2 N/2+1 N-2 N-1 N … … X

N⌘

X0 X1 ⌘ XN�1

X2 ⌘ XN�2

unique values

7

Gaussian example

DFT of a Complex Exponential + NoiseConsider a time series

xn = A ei⇥o n�t + nn, n = 0, . . . , N � 1

where nn is complex white noise.

What are the properties of white noise? By definition, white noise has a flat spectrum. But this meansflat in the mean. Mean over what? Over a statistical ensemble. What is an ensemble?

We will define these later. But it needs to be clear that we have one realization of data that is conceptu-ally part of an ensemble of all possible realizations.

3

8

2/7/17

5

0 200 400 600 800 1000Frequency Index t

�0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Spe

ctru

m

NegativeFrequencies

PositiveFrequencies

N = 1024 P = 4.0 samples (S/N)t = 0.500

�0.6 �0.4 �0.2 0.0 0.2 0.4 0.6Frequency (Hz)

�0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Spe

ctru

m

10�510�410�310�210�1100

log

Sy

Zoom in

9

Mapping Frequency to Frequency BinConsider

x(t) = ei⇤0t = e2⇥if0t

In an N-point DFT, in which bin does the signal fall (mostly)?

As before we have T = N�t and �f = 1/T

The frequency mapping is

fj = j�f for j = 0, . . . , N/2

= (N � j)�f for j = N/2 + 1, . . . , N � 1

The Nyquist frequency is the maximum frequency in the spectrum

fN =1

2�t

So for f0 ⇥ fN we the corresponding frequency bin k in the DFT is

k0 = f0N�t

Negative frequencies correspond to k above the Nyquist frequency.

Frequencies |f0| > fN are still represented in the DFT (remember ... it is lossless) but they appear ataliased frequencies.

The frequency bin k0: varies with N for fixed f0 and �t and varies with �t for fixed f0 and N .

6 10

2/7/17

6

1st and 2nd-order MomentsWe characterize the noise using the statistical moments. These are the first moment (the mean) and thesecond moment, the autocorrelation function (ACF).

Ensemble average moments are designated with angular brackets ⇧· · · ⌃:

The mean of the white noise is

⇧nn⌃ = 0 zero mean

and the ACF is written in terms of a Kronecker delta,

⇧nn n⇤m⌃ = ⇥2

n �nm white noise.

The �-form of the ACF is consistent with any pair of values nn and nm being statistically independent.

Note that we take a conjugate (*) because the noise is complex. One gets a different answer without theconjugate!

By definition (as we will see formally later on), the power spectrum is the Fourier transform of theensemble-average ACF where the ACF is assumed to depend only on the difference between the twotimes n and m. This is a property of stochastic processes that have stationary statistics.

For white noise we could write the ACF as

R(n,m) �⌅ R(n�m) = ⇥2n�(n�m)0.

The DFT of a delta function is a constant so we have shown that our definitions are consistent.

4

Ensemble averages technically require knowledge of an N-‐dimensional PDF. We consider cases that are much simpler.

11

Characterizing the noise

We can calculate the DFT of the complex exponential because it is simply a geometric series. Usuallyit is not so simple!

The DFT of Xn is

Xk = N�1N�1�

n=0

Xn e�2⇥ink/N

= A N�1N�1�

n=0

ei (⇧0�t�2⇥k/N)n +N�1N�1�

n=0

nn e�2⇥ink/N

= A N�1 ei⌅nsin N

2 (⇧0�t� 2⇥k/N)

sin 12(⇧0�t� 2⇥k/N)

+ Nk

where ⌅n is an uninteresting phase factor and Nk is the DFT of the white noise.

The amplitude of the spectral line term is A (the limit where the arguments of the sin functions ⇤ 0).

The noise term Nk is a zero mean random process with second moment

⇧Nk N⇥k⌅⌃ = N�2

�

n

�

n⌅

⇧nn n⇥n⌅⌃ e�2⇥i(nk�n⌅k⌅)/N

= N�2�

n

�

n⌅

⇤2n �nn⌅ e

�2⇥i(nk�n⌅k⌅)/N

= (⇤2n/N

2)�

n

e�2⇥in(k�k⌅)/N

= (⇤2n/N) �kk⌅.

The second moment of the noise has the same form in both the time and frequency domains.

5 12

2/7/17

7

We can calculate the DFT of the complex exponential because it is simply a geometric series. Usuallyit is not so simple!

The DFT of Xn is

Xk = N�1N�1�

n=0

Xn e�2⇥ink/N

= A N�1N�1�

n=0

ei (⇧0�t�2⇥k/N)n +N�1N�1�

n=0

nn e�2⇥ink/N

= A N�1 ei⌅nsin N

2 (⇧0�t� 2⇥k/N)

sin 12(⇧0�t� 2⇥k/N)

+ Nk

where ⌅n is an uninteresting phase factor and Nk is the DFT of the white noise.

The amplitude of the spectral line term is A (the limit where the arguments of the sin functions ⇤ 0).

The noise term Nk is a zero mean random process with second moment

⇧Nk N⇥k⌅⌃ = N�2

�

n

�

n⌅

⇧nn n⇥n⌅⌃ e�2⇥i(nk�n⌅k⌅)/N

= N�2�

n

�

n⌅

⇤2n �nn⌅ e

�2⇥i(nk�n⌅k⌅)/N

= (⇤2n/N

2)�

n

e�2⇥in(k�k⌅)/N

= (⇤2n/N) �kk⌅.

The second moment of the noise has the same form in both the time and frequency domains.

5

We can’t calculate the noise DFT. But we can calculate its second moment.

O6en but not always, the Fourier transform of a stochas:c process will be delta correlated

What is the amplitude of the signal part as the argument of the sin()’s à 0?

What does the signal part look like?

13

Detection . . . or not?Suppose you have a data set that you think may have the form of the model given above. To answer thequestion “is there a signal in the data” we have to assess what are the fluctuations in the DFT (or, moreusefully, the squared magnitude of the DFT = an estimate for the power spectrum) due to the additivenoise. We would like to have confidence that a feature in the DFT or the spectrum is “real” as opposedto being a noise fluctuation that is spurious. To quantify our confidence, we need to know the propertiesof our test statistic. The following develops an approach that is applicable to the particular problem andillustrates generally how we go about assessing test statistics.

6 14

2/7/17

8

Signal to noise ratio:

The rms amplitude of the noise term (in the frequency domain) is therefore ⇤N = ⇤n/⌃N and the

signal-to-noise ratio is

(S/N)DFT =line peak

rms noise=⌃N

A

⇤n.

Thus, the S/N of the line is⌃N larger than the S/N of the time series

(S/N)time series =amplitude of exponential

rms noise=

A

⇤n.

In practice, we must investigate the S/N of the squared magnitude of the DFT. Let ⌅0�t = 2⇥f0 �t =2⇥ ko/N so that the frequency is commensurate with the sampling in frequency space. Then Xk =A �kk0 + Nk and the spectral estimate becomes

Sk ⇥ |Xk|2 = |A �kk0 + Nk|2 (1)

= A2 �kk0 + A �kk0 (Nk + N �k ) + |Nk|2.

The ensemble average of the estimator is

⇤Sk⌅ = ⇤|Xk|2⌅ = A2 �kk0 + ⇤|Nk|2⌅ (2)

= A2 �kk0 + ⇤2n/N

The ratio of the peak to the off line mean is N A2/⇤2, consistent with (S/N)DFT calculated before.

7 15

The Probability of False Alarm:

Suppose we want to test whether a feature in a spectrum is signal or noise. Let’s suppose that there isno signal (a ‘null’ hypothesis) in which case we can calculate the probability that a given amplitude isjust a noise fluctuation.

If there is only noise, the probability density function of Sk for any given k is a one-sided exponentialbecause Sk is �2

2:

fSk(S) =1

⇥Sk⇤e�S/⇥Sk⇤ U(S)

10

Why is the spectrum distributed as chi2 with two degrees of freedom?

16

2/7/17

9

Suppose there is a spike in the spectrum of amplitude �⌃Sk⌥

The noise-like aspect of Sk implies that there can be spikes above a specified detection threshold thatare spurious (“false alarms”). The probability that a spike has an amplitude ⌅ �⌃Sk⌥ is

P (S ⌅ �⌃Sk⌥) =� ⇧

�⌃Sk⌥ds fSk(s) ⇥ e��

If the DFT length is NDFT, there are NDFT unique values of the spectrum.

Note this is true for a complex process but not for a real one. Why?

The expected number of spurious (i.e. false-alarm) spikes that equal or exceed �⌃Sk⌥ is

Nspurious = NDFT e��

To have Nspurious ⇤ 1 we must have

NDFT e�� ⇤ 1

we need

� ⌅ lnNDFT

11 17

NDFT � to have Nspurious � 1

128 4.9

1k 6.9

16k 9.7

1M 13.9

1G 20.8

1T 27.7

12

The larger the number of trials, the higher the threshold that is needed to have a specified number of false posi:ves. There are never zero false posi:ves!

18

lecture4 a6523 spring2017 - cornell...

Documents