A LabKitty Primer is an introduction to some technical topic I am smitten with. They differ from most primers as to the degree of mathematical blasphemy on display. I gloss over details aplenty and mostly work only the simplest cases. I show mistakes and dead ends and things any normal person would try that don't work. Also, there's probably a swear word or two. Mostly I want to get you, too, excited about the topic, perhaps enough to get you motivated to learn it properly. And, just maybe, my little irreverent introduction might help you understand a more sober treatment when the time comes.
Previously on A LabKitty Primer on Fourier Analysis, we met the beast and poked it with a stick. We discovered -- glossing over the hard stuff that made Fourier famous -- a Fourier transform is just a fancy way of correlating sines and cosines with a given signal of interest.
That simple worldview works conceptually, but now we're going to look at how the pros do things. As I stated in Part I, my goal is to get you to understand the numbers Matlab's fft( ) function returns, and so we shall. We're still decomposing a signal into its frequency components. Nothing new there. However, Matlab is top-shelf engineering software and not just some lunatic ranting on the Internet. As such, there's a rash of new details we must encompass and eclipse. Some of those new details are unpleasant.
Yes, our tale today is long and fairly horrible. The good news is you have a sagacious and sympathetic guide, one who asks only for your patience and tolerance of occasional stilted prose (but would it kill you to purchase some swag from the LabKitty store in return for all of this free learning goodness? Answer: No, it would not). Still, if you wish to proceed, the gloves must come off. LabKitty degloved, we might say, which is an adjective I advise you do not Google.
It's like grandpap LabKitty used to say: Lace up your mukluks children, 'cause we're goin' to the slaughterhouse.
Let's press on.
I. Same as the Old Boss
Today we drive, relentlessly, toward an understanding of fft( ). So, right off, let's try it out. Let's take the 1 Hz + 8 Hz composite sinewave introduced in Part I and do an FFT on it.
The code is simple enough:
It's standard practice to use lowercase letters for time series data and uppercase for the corresponding frequency domain representation (yep, Matlab* is case sensitive). So I have used x for the composite sinewave and X for its transform. (Some authors use a hat instead of uppercase. Hats are beyond my HTML capabilities.)
* actually, I used Octave for all of the examples, just to show I am a cat of the people (if you're following along in Matlab, there may be slight differences in your example numbers). Octave is also case sensitive.
Let's plot these. Here's the result of plot(x):
Looks correct enough. Here's the result of plot(X):
Well. That sucks. Perhaps you were expecting a nice correlation coefficient plot like in Part I. No such luck.
Apparently we have work to do.
II. The Complex Frequency Complex
A gander at the contents of X reveals the problem:
I only showed the first few entries so as to not printbomb your screen. But that's enough to see what's up: X is a vector of complex numbers which plot( ) can't make sense of. We're going to have to make sense of it ourself.
Let's switch to a smaller example. That'll make it easier to show the entire contents of X and, as we shall see, doing so will reveal useful information.
Here's eight samples of a 1 Hz sinewave, which is about the simplest example I can think of:
I made t a column vector so x and X come out as a column vectors which prints nicer. Note we want t to start at zero but also want length(t) to be 8 so the last entry in t is 1-1/8 not 1.
It's rather pointless to plot x because there's only eight points so it really wouldn't look like anything. What's important for us are numbers. FFT works as before, but now I can easily show all of X:
Interesting. There's a hint entries in X occur as complex conjugate pairs.
What happens if we double the frequency of the sinewave?
Whoa. The entries shifted by one slot. That suggests the elements of X are ordered by frequency. Perhaps the first entry is the lowest frequency? The lowest frequency I can think of is zero -- what EE nerds call a DC offset. What happens when we transform a pure DC signal:
Sure enough, X(1) seems to be related to the DC offset. Why is it 8 and not 1? I don't know, but there's eight samples in the signal and maybe that's a clue. We'll return to this later.
Finally, let's try something that fills in all of X. How about a random vector:
Weird. The second half of X is the complex conjugate of the first half in reverse order (ignoring X(1)). What is going on here?
I think we've maieuticed this about as far as we can. It's time to break out the equations.
Looking Under the Hood
We first met the Fast Fourier Transform (FFT) back in Part I, wherein I described it as a fast version of the Discrete Fourier Transform (DFT). It stands to reason, then, fft( ) is implementing the DFT, just fastly. So, let's have a look at the DFT. Here it is from Part I:
x[n] = 1/N Σk=0,N−1 X[k] ⋅ exp(ikΩ0n) [ Eq. 1a ]
where
X[k] = Σn=0,N-1 x[n] ⋅ exp(−ikΩ0n) [ Eq. 1b ]
Equation 1b is what fft( ) implements. It returns X -- a vector of N frequency coefficients.
Footnote: The usual textbook convention is vector indexing goes from 0 to N-1. Of course, the Matlab convention goes from 1 to N. The difference will be a constant pain in the fur today. As such, I'll use square brackets (e.g., x[n]) in expressions that use 0 to N-1 indexing, and parens (e.g., x(n)) in expressions that use 1 to N. Example: X[0] and X(1) refer to the same thing. I'll also use parens for functions, but the exception should be clear from context.
We can immediately solve one mystery: The DC component X[0] = Σn=0,N-1 x[n] ⋅ exp(0) = Σn=0,N-1 x[n] ⋅ 1 = Σn=0,N-1 x[n] = N, which explains why fft(ones(8,1)) gave X(1) = 8 and not 1. To put it another way: X[0]/N -- or, equivalently, X(1)/N -- is the average value of x.
So, one mystery solved. Yet, I can immediately think of two more:
I'm afraid the Cenobites have arrived. Meet: Nyquist frequency.
III. Nyquist Frequency
In Part I, I wrote of adding two waveforms together on a point-by-point basis. This presents a problem in Part II because a true waveform contains an infinite number of points and we can't add together infinity points on a computer. On a computer we traffic in sampled waveforms. Long story short, I shouldn't be plotting waveforms as a continuous curve; I should be plotting waveforms as a collection of discrete samples:
The interval between samples constrains what fft( ) can say about the frequency content of a sampled signal. We don't usually speak in terms of the sampling interval but rather its inverse -- the sampling frequency, denoted Fs (example: Fs = 2 KHz corresponds to a sampling interval of 0.5 ms). The Nyquist Sampling Theorem states that a signal sampled at a rate Fs can only represent frequencies less than Fs/2. Why? A simple demonstration should drive home the point: Sample a 1 Hz sinewave twice per second (so Fs = 2 Hz). Our first sample is zero. Our second sample is zero. Our third sample is zero. In fact, all of our samples are zero. Our digitized waveform is not a 1 Hz sinewave. It isn't a sinewave at all -- it's a flat line!
What went wrong? We violated the Nyquist Theorem. We sampled at 2 Hz, so Nyquist says we can only represent signals less than 1 Hz. Our signal wasn't less than 1 Hz, it was (exactly) 1 Hz. Close, yes, but it's still a violation. Ergo, we were punished. In order to correctly represent a continuous waveform, Fs must be greater than twice the highest frequency present. (If you use other values of Fs < 2, you'll get other wrong sinewaves. You should try a few on your own.)
The Fs/2 limit is called the Nyquist Frequency (Ny).
Here you may be thinking: Okie dokie. Nyquist frequency. I promise I won't make any claims about frequencies greater than Ny. Unfortunately, the problem is more insidious. Frequency content beyond the Nyquist limit doesn't simply vanish when we sample the data, it sneaks down into [0 Ny]. In the signal processing biz, this phenomenon is known as aliasing.
Here's a demonstration I showed in Part I:
The traces depict three continuous signals and the dots are the sampled versions. In the middle and bottom panel, the signal exceeds the Nyquist frequency. Aliased content appears in the sampled data (dots) that doesn't actually exist. Alas, fft( ) will process any aliased content along with the rest of the legitimate signal. Aliasing is like a virus infecting a cell with its DNA, and the cell machinery makes more copies of the virus because it doesn't know better.
This tragic affair is a fundamental unavoidable property of representing a continuous signal as a collection of discrete samples. We can't get around it using a cleverer algorithm or a better processor or a higher-precision word length. In the real world, you deal with aliasing by running your signal though an old-timey analog low-pass filter first thing (Nyquist is irrelevant for analog filters). Now the maximum frequency present is the filter cutoff. Sample accordingly.
Footnote: There's a tedious explanation where the Nyquist factor of two comes from which you can look up elsewhere. In the real world, nobody dances the edge by sampling just above the Nyquist frequency. Fast A-to-D boards and disk space are cheap, so you crank up Fs as high as you can. Just make sure you satisfy the Nyquist argle-bargle.
Mr. Nyquist Explains why length(X) = N
Your intuition is probably telling you Nyquist frequency has something to do with the number of coefficients fft( ) returns -- that is, it explains why length(X) = N and not 2N or N/2 or some other number (cf. Mystery Question #1). Your intuition is correct.
Our conceptual view of Fourier series in Part I was that we correlate our input signal with a collection of sinusoids. That is also true of the DFT. The sinusoids are integer multiples of the signal's fundamental frequency. That is also still true. However, I gave the impression in Part I that we can keep extracting higher and higher frequency sinusoids forever. That's not true when working with sampled data. The Nyquist theorem says you can stop when you reach Fs/2. In fact, the theorem says you must stop when you reach Fs/2. Further correlation will happily keep returning non-zero numbers because it doesn't know better. If you chase the math around you'll discover it's repeating the coefficients from DC to Ny over and over. Were you ignorant of the Nyquist cutoff, you might assume these nonzero coefficients were genuine higher frequency content when in fact they're just ghosts.
It's easy to prove this implies length(X) = N.
Write the sinusoids as exp(-iω0kn), where ω0 is the fundamental frequency of the signal. As k increases, the frequency of the sinusoid does too. We cannot allow this to exceed the Nyquist limit, Fs/2. What is the corresponding maximum value of k? That will give us length(X).
The fundamental frequency ω0 is the reciprocal of the fundamental period: ω0 = 2π/T0. There are N samples, and the time between them is Ts, so T0 = N⋅Ts. Substitute ω0 = 2π/(N⋅Ts) into exp(−iω0kn) to get exp(−2πi kn/(N⋅Ts)) = exp(−2πi (k/N) (1/Ts) n) = exp(−2πi (k/N) Fs n). The variable n plays the role of time here (it's the index into x), leaving (k/N) Fs to control the frequency. The frequency needs to stop at Fs/2. Ergo (k/N) Fs < Fs/2, or k < N/2.
From this we conclude length(X) is, um, N/2.
Sigh.
For the love of cats. Does anything in math ever work out the way it should? Ever? Can I derive one simple something one time without having to go look up a critical unexplained whatever everybody left out of the textbooks? Grr. No wonder mathematicians are such an unloved lot.
All right. Let's figure out what went sideways here.
If you check my derivation, you'll find it is correct. Multiples of the fundamental frequency indeed bust through the Nyquist ceiling at k=N/2. That's not the problem. The problem is it takes two quantities to define a sinusoid. One gets you amplitude, but two are necessary to represent both amplitude and phase.
I remind you of the pairings that appeared in alternate forms of the Fourier series I presented in Part I: i) sines and cosines, ii) frequency and phase, or iii) positive and negative frequencies. The DFT uses positive and negative frequencies. This reduces the number of independent entries in X by a factor of two. X(N/2) corresponds to the Nyquist frequency, but each frequency component requires two entries. Therefore, length(X) = N.
QED, I guess. But now we have a new problem: There aren't any negative frequencies in Eq. 1.
Hold my beer.
Rearranging X
We have been writing:
x[n] = 1/N Σk=0,N−1 X[k] ⋅ exp(ikΩ0n) [ Eq. 1a ]
to compute x. Break this up into two parts:
x[n] = 1/N [ Σk=0,N/2−1 X[k] ⋅ exp(ikΩ0n)
+ Σk=N/2,N−1 X[k] ⋅ exp(ikΩ0n) ]
Reverse the order of the summations:
x[n] = 1/N [ Σk=N/2,N−1 X[k] ⋅ exp(ikΩ0n)
+ Σk=0,N/2−1 X[k] ⋅ exp(ikΩ0n) ]
Substitute m = k-N in the first summation and m = k in the second:
x[n] = 1/N [ Σm=-N/2,-1 X[m+N] ⋅ exp(i(m+N)Ω0n)
+ Σm=0,N/2-1 X[m] ⋅ exp(imΩ0n) ]
Because of the periodicity of the complex exponential, exp(i(m+N)Ω0n) = exp(imΩ0n) ⋅ exp(iNΩ0n) = exp(imΩ0n) ⋅ exp(iN(2π/N)n) = exp(imΩ0n) ⋅ exp(i2πn) = exp(imΩ0n) ⋅ 1 = exp(imΩ0n). Apply this to simplify the first summation:
x[n] = 1/N [ Σm=-N/2,-1 X[m+N] ⋅ exp(imΩ0n)
+ Σm=0,N/2-1 X[m] ⋅ exp(imΩ0n) ]
Now, make a copy of X -- call it "X2" -- and create a new vector Y by pasting X2 onto the front of X (i.e., X2 = X; Y = [ X2 X ] in the Matlab parlance). Think of Y as X "shifted" by N/2 to move DC to the middle. (Yes, Y contains twice as many numbers as X, but we're only going to use half of them.)
In Matlab, the index on Y goes from 1 to 2N. However, in mathy square-bracket indexing, the index on Y goes from -N to N. Note Y[-N/2:0] = X[N/2:N-1] and Y[0:N/2-1] = X[0:N/2-1]. So, in the first summation of the previous expression X[m+N] = Y[m]. In the second summation, X[m] = Y[m]. Substitute:
x[n] = 1/N [ Σm=-N/2,-1 Y[m] ⋅ exp(imΩ0n)
+ Σm=0,N/2-1 Y[m] ⋅ exp(imΩ0n) ]
Because the expressions in each summation are now identical, we can combine them:
x[n] = 1/N Σm=-N/2,N/2-1 Y[m] ⋅ exp(imΩ0n) [ Eq. 1a' ]
Equation 1a' gives the same result as Eq. 1a. The vector Y contains the same numbers as X; we're just indexing them differently. The salient feature presently is this arrangement of the DFT has obvious negative frequencies (i.e., m < 0), as claimed above. QED, Mystery #1.
But wait. There's more.
Each negative frequency entry in Eq. 1a' (i.e., m=-blerg) has a corresponding positive entry (m=+blerg). Using Euler's identity, we see each term of the form cos(-mΩ0t) + i sin(-mΩ0t) in the summation can be partnered with a term cos(+mΩ0t) + i sin(+mΩ0t). Each will by multiplied by their corresponding entry in X. Recall that the entries in X appeared as complex conjugate pairs in all our ftt( ) examples. These occur at -m and +m using the indexing scheme of Eq. 1a'. If you also recall some trigonometry -- specifically that sin(-blerg) = -sin(blerg) and cos(-blerg) = cos(blerg) -- it might occur to you this arrangement exactly cancels the imaginary stuff in the summation. The x that comes out is a real-valued signal, which is a Good Thing because we put in a real-value signal.
That's why there's pairs of complex conjugates in X; that's what happens to the imaginary stuff (cf. Mystery Question #2). It's how we reconstruct a real-valued x from a complex-valued X.
Footnote: The conjugate magic trick only happens if the input signal is real. If you feed a complex-valued signal to fft( ) -- which is something people do I guess -- the coefficients in X don't pair up as complex conjugates. Dogs and cats living together, etc.
Footnote: Rearranging X to move DC to the middle is evidently such a common operation that Matlab/Octave provides a function called fftshift( ) to do so. Here it is in action:
The DC component has been moved to the middle. Huzzah.
Footnote: "fftshift" is a terrible name -- there's really nothing FFT-specific about it. The function takes a vector and rotates the first entry to the middle. For example: fftshift([1 2 3 4 5]) = [4 5 1 2 3]. That's called vector folding. Furthermore, there exists something called the shifting theorem which explains how the Fourier transform changes when the input signal is shifted in time, which has nothing to do with fftshift( ). Get it together, MathWorks.
And so we have, after many (many) words, solved our two mysteries of the DFT. Nyquist frequency limits the number of components in X to N/2, but it takes two components to describe both amplitude and phase. Ergo, length(X) = N. Furthermore, by craftily pairing the entries in X as complex conjugates, we get a real signal out when we put a real signal in.
It would seem we are done. Are we? No, we are not. For there is one last mystery lurking here. One last stab at thee from Hell's heart. One last breath spit at thee for hate's sake. It's time for us to wade into the angry sea, stare down the great fish, and pull a Titleist from its blowhole.
It's time to expose the dirty little secret of the DFT.
The Dirty Little Secret of the DFT
When discussing Nyquist frequency, I wrote exp(-iω0kn), where ω0 = 2π/T0. These are the sinusoids used in Fourier series. These are not the sinusoids used in the DFT. The DFT uses exp(-iΩ0kn). The difference between ω0 and Ω0 is important.
Look at Ω0 in Eq. 1b. It plays the role of fundamental frequency in the DFT like ω0 does in Fourier series. We're still correlating against multiples of fundamental frequency in the DFT -- that's the "kΩ0" in exp(ikΩ0n). But Ω0 = 2π/N. That isn't a true frequency. The quantity 1/N isn't the inverse of "time" -- it's just a number. To obtain a true frequency, we would define Ω0 = 2π/(N⋅Ts). We don't do that. The DFT exists in some bizarro integer-time Tron world. Real-world time appears nowhere in the DFT.
Here, let's rewrite Eq. 1b expanding Ω0:
X[k] = Σn=0,N-1 x[n] ⋅ exp(-2πikn/N)
Using Euler's identity, you should convince yourself exp(-2πikn/N) are simply N evenly-spaced points around the unit circle in the complex plane: You may have come across this as the "Nth roots of unity" (I chose N=8 in the example to make the figure manageable). They are all possible values of exp(-2πink/N) that appear in Eq. 1b and all possible values of exp(2πink/N) that appear in Eq. 1a. Both n and k in the summations go 0 to N-1, so their product nk goes from 0 to (N-1)^2. However, because of the periodicity of the complex exponential, the values that go into the summations repeat with period N.
We can exploit this quirk to construct a matrix representation of the DFT. For a given value of n and k, exp(-2πikn/N) picks an Nth root of unity. The product x[n] ⋅ exp(-2πikn/N) is the multiplication this term with a corresponding entry in x[n]. A sum over all n is the same as a dot product, ergo we can generate all the X[k] simultaneously as a matrix-vector product X = A x, where A(k,n) = exp(-2πikn/N) and bold indicates vector. Display like so:
The index on X and x pick the row and column, respectively, of an entry in A (indicated in red). The first row computes the DC component (each glyph in the first row represents the number "1" on the unit circle -- that is: exp(-2πikn/N) for k=0). Frequency then increases with increasing row. The end result is a matrix representation of correlating the signal x with a finite collection of complex exponentials, aka the DFT.
The picture above is every (8-point) DFT that ever existed or ever will. Note carefully: Sampling frequency is nowhere to be found. There is no Fs in the DFT; there is only normalized angular frequency. The same X might be describing the leisurely rhythmic lovemaking of a sloth or the gigahertz oscillations of a superconductor. If you want to extract real-world frequencies from the DFT, you need to make a note of Fs somewhere and scale the output yourself. I'll show this later in some example code.
Footnote: The length of x determines how many frequency coefficients we get (i.e., length(X) = length(x) = N). If we feed a longer signal to fft( ), we get back more coefficients. However, the coefficients always cover the same frequency range (DC to Ny). Simple arithmetic, then, tells us the fft( ) of a longer signal must give us a higher frequency resolution. This has practical ramifications for signal processing. It also has deeper ramifications, in that signals broad in time are narrow in frequency (and vice-versa). This is Heisenberg's uncertainty principle, albeit expressed in time and frequency rather than position and momentum.
Taking Stock
Let's take stock. We now know: a) The coefficients returned by fft( ) cover DC to Ny, b) The total number of coefficients fft( ) returns is equal to the number of samples in the signal, and c) It takes two coefficients to define each frequency component (a complex frequency and its conjugate). Hence, each X(k) gives the frequency content of the signal in a bandwidth of Ny/(N/2) = (Fs/2)/(N/2) = Fs/N. Think of the FFT as a virtual bank of tiny bandpass filters and each entry in X is the output of one of them. However, to convert the numbers returned by fft( ) to real-world frequencies, we need to scale the output using the sampling frequency.
What remains is to turn this mess into numbers that are meaningful for humans.
IV. Turning this Mess into Numbers that are Meaningful for Humans
We now know what X is describing. Nonetheless, X remains a vector of complex frequency coefficients. Mathozoidic life forms might be able to read a vector of complex frequency coefficients like a Danielle Steele paperback, but we mortals require a more comprehendible frequency description -- something, say, that begins with a number and ends in the word "Hertz." Something we can plot, like the correlation coefficients from Part I. Additionally, we would like to take an X and extract the collection of sinusoids that make up the signal (see, again, Part I). Only then will we be able to say we "understand the vector of numbers fft( ) returns." Merit badge complete.
First up is a more comprehendible description of the signal frequency content. Meet: power spectral density.
I. Power Spectral Density
Each term in X describes the signal contents in some finite frequency band. We want to know the magnitude. Each term in X is a complex number. You get the magnitude of a complex number by multiplying it by its complex conjugate and taking the square root: | z | = √ z ⋅ z*. This is the concept of power spectral density (PSD).
Here's how you roll your own PSD in Matlab (notes and comments to follow):
Notes: 1) We construct an x-axis vector to make the abscissa meaningful. Behold scaling of fft( ) output using Fs! Stare at the definition of f_axis until it makes sense to you (there are N/2 independent coefficients, and they cover DC to Ny). 2) We only plot the first half of pX because the second half repeats the same information (the magnitude of a complex number is the same as the magnitude of its conjugate). 3) The floor(N/2) business avoids generating a non-integer vector index if N is odd. 4) The eval( ) business is just a little Matlab trick that allows us to verify the signal in the plot title by defining it as a string. 5) The ".*" operator does term-wise vector multiplication.
Comments: 1) We omit the square root when computing a PSD. Not because it saves a few cpu cycles; rather, because the square of the amplitude is a more meaningful quantity in many applications. For example, the heat generated in a resistor is proportional to the square of the voltage across it. Ergo: square. 2) We divide pX by N. Why? Well, if I make a PSD using 1000 samples of a signal and you make one using 100, your FFT coefficients are going to be bigger than mine even though our signals are the same. Your frequency resolution is coarser, so there is more signal "present" in each of your coefficients. A comparison of our results is only meaning if we normalize by the frequency resolution. Ergo: divide.
That's really all there is to it. Physics tells us the square of a wave form is proportional to its power, the coefficients in X describe the spectral content of the signal, and dividing by the frequency resolution gives a density. Ergo: power spectral density.
I uncommented one of the sample inputs defined in the code and plotted the results below (signal on top, PSD on bottom):
This example nicely illustrates a classic PSD application: pulling useful information out of noisy data. We know a signal is present here (see plot title) but it's nigh impossible to see it in the time series. Therefore we switch to the frequency domain. The PSD exhibits a conspicuous spectral peak, alerting us to the signal's presence.
Bonus! Four PSD Factoids Your Textbook Didn't Explain
II. Sinusoid Collection
Finally, we would like to extract the collection of sinusoids that make up the signal, an omega to the alpha of where our story began way back in Part I. To accomplish this, we need the frequency, amplitude, and phase of all the entries in X.
Consider the simplest case. Suppose we feed a pure sinewave to fft( ). We should get back an X containing only two non-zero entries -- one at whatever k corresponds to the frequency of our sinewave and also it's conjugate doppelgänger. (In practice, there can be more than one non-zero pair in X even if the input is a pure sinewave due to frequency leakage. A little bit of signal energy spills over into neighboring coefficients because reasons. I will assume zero frequency leakage to simplify my derivation.)
The rest is just an exercise in converting from complex exponential form (what fft( ) returns) to explicit phase form (what we can plot).
Begin with our rotated form of the DFT:
x(n) = 1/N Σk=-N/2,+N/2-1 X[k] ⋅ exp(ikΩ0t)
(I'm working with the -N/2 to N/2 form rather than the 0 to N-1 form because it makes the algebra tidier.) Multiply both sides by N to keep from dragging a 1/N through everything:
N ⋅ x(n) = Σk=-N2,+N/2-1 X[k] ⋅ exp(ikΩ0t)
Let the one non-zero entry in X be at k = m, so the frequency is mΩ0, which I will simplify to Ωm. All terms in the summation for k not equal to +m or -m are zero, and we're left with:
N ⋅ x(n) = X[-m] ⋅ exp(iΩmt) + X[m] ⋅ exp(-iΩmt)
Let X[-m] = a+ib. Then X[m] = a-ib. Substitute:
N ⋅ x(n) = (a+ib) exp(iΩmt) + (a-ib) exp(-iΩmt)
Rewrite the complex exponentials using Euler's identity:
N ⋅ x(n) = (a+ib) [ cos(Ωmt) + i ⋅ sin(Ωmt) ]
+ (a-ib) [ cos(-Ωmt) + i ⋅ sin(-Ωmt) ]
But cos(-blerg) = cos(blerg) and sin(-blerg) = -sin(blerg):
N ⋅ x(n) = (a+ib) ⋅ [ cos(Ωmt) + i ⋅ sin(Ωmt) ]
+ (a-ib) [ cos(Ωmt) − i ⋅ sin(-Ωmt) ]
After some algebra mashing, this simplifies to:
N ⋅ x(n) = 2a ⋅ cos(Ωmt) − 2b ⋅ sin(Ωmt)
or, tweaking for style points:
N ⋅ x(n) = (2a) cos(Ωmt) + (-2b) sin(Ωmt)
The RHS is a one-term Fourier series in trigonometric form. We would like to express this in the I-can-plot-that form: x(n) = A cos(Ωmt - φ). This is a one-term Fourier series in explicit phase form. In Part I, I described how the constants in the latter (here, A and φ) are related to the constants in the former (here, 2a and -2b). We find:
A = √ (4a2 + 4b2)
φ = arctan(-b/a)
We use these to plot the component. Just remember to divide the amplitude by N.
Apply this formula to each entry in X to plot all the sinusoids in the input. You only have to crunch the first half of X because the coefficients come in pairs. Also, you have to skip the DC coefficient otherwise atan( ) blows up because DC doesn't have a phase. Instead, you simply extract the DC offset as X(1)/N.
Everything else is just looping and plotting:
Fs = 100; % Hz
t = [0:1/Fs:1-1/Fs];
x_string = '1+sin(1*2*pi*t)+0.25*sin(8*2*pi*t)';
x = eval(x_string);
X = fft(x);
subplot(2,1,1);
plot(x);
title(x_string);
grid on;
subplot(2,1,2);
grid on; hold on;
N = length(x);
DC = X(1)/N;
plot([1 N],[DC DC]);
for index = 2:N/2
F = Fs * ((index-1)/N);
a = real(X(index));
b = imag(X(index));
d = -b;
c = sqrt(4*a*a + 4*b*b) / N;
phi = atan(d/a) * 180 / pi;
if (phi < 0); phi = 180 + phi; end
rc = c * cos(F*2*pi*(t-(phi/(F*360))));
plot(rc);
end
The atan( ) function occasionally stabs us in the face because of quadrant weirdness -- it returns -π/2 < φ < π/2, and we want 0 < φ < 2π. Hence the phi < 0 check.
The example included in the code uses our favorite waveform (plus a DC offset). Results are plotted below (signal at top, sinusoid collection at bottom):
(Yes, I should have plotted these as discrete samples and not continuous curves. Think of it as a final exam. You passed.)
Footnote: There's an easier way to do sinusoid extraction: You repeatedly feed X to the inverse FFT function ifft( ) after zeroing out all entries except one conjugate pair (or, zeroing all entries except X(1) to extract the DC component). However, my way shows you the calculation details, and calculation details are why we have gathered here today.
Epilogue
If I have done my job, you now have an understanding of what comes out of fft( ). Funny, it somehow seems less mysterious to me than it did going in. Like driving after passing your driver's test, or buying beer with a real ID. Omne ignotum pro magnifico est, Tacitus tells us. Loosely translated: It wouldn't be a LabKitty joint unless I broke out some pretentious Latin at some point, and would it kill you to get a little culture outside of WoW and Rick and Morty? (Answer: No, it would not.)
However, if you would like a new challenge a new challenge is not hard to find. Once nerds figured out the FFT, they did what nerds always do -- they flogged the idea to get tenure. This gives us fresh horrors like the Fourier Sine Transform and the Fourier Cosine Transform and the Fractional Fourier Transform and the Short-time Fourier Transform and Fourier transforms in space and time and spacetime. Also, by swapping out basis functions we get other transforms. If instead of sinusoids we use decaying sinusoids we get the Laplace transform. If we use basis functions chosen only to reproduce the data with the fewest number of them we get Principal Component Analysis. If we use polynomials we get something (splines maybe?). And if we use little weirdy blippo basis functions we get a Wavelet transform. Google will happily lead you down the rabbit hole.
Footnote: Wavelets are all the rage these days. Like rap and herpes, they just refuse to go away. I've never found them to do anything for me the FFT didn't. Still, if you don't use wavelets on your data certain types will snicker at your results. Consider yourself warned.
But for now, at least, we have, at last, a cease to all our exploring. Arriving where we started and knowing the place for the first time, even if the return voyage did take eleventy f'ing months. No matter. Let's focus less on who took how long to do what, and focus more on arriving at all. Break out the hats and hooters. Rev up the motor scooters. Jose has come home.
Recommended Reading
To wrap things up, let me point you to a few references I have found helpful and you might too.
Understanding Digital Signal Processing -- Richard Lyons
Lyons is what really Explained the FFT to me, after my struggling with countless hateful textbooks (not to mention the feckless thugs employed by my undergrad math department). I spent a summer working through it cover to cover. You should too. There's a great deal more in Lyons than just Fourier -- there's many pages on digital and filters and digital filters (as well as a description of the famous Radix-2 algorithm) -- but at the heart of it all is the FFT. Go buy it. Now. Do it do it do it.
My recommendations for Part II really begin and end with Lyons, but there's a few honorable mentions. Oppenheim and Schafer's Digital Signal Processing is a classic that is also quite accessible as introductory treatment. Lyons is rather nuts-and-bolts, so the theoretical bent of O&S makes for a nice complement. Anything by Bendat and Piersol is the standard reference for the intersection of random processes and Fourier analysis, which explain things like error bars on a PSD. See also Wirshing's text on random vibrations, which is like Bendat and Piersol with a focus on stuff that breaks, augers, and explodes. Your ride in coach will never be the same.
Schaum offers at least three entrées that cover Fourier stuff. There's Signals and Systems by Hwei Hsu, which begins at signals and gets to the FFT around page 300, stopping along the way at the Fourier transform, the Laplace transform, the z transform, and state space analysis. There's Probability, Random Variables, & Random Processes by Hwei Hsu, which gets into Fourier analysis of random signals, as well as other cool stuff like queuing and decision theory. And there's Analog and Digital Communications by Hwei Hsu, which gets deeper into Fourier analysis of random signals, as well as other cool stuff like Shannon information theory.
Speaking of examples, here's a well-crafted pdf of DFT problems for you to ponder and solve. Very cool.
Finally, there's A Digital Signal Processing Primer by Ken Steiglitz, which has a nice explanation of phasors before getting into all things DSP in the context of digital music. What are our ears if not fft( ) implemented using squishy hardware?
As always, I have no affiliation with the authors or institutions listed in these sources, all of whom I suspect would be shocked and appalled by my approach to explaining mathematics.
Previously on A LabKitty Primer on Fourier Analysis, we met the beast and poked it with a stick. We discovered -- glossing over the hard stuff that made Fourier famous -- a Fourier transform is just a fancy way of correlating sines and cosines with a given signal of interest.
That simple worldview works conceptually, but now we're going to look at how the pros do things. As I stated in Part I, my goal is to get you to understand the numbers Matlab's fft( ) function returns, and so we shall. We're still decomposing a signal into its frequency components. Nothing new there. However, Matlab is top-shelf engineering software and not just some lunatic ranting on the Internet. As such, there's a rash of new details we must encompass and eclipse. Some of those new details are unpleasant.
Yes, our tale today is long and fairly horrible. The good news is you have a sagacious and sympathetic guide, one who asks only for your patience and tolerance of occasional stilted prose (but would it kill you to purchase some swag from the LabKitty store in return for all of this free learning goodness? Answer: No, it would not). Still, if you wish to proceed, the gloves must come off. LabKitty degloved, we might say, which is an adjective I advise you do not Google.
It's like grandpap LabKitty used to say: Lace up your mukluks children, 'cause we're goin' to the slaughterhouse.
Let's press on.
I. Same as the Old Boss
Today we drive, relentlessly, toward an understanding of fft( ). So, right off, let's try it out. Let's take the 1 Hz + 8 Hz composite sinewave introduced in Part I and do an FFT on it.
The code is simple enough:
tmax = 1;
t = [0:0.01:tmax-0.01];
x = sin(1*2*pi*t) + 0.25*sin(8*2*pi*t);
X = fft(x);
t = [0:0.01:tmax-0.01];
x = sin(1*2*pi*t) + 0.25*sin(8*2*pi*t);
X = fft(x);
It's standard practice to use lowercase letters for time series data and uppercase for the corresponding frequency domain representation (yep, Matlab* is case sensitive). So I have used x for the composite sinewave and X for its transform. (Some authors use a hat instead of uppercase. Hats are beyond my HTML capabilities.)
* actually, I used Octave for all of the examples, just to show I am a cat of the people (if you're following along in Matlab, there may be slight differences in your example numbers). Octave is also case sensitive.
Let's plot these. Here's the result of plot(x):
Looks correct enough. Here's the result of plot(X):
Well. That sucks. Perhaps you were expecting a nice correlation coefficient plot like in Part I. No such luck.
Apparently we have work to do.
II. The Complex Frequency Complex
A gander at the contents of X reveals the problem:
> X'
ans =
-0.0000
-0.0000 +50.0000i
-0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
-0.0000 - 0.0000i
-0.0000 - 0.0000i
0.0000 - 0.0000i
11.888 + 3.8627i . . .
ans =
-0.0000
-0.0000 +50.0000i
-0.0000 + 0.0000i
0.0000 - 0.0000i
0.0000 + 0.0000i
-0.0000 - 0.0000i
-0.0000 - 0.0000i
0.0000 - 0.0000i
11.888 + 3.8627i . . .
I only showed the first few entries so as to not printbomb your screen. But that's enough to see what's up: X is a vector of complex numbers which plot( ) can't make sense of. We're going to have to make sense of it ourself.
Let's switch to a smaller example. That'll make it easier to show the entire contents of X and, as we shall see, doing so will reveal useful information.
Here's eight samples of a 1 Hz sinewave, which is about the simplest example I can think of:
> t = [0:1/8:1-1/8]';
> x = sin(2*pi*t)
x =
0.00000
0.70711
1.00000
0.70711
0.00000
-0.70711
-1.00000
-0.70711
> x = sin(2*pi*t)
x =
0.00000
0.70711
1.00000
0.70711
0.00000
-0.70711
-1.00000
-0.70711
I made t a column vector so x and X come out as a column vectors which prints nicer. Note we want t to start at zero but also want length(t) to be 8 so the last entry in t is 1-1/8 not 1.
It's rather pointless to plot x because there's only eight points so it really wouldn't look like anything. What's important for us are numbers. FFT works as before, but now I can easily show all of X:
> X = fft(x)
X =
0.00000 + 0.00000i
-0.00000 - 4.00000i
0.00000 - 0.00000i
0.00000 - 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
-0.00000 + 4.00000i
X =
0.00000 + 0.00000i
-0.00000 - 4.00000i
0.00000 - 0.00000i
0.00000 - 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
-0.00000 + 4.00000i
Interesting. There's a hint entries in X occur as complex conjugate pairs.
What happens if we double the frequency of the sinewave?
> x = sin(4*pi*t);
> X = fft(x)
X =
0.00000 + 0.00000i
0.00000 + 0.00000i
-0.00000 - 4.00000i
0.00000 - 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
-0.00000 + 4.00000i
0.00000 - 0.00000i
> X = fft(x)
X =
0.00000 + 0.00000i
0.00000 + 0.00000i
-0.00000 - 4.00000i
0.00000 - 0.00000i
0.00000 + 0.00000i
0.00000 + 0.00000i
-0.00000 + 4.00000i
0.00000 - 0.00000i
Whoa. The entries shifted by one slot. That suggests the elements of X are ordered by frequency. Perhaps the first entry is the lowest frequency? The lowest frequency I can think of is zero -- what EE nerds call a DC offset. What happens when we transform a pure DC signal:
> X = fft(ones(8,1))
X =
8
0
0
0
0
0
0
0
X =
8
0
0
0
0
0
0
0
Sure enough, X(1) seems to be related to the DC offset. Why is it 8 and not 1? I don't know, but there's eight samples in the signal and maybe that's a clue. We'll return to this later.
Finally, let's try something that fills in all of X. How about a random vector:
> X = fft(rand(8,1))
X =
5.17434
0.51624 + 0.11615i
0.32745 - 0.08214i
0.34990 + 0.42696i
-1.22792 + 0.00000i
0.34990 - 0.42696i
0.32745 + 0.08214i
0.51624 - 0.11615i
X =
5.17434
0.51624 + 0.11615i
0.32745 - 0.08214i
0.34990 + 0.42696i
-1.22792 + 0.00000i
0.34990 - 0.42696i
0.32745 + 0.08214i
0.51624 - 0.11615i
Weird. The second half of X is the complex conjugate of the first half in reverse order (ignoring X(1)). What is going on here?
I think we've maieuticed this about as far as we can. It's time to break out the equations.
Looking Under the Hood
We first met the Fast Fourier Transform (FFT) back in Part I, wherein I described it as a fast version of the Discrete Fourier Transform (DFT). It stands to reason, then, fft( ) is implementing the DFT, just fastly. So, let's have a look at the DFT. Here it is from Part I:
x[n] = 1/N Σk=0,N−1 X[k] ⋅ exp(ikΩ0n) [ Eq. 1a ]
where
X[k] = Σn=0,N-1 x[n] ⋅ exp(−ikΩ0n) [ Eq. 1b ]
Equation 1b is what fft( ) implements. It returns X -- a vector of N frequency coefficients.
Footnote: The usual textbook convention is vector indexing goes from 0 to N-1. Of course, the Matlab convention goes from 1 to N. The difference will be a constant pain in the fur today. As such, I'll use square brackets (e.g., x[n]) in expressions that use 0 to N-1 indexing, and parens (e.g., x(n)) in expressions that use 1 to N. Example: X[0] and X(1) refer to the same thing. I'll also use parens for functions, but the exception should be clear from context.
We can immediately solve one mystery: The DC component X[0] = Σn=0,N-1 x[n] ⋅ exp(0) = Σn=0,N-1 x[n] ⋅ 1 = Σn=0,N-1 x[n] = N, which explains why fft(ones(8,1)) gave X(1) = 8 and not 1. To put it another way: X[0]/N -- or, equivalently, X(1)/N -- is the average value of x.
So, one mystery solved. Yet, I can immediately think of two more:
- Why does fft( ) return N frequency coefficients? Sure, n in Eq. 1b must go 0 to N-1 because length(x) = N. But there's no reason k in Eq. 1a must be similarly restricted. Why is length(X) = N? Why not 2N or N/2 or some other number? Why not infinity?
- The X(k) returned by fft( ) are complex numbers. We use X to reconstruct x (cf. Eq. 1a). But x was real in all our examples. Where did the imaginary stuff go?
I'm afraid the Cenobites have arrived. Meet: Nyquist frequency.
III. Nyquist Frequency
In Part I, I wrote of adding two waveforms together on a point-by-point basis. This presents a problem in Part II because a true waveform contains an infinite number of points and we can't add together infinity points on a computer. On a computer we traffic in sampled waveforms. Long story short, I shouldn't be plotting waveforms as a continuous curve; I should be plotting waveforms as a collection of discrete samples:
The interval between samples constrains what fft( ) can say about the frequency content of a sampled signal. We don't usually speak in terms of the sampling interval but rather its inverse -- the sampling frequency, denoted Fs (example: Fs = 2 KHz corresponds to a sampling interval of 0.5 ms). The Nyquist Sampling Theorem states that a signal sampled at a rate Fs can only represent frequencies less than Fs/2. Why? A simple demonstration should drive home the point: Sample a 1 Hz sinewave twice per second (so Fs = 2 Hz). Our first sample is zero. Our second sample is zero. Our third sample is zero. In fact, all of our samples are zero. Our digitized waveform is not a 1 Hz sinewave. It isn't a sinewave at all -- it's a flat line!
What went wrong? We violated the Nyquist Theorem. We sampled at 2 Hz, so Nyquist says we can only represent signals less than 1 Hz. Our signal wasn't less than 1 Hz, it was (exactly) 1 Hz. Close, yes, but it's still a violation. Ergo, we were punished. In order to correctly represent a continuous waveform, Fs must be greater than twice the highest frequency present. (If you use other values of Fs < 2, you'll get other wrong sinewaves. You should try a few on your own.)
The Fs/2 limit is called the Nyquist Frequency (Ny).
Here you may be thinking: Okie dokie. Nyquist frequency. I promise I won't make any claims about frequencies greater than Ny. Unfortunately, the problem is more insidious. Frequency content beyond the Nyquist limit doesn't simply vanish when we sample the data, it sneaks down into [0 Ny]. In the signal processing biz, this phenomenon is known as aliasing.
Here's a demonstration I showed in Part I:
The traces depict three continuous signals and the dots are the sampled versions. In the middle and bottom panel, the signal exceeds the Nyquist frequency. Aliased content appears in the sampled data (dots) that doesn't actually exist. Alas, fft( ) will process any aliased content along with the rest of the legitimate signal. Aliasing is like a virus infecting a cell with its DNA, and the cell machinery makes more copies of the virus because it doesn't know better.
This tragic affair is a fundamental unavoidable property of representing a continuous signal as a collection of discrete samples. We can't get around it using a cleverer algorithm or a better processor or a higher-precision word length. In the real world, you deal with aliasing by running your signal though an old-timey analog low-pass filter first thing (Nyquist is irrelevant for analog filters). Now the maximum frequency present is the filter cutoff. Sample accordingly.
Footnote: There's a tedious explanation where the Nyquist factor of two comes from which you can look up elsewhere. In the real world, nobody dances the edge by sampling just above the Nyquist frequency. Fast A-to-D boards and disk space are cheap, so you crank up Fs as high as you can. Just make sure you satisfy the Nyquist argle-bargle.
Mr. Nyquist Explains why length(X) = N
Your intuition is probably telling you Nyquist frequency has something to do with the number of coefficients fft( ) returns -- that is, it explains why length(X) = N and not 2N or N/2 or some other number (cf. Mystery Question #1). Your intuition is correct.
Our conceptual view of Fourier series in Part I was that we correlate our input signal with a collection of sinusoids. That is also true of the DFT. The sinusoids are integer multiples of the signal's fundamental frequency. That is also still true. However, I gave the impression in Part I that we can keep extracting higher and higher frequency sinusoids forever. That's not true when working with sampled data. The Nyquist theorem says you can stop when you reach Fs/2. In fact, the theorem says you must stop when you reach Fs/2. Further correlation will happily keep returning non-zero numbers because it doesn't know better. If you chase the math around you'll discover it's repeating the coefficients from DC to Ny over and over. Were you ignorant of the Nyquist cutoff, you might assume these nonzero coefficients were genuine higher frequency content when in fact they're just ghosts.
It's easy to prove this implies length(X) = N.
Write the sinusoids as exp(-iω0kn), where ω0 is the fundamental frequency of the signal. As k increases, the frequency of the sinusoid does too. We cannot allow this to exceed the Nyquist limit, Fs/2. What is the corresponding maximum value of k? That will give us length(X).
The fundamental frequency ω0 is the reciprocal of the fundamental period: ω0 = 2π/T0. There are N samples, and the time between them is Ts, so T0 = N⋅Ts. Substitute ω0 = 2π/(N⋅Ts) into exp(−iω0kn) to get exp(−2πi kn/(N⋅Ts)) = exp(−2πi (k/N) (1/Ts) n) = exp(−2πi (k/N) Fs n). The variable n plays the role of time here (it's the index into x), leaving (k/N) Fs to control the frequency. The frequency needs to stop at Fs/2. Ergo (k/N) Fs < Fs/2, or k < N/2.
From this we conclude length(X) is, um, N/2.
Sigh.
For the love of cats. Does anything in math ever work out the way it should? Ever? Can I derive one simple something one time without having to go look up a critical unexplained whatever everybody left out of the textbooks? Grr. No wonder mathematicians are such an unloved lot.
All right. Let's figure out what went sideways here.
If you check my derivation, you'll find it is correct. Multiples of the fundamental frequency indeed bust through the Nyquist ceiling at k=N/2. That's not the problem. The problem is it takes two quantities to define a sinusoid. One gets you amplitude, but two are necessary to represent both amplitude and phase.
I remind you of the pairings that appeared in alternate forms of the Fourier series I presented in Part I: i) sines and cosines, ii) frequency and phase, or iii) positive and negative frequencies. The DFT uses positive and negative frequencies. This reduces the number of independent entries in X by a factor of two. X(N/2) corresponds to the Nyquist frequency, but each frequency component requires two entries. Therefore, length(X) = N.
QED, I guess. But now we have a new problem: There aren't any negative frequencies in Eq. 1.
Hold my beer.
Rearranging X
We have been writing:
x[n] = 1/N Σk=0,N−1 X[k] ⋅ exp(ikΩ0n) [ Eq. 1a ]
to compute x. Break this up into two parts:
x[n] = 1/N [ Σk=0,N/2−1 X[k] ⋅ exp(ikΩ0n)
+ Σk=N/2,N−1 X[k] ⋅ exp(ikΩ0n) ]
Reverse the order of the summations:
x[n] = 1/N [ Σk=N/2,N−1 X[k] ⋅ exp(ikΩ0n)
+ Σk=0,N/2−1 X[k] ⋅ exp(ikΩ0n) ]
Substitute m = k-N in the first summation and m = k in the second:
x[n] = 1/N [ Σm=-N/2,-1 X[m+N] ⋅ exp(i(m+N)Ω0n)
+ Σm=0,N/2-1 X[m] ⋅ exp(imΩ0n) ]
Because of the periodicity of the complex exponential, exp(i(m+N)Ω0n) = exp(imΩ0n) ⋅ exp(iNΩ0n) = exp(imΩ0n) ⋅ exp(iN(2π/N)n) = exp(imΩ0n) ⋅ exp(i2πn) = exp(imΩ0n) ⋅ 1 = exp(imΩ0n). Apply this to simplify the first summation:
x[n] = 1/N [ Σm=-N/2,-1 X[m+N] ⋅ exp(imΩ0n)
+ Σm=0,N/2-1 X[m] ⋅ exp(imΩ0n) ]
Now, make a copy of X -- call it "X2" -- and create a new vector Y by pasting X2 onto the front of X (i.e., X2 = X; Y = [ X2 X ] in the Matlab parlance). Think of Y as X "shifted" by N/2 to move DC to the middle. (Yes, Y contains twice as many numbers as X, but we're only going to use half of them.)
In Matlab, the index on Y goes from 1 to 2N. However, in mathy square-bracket indexing, the index on Y goes from -N to N. Note Y[-N/2:0] = X[N/2:N-1] and Y[0:N/2-1] = X[0:N/2-1]. So, in the first summation of the previous expression X[m+N] = Y[m]. In the second summation, X[m] = Y[m]. Substitute:
x[n] = 1/N [ Σm=-N/2,-1 Y[m] ⋅ exp(imΩ0n)
+ Σm=0,N/2-1 Y[m] ⋅ exp(imΩ0n) ]
Because the expressions in each summation are now identical, we can combine them:
x[n] = 1/N Σm=-N/2,N/2-1 Y[m] ⋅ exp(imΩ0n) [ Eq. 1a' ]
Equation 1a' gives the same result as Eq. 1a. The vector Y contains the same numbers as X; we're just indexing them differently. The salient feature presently is this arrangement of the DFT has obvious negative frequencies (i.e., m < 0), as claimed above. QED, Mystery #1.
But wait. There's more.
Each negative frequency entry in Eq. 1a' (i.e., m=-blerg) has a corresponding positive entry (m=+blerg). Using Euler's identity, we see each term of the form cos(-mΩ0t) + i sin(-mΩ0t) in the summation can be partnered with a term cos(+mΩ0t) + i sin(+mΩ0t). Each will by multiplied by their corresponding entry in X. Recall that the entries in X appeared as complex conjugate pairs in all our ftt( ) examples. These occur at -m and +m using the indexing scheme of Eq. 1a'. If you also recall some trigonometry -- specifically that sin(-blerg) = -sin(blerg) and cos(-blerg) = cos(blerg) -- it might occur to you this arrangement exactly cancels the imaginary stuff in the summation. The x that comes out is a real-valued signal, which is a Good Thing because we put in a real-value signal.
That's why there's pairs of complex conjugates in X; that's what happens to the imaginary stuff (cf. Mystery Question #2). It's how we reconstruct a real-valued x from a complex-valued X.
Footnote: The conjugate magic trick only happens if the input signal is real. If you feed a complex-valued signal to fft( ) -- which is something people do I guess -- the coefficients in X don't pair up as complex conjugates. Dogs and cats living together, etc.
Footnote: Rearranging X to move DC to the middle is evidently such a common operation that Matlab/Octave provides a function called fftshift( ) to do so. Here it is in action:
> X = fft(rand(9,1))
X =
4.45248 + 0.00000i
1.10908 - 0.58652i
-0.60386 + 0.89979i
0.52309 + 0.24930i
-0.77660 - 0.35250i
-0.77660 + 0.35250i
0.52309 - 0.24930i
-0.60386 - 0.89979i
1.10908 + 0.58652i
> fftshift(X)
ans =
-0.77660 + 0.35250i
0.52309 - 0.24930i
-0.60386 - 0.89979i
1.10908 + 0.58652i
4.45248 + 0.00000i
1.10908 - 0.58652i
-0.60386 + 0.89979i
0.52309 + 0.24930i
-0.77660 - 0.35250i
X =
4.45248 + 0.00000i
1.10908 - 0.58652i
-0.60386 + 0.89979i
0.52309 + 0.24930i
-0.77660 - 0.35250i
-0.77660 + 0.35250i
0.52309 - 0.24930i
-0.60386 - 0.89979i
1.10908 + 0.58652i
> fftshift(X)
ans =
-0.77660 + 0.35250i
0.52309 - 0.24930i
-0.60386 - 0.89979i
1.10908 + 0.58652i
4.45248 + 0.00000i
1.10908 - 0.58652i
-0.60386 + 0.89979i
0.52309 + 0.24930i
-0.77660 - 0.35250i
The DC component has been moved to the middle. Huzzah.
Footnote: "fftshift" is a terrible name -- there's really nothing FFT-specific about it. The function takes a vector and rotates the first entry to the middle. For example: fftshift([1 2 3 4 5]) = [4 5 1 2 3]. That's called vector folding. Furthermore, there exists something called the shifting theorem which explains how the Fourier transform changes when the input signal is shifted in time, which has nothing to do with fftshift( ). Get it together, MathWorks.
And so we have, after many (many) words, solved our two mysteries of the DFT. Nyquist frequency limits the number of components in X to N/2, but it takes two components to describe both amplitude and phase. Ergo, length(X) = N. Furthermore, by craftily pairing the entries in X as complex conjugates, we get a real signal out when we put a real signal in.
It would seem we are done. Are we? No, we are not. For there is one last mystery lurking here. One last stab at thee from Hell's heart. One last breath spit at thee for hate's sake. It's time for us to wade into the angry sea, stare down the great fish, and pull a Titleist from its blowhole.
It's time to expose the dirty little secret of the DFT.
The Dirty Little Secret of the DFT
When discussing Nyquist frequency, I wrote exp(-iω0kn), where ω0 = 2π/T0. These are the sinusoids used in Fourier series. These are not the sinusoids used in the DFT. The DFT uses exp(-iΩ0kn). The difference between ω0 and Ω0 is important.
Look at Ω0 in Eq. 1b. It plays the role of fundamental frequency in the DFT like ω0 does in Fourier series. We're still correlating against multiples of fundamental frequency in the DFT -- that's the "kΩ0" in exp(ikΩ0n). But Ω0 = 2π/N. That isn't a true frequency. The quantity 1/N isn't the inverse of "time" -- it's just a number. To obtain a true frequency, we would define Ω0 = 2π/(N⋅Ts). We don't do that. The DFT exists in some bizarro integer-time Tron world. Real-world time appears nowhere in the DFT.
Here, let's rewrite Eq. 1b expanding Ω0:
X[k] = Σn=0,N-1 x[n] ⋅ exp(-2πikn/N)
Using Euler's identity, you should convince yourself exp(-2πikn/N) are simply N evenly-spaced points around the unit circle in the complex plane: You may have come across this as the "Nth roots of unity" (I chose N=8 in the example to make the figure manageable). They are all possible values of exp(-2πink/N) that appear in Eq. 1b and all possible values of exp(2πink/N) that appear in Eq. 1a. Both n and k in the summations go 0 to N-1, so their product nk goes from 0 to (N-1)^2. However, because of the periodicity of the complex exponential, the values that go into the summations repeat with period N.
We can exploit this quirk to construct a matrix representation of the DFT. For a given value of n and k, exp(-2πikn/N) picks an Nth root of unity. The product x[n] ⋅ exp(-2πikn/N) is the multiplication this term with a corresponding entry in x[n]. A sum over all n is the same as a dot product, ergo we can generate all the X[k] simultaneously as a matrix-vector product X = A x, where A(k,n) = exp(-2πikn/N) and bold indicates vector. Display like so:
The index on X and x pick the row and column, respectively, of an entry in A (indicated in red). The first row computes the DC component (each glyph in the first row represents the number "1" on the unit circle -- that is: exp(-2πikn/N) for k=0). Frequency then increases with increasing row. The end result is a matrix representation of correlating the signal x with a finite collection of complex exponentials, aka the DFT.
The picture above is every (8-point) DFT that ever existed or ever will. Note carefully: Sampling frequency is nowhere to be found. There is no Fs in the DFT; there is only normalized angular frequency. The same X might be describing the leisurely rhythmic lovemaking of a sloth or the gigahertz oscillations of a superconductor. If you want to extract real-world frequencies from the DFT, you need to make a note of Fs somewhere and scale the output yourself. I'll show this later in some example code.
Footnote: The length of x determines how many frequency coefficients we get (i.e., length(X) = length(x) = N). If we feed a longer signal to fft( ), we get back more coefficients. However, the coefficients always cover the same frequency range (DC to Ny). Simple arithmetic, then, tells us the fft( ) of a longer signal must give us a higher frequency resolution. This has practical ramifications for signal processing. It also has deeper ramifications, in that signals broad in time are narrow in frequency (and vice-versa). This is Heisenberg's uncertainty principle, albeit expressed in time and frequency rather than position and momentum.
Taking Stock
Let's take stock. We now know: a) The coefficients returned by fft( ) cover DC to Ny, b) The total number of coefficients fft( ) returns is equal to the number of samples in the signal, and c) It takes two coefficients to define each frequency component (a complex frequency and its conjugate). Hence, each X(k) gives the frequency content of the signal in a bandwidth of Ny/(N/2) = (Fs/2)/(N/2) = Fs/N. Think of the FFT as a virtual bank of tiny bandpass filters and each entry in X is the output of one of them. However, to convert the numbers returned by fft( ) to real-world frequencies, we need to scale the output using the sampling frequency.
What remains is to turn this mess into numbers that are meaningful for humans.
IV. Turning this Mess into Numbers that are Meaningful for Humans
We now know what X is describing. Nonetheless, X remains a vector of complex frequency coefficients. Mathozoidic life forms might be able to read a vector of complex frequency coefficients like a Danielle Steele paperback, but we mortals require a more comprehendible frequency description -- something, say, that begins with a number and ends in the word "Hertz." Something we can plot, like the correlation coefficients from Part I. Additionally, we would like to take an X and extract the collection of sinusoids that make up the signal (see, again, Part I). Only then will we be able to say we "understand the vector of numbers fft( ) returns." Merit badge complete.
First up is a more comprehendible description of the signal frequency content. Meet: power spectral density.
I. Power Spectral Density
Each term in X describes the signal contents in some finite frequency band. We want to know the magnitude. Each term in X is a complex number. You get the magnitude of a complex number by multiplying it by its complex conjugate and taking the square root: | z | = √ z ⋅ z*. This is the concept of power spectral density (PSD).
Here's how you roll your own PSD in Matlab (notes and comments to follow):
Fs = 500; % Hz
t = [0:1/Fs:1-1/Fs];
% x_string = 'sin(10*2*pi*t) + sin(50*2*pi*t)';
% x_string = '2*sin(20*2*pi*t)';
% x_string = 'randn(size(t))';
x_string = 'cos(50*2*pi*t) + 2*randn(size(t))';
x = eval(x_string);
N = length(x);
X = fft(x);
subplot(2,1,1);
plot(x);
title(x_string);
grid on;
subplot(2,1,2);
pX = X .* conj(X) / N;
f_axis = Fs/N * [1:floor(N/2)];
plot(f_axis, pX(1:floor(N/2)));
grid on;
t = [0:1/Fs:1-1/Fs];
% x_string = 'sin(10*2*pi*t) + sin(50*2*pi*t)';
% x_string = '2*sin(20*2*pi*t)';
% x_string = 'randn(size(t))';
x_string = 'cos(50*2*pi*t) + 2*randn(size(t))';
x = eval(x_string);
N = length(x);
X = fft(x);
subplot(2,1,1);
plot(x);
title(x_string);
grid on;
subplot(2,1,2);
pX = X .* conj(X) / N;
f_axis = Fs/N * [1:floor(N/2)];
plot(f_axis, pX(1:floor(N/2)));
grid on;
Notes: 1) We construct an x-axis vector to make the abscissa meaningful. Behold scaling of fft( ) output using Fs! Stare at the definition of f_axis until it makes sense to you (there are N/2 independent coefficients, and they cover DC to Ny). 2) We only plot the first half of pX because the second half repeats the same information (the magnitude of a complex number is the same as the magnitude of its conjugate). 3) The floor(N/2) business avoids generating a non-integer vector index if N is odd. 4) The eval( ) business is just a little Matlab trick that allows us to verify the signal in the plot title by defining it as a string. 5) The ".*" operator does term-wise vector multiplication.
Comments: 1) We omit the square root when computing a PSD. Not because it saves a few cpu cycles; rather, because the square of the amplitude is a more meaningful quantity in many applications. For example, the heat generated in a resistor is proportional to the square of the voltage across it. Ergo: square. 2) We divide pX by N. Why? Well, if I make a PSD using 1000 samples of a signal and you make one using 100, your FFT coefficients are going to be bigger than mine even though our signals are the same. Your frequency resolution is coarser, so there is more signal "present" in each of your coefficients. A comparison of our results is only meaning if we normalize by the frequency resolution. Ergo: divide.
That's really all there is to it. Physics tells us the square of a wave form is proportional to its power, the coefficients in X describe the spectral content of the signal, and dividing by the frequency resolution gives a density. Ergo: power spectral density.
I uncommented one of the sample inputs defined in the code and plotted the results below (signal on top, PSD on bottom):
This example nicely illustrates a classic PSD application: pulling useful information out of noisy data. We know a signal is present here (see plot title) but it's nigh impossible to see it in the time series. Therefore we switch to the frequency domain. The PSD exhibits a conspicuous spectral peak, alerting us to the signal's presence.
Bonus! Four PSD Factoids Your Textbook Didn't Explain
- Units
The PSD y-axis has units of blerg-squared-per-Hertz. For example, if the signal is measured in volts, the PSD would be in V^2/Hz. Sometimes the squared part is expressed in literal power (e.g., "watts/Hz"). The square makes sense -- we're doing coefficient times conjugate-coefficient and omitting the square root. The per-Hz comes from dividing by the length of X to normalize for frequency resolution. Simple enough written in words, but it's confusing in a figure to see the per-Hz on the y-axis when the x-axis is also in Hz.
- Logarithmic scaling
Because the power in various frequency bands of a signal may differ by many orders of magnitude, a PSD is often plotted as y-semilog. Additionally, sometimes the y scale is dB, and there's different dB definitions in different fields, and an author will invariably fail to explain which one s/he used.
Footnote: There's a function in Matlab called psd( ) that will make a PSD for you which does semilog scaling and other stuff too. I don't recommend it. The results look weird. Also, the calculations are hidden from you. Also, you have to buy the signal processing toolbox to get psd( ).
- Windowing
Unless you are writing a website blog where your FFT examples use contrived collections of well-behaved sinewaves, it's likely the signal you feed to fft( ) will not begin and end at zero. This makes fft( ) angry. The sudden appearance or disappearance of signal is the computational equivalent to whacking the algorithm with a hammer. Whacking generates high-frequency content. As you now know, any frequency content in your signal above the Nyquist limit is asking for Trouble. What to do?
The standard solution is to squash your data to zero at both ends, smoothly, before sending it to fft( ). This is called windowing.
The simplest window is linear -- a triangle wave that peaks at the center of your signal and goes to zero at each end. You multiply them together. Viola! Squashed signal. Other popular windows include the Hanning, which is something like a half cycle of a sinewave, and the Hamming, which is something like a half cycle of a sinewave except completely different. Every window has advocates and detractors. Google will lead you to both.
Footnote: Yes, "Hamming" and "Hanning." One might assume that's a typo. But they really are two different guys. Just to make everything as confusing as possible.
4. The PSD is a statistic
Something genuinely hard to get your head around is that the PSD is a statistic. Every PSD comes with a assumption of uncertainty, just like a t-test or ANOVA (picture little error bars drawn on each peak). That seems redonkulus. Where is "statistics" in anything we did?
Think of a power spectrum as a desperate grasp for some way to characterize a signal. If the signal was something simple -- a sinewave, for example -- we wouldn't need fancy FFT mathematics to describe it. We would simply say: it's a sinewave. If your signal was a generated by a deterministic system (e.g., bobbing spring mass), we would simply write down the governing equation. In the real world, breaking out a PSD is like retaining a lawyer: You wouldn't unless you knew you did something wrong.
Using probability lingo, a PSD assumes the input is a random process. We analyze one realization of the process. (Quintessential example: accelerometer during an earthquake.) If we had analyzed a different realization, we would have obtained a different PSD. We expect our PSD is similar to the "population" PSD. But that expectation carries a degree of uncertainty. Ergo: statistic.
Footnote: If I just confused the bejeezus out of you with talk of statistical PSD, fear not. It's possible to go your entire career without bumping into it. However, it is indeed a thing, and someday somebody might spring it on you in a why didn't the eagles just fly the ring to Mordor sort of way.
II. Sinusoid Collection
Finally, we would like to extract the collection of sinusoids that make up the signal, an omega to the alpha of where our story began way back in Part I. To accomplish this, we need the frequency, amplitude, and phase of all the entries in X.
Consider the simplest case. Suppose we feed a pure sinewave to fft( ). We should get back an X containing only two non-zero entries -- one at whatever k corresponds to the frequency of our sinewave and also it's conjugate doppelgänger. (In practice, there can be more than one non-zero pair in X even if the input is a pure sinewave due to frequency leakage. A little bit of signal energy spills over into neighboring coefficients because reasons. I will assume zero frequency leakage to simplify my derivation.)
The rest is just an exercise in converting from complex exponential form (what fft( ) returns) to explicit phase form (what we can plot).
Begin with our rotated form of the DFT:
x(n) = 1/N Σk=-N/2,+N/2-1 X[k] ⋅ exp(ikΩ0t)
(I'm working with the -N/2 to N/2 form rather than the 0 to N-1 form because it makes the algebra tidier.) Multiply both sides by N to keep from dragging a 1/N through everything:
N ⋅ x(n) = Σk=-N2,+N/2-1 X[k] ⋅ exp(ikΩ0t)
Let the one non-zero entry in X be at k = m, so the frequency is mΩ0, which I will simplify to Ωm. All terms in the summation for k not equal to +m or -m are zero, and we're left with:
N ⋅ x(n) = X[-m] ⋅ exp(iΩmt) + X[m] ⋅ exp(-iΩmt)
Let X[-m] = a+ib. Then X[m] = a-ib. Substitute:
N ⋅ x(n) = (a+ib) exp(iΩmt) + (a-ib) exp(-iΩmt)
Rewrite the complex exponentials using Euler's identity:
N ⋅ x(n) = (a+ib) [ cos(Ωmt) + i ⋅ sin(Ωmt) ]
+ (a-ib) [ cos(-Ωmt) + i ⋅ sin(-Ωmt) ]
But cos(-blerg) = cos(blerg) and sin(-blerg) = -sin(blerg):
N ⋅ x(n) = (a+ib) ⋅ [ cos(Ωmt) + i ⋅ sin(Ωmt) ]
+ (a-ib) [ cos(Ωmt) − i ⋅ sin(-Ωmt) ]
After some algebra mashing, this simplifies to:
N ⋅ x(n) = 2a ⋅ cos(Ωmt) − 2b ⋅ sin(Ωmt)
or, tweaking for style points:
N ⋅ x(n) = (2a) cos(Ωmt) + (-2b) sin(Ωmt)
The RHS is a one-term Fourier series in trigonometric form. We would like to express this in the I-can-plot-that form: x(n) = A cos(Ωmt - φ). This is a one-term Fourier series in explicit phase form. In Part I, I described how the constants in the latter (here, A and φ) are related to the constants in the former (here, 2a and -2b). We find:
A = √ (4a2 + 4b2)
φ = arctan(-b/a)
We use these to plot the component. Just remember to divide the amplitude by N.
Apply this formula to each entry in X to plot all the sinusoids in the input. You only have to crunch the first half of X because the coefficients come in pairs. Also, you have to skip the DC coefficient otherwise atan( ) blows up because DC doesn't have a phase. Instead, you simply extract the DC offset as X(1)/N.
Everything else is just looping and plotting:
Fs = 100; % Hz
t = [0:1/Fs:1-1/Fs];
x_string = '1+sin(1*2*pi*t)+0.25*sin(8*2*pi*t)';
x = eval(x_string);
X = fft(x);
subplot(2,1,1);
plot(x);
title(x_string);
grid on;
subplot(2,1,2);
grid on; hold on;
N = length(x);
DC = X(1)/N;
plot([1 N],[DC DC]);
for index = 2:N/2
F = Fs * ((index-1)/N);
a = real(X(index));
b = imag(X(index));
d = -b;
c = sqrt(4*a*a + 4*b*b) / N;
phi = atan(d/a) * 180 / pi;
if (phi < 0); phi = 180 + phi; end
rc = c * cos(F*2*pi*(t-(phi/(F*360))));
plot(rc);
end
The example included in the code uses our favorite waveform (plus a DC offset). Results are plotted below (signal at top, sinusoid collection at bottom):
(Yes, I should have plotted these as discrete samples and not continuous curves. Think of it as a final exam. You passed.)
Footnote: There's an easier way to do sinusoid extraction: You repeatedly feed X to the inverse FFT function ifft( ) after zeroing out all entries except one conjugate pair (or, zeroing all entries except X(1) to extract the DC component). However, my way shows you the calculation details, and calculation details are why we have gathered here today.
Epilogue
If I have done my job, you now have an understanding of what comes out of fft( ). Funny, it somehow seems less mysterious to me than it did going in. Like driving after passing your driver's test, or buying beer with a real ID. Omne ignotum pro magnifico est, Tacitus tells us. Loosely translated: It wouldn't be a LabKitty joint unless I broke out some pretentious Latin at some point, and would it kill you to get a little culture outside of WoW and Rick and Morty? (Answer: No, it would not.)
However, if you would like a new challenge a new challenge is not hard to find. Once nerds figured out the FFT, they did what nerds always do -- they flogged the idea to get tenure. This gives us fresh horrors like the Fourier Sine Transform and the Fourier Cosine Transform and the Fractional Fourier Transform and the Short-time Fourier Transform and Fourier transforms in space and time and spacetime. Also, by swapping out basis functions we get other transforms. If instead of sinusoids we use decaying sinusoids we get the Laplace transform. If we use basis functions chosen only to reproduce the data with the fewest number of them we get Principal Component Analysis. If we use polynomials we get something (splines maybe?). And if we use little weirdy blippo basis functions we get a Wavelet transform. Google will happily lead you down the rabbit hole.
Footnote: Wavelets are all the rage these days. Like rap and herpes, they just refuse to go away. I've never found them to do anything for me the FFT didn't. Still, if you don't use wavelets on your data certain types will snicker at your results. Consider yourself warned.
But for now, at least, we have, at last, a cease to all our exploring. Arriving where we started and knowing the place for the first time, even if the return voyage did take eleventy f'ing months. No matter. Let's focus less on who took how long to do what, and focus more on arriving at all. Break out the hats and hooters. Rev up the motor scooters. Jose has come home.
Recommended Reading
To wrap things up, let me point you to a few references I have found helpful and you might too.
Understanding Digital Signal Processing -- Richard Lyons
Lyons is what really Explained the FFT to me, after my struggling with countless hateful textbooks (not to mention the feckless thugs employed by my undergrad math department). I spent a summer working through it cover to cover. You should too. There's a great deal more in Lyons than just Fourier -- there's many pages on digital and filters and digital filters (as well as a description of the famous Radix-2 algorithm) -- but at the heart of it all is the FFT. Go buy it. Now. Do it do it do it.
My recommendations for Part II really begin and end with Lyons, but there's a few honorable mentions. Oppenheim and Schafer's Digital Signal Processing is a classic that is also quite accessible as introductory treatment. Lyons is rather nuts-and-bolts, so the theoretical bent of O&S makes for a nice complement. Anything by Bendat and Piersol is the standard reference for the intersection of random processes and Fourier analysis, which explain things like error bars on a PSD. See also Wirshing's text on random vibrations, which is like Bendat and Piersol with a focus on stuff that breaks, augers, and explodes. Your ride in coach will never be the same.
Schaum offers at least three entrées that cover Fourier stuff. There's Signals and Systems by Hwei Hsu, which begins at signals and gets to the FFT around page 300, stopping along the way at the Fourier transform, the Laplace transform, the z transform, and state space analysis. There's Probability, Random Variables, & Random Processes by Hwei Hsu, which gets into Fourier analysis of random signals, as well as other cool stuff like queuing and decision theory. And there's Analog and Digital Communications by Hwei Hsu, which gets deeper into Fourier analysis of random signals, as well as other cool stuff like Shannon information theory.
Speaking of examples, here's a well-crafted pdf of DFT problems for you to ponder and solve. Very cool.
Finally, there's A Digital Signal Processing Primer by Ken Steiglitz, which has a nice explanation of phasors before getting into all things DSP in the context of digital music. What are our ears if not fft( ) implemented using squishy hardware?
As always, I have no affiliation with the authors or institutions listed in these sources, all of whom I suspect would be shocked and appalled by my approach to explaining mathematics.
No comments:
Post a Comment