Understanding how signals behave in the frequency domain is crucial for engineers working with digital speech. The Fourier Transform is our most powerful tool for this, allowing us to decompose complex signals into their constituent frequencies. Let’s embark on a step-by-step journey through its various forms, with a particular focus on accurate mathematical representation: the Continuous-Time Fourier Transform (CTFT), the Discrete-Time Fourier Transform (DTFT), and the Discrete Fourier Transform (DFT), culminating in the efficient Fast Fourier Transform (FFT).
Step 1: The Continuous-Time Fourier Transform (CTFT)
The journey begins with continuous-time signals, which exist for all real numbers \(t\). The CTFT converts a time-domain signal \(x(t)\) into its continuous frequency-domain representation \(X(j\omega)\).
Definition:
- Forward CTFT: $$X(j\omega) = \int_{-\infty}^{+\infty} x(t)e^{-j\omega t} \mathrm{d}t $$
- Inverse CTFT: $$x(t) = \frac{1}{2\pi} \int_{-\infty}^{+\infty} X(j\omega)e^{+j\omega t} \mathrm{d}\omega$$
Example: Calculating the CTFT for $$x(t) = e^{-\alpha|t|}$$ (where \(\alpha > 0\))
- Break down the signal: The signal \(x(t) = e^{-\alpha|t|}\) can be expressed as the sum of two causal exponential signals using the step function \(\varepsilon(t)\): \(x(t) = \varepsilon(t)e^{-\alpha t} + \varepsilon(-t)e^{+\alpha t}\).
- Find the CTFT of the basic causal exponential: The CTFT of \(v(t) = \varepsilon(t)e^{-\alpha t}\) is \(V(j\omega) = \frac{1}{\alpha + j\omega}\).
- Apply time reversal property: Using the time reversal property, \(v(-t) \circ — \bullet V(-j\omega)\), the CTFT of \(\varepsilon(-t)e^{+\alpha t}\) is \(\frac{1}{\alpha – j\omega}\).
- Apply linearity: Since the CTFT is a linear transformation, we can sum the individual transforms: $$X(j\omega) = \frac{1}{\alpha + j\omega} + \frac{1}{\alpha – j\omega}$$
- Simplify the expression: $$X(j\omega) = \frac{(\alpha – j\omega) + (\alpha + j\omega)}{(\alpha + j\omega)(\alpha – j\omega)} = \frac{2\alpha}{\alpha^2 + \omega^2}$$ This shows how a single, continuous-time signal is represented across a continuous range of frequencies.
Step 2: Sampling and the Discrete-Time Fourier Transform (DTFT)
In digital signal processing, we work with discrete-time signals, which are samples of continuous signals taken at equidistant time instances \(t_n = nT\), where \(T\) is the sampling period. We denote the sampled signal as \(x(n) = x(t_n)\).
Transformation from Continuous to Discrete-Time Domain:
- Represent the sampled signal as a sum of Dirac impulses: The sampled continuous signal \(x(t)\)can be viewed as $$x_s(t) = \sum_{n=-\infty}^{+\infty} x(nT)\delta(t – nT)$$
- Apply the CTFT to the sampled signal: $$X_s(j\omega) = \int_{-\infty}^{+\infty} \left(\sum_{n=-\infty}^{+\infty} x(nT)\delta(t – nT)\right)e^{-j\omega t} \mathrm{d}t$$
- Exchange sum and integral, and use the sifting property of the Dirac delta function: $$X_s(j\omega) = \sum_{n=-\infty}^{+\infty} x(nT) \int_{-\infty}^{+\infty} \delta(t – nT)e^{-j\omega t} \mathrm{d}t = \sum_{n=-\infty}^{+\infty} x(nT)e^{-j\omega nT}$$
- Introduce the normalized angular frequency $$\theta = \omega T$$ This notation is common when working almost exclusively with discrete-time signals. The Discrete-Time Fourier Transform (DTFT): $$X(e^{j\theta}) = \sum_{n=-\infty}^{+\infty} x(n)e^{-j\theta n}$$ This definition is consistent with source, which uses \(x[n]\) for the discrete-time signal, and source, which uses \(w\) as the continuous argument.
Key Properties of the DTFT:
- Periodicity: The DTFT \(X(e^{j\theta})\) is continuous on the frequency axis and periodic with a period length of \(2\pi\) (or \(2\pi/T\) in real frequency \(\omega\)). This is why the argument is written as \(e^{j\theta}\) instead of \(j\theta\).
- Inverse DTFT: The inverse DTFT \(x(n)\) can be recovered by integrating \(X(e^{j\theta})\) over just one period, typically \([-\pi, \pi]\) or \([0, 2\pi]\): $$x(n) = \frac{1}{2\pi} \int_{-\pi}^{\pi} X(e^{j\theta})e^{j\theta n} \mathrm{d}\theta$$
- Existence: The DTFT exists if the signal \(x(n)\) is absolutely summable (\(\sum|x(k)| < \infty\)), or if it converges in the quadratic mean (\(\sum|x(k)|^2 < \infty\)).
- Gibbs Phenomenon: For signals that are not absolutely summable (e.g., ideal filters with sharp cut-offs), approximating the DTFT with finite sums can lead to the Gibbs phenomenon at points of discontinuity. This can be visualised in MATLAB simulations.
The Poisson Summation Formula: Connecting CTFT and DTFT The Poisson Summation Formula elegantly relates the DTFT of a sampled signal to the CTFT of the original continuous-time signal: $$X(e^{j\omega T}) = \sum_{n=-\infty}^{+\infty} x(nT)e^{-j\omega nT} = \frac{1}{T} \sum_{k=-\infty}^{+\infty} X\left(j\left(\omega – \frac{2\pi k}{T}\right)\right)$$ This formula, as shown in source, clearly shows that the DTFT of the sampled signal is a periodic repetition and summation of the original continuous-time signal’s CTFT along the frequency axis. When the sampling period \(T \to 0\), the repeated spectra are moved to infinity and vanish, meaning \(T \cdot X(e^{j\omega T})\) converges to \(X(j\omega)\).
Step 3: The Discrete Fourier Transform (DFT)
The DTFT, while powerful for theoretical analysis, is a continuous function of frequency \(\theta\) and therefore cannot be directly computed on digital devices. To enable computation, we sample the DTFT spectrum at equidistant discrete frequency points. This leads to the Discrete Fourier Transform (DFT).
Definition: For a finite-length signal \(x[n]\) of length \(N\) (or \(M\)), defined for \(n = 0, \dots, N-1\):
- Forward DFT: $$X[k] = \sum_{n=0}^{N-1} x[n]e^{-j\frac{2\pi}{N}kn} \quad \text{for } k = 0, \dots, N-1$$ This is also often written using the notation \(W_N = e^{-j\frac{2\pi}{N}}\), so the expression becomes $$X[k] = \sum_{n=0}^{N-1} x[n]W_N^{kn}$$.
- Inverse DFT (IDFT): $$x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]e^{+j\frac{2\pi}{N}kn} \quad \text{for } n = 0, \dots, N-1$$ Or, using \(W_N^{-kn}\): $$x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]W_N^{-kn}$$.
Key Aspects of the DFT:
- Exact Samples of DTFT: The DFT \(X[k]\) represents exact samples of the DTFT of the finite sequence \(x(n)\) at \(N\) equally spaced frequencies, \(\omega_k = \frac{2\pi}{N}k\).
- Periodic Interpretation: The DFT implicitly assumes that the time-domain signal \(x(n)\) is \(M\)-periodic (where \(M\) is the length). This periodic continuation plays a crucial role in operations like time shifts and convolution.
- Time-Domain Aliasing: Sampling in the frequency domain leads to periodic repetition in the time domain. If the number of frequency points is less than the signal length, this results in aliasing in the time domain.
- Symmetries for Real Signals: For real-valued input signals, the DFT exhibits Hermitian symmetry, \(X[N – k] = X^*[k]\). This has significant implications for computational efficiency:
- The Real part and Magnitude of the DFT are even functions.
- The Imaginary part and Phase of the DFT are odd functions.
- This means it is often sufficient to determine the DFT for \(k = 0, \dots, \lfloor N/2 \rfloor\), effectively reducing computational effort by a factor of 2. Source provides details on how to calculate the second half of the DFT using this symmetry.
Parseval’s Relation in Discrete Context: For discrete signals, Parseval’s relation states that the energy (or power for periodic signals) can be calculated in either the time or frequency domain. For a periodic signal \(x[n]\) with Fourier series coefficients \(c_k\), the power can be calculated as: $$P_x = \sum_{k=0}^{N-1} |c_k|^2$$ This relationship, resulting from the use of harmonic exponential functions as basis functions, can greatly simplify calculations for signals with few spectral lines compared to summing in the time domain.
Step 4: The Fast Fourier Transform (FFT)
While the DFT is computable, its direct implementation requires a computational effort that scales quadratically with the DFT length \(N\) (\(O(N^2)\)). The Fast Fourier Transform (FFT) is not a new transform, but a class of highly efficient algorithms for computing the DFT.
The Decimation-in-Time FFT Algorithm:
- Condition: The FFT algorithms are most efficient when the DFT length \(M\) (or \(N\)) is a power of two (\(M = 2^p\)).
- Splitting the Sum: The core idea is to split the DFT sum into two parts based on even and odd indices of the input signal \(x(n)\). For an \(M\)-point DFT \(X[k]\): $$X[k] = \sum_{n=0}^{M/2-1} x[2n]e^{-j\frac{2\pi}{M}k(2n)} + \sum_{n=0}^{M/2-1} x[2n+1]e^{-j\frac{2\pi}{M}k(2n+1)}$$ This can be rewritten using the twiddle factor \(W_M = e^{-j\frac{2\pi}{M}}\): $$X[k] = \sum_{n=0}^{M/2-1} x[2n]W_M^{2kn} + \sum_{n=0}^{M/2-1} x[2n+1]W_M^{k(2n+1)}$$ Recognising that \(W_M^{2kn} = W_{M/2}^{kn}\) and \(W_M^{k(2n+1)} = W_M^k W_{M/2}^{kn}\), the equation becomes: $$X[k] = \left(\sum_{n=0}^{M/2-1} x[2n]W_{M/2}^{kn}\right) + W_M^k \left(\sum_{n=0}^{M/2-1} x[2n+1]W_{M/2}^{kn}\right)$$ This cleverly transforms one \(M\)-point DFT into two \(M/2\)-point DFTs (one for the even-indexed samples and one for the odd-indexed samples).
- Recursive Application: This splitting process can be repeatedly applied until you are left with simple 2-point DFTs.
- Computational Savings: This recursive decomposition drastically reduces the computational complexity from \(O(M^2)\) to \(O(M \log_2 M)\). For large \(M\) (e.g., \(M = 1024\)), this can mean a factor of 10 or more in computing time savings compared to direct DFT calculation.
- Bit-Reversal: For efficient implementation, the input signal often needs to be re-sorted based on bit-reversed indices.
These Fourier transforms, particularly the computationally efficient FFT, are cornerstones of modern digital signal processing. They enable critical applications in speech processing, such as spectral analysis, filtering, and speech coding, by allowing us to analyze and manipulate signals in the frequency domain with unprecedented efficiency. For instance, the Fast Convolution method, which heavily relies on FFT, offers significant computational savings for filtering operations. This is particularly important when dealing with long impulse responses. The Overlap-Add and Overlap-Save methods are practical implementations that achieve linear convolution using cyclic convolution via FFT.
The application of these concepts extends to areas like noise reduction, where techniques like the Wiener filter operate in the frequency domain, often using FFT for efficient spectral estimation and manipulation. Furthermore, speech coding relies on efficiently representing speech signals in the frequency domain, with methods like LPC-based distance measures and vector quantization applied to short-term spectra. The use of Mel-filterbanks and Discrete Cosine Transform (DCT) to derive Mel-Frequency Cepstral Coefficients (MFCCs) is another direct application of frequency domain analysis for feature extraction in speech recognition systems.