A New Method for High-Quality Real-Time Resampling with an Elliptic Filter
Table Of Contents
Introduction
I’d like to extend a special thanks to my former colleague, R. Muller, with whom I collaborated at UVI. He developed a novel audio bitcrushing technique, which sparked the idea behind this recursive resampler. This article keeps the math light, making it accessible to everyone.
State of the Art
Resampling: The Traditional Approach
Traditional resampling techniques for audio sampling structures typically use:
- Polynomial Interpolators: These include Linear1, Cubic2, Bezier3, Hermite4, Lagrange5, Catmull-Rom6 interpolators, and more.
- Upsampled FIR Kernels, also known as polyphase filters789.
Both traditional approaches also require an additional filter to mitigate aliasing caused by downsampling. Unlike upsampling (increasing the pitch of the input), downsampling (decreasing the pitch) introduces a further risk of aliasing. To address this, an extra anti-aliasing filter is usually employed, often a simple IIR filter since the kernel size can be too extensive for a FIR filter.
Limitations of Traditional Resamplers
Polynomial interpolators are easy to implement and CPU-efficient, but they typically produce poor sound quality. They can introduce excessive aliasing and high-frequency damping, which are clearly audible in audio applications at a 44.1 kHz sampling rate.
Upsampled FIR kernels (polyphase filters) require more effort, necessitating the design of an efficient kernel that will ultimately be re-interpolated by a polynomial interpolator, complicating the aliasing analysis. Some implementations also use mip-mapped versions of the kernel9, which adds complexity, particularly at non-constant resampling rates. This approach can be both challenging and error-prone compared to the more straightforward polynomial methods.
Proposed Solution: A Recursive Resampler Using Elliptic Filtering
Concept
Instead of traditional resampling methods, we propose a recursive resampler using an analog elliptic filter as the primary anti-aliasing component. Recursive structures are well-suited for real-time processing, offering excellent audio quality and robustness to time-variant signals. Moreover, this approach is simpler to implement and requires minimal memory, as there is no need to store a complete sampled kernel. The algorithm is CPU-efficient, employing only an 8th-order IIR filter. Additionally, this method eliminates the need for an extra anti-aliasing filter for downsampling, simplifying the overall algorithm. As the IIR filter operates in minimum-phase mode, it achieves the best latency-to-aliasing ratio theoretically possible.
Digital-to-Analog Conversion of the Input Signal
The core component of our resampler is an analog elliptic filter. Since this filter is analog, the input signal must be converted to an analog format. This conversion involves generating an impulse train, with each impulse weighted by the corresponding sample value of the digital input signal.
This process transforms the digital input into an analog signal with an infinite spectrum, comprising mirrored versions of the bandwidth-limited spectrum of the original digital signal, as predicted by Shannon’s sampling theorem10. The elliptic filter’s primary role is to eliminate this mirrored spectrum content. To effectively filter out these unwanted spectral components and avoid aliasing, a brickwall filter is required, precisely the type of filter for which an elliptic filter is well-suited.
Intuitive Explanation of the Recursive Resampler
Implementing a Recursive Resampler with a Simple First-Order Lowpass Filter
Before realizing the analog filtering with an elliptic filter, let’s first consider our anti-aliasing filter as a simple first-order lowpass filter.
While this approach may produce suboptimal sound quality, it serves as a helpful starting point for understanding how recursive resampling works.
The code below illustrates a digital input signal in the first graph, represented by its discrete samples. This digital signal is then converted to an analog format and filtered using the first-order lowpass filter, which acts as the anti-aliasing filter. The output of this filtering process is displayed in the second graph. Finally, the analog signal is resampled at a different sampling rate, as shown in the third graph.
|
|
As the graphs illustrate, the continuous filtered version can be easily expressed mathematically as a series of decaying exponentials, resulting from the instantaneous integration of each periodic impulse from the input signal. When the analog input signal is silent (as it does during two of the impulses), the filter enters a “free-running” mode, exhibiting an exponentially decaying tail that approaches zero. When the filter is suddenly fed an input impulse, it integrates this impulse instantaneously, which translates to a simple addition operation on the current state of the filter’s tail.
To compute the final sampled values in the last graph (depicted as dots) without evaluating the entire analog signal, we integrate each impulse and apply a time offset using the exponentiation operator to target the new time position corresponding to the desired sampling rate.
To effectively remove aliasing, we set the filter’s cutoff frequency to the minimum of the Nyquist frequencies of both the source and target sampling rates:
$$f_{c} = \frac{\min\left(sr_{source}, sr_{target}\right)}{2}$$
This cutoff definition filters out additional aliasing caused by downsampling, eliminating the need for an extra anti-aliasing filter, as is common in traditional resampling schemes.
Since the filter currently used is a 6 dB/oct lowpass, the aliasing may still be audible. Therefore, we will increase the filter order by replacing our simple one-pole filter with an elliptic brickwall lowpass filter.
Implementing a Recursive Resampler with an Elliptic Filter
Now that we understand the principle of recursive resampling, we can implement a more robust elliptic filter instead of the basic one-pole lowpass filter. The approach involves decomposing the complex elliptic filter into parallel first-order sections and applying the same integration process used for the first-order filter to each of these sections. Finally, we sum the results to obtain the output analog response of the elliptic filter.
The code below follows the same process as for the first-order filter, but this time we employ an 8th-order elliptic filter. As we can see, the analog filtered response is much smoother, resulting in significantly improved sound quality.
|
|
Designing the Optimal Elliptic Filter
Desired Characteristics
-
Minimal Resonance: While an ideal anti-aliasing filter might seem to require a sharp, brickwall cutoff, in practice, this isn’t ideal. It’s especially important to consider that, when used for pitch shifting, the knee can move into the audible range, and a harsh knee would sound extremely unpleasant. To avoid this, we aim for a filter with a softer knee and reduced resonance, resulting in a smoother, more musical sound.
-
Unnoticeable Aliasing: A cutoff slope of -60 dB/octave is effective, while -90 dB/octave is ideal for pristine sound quality. Keep in mind post-processing by the sound engineer can sometimes elevate background noise, so a higher threshold is always advantageous.
-
Integer Latency at DC: Using an elliptic filter typically introduces a latency between 1 and 3 samples, which is imperceptible. Although not critical, setting an integer value for the group delay at DC (where the resampling rate factor is 1) helps maintain the original waveform shape.
-
Flat Response and Low Ripple: Keeping the ripple range within an imperceptible 0.1 dB difference meets relaxed limits of human hearing, so we can set the ripple target accordingly.
-
Wide Frequency Response: A flat response across the audible range is essential. At a 44.1 kHz sampling rate, maintaining response up to 17–18 kHz is generally sufficient.
-
Moderate CPU Usage: For an 8th-order elliptic filter, each sample requires four complex exponential calculations, plus a minor number of additions and multiplications. Efficient implementation is crucial, especially if the filter is used in a sampler playing up to more than 200 voices simultaneously.
Creating the Elliptic Filter Response
To make our anti-aliasing filter suitable for sampling, it must have more poles than zeros. This design choice ensures that the output is free from Dirac impulses, which cannot be sampled. Another way to interpret this is that any Dirac impulse in the output would prevent the spectral amplitude from decreasing as frequency increases, leading to convergence issues when the spectrum folds due to sampling. This problem is similar to what occurs in impulse invariance techniques.
A standard elliptic filter unfortunately has an extra zero. To address this, we either add a pole or remove a zero. Here, we chose to remove one zero by omitting a direct polynomial coefficient from the modal decomposition, as it was close to the threshold of the bottom ripple and negligible.
The filter that meets these requirements is generated below using signal.ellip()
, with additional refinements applied. To verify the design, we plot both the magnitude response and the group delay to ensure all constraints are met.
|
|
Smoothing the Knee
As discussed, a soft knee is essential. During pitch shifts, a sharp knee could enter the audible range, causing unpleasant smearing effects. To smooth the spectrum and eliminate these artifacts, we might reduce the resonance of the most prominent section of our elliptic filter. This is similar to the resonance adjustment used in resonant filters. The result will be a smooth anti-aliasing filter with a short decay, giving a more musical response.
Comparison with Other Interpolators
To illustrate the differences, we will plot the same graph for both linear interpolation and a polyphase filter, represented by a windowed cardinal sine function. As we can observe, linear interpolation not only attenuates the signal but also introduces excessive aliasing. In contrast, the polyphase filter provides a more favorable response, with its musical soft knee arising from the necessity to trim the kernel, to optimize CPU usage. Both methods possess the property of linear phase, which can be advantageous, however, it also increases latency.
Additionally, the aliasing profile does not account for the extra filtering required by these interpolators when used for downsampling. This additional filter can be a significant source of aliasing/damping unless the developer opts for an elliptic filter, which is somewhat ironic given that we aim to utilize an elliptic filter ourselves. Furthermore, incorporating this extra filtering would compromise the linear phase characteristic.
For a sampler usage, one approach to avoid this additional filtering step is to use mipmapping for the input samples, allowing the filtering process to occur offline during sample loading. While this method can help reduce aliasing, particularly at high frequencies, it does not completely eliminate it. Additionally, mipmapping adds significantly more complexity.
With that in mind, the proposed method clearly stands out as the optimal approach for resampling audio signals in real-time.
|
|
Modal Decomposition
We will now use the signal.residue()
function to decompose our elliptic filter into eight parallel first-order filters.
In the code below, only one element from each pair of conjugate poles and zeros is considered, as just one is sufficient for computing the filter’s output.
For convenience, the corresponding C++ code has been generated, allowing for easy copying and pasting of the coefficients into a C++ application for testing various filter designs.
|
|
--- Modal decomposition coefficients ---
[1.46497914 2.02030223 2.55371515 2.83976107] [ 2.70037723 2.11868088 -1.82535138 1.64563957]
[2.6749963 1.93138821 0.95614734 0.2702628 ] [-1.25165703 2.5613368 -0.16040834 -2.24113278]
--- Generated C++ code ---
static constexpr int numStages = 4;
static const auto poles = createComplexVec4(
{ -1.324682919562635, 0.625602948081901 },
{ -1.0523402651805296, 1.7245872111955274 },
{ -0.6430633215429776, -2.4714227916635765 },
{ -0.21233856556873273, 2.8318113006572734 }
);
static const auto twoResidues = createComplexVec4(
{ 1.678557147926121, -5.079849075065903 },
{ -3.230526473231284, 2.1177205484517803 },
{ 1.887744925877217, -0.3054342242923359 },
{ -0.3358013524237038, -0.42356271734465195 }
);
Delay formula of the designed filter
As the resampling is done in real time, we can notice some delay on the figure preseted above. You will always have delay by doing a real time resampling, even linear interpolation cause theorically 1/2 sample delay. The delay caused by the elliptic filter is minimal, as the minimal phase property involve. In practice, an audio sampler can compensate this delay and by precomputing and triming the output audio sample. Since 2 samples is absolutly not detectable for human earing, the only indication of compensentating this delay would be to align perfectly the phase. However, as with IIR filter, there is not linear phase, they will still be a slight difference between the input and the output in term of waveform even if the resampling factor is 1.
By the way, the formula of the delay of our filter is given by :
$$\Delta_{resample} = \displaystyle \tau _{g} \max\left(\frac{f _{s \, \text{out}}}{f _{s \, \text{in}}},1\right)$$
If the resampling is used to modify the pitch, by playing the sample faster or slower, then the delay become :
$$\Delta_{repitch} = \displaystyle \tau _{g} \min(\operatorname{speed}, 1)$$
Where \(\tau _{g}\) is the filter group delay.
Implementation
Now we get all our understanding of the recursive resampling and the right elliptic filter coefficient, we can build the algorythm as follow.
- Compute the elliptic filter output at the source sample’s sampling rate to ensure proper integration of each input sample.
- Each time a target output sample is closer than the next input sample to integrate, calculate that output sample using exponentiation to reach the sample’s timestamp.
|
|
Optimization
Avoiding Complex Exponentiation
To enhance the speed of our algorithm, it is essential to eliminate the complex exponentiation from the core loop. A complex exponentiation can be expressed as follows:
$$e^{a+bi} = e^{a} \left( \cos(b) + i \sin(b) \right)$$
The computational cost of this operation includes 1 exponentiation, 1 cosine, and 1 sine, making it relatively heavy for a CPU. Therefore, finding a more efficient alternative is advantageous.
At first glance, it may seem impossible to avoid recalculating the complex exponentiation for every sample, as its argument varies with each computation. However, the variation of this argument can only take on two values, corresponding to either incrementing the input index by 1 or incrementing the output index by a constant speed factor based on the block size. Thus, we can express the result of the exponentiation in terms of its previous value multiplied by one of two precomputed geometric increments. This transformation allows us to replace the complex exponentiation with a complex multiplication, significantly reducing the computational load to just 1 ADD, 1 SUB, and 4 MUL, as illustrated by:
$$\left(a + bi\right)\left(c + di\right) = \left(ac - bd\right) + \left(ad + bc\right)i$$
And now, we can achieve remarkably fast resampling! :)
To prevent error propagation from the recursive computation of complex exponentiation, it is wise to periodically recompute it using its closed forms. In a real-time audio application, this recalculation should ideally take place at the beginning of the audio processing block function, allowing for the state variable to be stored locally within that function.
In scenarios where the speed is not constant but varies linearly, the optimization technique can be applied twice. This involves recursively computing the variation, which itself is calculated recursively. Although this approach introduces one additional complex multiplication, it eliminates the need for exponentiations. To go further, we can incorporate as many multiplications as needed to increase the order of the speed modulation shape, providing greater flexibility in handling various modulation profiles.
In the end, by refactoring the complex multiplications, the algorithm can be optimized to utilize only a limited number of multiplication (MUL) and addition/subtraction (ADD/SUB) operations per section. With our 8th-order elliptic filter, we are left with just four sections to compute, thanks to the conjugate pair optimization mentioned earlier. While this optimization ensures that our resampler is highly competitive, there are still opportunities for further enhancement, as discussed below.
SIMD Instructions
The parallel decomposition of the filter presents an opportunity to leverage SIMD (Single Instruction, Multiple Data) operations in our code. By employing vectors of dimension 4, which corresponds to half the number of poles, SIMD instructions (such as AVX or NEON) can be seamlessly integrated. Utilizing NEON on a Mac M1 yielded an impressive speedup of over 3 times, showcasing the effectiveness of this approach.
Additional Mathematical Optimizations
The while loop can also be optimized by updating the state and exp_p_frac
variables less frequently, for instance, every 8 iterations. This approach means that the two complex multiplications within the loop are performed only every 8 iterations. When feeding the state with the input, we only need to multiply the input by \(exp^{p_{ik}}\) for \(1 \leq k < 8\), where \(p_i\) represents the poles of the filter. These values can be precomputed in a table and even calculated offline for the upsampling case.
This optimization resulted in a 30% reduction in the total cost of our C++ algorithm when the -O3
optimization flag was enabled on a Mac M1, with even greater improvements observed on Windows.
Additionally, moving the residue multiplication outside the loop contributed to the overall efficiency.
Hot Disabling for Speed = 1
In sound design, there are many scenarios where the interpolation speed perfectly matches 1, meaning no resampling is necessary. When this is the case, we can utilize memcpy(), which operates at remarkable speed. However, since our interpolator is not linear phase, even if we align the delay at DC to be an integer, there is no straightforward method to disable the interpolator without introducing artifacts.
In the context of our polyphonic sampler, we have the flexibility to switch between a linear interpolator and elliptic interpolation as the voice starts. This choice is informed by various indicators that help us predict whether resampling will be needed. The linear interpolator, which can be easily enabled or disabled, serves as a fallback in case our prediction is incorrect. Although this scenario may occur, it is relatively rare.
Oversampling Applications (Speed = \(2^n\))
Oversampling is a technique that allows for processing signals at a higher frequency rate. A common strategy for implementing oversampling is to use a simple polyphase filter. This approach capitalizes on the fact that oversampling involves both upsampling and downsampling with integer ratios, creating an ideal scenario for polyphase filters, as they eliminate the need to interpolate the filter kernel.
However, as previously discussed, polyphase filters can encounter several other challenges. Therefore, our solution remains a superior option in this context.
To adapt our algorithm for oversampling, we can implement specialized cases for speeds of 2, 4, 8, or 16, as well as for speeds of 1/2, 1/4, 1/8, or 1/16. This allows for further optimizations since the selected sample positions will now either be integers or subdivisions of integers. This change results in fewer variables, more constants, and ultimately, fewer operations.
By implementing this approach for specific speed cases, we enhance the efficiency of our algorithm by approximately 20% when using a C++ implementation with the -O3
optimization flag, further demonstrating that the initial algorithm is already performing exceptionally well.
Speed Interpolation Example
The code below demonstrates an example of resampling with varying sampling rates, without utilizing any complex exponentiation within the loop.
|
|
C++ Library
The algorithm has been successfully implemented in C++ and will soon be available on GitHub for both personal and commercial use. Feel free to share it if you find it helpful!
Conclusion
We have demonstrated that recursive resampling outperforms traditional methods in many aspects of real-time programming. The resampling quality is incredibly superior to polynomial interpolators and is equivalent to that of a windowed sinc interpolator in terms of magnitude fidelity, as illustrated in the previous graphs.
The algorithm itself is remarkably efficient and simple, as it incorporates the additional filtering typically required for downsampling scenarios. The elliptic resampler is resource-efficient compared to polyphase resamplers and is mostly comparable to polynomial interpolators regarding CPU and memory usage. Moreover, the elliptic resampler is particularly well-suited for real-time applications, with a latency of only 2 samples, thanks to its use of minimum phase filters.
We successfully utilized this solution in our resampler, allowing us to play more than 600 resampled voices simultaneously on a single CPU unit of a MacBook Air M1, which is a significant achievement.
I hope you enjoy this article, feel free to share your thoughts and feedback. See you next time! ☕
-
https://ccrma.stanford.edu/~jos/pasp06/Linear_Interpolation.html ↩︎
-
https://www.codeproject.com/Articles/31859/Draw-a-Smooth-Curve-through-a-Set-of-2D-Points-wit ↩︎
-
https://ccrma.stanford.edu/~jos/pasp/Lagrange_Interpolation.html ↩︎
-
https://en.wikipedia.org/wiki/Centripetal_Catmull%E2%80%93Rom_spline ↩︎
-
https://ccrma.stanford.edu/~jos/JFB/Polyphase_Filtering.html ↩︎
-
https://yehar.com/blog/wp-content/uploads/2009/08/deip.pdf ↩︎ ↩︎
-
https://ccrma.stanford.edu/~jos/mdft/Sampling_Theorem.html ↩︎