Louis Couka

A New Method for High-Quality Real-Time Resampling with an Elliptic Filter

Published: 11/08/2023 | Author: Louis Couka

Table Of Contents

Introduction

I’d like to extend a special thanks to my former colleague, R. Muller, with whom I collaborated at UVI. He developed a novel audio bitcrushing technique, which sparked the idea behind this recursive resampler. This article keeps the math light, making it accessible to everyone.

State of the Art

Resampling: The Traditional Approach

Traditional resampling techniques for audio sampling structures typically use:

  • Polynomial Interpolators: These include Linear1, Cubic2, Bezier3, Hermite4, Lagrange5, Catmull-Rom6 interpolators, and more.
  • Upsampled FIR Kernels, also known as polyphase filters789.

Both traditional approaches also require an additional filter to mitigate aliasing caused by downsampling. Unlike upsampling (increasing the pitch of the input), downsampling (decreasing the pitch) introduces a further risk of aliasing. To address this, an extra anti-aliasing filter is usually employed, often a simple IIR filter since the kernel size can be too extensive for a FIR filter.

Limitations of Traditional Resamplers

Polynomial interpolators are easy to implement and CPU-efficient, but they typically produce poor sound quality. They can introduce excessive aliasing and high-frequency damping, which are clearly audible in audio applications at a 44.1 kHz sampling rate.

Upsampled FIR kernels (polyphase filters) require more effort, necessitating the design of an efficient kernel that will ultimately be re-interpolated by a polynomial interpolator, complicating the aliasing analysis. Some implementations also use mip-mapped versions of the kernel9, which adds complexity, particularly at non-constant resampling rates. This approach can be both challenging and error-prone compared to the more straightforward polynomial methods.

Proposed Solution: A Recursive Resampler Using Elliptic Filtering

Concept

Instead of traditional resampling methods, we propose a recursive resampler using an analog elliptic filter as the primary anti-aliasing component. Recursive structures are well-suited for real-time processing, offering excellent audio quality and robustness to time-variant signals. Moreover, this approach is simpler to implement and requires minimal memory, as there is no need to store a complete sampled kernel. The algorithm is CPU-efficient, employing only an 8th-order IIR filter. Additionally, this method eliminates the need for an extra anti-aliasing filter for downsampling, simplifying the overall algorithm. As the IIR filter operates in minimum-phase mode, it achieves the best latency-to-aliasing ratio theoretically possible.

Digital-to-Analog Conversion of the Input Signal

The core component of our resampler is an analog elliptic filter. Since this filter is analog, the input signal must be converted to an analog format. This conversion involves generating an impulse train, with each impulse weighted by the corresponding sample value of the digital input signal.

This process transforms the digital input into an analog signal with an infinite spectrum, comprising mirrored versions of the bandwidth-limited spectrum of the original digital signal, as predicted by Shannon’s sampling theorem10. The elliptic filter’s primary role is to eliminate this mirrored spectrum content. To effectively filter out these unwanted spectral components and avoid aliasing, a brickwall filter is required, precisely the type of filter for which an elliptic filter is well-suited.

Intuitive Explanation of the Recursive Resampler

Implementing a Recursive Resampler with a Simple First-Order Lowpass Filter

Before realizing the analog filtering with an elliptic filter, let’s first consider our anti-aliasing filter as a simple first-order lowpass filter.

While this approach may produce suboptimal sound quality, it serves as a helpful starting point for understanding how recursive resampling works.

The code below illustrates a digital input signal in the first graph, represented by its discrete samples. This digital signal is then converted to an analog format and filtered using the first-order lowpass filter, which acts as the anti-aliasing filter. The output of this filtering process is displayed in the second graph. Finally, the analog signal is resampled at a different sampling rate, as shown in the third graph.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

fs = 44100
N = 30
t = np.arange(N) / fs
x = (np.sin(t * 2000 * 2 * np.pi) + np.sin(t * 3500 * 2 * np.pi)) * signal.windows.hann(N)
up = 100 # Fake analog using upsampling x 100.
x_analog = signal.upfirdn([1], x, up)
a1 = -0.99 # Arbitrary lowpass coefficient
y = signal.lfilter([1 + a1], [1, a1], x_analog)

plt.stem(np.arange(len(x)), x, basefmt=" ")
plt.title("Input Digital Signal")
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()

t_analog = np.arange(len(y)) / up
plt.plot(t_analog, y)
plt.title("Output of the One-Pole Analog Anti-Aliasing Filter (Suboptimal Quality)")
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()

plt.plot(np.arange(len(y)) / up, y)
plt.scatter(t_analog[::59], y[::59], zorder=2)
plt.title("Resampled signal")
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()

Python - Console Output

Python - Console Output

Python - Console Output

As the graphs illustrate, the continuous filtered version can be easily expressed mathematically as a series of decaying exponentials, resulting from the instantaneous integration of each periodic impulse from the input signal. When the analog input signal is silent (as it does during two of the impulses), the filter enters a “free-running” mode, exhibiting an exponentially decaying tail that approaches zero. When the filter is suddenly fed an input impulse, it integrates this impulse instantaneously, which translates to a simple addition operation on the current state of the filter’s tail.

To compute the final sampled values in the last graph (depicted as dots) without evaluating the entire analog signal, we integrate each impulse and apply a time offset using the exponentiation operator to target the new time position corresponding to the desired sampling rate.

To effectively remove aliasing, we set the filter’s cutoff frequency to the minimum of the Nyquist frequencies of both the source and target sampling rates:

$$f_{c} = \frac{\min\left(sr_{source}, sr_{target}\right)}{2}$$

This cutoff definition filters out additional aliasing caused by downsampling, eliminating the need for an extra anti-aliasing filter, as is common in traditional resampling schemes.

Since the filter currently used is a 6 dB/oct lowpass, the aliasing may still be audible. Therefore, we will increase the filter order by replacing our simple one-pole filter with an elliptic brickwall lowpass filter.

Implementing a Recursive Resampler with an Elliptic Filter

Now that we understand the principle of recursive resampling, we can implement a more robust elliptic filter instead of the basic one-pole lowpass filter. The approach involves decomposing the complex elliptic filter into parallel first-order sections and applying the same integration process used for the first-order filter to each of these sections. Finally, we sum the results to obtain the output analog response of the elliptic filter.

The code below follows the same process as for the first-order filter, but this time we employ an 8th-order elliptic filter. As we can see, the analog filtered response is much smoother, resulting in significantly improved sound quality.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
b, a = signal.ellip(8, 0.01, 90, 1/up, 'low', analog=False) # Fake analog using upsampling x 100.
y = signal.lfilter(b, a, x_analog)

plt.stem(np.arange(len(x)), x, basefmt=" ")
plt.title("Input Digital Signal")
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()

t_analog = np.arange(len(y)) / up
plt.plot(t_analog, y)
plt.title("Output of the Analog Anti-Alising Filter")
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()

plt.plot(np.arange(len(y)) / up, y)
plt.scatter(t_analog[::59], y[::59], zorder=2)
plt.title("Resampled Signal")
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()

Python - Console Output

Python - Console Output

Python - Console Output

Designing the Optimal Elliptic Filter

Desired Characteristics

  • Minimal Resonance: While an ideal anti-aliasing filter might seem to require a sharp, brickwall cutoff, in practice, this isn’t ideal. It’s especially important to consider that, when used for pitch shifting, the knee can move into the audible range, and a harsh knee would sound extremely unpleasant. To avoid this, we aim for a filter with a softer knee and reduced resonance, resulting in a smoother, more musical sound.

  • Unnoticeable Aliasing: A cutoff slope of -60 dB/octave is effective, while -90 dB/octave is ideal for pristine sound quality. Keep in mind post-processing by the sound engineer can sometimes elevate background noise, so a higher threshold is always advantageous.

  • Integer Latency at DC: Using an elliptic filter typically introduces a latency between 1 and 3 samples, which is imperceptible. Although not critical, setting an integer value for the group delay at DC (where the resampling rate factor is 1) helps maintain the original waveform shape.

  • Flat Response and Low Ripple: Keeping the ripple range within an imperceptible 0.1 dB difference meets relaxed limits of human hearing, so we can set the ripple target accordingly.

  • Wide Frequency Response: A flat response across the audible range is essential. At a 44.1 kHz sampling rate, maintaining response up to 17–18 kHz is generally sufficient.

  • Moderate CPU Usage: For an 8th-order elliptic filter, each sample requires four complex exponential calculations, plus a minor number of additions and multiplications. Efficient implementation is crucial, especially if the filter is used in a sampler playing up to more than 200 voices simultaneously.

Creating the Elliptic Filter Response

To make our anti-aliasing filter suitable for sampling, it must have more poles than zeros. This design choice ensures that the output is free from Dirac impulses, which cannot be sampled. Another way to interpret this is that any Dirac impulse in the output would prevent the spectral amplitude from decreasing as frequency increases, leading to convergence issues when the spectrum folds due to sampling. This problem is similar to what occurs in impulse invariance techniques.

A standard elliptic filter unfortunately has an extra zero. To address this, we either add a pole or remove a zero. Here, we chose to remove one zero by omitting a direct polynomial coefficient from the modal decomposition, as it was close to the threshold of the bottom ripple and negligible.

The filter that meets these requirements is generated below using signal.ellip(), with additional refinements applied. To verify the design, we plot both the magnitude response and the group delay to ensure all constraints are met.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# Note to myself : it would be smarter to use the zeros near to nyquist to center pos
def ellip(N, rp, rs, Wn, bytype, analog="True"): # This quick n dirty ellip function will recenter w0 to the stopband start freq 
  b, a = signal.ellip(N, rp, rs, 1, bytype, analog)
  w = np.linspace(1, 4*np.pi, 100000)
  _, H = signal.freqs(b, a, w)
  w0 = (w[abs(H) < 10**(-rs/20)][0])
  w0 **= 0.5
  return signal.ellip(N, rp, rs, Wn/w0, bytype, analog=True)

#b, a = signal.ellip(8, 0.01, 90, 2.477092313631182, 'low', analog=True)
#b, a = signal.ellip(16, 0.01, 110, 3, 'low', analog=True)
#b, a = signal.ellip(4, 0.1, 50, 2.26403, 'low', analog=True)
#b, a = signal.ellip(16, 0.0045, 130, 3, 'low', analog=True)
#b, a = signal.ellip(16, 0.01, 200, 2.8, 'low', analog=True)
#b, a = signal.ellip(8, 0.01, 114, 18662/22050*np.pi, 'low', analog=True)
#b, a = signal.ellip(8, 0.0001, 108, 18413.3/22050*np.pi, 'low', analog=True)
b, a = signal.ellip(8, 0.01, 108, 18413.3/22050*np.pi, 'low', analog=True)
b *= 10**(0.005/20) # center ripple

r, p, k = signal.residue(b, a)
b, a = signal.invres(r, p, 0)

### PLOT FILTER FREQ
len_w = 1000
w = np.linspace(0, np.pi, len_w)
_, H = signal.freqs(b, a, w)
f = w/(2*np.pi) * fs
magn = 20*np.log10(np.abs(H))
plt.plot(f, magn, label = "Passband Response")

num_folds = 1000
w_full = np.linspace(np.pi, (num_folds+1)*np.pi, len_w*num_folds)
_, H_full = signal.freqs(b, a, w_full)
energy = np.zeros(len_w)
for i in range(num_folds):
  energy += np.abs(H_full[i*len_w:][:len_w][::i%2*2-1])**2
aliasing = 10*np.log10(energy)
plt.plot(f, aliasing, label = "Cumulated Aliasing")

plt.ylim(-130, 5)
plt.axvline(x=fs/2, ls='--', linewidth=0.7, c="#BBC3CF")
plt.title("Magnitude Response of the Resampling Filter (Elliptic Resampling)")
plt.ylabel('Magnitude (dB)')
plt.xlabel('Frequency (Hz) for fs = ' + str(fs) + 'Hz')
plt.legend()
plt.show()

### PLOT FILTER GROUP DELAY
gd = np.unwrap(np.angle(H))
gd = -np.gradient(gd)/np.gradient(w)
plt.plot(f, gd)
plt.ylim(-0.1, 8.2)
plt.axvline(x=fs/2, ls='--', linewidth=0.7, c="#BBC3CF")
plt.title("Group Delay of the Resampling Filter (Elliptic Resampling)")
plt.ylabel('Group delay (samples)')
plt.xlabel('Frequency (Hz) for fs = ' + str(fs) + 'Hz')
plt.show()

# print("Group Delay at DC: " + str(gd[0]) + " Samples") # Better to have a integer delay.
# print("Magn at DC: " + str(aliasing[0]) + " dB")
# print("Magn at 4kHz: " + str(aliasing[f>4000][0]) + " dB")
# print("Magn at 10kHz: " + str(aliasing[f>10000][0]) + " dB")
# print("Magn at 12kHz: " + str(aliasing[f>12000][0]) + " dB")
# print("Slope per octave" + str(aliasing[f>1000][0] - aliasing[f>2000][0]) + " dB")
# print("Magn mean: " + str(np.mean(aliasing[f < 18000])) + "dB")
# print("Max deviation for f<1000Hz: ", np.max(np.abs(magn[f<=10000])), "dB")

Python - Console Output

Python - Console Output

Smoothing the Knee

As discussed, a soft knee is essential. During pitch shifts, a sharp knee could enter the audible range, causing unpleasant smearing effects. To smooth the spectrum and eliminate these artifacts, we might reduce the resonance of the most prominent section of our elliptic filter. This is similar to the resonance adjustment used in resonant filters. The result will be a smooth anti-aliasing filter with a short decay, giving a more musical response.

Comparison with Other Interpolators

To illustrate the differences, we will plot the same graph for both linear interpolation and a polyphase filter, represented by a windowed cardinal sine function. As we can observe, linear interpolation not only attenuates the signal but also introduces excessive aliasing. In contrast, the polyphase filter provides a more favorable response, with its musical soft knee arising from the necessity to trim the kernel, to optimize CPU usage. Both methods possess the property of linear phase, which can be advantageous, however, it also increases latency.

Additionally, the aliasing profile does not account for the extra filtering required by these interpolators when used for downsampling. This additional filter can be a significant source of aliasing/damping unless the developer opts for an elliptic filter, which is somewhat ironic given that we aim to utilize an elliptic filter ourselves. Furthermore, incorporating this extra filtering would compromise the linear phase characteristic.

For a sampler usage, one approach to avoid this additional filtering step is to use mipmapping for the input samples, allowing the filtering process to occur offline during sample loading. While this method can help reduce aliasing, particularly at high frequencies, it does not completely eliminate it. Additionally, mipmapping adds significantly more complexity.

With that in mind, the proposed method clearly stands out as the optimal approach for resampling audio signals in real-time.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def drop_sample_kernel(t):
    return np.ones_like(t) * (0 <= t)* (t < 1)

def linear_kernel(t):
    return np.select([(0 <= t) * (t < 1), (1 <= t) * (t < 2)], [t, 2 - t])

# https://ccrma.stanford.edu/~jos/Interpolation/Lagrange_Interpolation_Coefficients_Orders.html
def lagrange2(t):
    return np.select([(0 <= t) * (t < 1), (1 <= t) * (t < 2), (2 <= t) * (t < 3)],
                     [t*(t-1)/2, -(t-1)*((t-1)-2), ((t-2)-1)*((t-2)-2)/2])

def lagrange3(t):
    return np.select([(0 <= t) * (t < 1), (1 <= t) * (t < 2), (2 <= t) * (t < 3), (3 <= t) * (t < 4)],
                     [(t*(t-1)*(t-2)/6), -(t-1)*((t-1)-1)*((t-1)-3)/2,  (t-2)*((t-2)-2)*((t-2)-3)/2, -((t-3)-1)*((t-3)-2)*((t-3)-3)/6])

def nuttall(t):
    t = (t + 1) / 2
    a = 7
    y  = 0.3635819
    y -= 0.4891775 * np.cos(2*np.pi*t)
    y += 0.1365995 * np.cos(4*np.pi*t)
    y -= 0.0106411 * np.cos(6*np.pi*t)
    y[t < 0] = 0
    y[t > 1] = 0
    return y

def sync_kernel(t):
    return np.piecewise(t, [t == 0, t != 0], [1, lambda t: np.sin(np.pi * t)/(np.pi * t)])

def sync_kernel_1(t, a):
    return sync_kernel(t) * nuttall(t/a)

oversampling = 40 # Kernel oversampling, need to be big enough so no aliasing interferences for analysis
kernel_duration = 1000 # Second = samples, increase in order to zero pad the kernel and get smoother spectrum
def linspace(start, stop, step):
  return np.linspace(start, stop, int((stop - start) / step + 1))
t = linspace(-kernel_duration, kernel_duration, 1 / oversampling)#[:-1]

for x_, name in zip([drop_sample_kernel(t), linear_kernel(t), lagrange2(t), lagrange3(t), sync_kernel_1(t, 20)], ("Neighbour Interpolator", "Linear Interpolator", "Lagrange Interpolator order=2", "Lagrange Interpolator order=3", "Polyphase Interpolator")):
  X = np.fft.rfft(x_) / oversampling
  w = np.fft.rfftfreq(len(x_), d=1/oversampling)
  f = w[:kernel_duration] * fs
  magn = 20*np.log10(np.abs(X[:kernel_duration]))
  energy = np.zeros(kernel_duration)
  for i in range(30):
    energy += np.abs(X[(i+1)*kernel_duration:][:kernel_duration][::i%2*2-1])**2
  aliasing = 10*np.log10(energy)
  
  plt.plot(f, magn, label = "Passband Response")
  plt.plot(f, aliasing, label = "Cumulated Aliasing")
  plt.ylim(-130, 5)
  plt.axvline(x=fs/2, ls='--', linewidth=0.7, c="#BBC3CF")
  plt.title("Magnitude Response of the Resampling Filter (" + name + ")")
  plt.ylabel('Magnitude (dB)')
  plt.xlabel('Frequency (Hz) for fs = ' + str(fs) + 'Hz')
  plt.legend()
  plt.show()

  # print("Aliasing @ f=10kHz: ", aliasing[f>=10000][0], "dB")
  # print("Max deviation for f<1000Hz: ", np.max(np.abs(magn[f<=10000])), "dB")

Python - Console Output

Python - Console Output

Python - Console Output

Python - Console Output

Python - Console Output

We will now use the signal.residue() function to decompose our elliptic filter into eight parallel first-order filters.

In the code below, only one element from each pair of conjugate poles and zeros is considered, as just one is sufficient for computing the filter’s output.

For convenience, the corresponding C++ code has been generated, allowing for easy copying and pasting of the coefficients into a C++ application for testing various filter designs.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
### COMPUTE COEFFICIENTS
r, p, k = signal.residue(b, a)

p2 = p[::2]
r2 = r[::2]
print("--- Modal decomposition coefficients --- \n")
print(abs(p2), np.angle(p2))
print(abs(r2), np.angle(r2))

print("\n--- Generated C++ code ---\n")
print("static constexpr int numStages = " + str(len(p2)) + ";")
print("static const auto poles = createComplexVec4(")
for i, p2_ in enumerate(p2):
    print("  { " + str(np.real(p2_)) + ", " + str(np.imag(p2_)) + (" }," if i < len(p2)-1 else " }"))
print(");")

print("static const auto twoResidues = createComplexVec4(")
for i, r2_ in enumerate(r2):
    print("  { " + str(np.real(2*r2_)) + ", " + str(np.imag(2*r2_)) + (" }," if i < len(r2)-1 else " }"))
print(");")
--- Modal decomposition coefficients --- 

[1.46497914 2.02030223 2.55371515 2.83976107] [ 2.70037723  2.11868088 -1.82535138  1.64563957]
[2.6749963  1.93138821 0.95614734 0.2702628 ] [-1.25165703  2.5613368  -0.16040834 -2.24113278]

--- Generated C++ code ---

static constexpr int numStages = 4;
static const auto poles = createComplexVec4(
  { -1.324682919562635, 0.625602948081901 },
  { -1.0523402651805296, 1.7245872111955274 },
  { -0.6430633215429776, -2.4714227916635765 },
  { -0.21233856556873273, 2.8318113006572734 }
);
static const auto twoResidues = createComplexVec4(
  { 1.678557147926121, -5.079849075065903 },
  { -3.230526473231284, 2.1177205484517803 },
  { 1.887744925877217, -0.3054342242923359 },
  { -0.3358013524237038, -0.42356271734465195 }
);

Delay formula of the designed filter

As the resampling is done in real time, we can notice some delay on the figure preseted above. You will always have delay by doing a real time resampling, even linear interpolation cause theorically 1/2 sample delay. The delay caused by the elliptic filter is minimal, as the minimal phase property involve. In practice, an audio sampler can compensate this delay and by precomputing and triming the output audio sample. Since 2 samples is absolutly not detectable for human earing, the only indication of compensentating this delay would be to align perfectly the phase. However, as with IIR filter, there is not linear phase, they will still be a slight difference between the input and the output in term of waveform even if the resampling factor is 1.

By the way, the formula of the delay of our filter is given by :

$$\Delta_{resample} = \displaystyle \tau _{g} \max\left(\frac{f _{s \, \text{out}}}{f _{s \, \text{in}}},1\right)$$

If the resampling is used to modify the pitch, by playing the sample faster or slower, then the delay become :

$$\Delta_{repitch} = \displaystyle \tau _{g} \min(\operatorname{speed}, 1)$$

Where \(\tau _{g}\) is the filter group delay.

Implementation

Now we get all our understanding of the recursive resampling and the right elliptic filter coefficient, we can build the algorythm as follow.

  • Compute the elliptic filter output at the source sample’s sampling rate to ensure proper integration of each input sample.
  • Each time a target output sample is closer than the next input sample to integrate, calculate that output sample using exponentiation to reach the sample’s timestamp.
Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
speed = 0.38 # Resampling factor

def linspace2(start, stop, step):
  return np.linspace(start, stop, int((stop - start) / step + 1))
  
t = linspace2(0, len(x), speed)[:-1]
y = np.zeros(len(t))

cutoff_shift = 1/max(1, speed) # Move the cutoff frequency for the downsampling case
p_ = cutoff_shift*p[::2] # we optim poles conjuguates computation in one pass which explain step=2, as which scipy already paired poles for us.
r_ = cutoff_shift*2*r[::2] # since the optim we also need to double the output gain *2.
exp_p = np.exp(p_)

for j in range(0, len(p_)): # Note : we optim poles conjuguates computation in one pass which explain step=2, as which scipy already paired poles for us.
    i, state = 0, x[0]
    for l in range(len(y)):
        while(t[l] - i >= 1):
            i += 1
            state = exp_p[j]*state + x[i]
        y[l] += np.real(r_[j] * np.exp((t[l] - i) * p_[j])*state) # Compute two poles at a time, faster ;)
        # y[l] = x[int(t[l])] # Drop sample naive interpolation, uncomment just too compare ;)

plt.scatter(np.arange(len(x)), x, zorder=2)

plt.stem(np.arange(len(x)), x, basefmt=" ")
plt.xlim(-0.2, N+0.3)
plt.title('Input Source')
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()
plt.stem(t, y, basefmt=" ")
plt.xlim(-0.2, N+0.3)
plt.title('Resampled')
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()

Python - Console Output

Python - Console Output

Optimization

Avoiding Complex Exponentiation

To enhance the speed of our algorithm, it is essential to eliminate the complex exponentiation from the core loop. A complex exponentiation can be expressed as follows:

$$e^{a+bi} = e^{a} \left( \cos(b) + i \sin(b) \right)$$

The computational cost of this operation includes 1 exponentiation, 1 cosine, and 1 sine, making it relatively heavy for a CPU. Therefore, finding a more efficient alternative is advantageous.

At first glance, it may seem impossible to avoid recalculating the complex exponentiation for every sample, as its argument varies with each computation. However, the variation of this argument can only take on two values, corresponding to either incrementing the input index by 1 or incrementing the output index by a constant speed factor based on the block size. Thus, we can express the result of the exponentiation in terms of its previous value multiplied by one of two precomputed geometric increments. This transformation allows us to replace the complex exponentiation with a complex multiplication, significantly reducing the computational load to just 1 ADD, 1 SUB, and 4 MUL, as illustrated by:

$$\left(a + bi\right)\left(c + di\right) = \left(ac - bd\right) + \left(ad + bc\right)i$$

And now, we can achieve remarkably fast resampling! :)

To prevent error propagation from the recursive computation of complex exponentiation, it is wise to periodically recompute it using its closed forms. In a real-time audio application, this recalculation should ideally take place at the beginning of the audio processing block function, allowing for the state variable to be stored locally within that function.

In scenarios where the speed is not constant but varies linearly, the optimization technique can be applied twice. This involves recursively computing the variation, which itself is calculated recursively. Although this approach introduces one additional complex multiplication, it eliminates the need for exponentiations. To go further, we can incorporate as many multiplications as needed to increase the order of the speed modulation shape, providing greater flexibility in handling various modulation profiles.

In the end, by refactoring the complex multiplications, the algorithm can be optimized to utilize only a limited number of multiplication (MUL) and addition/subtraction (ADD/SUB) operations per section. With our 8th-order elliptic filter, we are left with just four sections to compute, thanks to the conjugate pair optimization mentioned earlier. While this optimization ensures that our resampler is highly competitive, there are still opportunities for further enhancement, as discussed below.

SIMD Instructions

The parallel decomposition of the filter presents an opportunity to leverage SIMD (Single Instruction, Multiple Data) operations in our code. By employing vectors of dimension 4, which corresponds to half the number of poles, SIMD instructions (such as AVX or NEON) can be seamlessly integrated. Utilizing NEON on a Mac M1 yielded an impressive speedup of over 3 times, showcasing the effectiveness of this approach.

Additional Mathematical Optimizations

The while loop can also be optimized by updating the state and exp_p_frac variables less frequently, for instance, every 8 iterations. This approach means that the two complex multiplications within the loop are performed only every 8 iterations. When feeding the state with the input, we only need to multiply the input by \(exp^{p_{ik}}\) for \(1 \leq k < 8\), where \(p_i\) represents the poles of the filter. These values can be precomputed in a table and even calculated offline for the upsampling case.

This optimization resulted in a 30% reduction in the total cost of our C++ algorithm when the -O3 optimization flag was enabled on a Mac M1, with even greater improvements observed on Windows.

Additionally, moving the residue multiplication outside the loop contributed to the overall efficiency.

Hot Disabling for Speed = 1

In sound design, there are many scenarios where the interpolation speed perfectly matches 1, meaning no resampling is necessary. When this is the case, we can utilize memcpy(), which operates at remarkable speed. However, since our interpolator is not linear phase, even if we align the delay at DC to be an integer, there is no straightforward method to disable the interpolator without introducing artifacts.

In the context of our polyphonic sampler, we have the flexibility to switch between a linear interpolator and elliptic interpolation as the voice starts. This choice is informed by various indicators that help us predict whether resampling will be needed. The linear interpolator, which can be easily enabled or disabled, serves as a fallback in case our prediction is incorrect. Although this scenario may occur, it is relatively rare.

Oversampling Applications (Speed = \(2^n\))

Oversampling is a technique that allows for processing signals at a higher frequency rate. A common strategy for implementing oversampling is to use a simple polyphase filter. This approach capitalizes on the fact that oversampling involves both upsampling and downsampling with integer ratios, creating an ideal scenario for polyphase filters, as they eliminate the need to interpolate the filter kernel.

However, as previously discussed, polyphase filters can encounter several other challenges. Therefore, our solution remains a superior option in this context.

To adapt our algorithm for oversampling, we can implement specialized cases for speeds of 2, 4, 8, or 16, as well as for speeds of 1/2, 1/4, 1/8, or 1/16. This allows for further optimizations since the selected sample positions will now either be integers or subdivisions of integers. This change results in fewer variables, more constants, and ultimately, fewer operations.

By implementing this approach for specific speed cases, we enhance the efficiency of our algorithm by approximately 20% when using a C++ implementation with the -O3 optimization flag, further demonstrating that the initial algorithm is already performing exceptionally well.

Speed Interpolation Example

The code below demonstrates an example of resampling with varying sampling rates, without utilizing any complex exponentiation within the loop.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
speed = np.linspace(0.1, 0.9, 60)
dspeed = speed[1] - speed[0]

t = np.cumsum(speed)
y = np.zeros(len(t))

# Precomputed scale factors to avoid a cpu expensive complexe exp in the for loop
exp_p_inv, exp_p_speed, exp_p_dspeed = 1 / exp_p, np.exp(p_ * speed[0]), np.exp(p_ * dspeed)

for j in range(0, len(p_)):
    i, state, exp_p_frac = 0, x[0], r_[j]
    for l in range(len(y)):
        while t[l] - i >= 1:
            i += 1
            state = exp_p[j] * state + x[i]
            exp_p_frac *= exp_p_inv[j]
        y[l] += np.real(exp_p_frac * state)
        exp_p_frac *= exp_p_speed[j]
        exp_p_speed[j] *= exp_p_dspeed[j]

plt.stem(np.arange(len(x)), x, basefmt=" ")
plt.xlim(-0.2, N+0.3)
plt.title('Input Source')
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()
plt.stem(t, y, basefmt=" ")
plt.xlim(-0.2, N+0.3)
plt.title('Resampled (Varying Sample Rate)')
plt.ylabel('Amplitude')
plt.xlabel('Samples')
plt.show()

Python - Console Output

Python - Console Output

C++ Library

The algorithm has been successfully implemented in C++ and will soon be available on GitHub for both personal and commercial use. Feel free to share it if you find it helpful!

Conclusion

We have demonstrated that recursive resampling outperforms traditional methods in many aspects of real-time programming. The resampling quality is incredibly superior to polynomial interpolators and is equivalent to that of a windowed sinc interpolator in terms of magnitude fidelity, as illustrated in the previous graphs.

The algorithm itself is remarkably efficient and simple, as it incorporates the additional filtering typically required for downsampling scenarios. The elliptic resampler is resource-efficient compared to polyphase resamplers and is mostly comparable to polynomial interpolators regarding CPU and memory usage. Moreover, the elliptic resampler is particularly well-suited for real-time applications, with a latency of only 2 samples, thanks to its use of minimum phase filters.

We successfully utilized this solution in our resampler, allowing us to play more than 600 resampled voices simultaneously on a single CPU unit of a MacBook Air M1, which is a significant achievement.

I hope you enjoy this article, feel free to share your thoughts and feedback. See you next time! ☕