Awesome! Haven’t tried the patch yet, but here are some suggestions from looking at code:
float nextAttack = daisysp::fmax(0.01f, daisysp::fmin(env, 0.99f));
There’s no need to use daisysp here, because any call for std::min/max
with floats gets compiled into a single hardware instruction on ARM. Also, in this case you could use std::clamp
(I imagine that would do the same thing, but look cleaner)
while (preDelay && outLen)
{
++outL;
++outR;
--preDelay;
--outLen;
}
This probably could work faster if you do it like this:
int skipSamples = std::min(prevDelalay, outLen);
if (skipSamples) {
outL += skipSamples;
outR += skipSamples;
prevDelay -= skipSamples;
outLen -= skipSamples;
}
Pass data by value for ::interpolated
call to reduce amount of array lookups:
interpolated(left[i], left[j], t);
// ...
inline float interpolated(float a, float b, float t) const
{
return a + t * (b - a);
}
You could optimize this part by using buffer size that is a power of 2 and then using bit mask to truncate data:
const int i = ((int)pos) % bufferSize;
const int j = (i + 1) % bufferSize;
Those are trivial things and it’s likely that compiler does some of them already. Even if it doesn’t there’s not much to gain from math optimizations here. The bigger issue is that you need to access buffer data several times per grain and this buffer can only fit in SDRAM that is relatively slow. So what you could do to improve performance is split recording buffer into multiple pages (something like 4k each) that are allocated separately. This would move some data to SRAM that is much faster to use. Not sure if it’s worth the effort as there could still happen a pathological case where all grains read from pages in SDRAM.