Handling memory when porting a Daisy patch to Lich with OWL3

Hirnlego · November 15, 2022, 5:08pm

Hi.

I have a complex C++ Daisy patch that I’m trying to port to a Lich with OWL3, but I’m having troubles to make it fit.

Currently this is the memory usage as reported while compiling the Daisy patch:

       FLASH:      122888 B       128 KB     93.76%
     DTCMRAM:          0 GB       128 KB      0.00%
        SRAM:       31220 B       512 KB      5.95%
      RAM_D2:       16704 B       288 KB      5.66%
      RAM_D3:          0 GB        64 KB      0.00%
     ITCMRAM:          0 GB        64 KB      0.00%
       SDRAM:      12317 KB        64 MB     18.79%
   QSPIFLASH:          0 GB         8 MB      0.00%

I think that the bit that is giving me trouble (memory overflow) is that SDRAM usage taken by various audio buffers. I’m not quite sure, but I think that that amount should fit in the OWL3’s SDRAM.

how much SDRAM has the OWL3 exactly?

should I define my variables in a specific way? This is the definition in the Daisy code:

 float DSY_SDRAM_BSS buffer1L[262144];
 float DSY_SDRAM_BSS buffer1R[262144];

 float DSY_SDRAM_BSS buffer2L[3][48000];
 float DSY_SDRAM_BSS buffer2R[3][48000];

 float DSY_SDRAM_BSS buffer3[1572864];

 float DSY_SDRAM_BSS buffer4L[4][96000];
 float DSY_SDRAM_BSS buffer4R[4][96000];

are there any general gotchas or things to be aware of when doing a port from Daisy to OWL3?

antisvin · November 15, 2022, 10:43pm

Hi,

Don’t allocate those buffers statically. Instead of that use FloatArray::create in constructor and FloatArray::destroy in destructor. Then you patch would be utilizing all available memory sections when it’s possible. For multichannel buffers use AudioBuffer class the same way.

OWL2/3 has 8 MB of SDRAM, Xibeca (on AC/DC) has 32 MB. Your buffers need 12.3 MB and won’t fit on an OWL3 board. There’s also ~1MB of faster SRAM memories that would be used for buffers that fit in them, but it would be insufficient in this case.

Hirnlego · November 16, 2022, 8:28am

Only 8 MB? Bummer, I hoped it had at least the double of that. Well, I guess I’ll have to cut here and there…

I’ll try the dynamic allocation, thanks for the info.

On a side note, what are the main differences between OWL3 and Xibeca? I couldn’t find exact specifications.

antisvin · November 16, 2022, 5:17pm

Memory aside, Xibeca has a slightly different MCU from the same family (H750 vs H743 on OWL3) with different amount of internal flash (128KB vs 1MB). It’s not that important as most of the data except firmware is stored on an external flash chip. But that chip is also different and is used with QSPI peripheral which is faster than SPI used on OWL2/3. Xibeca also relies on DFU for flashing, because like Daisy it can’t fit both bootloader and firmware on internal flash.

The most important change (SDRAM amount aside) is that Xibeca has a different codec with more channels (6+8) which opens up some new opportunities. It also is also much more compact, allowing to fit it in just 6 HP while 12HP is required for classic OWL board or 10HP for a redesign used on Genius.

Befaco · November 16, 2022, 6:11pm

Bummer^2

The big problem here is that Xibeca cannot be built and OWL3 we have plenty

Hirnlego · November 20, 2022, 4:47pm

Ok, I updated the code and used FloatArray and AudioBuffer wherever I needed a mono or a stereo buffer. I also reduced the size of the bigger buffer and now SDRAM should be sufficiently below 8 MB, but unfortunately I’m still getting “memory overflow” when loading the patch in the module.

Is there a way to know how much is the memory overflowing? Or a usage report like the one displayed when compiling for Daisy?

antisvin · November 20, 2022, 7:22pm

Yes, if you run something like make grind PATCH=YourPatch then it will print various info including dynamically allocated data amount. This builds a native version of the patch and runs it for 1 cycle. Note that you must have a working version of valgrind in order to get this profiling info.

Hirnlego · November 21, 2022, 8:31am

Thank you, I’ll check it out!

Hirnlego · November 21, 2022, 4:16pm

Should I have anything more installed, apart gcc? Or should I edit native.mk somehow?
I’m getting

Building patch Template
LibSource/FloatArray.cpp:429:22: error: use of undeclared identifier 'exp10f'
    destination[i] = exp10f(data[i]*0.05);
                     ^
1 error generated.
make[1]: *** [Build/Test/FloatArray.o] Error 1
make: *** [grind] Error 2

antisvin · November 21, 2022, 9:31pm

Do you follow the README file in that repo? What exactly are you doing to get that error?

Hirnlego · November 22, 2022, 8:21am

Yes, I followed the README, and by the way building the patch locally doesn’t throw any error:

> make PATCHNAME=<...> clean patch
Building patch <...>
>

Instead this won’t work:

> make PATCHNAME=<...> clean grind
Building patch <...>
LibSource/FloatArray.cpp:429:22: error: use of undeclared identifier 'exp10f'
    destination[i] = exp10f(data[i]*0.05);
                     ^
1 error generated.
make[1]: *** [Build/Test/FloatArray.o] Error 1
make: *** [grind] Error 2

My gcc:

> gcc -v
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: x86_64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

My valgrind:

> valgrind --version
valgrind-3.21.0.GIT-lbmacos

I’ve ran make libs - that only gave an error because I’m missing emcc, otherwise everything seems fine.

antisvin · November 22, 2022, 11:51am

I think I’ve understood what’s happening. The exp10f function is defined only if -ffast-math is passed to the compiler and it only happen for release builds. I think this is not the intended behavior, so maybe @mars will want to change this.

As a workaround you could try adding it in native.mk for debug builds. If my guess is correct, it’s not clear why you’re making debug builds, but it’s enabled if you select it explicitly with CONFIG variable or have a TEST variable defined

The error about missing emcc is not a problem, it only matters if you want to run patches in the browser.

Hirnlego · November 23, 2022, 4:28pm

Actually I’m not trying to make debug builds voluntarily. From what I see, building as I’m doing should already add the flag. In native.mk:

ifeq ($(CONFIG),Release)
CPPFLAGS    ?= -Os -ffast-math
endif

Release is the default value for CONFIG, I think. Even so, if I remove that if and leave the CPPFLAGS line, I keep getting the same error.

Looking at basicmath.h, it appears to me that this whole ifdef

#ifdef __FAST_MATH__ /* set by gcc option -ffast-math */

is inside this other block

#ifdef ARM_CORTEX

Maybe it doesn’t enter this one in the first place?

Hirnlego · November 23, 2022, 6:27pm

Ok, I managed to make it work by extracting from the ifdef all the fast math defines and also substituting in PatchRun.cpp

#include <malloc.h>

with

#include <stdlib.h>

because it wasn’t working on macOS (or so I read).

Now I’m getting a looong output that frankly I’m not able to decipher. These are the last few rows, what should I be looking at?

==73827== Process terminating with default action of signal 11 (SIGSEGV)
==73827==  Access not within mapped region at address 0x2F93D1CE8
==73827==    at 0x1000095EB: hlstk::Sampler::Process(float, float, float&, float&) (in ./Build/Test/patch)
==73827==    by 0x100007938: MyPatch::Process(float, float, float&, float&) (in ./Build/Test/patch)
==73827==    by 0x100005073: MyPatch::processAudio(AudioBuffer&) (in ./Build/Test/patch)
==73827==    by 0x100004467: main (in ./Build/Test/patch)
==73827==  If you believe this happened as a result of a stack
==73827==  overflow in your program's main thread (unlikely but
==73827==  possible), you can try to increase the size of the
==73827==  main thread stack using the --main-stacksize= flag.
==73827==  The main thread stack size used in this run was 67104768.
==73827== 
==73827== HEAP SUMMARY:
==73827==     in use at exit: 0 bytes in 0 blocks
==73827==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==73827== 
==73827== All heap blocks were freed -- no leaks are possible
==73827== 
==73827== For lists of detected and suppressed errors, rerun with: -s
==73827== ERROR SUMMARY: 463 errors from 35 contexts (suppressed: 0 from 0)
make[1]: *** [grind] Segmentation fault: 11
make: *** [grind] Error 2

antisvin · November 24, 2022, 9:43am

Nice! I suggest creating a PR on github with your updates, I think Martin will want to make current code more cross-platform if your changes won’t break anything. However, I can tell you that exp10f definitely works on Linux for native X86 builds.

After some more research:

exp10f (GNU Gnulib) :
This function is missing on some platforms: macOS 11.1...
https://github.com/muse-sequencer/muse/issues/584#issuecomment-492027865 - an attempt to define cross-platform exponential functions (specifically, exp10f is added for Apple and non-glibc platforms)

Can you confirm that __APPLE__ is defined when you build a native patch on MacOS? I think we’ll have to add exp10f and maybe something else in basicmath sources, just like we do it if ARM_CORTEX is defined.

Vagrind’s output is a bit cryptic, but I think that Access not within mapped region at address 0x2F93D1CE8 means that you’re trying to access an address that wasn’t allocated. This leads to a segfault when you run the program on desktop or patch crashing on OWL. So your program is likely not failing due to lack of SDRAM, but because there’s some bug in current code. If you don’t mind uploading it somewhere, I can have a look and might find something fishy going on.

Alternatively (I haven’t tried this myself yet) it should be possible to use make native CONFIG=Debug and debug it using GDB.

Hirnlego · November 24, 2022, 1:46pm

Yeah, I changed all the block to this and it works fine, but I don’t think that all the ARM’s fast math defines should also apply to macOS, a curated block would be better:

#if defined(ARM_CORTEX) || defined(__APPLE__)

#ifdef ARM_CORTEX
#define sin(x) arm_sin_f32(x)
#define sinf(x) arm_sin_f32(x)
#define cos(x) arm_cos_f32(x)
#define cosf(x) arm_cos_f32(x)
#define sqrt(x) sqrtf(x)
/* #define sqrtf(x) arm_sqrtf(x) */
#define rand() arm_rand32()
#endif /* ARM_CORTEX */

#ifdef __FAST_MATH__ /* set by gcc option -ffast-math */

// fast lookup-based exponentials
#define pow(x, y) fast_powf(x, y)
#define powf(x, y) fast_powf(x, y)
#define exp(x) fast_expf(x)
#define expf(x) fast_expf(x)
#define exp2(x) fast_exp2f(x)
#define exp2f(x) fast_exp2f(x)
#define exp10(x) fast_exp10f(x)
#define exp10f(x) fast_exp10f(x)

// fast lookup-based logarithmics
#ifdef log2
#undef log2 /* defined in math.h */
#endif
#define log(x) fast_logf(x)
#define logf(x) fast_logf(x)
#define log2(x) fast_log2f(x)
#define log2f(x) fast_log2f(x)
#define log10(x) fast_log10f(x)
#define log10f(x) fast_log10f(x)

#else /* __FAST_MATH__ */

#define exp10(x) powf(10, x)
#define exp10f(x) powf(10, x)

#endif /* __FAST_MATH__ */

#undef RAND_MAX
#define RAND_MAX UINT32_MAX
#endif //ARM_CORTEX / __APPLE__

I’ll try it, thanks!

Hirnlego · December 23, 2022, 10:11am

Well, it appears that “memory overflow” is a catch-all error, I had an unrelated bug in the code.

Now I just have to figure it out how to make MIDI work in the terminal, because having to use the web interface to load the patch is giving me headaches…

Hirnlego · December 23, 2022, 10:31am

Success!