Faust "Hello World"

dxinteractive · May 25, 2020, 12:47am

Hey, I’m keen to try Faust.

Is there a good stater template that simply passes through audio and shows how to get readings from knobs and buttons? I can gradually infer this from reading many patches in the patch library, but a “heres how to Faust Owl in 2020” would be a more stable starting point if it exists.
Are there any difficulties I should watch out for if I choose to use Faust with OWL? Performance? Version incompatibilities with different compilers etc?
Are there any 3rd party libraries you like? I’ve noticed a few uses of guitarix.lib in the patch library for instance, which interests me as they have many different kinds of tube emulations just read to go.

antisvin · May 26, 2020, 8:50am

Hi,

I was actually preparing some documentation that explain Owl-specific Faust usage. It’s mostly ready, so I can upload a link here. It’s meant to describe next update, which means that a few features are not available yet as they are not merged to official branch of OwlLibrary repo that is in Rebeltech online compiler. Will try to finish it soon.

For now you should go through there official tutorials, since question like “how do I get audio pass through” are not really specific to Owl (btw, it’s simply “process = _;”)

Generally, Faust is in a better shape than PD. Performance is great (I’ve seen better results than hand-written C++ for some common DSP code). Faust went through some compiler changes over the years (switch to LLVM and a few version upgrades), but everything seems fairly stable for now. You may consider setting up local build environment if you’re working on a linux machine - personally I find it easier to build locally rather than using online compiler.

Guitarix is great, but there’s so much stuff in their official library, that I have no need to bother with anything else for a long time. Also, I’m pretty sure they’ve been merging to it some stuff originating from guitarix.

sletz · May 26, 2020, 9:40am

Concerning Faust recent developments:

we just added (in 2.24.1 version here Modulo based -dlm 1 is now -dlm 2. Add a faster and correct 'if based… · grame-cncm/faust@675cb0d · GitHub) an option to choose between 3 possible delay lines implementation. -dlm 0 is the existing next power-of-two + mask based one, that produce fast code at the expense of more memory consumption (since the DL size takes the next power-of-two value). We added -dlm 1 that lower the memory consumption (using DL size + 1) and select based wrapping of read/write indexes (but slower than -dlm 0) , and -dlm 2 that lower the memory consumption (using DL size + 1) and modulo based wrapping of read/write indexes (even slower than -dlm 1). These options can be quite helpful in embedded use-cases.
Yann is working on a new compilation model with even more choices between different “code shape”. This is quite promising since even in pure scalar mode, we can see quite substantial improvement in CPU speed in a lot of cases.

antisvin · May 26, 2020, 2:49pm

@sletz, do you think it would be possible to have dlm options that would work like dlm0 if delay line size is a power of 2, otherwise use algorithm from dlm1 or 2? This way there would be no performance loss for most common use case.

That said, I’ll run some benchmarks to see if there any noticeable overhead from the new algorithms on Owl.

antisvin · May 26, 2020, 2:54pm

Here is the document I’ve mentioned - https://github.com/antisvin/OpenWareLab/blob/faust-docs/Faust/Faust.md . For now you’ll have to ignore a few features that are not yet merged to official branch (no VOct scaling support and shorter parameters lists). I was going to spend some more time on this doc, but I guess it would be useful as is.

sletz · May 26, 2020, 4:30pm

@antisvin: do you think it would be possible to have dlm options that would work like dlm0 if delay line size is a power of 2, otherwise use algorithm from dlm1 or 2: yes sure this would be a possible way to mix options. Another one would be to use -dlm 0 for “small” delays line and -dlm 1 for “bigger” ones.

antisvin · May 26, 2020, 5:10pm

I think that an even better heuristic would be to look at difference between required delay line length and next power of two. This covers use case that you’ve mentioned (very short delay lines), but also uses faster algorithm when it doesn’t introduce significant memory overhead even if the delay line itself is fairly long. Think something like length = pow(2, N) - 1 as worst case scenario here.

dxinteractive · May 26, 2020, 10:36pm

That’s great news! I’ll give your link a read. If the main library covers most things then that’s even better than having to keep an eye out for external libraries.

I’ve been trying out Faust on their online playground and so far it’s been good. I’ve never written in a functional language for audio / dsp before, it’s really suitable. I’ll keep an eye out for the release of the new compiler, in the mean time I’ll try set up a local compiler and just get some test effects happening, thanks

sletz · January 23, 2021, 4:05pm

Faust -dlm option to play with delay-line model has finally been reworked as a -dtl one, with documentation here: Optimizing the Code - Faust Documentation.

mars · January 23, 2021, 5:07pm

Great, we can start using this option with the online compiler (once we upgrade the FAUST version).
@antisvin what do you think is a good threshold? I’m thinking something relatively small, like 4096 samples / 16k bytes.

antisvin · January 23, 2021, 5:22pm

I’ll do some benchmarking and post results here, just rebuilt FAUST for this.

antisvin · January 23, 2021, 6:14pm

The results are interesting. I’ve used this reverb patch from our library for testing: https://www.rebeltech.org/patch-library/patch/Owlgazer_Shimmer_Reverb

First of all, on current OWL (STM32F4 MCU) we get the expected effects of trading some extra RAM for lower CPU utilization:

dlt - RAM - CPU
1024 - 341k - 88%
2048 - 352k - 85%
4096 - 362k - 82%
8192 - 369k - 80%

DLT values outside of that range made no difference.

But the real surprise is running it on STM32H7 MCU, which is similar to what we’ll have on next generation OWL board (am I leaking secrets here already?). The two important points is that it has cache and that there’s 512kb of internal SRAM that is used for dynamic memory before SDRAM. SDRAM has much higher latency and things get worse as we get more cache misses on it.

dlt - RAM - CPU
1024 - 341k - 8%
2048 - 352k - 8%
4096 - 363k - 7%
8192 - 369k - 7%

So, as long as we’re not allocating on SDRAM, it doesn’t seem to make any measurable difference.

To confirm this, I’ve ran another patch that adds several more long buffers and would require utilizing SDRAM. This patch can’t run on current OWL, it’s not performant enough for that.

dlt - RAM - CPU
1024 - 6M - 22%
2048 - 6M - 26%
4096 - 6M - 30%
8192 - 6M - 36%

So on H7 we have the opposite effect - increasing DLT requires more SDRAM access and it slows thing down a lot. Luckily we can have separate settings for those platforms.

I’d suggest using 4k or 8k setting for OWL2 and 1k for OWL3.