Getting a usable bootloader turned out too complicated. It can't be done with the simpler approach of execution from QSPI, because it can either read and execute code or be written to, but not both both at once. Using QSPI as ROM and uploading data to external memory is another approach suggested by ST. This should work, but initialization hangs when it resets RCC clock. I'm not sure how to solve this yet, plus it seems to require initializing FMC with registers before HAL is started.
So for now I will try to fit everything in flash - currently at 88k out of 128k. In order to allocate math LUTS I've written a script to convert binary data to resources (i.e. add resource header and use little endian format). It can also parse data from C headers, so I can convert pow/log tables.
Once I confirm that this script works correctly, I will add some code to preload data from resources to RAM. And that should be the last major step left till FW is usable.
Made some experiments with using ITCM/DTCM memory. ITCM can be used for loading most common code via linker script + custom startup, confirmed that this works. DTCM will store data/bss/stack and maybe I will use half of it for fast heap section like it was done for CCM on older devices.