M0AGX / LB9MG

Amateur radio and embedded systems

Reducing FFT code size of CMSIS DSP

CMSIS DSP is a fantastic library develeoped by ARM that provides various math primitives (like matrices, filters and FFT). It is usually my first pick when implementing signal processing on microcontrollers bacause it is highly optimized for ARM Cortex-M cores and is free. I was therefore quite surprised when my project suddenly stopped fitting into a 128 KB flash MCU after adding a simple FFT call.

I dived into the code of arm_rfft_init_q15.c and found out that the library uses lookup tables to speed up FFT. The problem is that no matter which FFT bin sizes you actually use, all possible bin size tables will be compiled in. This could be solved by enabling link-time optimization in the toolchain to constant propagate only the desired bin size, but it comes with its own set of risks so I decided to force the bin size in the init function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
arm_status arm_rfft_init_q15(
    arm_rfft_instance_q15 * S,
    uint32_t fftLenReal,
    uint32_t ifftFlagR,
    uint32_t bitReverseFlag)
{
#if defined(OVERRIDE_CMSIS_DSP_RFFT_INIT_Q15_LENGTH)
    /* This saves flash space. Forcing it here will
     * constant propagate and optimize out other FFT tables.
     */
    fftLenReal = OVERRIDE_CMSIS_DSP_RFFT_INIT_Q15_LENGTH;
#endif

    /*  Initialise the default arm status */
    arm_status status = ARM_MATH_SUCCESS;

    /*  Initialize the Real FFT length */
    S->fftLenReal = (uint16_t) fftLenReal;

    /*  Initialize the Twiddle coefficientA pointer */
    S->pTwiddleAReal = (q15_t *) realCoefAQ15;

    /*  Initialize the Twiddle coefficientB pointer */
    S->pTwiddleBReal = (q15_t *) realCoefBQ15;

    /*  Initialize the Flag for selection of RFFT or RIFFT */
    S->ifftFlagR = (uint8_t) ifftFlagR;

    /*  Initialize the Flag for calculation Bit reversal or not */
    S->bitReverseFlagR = (uint8_t) bitReverseFlag;

    /*  Initialization of coef modifier depending on the FFT length */
    switch (S->fftLenReal)
    {
    case 8192U:
        S->twidCoefRModifier = 1U;
        S->pCfft = &arm_cfft_sR_q15_len4096;
        break;
    case 4096U:
        S->twidCoefRModifier = 2U;
        S->pCfft = &arm_cfft_sR_q15_len2048;
        break;
    case 2048U:
        S->twidCoefRModifier = 4U;
        S->pCfft = &arm_cfft_sR_q15_len1024;
        break;
    case 1024U:
        S->twidCoefRModifier = 8U;
        S->pCfft = &arm_cfft_sR_q15_len512;
        break;
    case 512U:
        S->twidCoefRModifier = 16U;
        S->pCfft = &arm_cfft_sR_q15_len256;
        break;
    case 256U:
        S->twidCoefRModifier = 32U;
        S->pCfft = &arm_cfft_sR_q15_len128;
        break;
    case 128U:
        S->twidCoefRModifier = 64U;
        S->pCfft = &arm_cfft_sR_q15_len64;
        break;
    case 64U:
        S->twidCoefRModifier = 128U;
        S->pCfft = &arm_cfft_sR_q15_len32;
        break;
    case 32U:
        S->twidCoefRModifier = 256U;
        S->pCfft = &arm_cfft_sR_q15_len16;
        break;
    default:
        /*  Reporting argument error if rfftSize is not valid value */
        status = ARM_MATH_ARGUMENT_ERROR;
        break;
    }

    /* return the status of RFFT Init function */
    return (status);
}

This tiny fix hardcodes the FFT bin size at compile time and optimizes out all other lookup tables. The hack could be supplemented by an assertion to give an obvious error when the demanded bin size is differen t to the one that is built. I am using CMSIS DSP V.1.5.1 in this particular project. This issue has been fixed by ARM. However updating complete libraries is often not easy in embedded projects (due to testing, compliance, certification and other reasons) so simple hacks like this can save the day. :)