Practical FFT on microcontrollers using CMSIS DSP

Fourier transform is a vast domain of knowledge with many practical applications within signal processing. Virtually all communications protocols use Fourier transform at one step or another (including LTE, GPS and WiFi). Another popular example are the "jumping bars" in music players showing levels of low and high tones in real time. In this post I show the basics of obtaining spectrum of an audio signal on an EFM32 Cortex-M3 microcontroller. No scary math!

Fourier Transform basics

Fourier transform itself can be treated like a mathematical black box that takes any function or combination of functions as the input and outputs a set of sine and cosine functions with different amplitudes, frequencies and phases. Fourier transform "works" on mathematical formulas.

Real-world signals don't come packaged as mathematical formulas but rather as series of samples from an analog-to-digital converter so their processing requires a slightly different tool called a discrete Fourier transform. This transform "works" on numbers. Fast Fourier transform (FFT) is a practical implementation of the discrete Fourier transform with reduced computational complexity.

FFT takes a series of numbers (signal samples). Length of the series has to be a power of two. This dimension is commonly called FFT length or FFT size. FFT output is also a series of numbers of the same length. Each of the output numbers is called an FFT bin. A bin can be thought as a group of frequencies (or as a single vertical jumping bar in a music player).

Example

Let's have a look at a simple system suitable for DTMF detection.

An ADC works with a specific sampling frequency. I will assume Fs = 8 kHz in this example. Per the sampling theorem the maximum input frequency (without aliasing) of the system can be 4 kHz - this is enough for voice communication and DTMF. FFT size is 256. This leads to the following numbers:

• There are 256 frequency bins.
• Each bin is 4000 Hz / 256 = 15.625 Hz wide (theoretically!).
• 256 samples are required to compute the FFT.
• Acquisition time is 0.032 seconds.

First important implication: close frequencies can not be distinguished (only groups of frequencies 15 Hz wide). Second important implication: within the acquisition time of 0.032 seconds it is not possible when each frequency started and ended. This is the fundamental limit of Fourier transform - the more bins (and better frequency resolution), the more samples have to be put it which leads to poor temporal resolution. On the other hand the smaller the size of the FFT - the better temporal resolution and worse frequency resolution. This is called the Gabor limit.

(Over)simplifications

For brevity I omitted phase information. FFT returns not only the amplitude per frequency bin but also the phase (a complex number), however for simple audio applications phase information is not needed. CMSIS DSP library has functions for both complex (with phase) and real (without phase) FFT. Real FFT is slightly faster.

Second simplification is that the frequency bin does not only "pick" adjacent frequencies but also frequencies further away (with diminishing amplitude). This is called spectral leakage and, if needed, can be dealt with by applying a window function to the incoming samples. In my example no window function is used so the effective window is rectangular.

ARM CMSIS DSP library

ARM provides a free and open source library optimized for all Cortex cores with most commonly used math and DSP functions. It is called CMSIS DSP and is available on Github.

CMSIS DSP is often present in blank projects of vendor-provided IDEs but is also easy to integrate with a Makefile project. The only tweak (for EFM32) I had to make was to add those two lines to `arm_math.h`:

 ```1 2``` ``````#include #define ARM_MATH_CM3 ``````

The library provides functions in two flavors: floating-point and fixed-point. Without diving deep into the theory, floating-point functions use types like `float` or `double`. Using floats require a floating-point unit (coprocessor) or very costly software emulation using libraries. On the other hand - the fixed-point counterparts use only integer types and interpret them as floats. For example instead of storing and processing \$1.23 as float (when it comes to money - bad idea anyway) it can be stored as (integer) 123 and interpreted as the original value, just scaled by a factor of 100. Scaling by powers of 10 is easy for humans but scaling by powers of 2 is easier on CPUs. A typical fixed-point format is the Q number format.

My MCU is a Cortex-M3 and has no floating-point unit so I obviously chose the fixed-point implementation. CMSIS DSP has an internal type called `q15_t`, that is simply an `int16_t` but interpreted as a real number between -1 and +1.

Fixed-point also has the benefit of avoiding conversion from int to float when data to be processed is acquired by the built-in ADC.

Making test data

Math functions can take any data in and output some numbers. But how am I going to figure out if they make sense? I start with test data, that I know what to expect in the end. In the case of FFT I started with pure sine waves to verify that I can actually see a proper spectrum after FFT.

CMSIS DSP library is optimized to work only on ARM processors so it would be hard to test it on a PC. To keep the testing simple I decided to hardcode the test data in C arrays and return the spectrum to the PC via the debugger.

How to generate tones in Audacity:

1. Select project rate 8000 (bottom left).
2. Use Generate->Tone with the particular frequency.
3. Length has to be 256 samples (does not have to be longer).
4. File->Export Audio
6. Select encoding "Signed 16-bit PCM"

Exported `.raw` files contain binary data. I used `xxd` command (part of `vim`) to convert them to C code. Example: `xxd -i 697hz.raw > 697hz.inc`

The output contains an `unsigned char` array and its length:

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46``` ``````unsigned char __697hz_raw[] = { 0x00, 0x00, 0x4d, 0x35, 0x04, 0x5b, 0x24, 0x66, 0x67, 0x53, 0x4e, 0x28, 0x68, 0xf1, 0xcd, 0xbe, 0x36, 0x9f, 0xf5, 0x9b, 0xe6, 0xb5, 0x87, 0xe5, 0xe1, 0x1c, 0xca, 0x4b, 0x8e, 0x64, 0xf0, 0x5f, 0x43, 0x3f, 0x1f, 0x0c, 0x6b, 0xd5, 0x2d, 0xab, 0xb5, 0x99, 0x29, 0xa6, 0xd6, 0xcc, 0x80, 0x02, 0x69, 0x37, 0x22, 0x5c, 0xee, 0x65, 0xf0, 0x51, 0xfe, 0x25, 0xf4, 0xee, 0xe2, 0xbc, 0x71, 0x9e, 0x81, 0x9c, 0xa6, 0xb7, 0xf2, 0xe7, 0x3f, 0x1f, 0x76, 0x4d, 0xfd, 0x64, 0x09, 0x5f, 0x4a, 0x3d, 0xa3, 0x09, 0x2a, 0xd3, 0xcd, 0xa9, 0xa2, 0x99, 0x5c, 0xa7, 0x0b, 0xcf, 0xf6, 0x04, 0x82, 0x39, 0x2f, 0x5d, 0xab, 0x65, 0x6a, 0x50, 0xac, 0x23, 0x7e, 0xec, 0x07, 0xbb, 0xb5, 0x9d, 0x20, 0x9d, 0x71, 0xb9, 0x5b, 0xea, 0xa5, 0x21, 0x08, 0x4f, 0x67, 0x65, 0x0e, 0x5e, 0x49, 0x3b, 0x27, 0x07, 0xef, 0xd0, 0x7c, 0xa8, 0x99, 0x99, 0xa6, 0xa8, 0x3a, 0xd1, 0x79, 0x07, 0x89, 0x3b, 0x33, 0x5e, 0x56, 0x65, 0xd8, 0x4e, 0x53, 0x21, 0x0e, 0xea, 0x34, 0xb9, 0x0a, 0x9d, 0xce, 0x9d, 0x41, 0xbb, 0xd2, 0xec, 0xf7, 0x23, 0x9e, 0x50, 0xb4, 0x65, 0x0d, 0x5d, 0x3c, 0x39, 0xa8, 0x04, 0xbe, 0xce, 0x37, 0xa7, 0xa0, 0x99, 0xfc, 0xa9, 0x73, 0xd3, 0xf6, 0x09, 0x8d, 0x3d, 0x24, 0x5f, 0xf4, 0x64, 0x3b, 0x4d, 0xf5, 0x1e, 0xa0, 0xe7, 0x6b, 0xb7, 0x6f, 0x9c, 0x8a, 0x9e, 0x20, 0xbd, 0x47, 0xef, 0x46, 0x26, 0x27, 0x52, 0xf1, 0x65, 0x01, 0x5c, 0x23, 0x37, 0x2b, 0x02, 0x94, 0xcc, 0xfc, 0xa5, 0xbd, 0x99, 0x58, 0xab, 0xb9, 0xd5, 0x6d, 0x0c, 0x88, 0x3f, 0x09, 0x60, 0x80, 0x64, 0x93, 0x4b, 0x91, 0x1c, 0x36, 0xe5, 0xb2, 0xb5, 0xdc, 0x9b, 0x5a, 0x9f, 0x05, 0xbf, 0xbe, 0xf1, 0x98, 0x28, 0x96, 0x53, 0x2c, 0x66, 0xdc, 0x5a, 0x09, 0x35, 0xab, 0xff, 0x6f, 0xca, 0xd5, 0xa4, 0xe2, 0x99, 0xca, 0xac, 0xfc, 0xd7, 0xeb, 0x0e, 0x73, 0x41, 0xe2, 0x60, 0xfd, 0x63, 0xde, 0x49, 0x2b, 0x1a, 0xd1, 0xe2, 0xfe, 0xb3, 0x61, 0x9b, 0x31, 0xa0, 0xf8, 0xc0, 0x39, 0xf4, 0xdb, 0x2a, 0x05, 0x55, 0x4b, 0x66, 0xb2, 0x59, 0xe2, 0x32, 0x2d, 0xfd, 0x53, 0xc8, 0xb9, 0xa3, 0x1a, 0x9a, 0x43, 0xae, 0x4d, 0xda, 0x5e, 0x11, 0x5b, 0x43, 0xa8, 0x61, 0x6d, 0x63, 0x1d, 0x48, 0xc3, 0x17, 0x6a, 0xe0, 0x5d, 0xb2, 0xef, 0x9a, 0x1a, 0xa1, 0xf6, 0xc2, 0xb0, 0xf6, 0x1f, 0x2d, 0x5f, 0x56, 0x62, 0x66, 0x76, 0x58, 0xb4, 0x30, 0xb1, 0xfa, 0x3d, 0xc6, 0xaf, 0xa2, 0x5d, 0x9a, 0xcc, 0xaf, 0x9f, 0xdc, 0xd5, 0x13, 0x34, 0x45, 0x63, 0x62, 0xc9, 0x62, 0x56, 0x46, 0x51, 0x15, 0x12, 0xde, 0xbe, 0xb0, 0x93, 0x9a, 0x10, 0xa2, 0xf9, 0xc4, 0x30, 0xf9, 0x54, 0x2f, 0xb3, 0x57, 0x65, 0x66, 0x2e, 0x57, 0x80, 0x2e, 0x32, 0xf8, 0x35, 0xc4, 0xac, 0xa1, 0xb6, 0x9a, 0x5d, 0xb1, 0xfb, 0xde, 0x42, 0x16, 0x08, 0x47, 0x0b, 0x63, 0x19, 0x62, 0x84, 0x44, 0xda, 0x12, 0xc0, 0xdb, 0x2b, 0xaf, 0x48, 0x9a, 0x0e, 0xa3, 0x0f, 0xc7, 0xa6, 0xfb, 0x8c, 0x31, 0xf3, 0x58, 0x5a, 0x66, 0xdc, 0x55, 0x3e, 0x2c, 0xbc, 0xf5, 0x2f, 0xc2, 0xbe, 0xa0, 0x1c, 0x9b, 0xf9, 0xb2, 0x5c, 0xe1, 0xad, 0x18, 0xd0, 0x48, 0xa4, 0x63, 0x5f, 0x61, 0x9d, 0x42, 0x6e, 0x10, 0x66, 0xd9, 0xaf, 0xad, 0x03, 0x9a, 0x25, 0xa4, 0x22, 0xc9, 0x28, 0xfe, 0xb4, 0x33, 0x29, 0x5a, 0x41, 0x66, 0x76, 0x54, 0x00, 0x2a, 0x3e, 0xf3, 0x3a, 0xc0, 0xd9, 0x9f, 0x90, 0x9b, 0xa6, 0xb4, 0xbc, 0xe3, 0x1a, 0x1b, 0x8a, 0x4a, 0x2e, 0x64 }; unsigned int __697hz_raw_len = 512; ``````

Test waveform has now type `unsigned char` (ie. 8-bits), the original export format was 16-bit signed and CMSIS DSP wants `q15_t`. Why this will work? Pointer to this array can be simply cast to `q15_t` (so as to treat two bytes as a single `int16_t` value). This only works because the raw format on x86 is little-endian and Cortex-M3 is also little endian (to be exact, the particular Cortex-M3 I am using is little-endian, just like 99% on the market).

FFT test code

The code uses SEGGER RTT for output. It processes all test signals in sequence, outputs (absolute value of) the spectrum and stops on a breakpoint.

All magic is done by a single call to `arm_rfft_q15`. The output is a vector (one dimensional array). Each of the elements represents a frequency bin. A positive value means that a particular bin is closer to a sine function, while a negative value, to cosine. Since I don't care about phase in this example - an absolute value is taken (using `arm_abs_q15`). The vector is then sent to the terminal.

To measure performance I use the cycle counter of Cortex-M3 Data Watchpoint and Trace Unit. It is a register that... simply counts how many clock cycles have elapsed. That of course counts interrupt traffic so the results are not always identical. This gives hard data on the execution time (better and easier than counting assembly output instructions). The cycle count register is read before doing the FFT and after. The difference is execution time in clock cycles.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63``` ``````#include #include #include #include "697hz.inc" #include "770hz.inc" #include "852hz.inc" #include "941hz.inc" #include "1209hz.inc" #include "1336hz.inc" #include "1477hz.inc" #include "1633hz.inc" #include "whitenoise.inc" #include "DTMF_0.inc" #define FFT_SIZE 256 typedef struct { const char *desc; unsigned char *data; } test_wave_t; static const test_wave_t WAVES[] = { { "whitenoise", whitenoise_raw }, { "697Hz", __697hz_raw }, { "770Hz", __770hz_raw }, { "852Hz", __852hz_raw }, { "941Hz", __941hz_raw }, { "1209Hz", __1209hz_raw }, { "1336Hz", __1336hz_raw }, { "1477Hz", __1477hz_raw }, { "1633Hz", __1633hz_raw }, { "DTMF0", DTMF_0_raw }, }; void fft_test(void){ static arm_rfft_instance_q15 fft_instance; static q15_t output[FFT_SIZE*2]; //has to be twice FFT size arm_status status; status = arm_rfft_init_q15(&fft_instance, 256/*bin count*/, 0/*forward FFT*/, 1/*output bit order is normal*/); SEGGER_RTT_printf(0, "FFT init %d\n", status); for (uint32_t i = 0; i < sizeof(WAVES)/sizeof(WAVES[0]); i++){ uint32_t c_start = DWT->CYCCNT; arm_rfft_q15(&fft_instance, (q15_t*)WAVES[i].data, output); arm_abs_q15(output, output, FFT_SIZE); uint32_t c_stop = DWT->CYCCNT; SEGGER_RTT_printf(0, "%s %ld ", WAVES[i].desc, c_stop-c_start); for (uint32_t j = 0; j < FFT_SIZE; j++){ SEGGER_RTT_printf(0, "%d ", output[j]); } SEGGER_RTT_printf(0, "\n"); } __BKPT(1); } ``````

FFT results

 ``` 1 2 3 4 5 6 7 8 9 10``` ``````whitenoise 31840 477 0 1040 328 426 297 751 1476 651 358 85 101 531 175 433 67 55 195 164 888 870 556 75 7 847 1880 219 2036 276 725 864 383 667 421 205 305 372 283 345 980 304 1082 388 306 318 1118 962 301 374 433 463 1334 900 149 337 127 138 1054 656 201 767 555 770 422 495 282 272 325 531 842 303 263 662 485 151 626 78 455 1013 344 629 780 110 536 230 17 233 787 100 443 1438 234 370 183 303 153 190 538 1317 37 347 541 1176 110 1594 584 1122 738 1382 536 269 721 129 178 919 229 1950 347 714 82 1457 1974 1135 976 398 221 76 346 155 780 172 308 115 67 111 744 214 710 140 532 711 1042 816 195 472 390 411 101 462 369 227 833 395 133 521 468 16 689 942 653 373 165 222 375 455 660 584 582 290 25 1217 576 573 659 398 140 202 1480 687 206 382 628 1106 383 94 937 679 259 139 241 22 484 452 140 62 1261 547 756 490 1015 200 334 932 558 92 1086 579 228 19 339 888 1041 1011 1906 414 1631 586 410 29 324 175 224 810 863 598 804 1019 501 405 339 370 350 366 763 458 984 40 1167 954 356 76 475 264 681 188 988 135 659 878 1116 721 112 895 271 406 762 697Hz 31868 186 0 191 7 193 16 197 25 200 33 205 42 211 52 219 63 229 74 242 87 255 100 273 117 294 136 321 157 355 186 400 218 456 263 538 325 658 412 853 551 1216 811 2139 1468 9151 6430 3988 2865 1635 1198 1029 766 751 568 592 453 489 379 417 326 364 289 325 257 290 233 264 213 243 196 226 182 211 169 197 159 186 151 177 142 169 134 159 127 154 122 148 116 142 112 137 106 133 102 128 97 124 94 121 91 119 87 114 84 112 82 109 79 106 76 104 74 103 72 102 69 99 67 98 65 95 63 95 61 92 59 91 57 89 56 89 54 89 51 87 51 87 50 85 49 85 47 83 46 84 45 82 44 83 42 82 41 81 40 80 39 79 37 78 36 78 36 78 34 76 34 78 33 77 32 75 31 75 30 75 28 74 28 73 27 73 27 74 25 73 24 71 23 73 23 73 22 73 21 72 22 72 22 72 21 71 20 71 19 70 18 71 17 70 18 70 16 70 16 70 15 69 15 70 14 69 12 68 12 70 11 69 11 69 10 69 10 69 8 68 7 68 8 68 7 68 5 68 6 67 6 68 6 68 4 67 3 66 4 69 4 770Hz 32026 300 0 304 6 306 11 310 17 312 23 317 29 323 35 330 42 339 49 348 58 360 64 374 75 391 85 413 98 439 111 468 129 507 148 557 176 624 208 713 252 841 314 1041 411 1396 579 2183 951 5436 2483 9391 4492 2416 1209 1352 708 923 504 690 394 546 324 447 276 374 242 320 215 277 194 243 177 214 162 191 150 172 140 153 132 139 123 126 116 116 110 106 103 98 99 88 94 82 91 75 87 70 82 64 78 59 76 54 73 50 70 47 69 43 66 40 64 37 62 34 60 31 58 30 56 27 54 25 53 23 52 21 50 19 49 19 47 17 46 14 44 13 43 11 42 11 42 10 40 9 37 7 37 6 36 5 35 5 34 4 33 2 32 2 32 1 30 1 30 1 29 2 27 1 27 2 26 3 26 3 25 4 25 3 24 5 23 6 21 5 20 7 20 7 20 7 19 7 18 8 16 8 16 8 15 8 14 9 15 9 14 9 12 11 11 10 12 11 11 11 10 12 10 11 9 11 10 11 9 12 8 12 7 10 7 11 6 12 5 12 5 11 4 11 3 13 4 12 2 13 1 14 1 14 0 13 0 14 1 12 3 852Hz 31773 100 0 105 5 107 11 107 18 111 23 112 29 114 36 118 44 123 51 127 58 133 67 139 75 148 86 156 96 167 108 179 123 194 139 212 159 236 181 265 210 301 246 350 292 419 357 518 450 681 600 987 884 1779 1610 8578 7840 3101 2855 1326 1229 849 790 626 585 500 468 415 390 357 334 316 294 281 262 256 238 234 217 218 200 203 185 190 172 180 163 169 152 163 142 155 136 148 128 143 123 137 117 133 113 129 107 126 102 121 99 118 96 116 92 113 88 111 85 108 82 105 79 104 76 102 74 99 71 98 69 97 66 95 64 95 62 93 59 93 58 91 56 91 56 89 54 89 52 87 50 85 49 85 48 84 47 84 45 82 43 83 42 83 42 81 41 81 39 81 38 79 37 79 35 79 33 78 35 78 32 77 32 77 30 76 29 76 29 76 28 76 26 75 25 76 25 75 24 75 24 75 23 74 23 74 21 74 21 73 20 74 20 75 19 72 18 73 17 72 17 71 15 73 15 73 15 72 13 70 13 71 12 71 11 72 11 71 10 71 8 72 9 71 7 71 6 71 6 70 7 71 5 70 5 71 5 70 3 71 5 941Hz 32124 10 0 6 2 6 5 4 11 5 12 4 17 4 19 3 23 2 27 1 30 0 34 1 39 3 43 4 50 6 54 7 61 10 68 12 76 15 86 18 95 22 109 28 124 34 143 43 167 54 199 70 243 92 308 132 413 207 618 418 1195 4397 12033 587 1538 291 732 199 484 155 363 129 291 111 245 101 209 90 185 84 166 79 150 74 136 70 126 68 117 64 109 61 101 61 96 58 90 56 85 56 81 55 77 53 74 52 70 50 67 51 65 49 63 48 60 48 58 48 56 47 53 46 51 46 50 46 48 46 47 44 43 45 43 45 42 45 40 43 39 45 38 44 36 43 35 43 35 43 33 42 32 42 32 42 30 42 29 42 29 41 28 40 27 42 26 41 25 40 24 42 24 41 22 41 23 40 22 40 21 40 21 40 20 40 19 41 18 39 18 40 17 40 17 40 16 40 17 40 17 40 16 39 15 39 15 40 13 40 14 39 12 39 11 39 12 39 11 38 12 39 10 39 9 38 10 40 8 38 8 39 8 39 7 39 7 39 6 39 5 39 6 37 4 38 6 38 5 38 4 38 4 37 3 37 3 39 4 1209Hz 31861 175 0 179 4 182 6 183 9 183 12 184 14 185 17 186 21 189 24 192 27 193 30 195 34 198 36 201 41 205 45 209 49 213 53 218 58 224 63 229 69 238 74 244 82 253 89 264 95 276 106 289 117 305 128 324 142 346 158 373 177 406 201 449 229 507 267 581 319 689 393 855 504 1144 699 1777 1125 4256 2787 9153 6202 2121 1488 1174 851 798 598 598 462 472 377 386 319 325 277 278 245 242 219 212 198 188 182 166 167 150 156 136 144 123 136 111 127 103 121 93 114 85 107 77 103 71 98 66 94 62 89 56 85 51 82 48 79 45 75 42 73 39 70 36 67 33 65 30 62 27 61 26 58 22 57 21 54 20 52 18 50 15 49 14 48 12 46 11 44 10 43 9 41 8 39 7 38 6 37 5 37 5 35 3 33 2 33 1 33 0 31 0 30 0 28 2 26 2 27 2 24 3 22 4 21 6 20 4 20 5 20 5 18 6 17 7 17 7 16 7 14 8 14 8 13 9 13 9 11 9 12 9 11 8 10 9 10 10 8 8 8 10 7 9 6 11 4 11 3 9 3 11 2 11 0 11 0 11 0 11 4 1336Hz 31875 130 0 133 4 135 5 135 8 136 10 136 12 138 15 139 18 140 20 141 24 142 26 143 30 145 33 145 36 148 39 150 43 153 46 155 50 157 54 160 59 164 63 166 68 172 74 176 80 182 86 187 92 194 100 201 109 209 119 219 130 230 143 243 156 261 174 279 193 304 218 332 251 372 290 424 344 500 422 615 538 814 740 1241 1175 2805 2758 8241 8425 1587 1685 852 941 570 655 421 504 329 409 266 346 222 300 188 264 160 236 139 214 121 194 107 179 94 167 84 154 74 145 65 135 59 129 53 121 46 114 43 109 36 102 34 98 31 95 27 90 25 85 22 82 18 79 15 75 13 73 11 70 9 68 7 64 5 62 3 60 2 58 2 56 2 54 1 53 3 50 3 47 5 47 6 46 8 43 7 42 9 41 9 39 10 38 11 37 11 36 12 34 13 33 14 32 14 30 14 27 15 26 15 25 16 23 16 24 16 22 17 21 17 21 18 19 17 19 19 17 18 17 19 15 20 15 19 13 20 13 20 11 20 11 20 10 20 9 21 8 20 8 21 5 21 6 22 5 22 3 23 3 22 2 21 1 22 0 22 3 1477Hz 31865 26 0 29 1 31 4 32 6 31 8 32 10 33 12 33 14 34 16 35 20 36 22 37 25 38 26 40 29 42 33 43 35 46 37 44 39 49 43 51 46 53 49 56 53 59 57 62 62 65 65 69 70 75 75 78 81 84 85 90 93 96 100 104 108 115 118 125 127 135 139 149 153 166 168 185 185 208 209 239 237 276 272 325 317 394 381 493 475 656 624 964 906 1756 1632 8556 7860 3122 2833 1347 1208 870 768 647 565 519 448 436 370 379 315 335 275 303 245 275 219 254 199 236 182 221 169 209 156 198 146 188 134 179 126 172 120 167 114 160 107 154 101 150 97 145 92 142 87 139 84 134 80 131 77 129 74 126 71 123 68 122 64 119 62 118 60 116 58 113 55 113 53 109 51 109 49 108 48 106 46 106 44 105 43 103 41 103 39 102 38 99 37 99 35 99 33 98 32 96 34 96 32 96 29 96 29 94 27 94 27 95 26 93 25 93 23 93 22 92 21 92 20 90 19 92 18 90 17 90 16 91 15 90 13 90 14 89 11 90 11 88 10 88 9 88 9 88 8 86 7 88 7 88 6 87 5 88 4 87 5 1633Hz 32138 11 0 15 2 16 4 17 4 17 7 18 9 18 11 18 12 20 13 19 16 21 17 23 20 22 22 23 24 24 25 25 28 26 30 27 33 29 35 30 38 32 40 34 42 36 46 39 47 40 52 42 54 45 60 48 62 51 66 55 71 59 74 64 80 69 85 73 92 80 97 86 104 93 113 103 121 112 133 122 144 136 157 150 173 169 190 193 214 220 241 257 277 303 324 369 387 465 480 621 631 916 916 1680 1652 8412 8136 2953 2808 1285 1201 834 765 622 563 501 445 422 367 366 314 325 274 294 241 268 216 248 197 230 180 217 165 205 153 194 142 186 133 177 124 171 118 165 110 159 105 153 99 148 94 144 88 141 85 137 80 134 78 131 73 129 71 125 68 124 64 122 62 120 59 117 56 115 54 114 52 113 51 111 48 111 46 108 45 106 43 106 41 105 39 104 38 104 35 103 36 102 36 103 34 100 32 99 30 98 29 99 28 97 26 97 26 96 24 96 24 96 22 95 21 94 19 95 18 94 17 93 16 94 17 92 14 93 13 94 12 92 10 92 10 91 9 92 9 91 8 91 7 92 6 91 5 91 3 92 4 DTMF0 31820 9 0 5 5 4 6 5 6 4 1 4 2 2 2 1 0 1 6 3 10 5 13 6 10 6 4 2 1 1 0 0 5 2 18 7 29 12 34 13 29 9 15 3 1 2 2 2 28 22 88 57 180 100 283 138 370 143 376 55 145 1953 5287 492 1413 281 865 174 564 103 325 59 145 30 37 7 11 75 73 200 219 405 451 754 815 1676 1743 3727 3744 325 316 58 59 161 162 158 164 111 124 55 74 14 29 7 5 7 3 2 2 13 13 18 20 16 21 10 17 2 11 3 4 3 2 1 1 4 4 6 8 5 8 3 9 1 5 2 3 3 1 1 1 1 1 3 3 3 4 1 5 1 3 1 1 2 0 1 0 1 0 1 1 1 3 0 4 1 2 2 1 3 0 1 1 1 0 1 1 1 2 1 2 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 2 1 2 1 2 2 2 1 1 1 1 0 1 0 1 1 2 1 3 1 2 1 1 1 1 0 0 1 1 0 3 0 2 0 1 2 2 1 1 1 1 1 1 0 1 2 1 0 4 1 2 0 3 0 2 0 2 1 1 0 4 ``````

Raw output is not easy to follow. First column is waveform name, second column is execution time and all else are the frequency bins. After some gnuplot trickery everything looks clear (click to enlarge):

Wow! Exactly what I would expect:

• Single tones have a clear prominent peak (or two).
• The peaks move up as the frequency increases.
• White noise looks like it has all possible frequencies (yay!).
• DTMF tone has two peaks.

Performance

The average of all cycle counters is 31919. Remember that this function has to be eventually executed in real time on incoming ADC samples, all the time. Is that too much or is that acceptable? Let's do the math:

• Sampling frequency: 8 kHz (ie. 8000 samples per second)
• FFT size: 256
• CPU cycles to do the FFT: 31919
• CPU frequency: 48 MHz (EFM32GG332)
• Time to acquire 256 samples: 0.032 seconds (formula: 256/8000)
• Time to process 256 samples: 0.000664979 seconds (formula: 31919/48'000'000)
• Fraction of processing time to acquisition time: 2% (formula: 0.000664979/0.032)

What does the 2% figure mean in practice? While data is coming from the ADC to a memory buffer there is nothing yet to process (so the CPU can do something else or sleep). When a complete 256 sample chunk is ready it takes just 2% of the acquisition time to calculate the FFT (and another acquisition cycle is immediately started again). This is a very low CPU load for a non-trivial computation. All processing can be done in real-time and there is plenty of CPU power left. In my opinion, FFT on a fast Cortex-M3 is a justifiable solution, even for trivial tasks like tone detection. Tone detection could also be done by simpler algorithms (for example: banks of filters) that may be more power-efficient, however FFT gives the ultimate flexibility when it comes to analyzing the signal and drastically shortens development time (just two function calls!).