Interrupt remapping on Cortex-M0 (and other small chips)
Interrupt remapping is the ability to change the interrupt handlers at runtime. Why would you need to change handlers
on the fly? During "normal" operation this is indeed unlikely. A simple branch like
if (something){ handle1(); } else { handle2(); }
in the handler is enough. Remapping is useful
when there are totally different applications running. For example a manufacturing test application and a regular application,
or a bootloader and the application.
Microcontrollers usually execute code from flash. The vectors are also kept in flash. Modification of flash in the field is slow, tricky, and has to be carefully designed to avoid bricking the device. I would avoid a brute force approach that modifies the vector table in flash. So what is a better approach?
If the vocabulary (vector, interrupt, handler etc.) sounds confusing I recommend one of my older articles.
Cortex-M0+ and larger cores have the VTOR
register that allows to place the vector
table anywhere in memory (with certain alignment requirements). This makes remapping
as simple as reserving a piece of RAM and putting the vectors there.
Cortex-M0 does not have this register so the vectors have to be at a fixed address which is usually in flash. The M0 is a pretty old core in 2024 but there are some devices with this core that can be still useful, like the W7500P. If the hardware does not support remapping then the only way around it is a software implementation. I will show a technique I call trampolines that uses a block of RAM and stub handlers to execute different handlers on the fly. All in pure C, no assembly required (pun intended). I have used this approach when the bootloader and the application had to have their own interrupt handlers.
The code shown in this article is needed only in the bootloader. The application does not even need to know that its interrupt handlers are called indirectly from a stub handler.
This approach is not limited to the Cortex-M0. It can work basically on any microcontroller with a small piece of RAM.
Reserving a piece of RAM
The first step is to look up all possible interrupt handlers and prepare a struct that looks similar to the regular vector table. In a CMSIS codebase the vectors are somewhere in the startup code. The struct holds the addresses of handlers for almost all interrupt sources. The only exception (in Cortex-M0) is the hard fault handler that needs some special treatment. To leave usable debugging information the hard fault handler needs some assembly so it is best to implement it fully in the bootloader's codebase and leave some debugging breadcrumbs for the application.
I use W7500P as an example. I created a struct for all the handlers starting from SysTick up to the last source.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
This is just an ordinary struct with a function pointer for every interrupt source. It will be placed at the beginning of RAM by pretending in the linker script that RAM start address is slightly higher. This is to prevent struct contents from being erased by application startup code. Of course you can make a "proper" custom section in the linker script as well.
The struct can be accessed for example using a macro like this:
#define GLOBAL_trampolines_block_ptr ((trampolines_t*)(RAM_BASE))
RAM_BASE
can be provided by the build system or hardcoded somewhere else.
Stub handlers
Every interrupt has a tiny handler that will call the function from the trampolines struct. Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
This is one of the very few places where I do not apply DRY. Of course there could be a single handler that would look up the active interrupt number, calculate where the appropriate handler is within the struct, and call it. Interrupt latency should be as short as possible so this would lead to more code executed at every interrupt.
Memory map
This map shows all the elements of the bootloader and the application.
All diagrams were made with Ditaa.
Starting the bootloader
The bootloader has to place its real handlers in the trampolines struct. The real handlers should also use
a name different from the handlers in the startup code. For example SysTick_Handler_bootloader
.
The handler can be installed like this:
_trampolines_block_ptr->SysTick_Handler = SysTick_Handler_bootloader;
Unfortunately every handler that is used by the bootloader has to be explicitly assigned in the struct. There is no easy way around it. It will be slightly easier for the application.
Handling interrupts in the bootloader
Starting the application
Before the application is started the trampolines struct has to be filled with the vectors from the application. You can see the typical vector table layout in my older article. The application only has to be linked at a higher address in flash. No other changes are needed! If the application binary follows the standard Cortex-M convention then populating the trampolines struct is as simple as a memory copy.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
I used a loop to avoid linking memcpy
in my bootloader. It was the only place where it would be used
and code size was larger than an explicit loop. memcpy
would have worked identically.
APPLICATION_BASE
is some compile-time constant where the application code can begin. For example if the bootloader
occupies 9.2 kilobytes the application can start at the 10th kilobyte. A convenient address is the address
of the first erasable chunk of flash after the bootloader.
APPLICATION_VECTORS_START_OFFSET
is a constant that specifies the offset of the first relocated interrupt handler.
If the layout is standard then this will be the SVC_Handler
at offset 11 * 4
(every vector is 4 bytes long).
If your application does not use an RTOS then the first vector to copy can be the SysTick_Handler
.
After the copy is done the bootloader jumps to the function at the address specified by application reset vector (which is, again, by convention the second word).
Handling interrupts in the application
Overhead
Let's have a look at the disassembly. Every stub handler is pretty short,
identical, and differs only by the address constant. There are 4 instructions before
the actual handler is called (the blx
) and 1 instruction after the actual handler (the pop
).
So the interrupt latency is worse by 4 instructions (5 in the case of tail-chaining due to the pop
).
Not too bad. 🙂
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
There is also a nop
that is generated for function alignment.
It will not be executed as the pop
will change the program counter so it does not contribute to overall latency.
It just wastes 2 bytes.