M0AGX / LB9MG

Amateur radio and embedded systems

Interrupt remapping on Cortex-M0 (and other small chips)

Interrupt remapping is the ability to change the interrupt handlers at runtime. Why would you need to change handlers on the fly? During "normal" operation this is indeed unlikely. A simple branch like if (something){ handle1(); } else { handle2(); } in the handler is enough. Remapping is useful when there are totally different applications running. For example a manufacturing test application and a regular application, or a bootloader and the application.

Microcontrollers usually execute code from flash. The vectors are also kept in flash. Modification of flash in the field is slow, tricky, and has to be carefully designed to avoid bricking the device. I would avoid a brute force approach that modifies the vector table in flash. So what is a better approach?

If the vocabulary (vector, interrupt, handler etc.) sounds confusing I recommend one of my older articles.

Cortex-M0+ and larger cores have the VTOR register that allows to place the vector table anywhere in memory (with certain alignment requirements). This makes remapping as simple as reserving a piece of RAM and putting the vectors there.

Cortex-M0 does not have this register so the vectors have to be at a fixed address which is usually in flash. The M0 is a pretty old core in 2024 but there are some devices with this core that can be still useful, like the W7500P. If the hardware does not support remapping then the only way around it is a software implementation. I will show a technique I call trampolines that uses a block of RAM and stub handlers to execute different handlers on the fly. All in pure C, no assembly required (pun intended). I have used this approach when the bootloader and the application had to have their own interrupt handlers.

The code shown in this article is needed only in the bootloader. The application does not even need to know that its interrupt handlers are called indirectly from a stub handler.

This approach is not limited to the Cortex-M0. It can work basically on any microcontroller with a small piece of RAM.

Reserving a piece of RAM

The first step is to look up all possible interrupt handlers and prepare a struct that looks similar to the regular vector table. In a CMSIS codebase the vectors are somewhere in the startup code. The struct holds the addresses of handlers for almost all interrupt sources. The only exception (in Cortex-M0) is the hard fault handler that needs some special treatment. To leave usable debugging information the hard fault handler needs some assembly so it is best to implement it fully in the bootloader's codebase and leave some debugging breadcrumbs for the application.

I use W7500P as an example. I created a struct for all the handlers starting from SysTick up to the last source.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
typedef void (*interrupt_handler_t)(void);

typedef struct {
    interrupt_handler_t SysTick_Handler;
    interrupt_handler_t SSP0_Handler;
    interrupt_handler_t SSP1_Handler;
    interrupt_handler_t UART0_Handler;
    interrupt_handler_t UART1_Handler;
    interrupt_handler_t UART2_Handler;
    interrupt_handler_t I2C0_Handler;
    interrupt_handler_t I2C1_Handler;
    interrupt_handler_t PORT0_Handler;
    interrupt_handler_t PORT1_Handler;
    interrupt_handler_t PORT2_Handler;
    interrupt_handler_t PORT3_Handler;
    interrupt_handler_t DMA_Handler;
    interrupt_handler_t DUALTIMER0_Handler;
    interrupt_handler_t DUALTIMER1_Handler;
    interrupt_handler_t PWM0_Handler;
    interrupt_handler_t PWM1_Handler;
    interrupt_handler_t PWM2_Handler;
    interrupt_handler_t PWM3_Handler;
    interrupt_handler_t PWM4_Handler;
    interrupt_handler_t PWM5_Handler;
    interrupt_handler_t PWM6_Handler;
    interrupt_handler_t PWM7_Handler;
    interrupt_handler_t RTC_Handler;
    interrupt_handler_t ADC_Handler;
    interrupt_handler_t WZTOE_Handler;
    interrupt_handler_t EXTI_Handler;
} trampolines_t;

This is just an ordinary struct with a function pointer for every interrupt source. It will be placed at the beginning of RAM by pretending in the linker script that RAM start address is slightly higher. This is to prevent struct contents from being erased by application startup code. Of course you can make a "proper" custom section in the linker script as well.

The struct can be accessed for example using a macro like this:

#define GLOBAL_trampolines_block_ptr ((trampolines_t*)(RAM_BASE))

RAM_BASE can be provided by the build system or hardcoded somewhere else.

Stub handlers

Every interrupt has a tiny handler that will call the function from the trampolines struct. Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
void SysTick_Handler(void) {
    _trampolines_block_ptr->SysTick_Handler();
}

void SSP0_Handler(void) {
    _trampolines_block_ptr->SSP0_Handler();
}

void SSP1_Handler(void) {
    _trampolines_block_ptr->SSP1_Handler();
}

// more and more handlers here...

void EXTI_Handler(void) {
    _trampolines_block_ptr->EXTI_Handler();
}

This is one of the very few places where I do not apply DRY. Of course there could be a single handler that would look up the active interrupt number, calculate where the appropriate handler is within the struct, and call it. Interrupt latency should be as short as possible so this would lead to more code executed at every interrupt.

Memory map

This map shows all the elements of the bootloader and the application.

All diagrams were made with Ditaa.

memory map

Starting the bootloader

The bootloader has to place its real handlers in the trampolines struct. The real handlers should also use a name different from the handlers in the startup code. For example SysTick_Handler_bootloader.

The handler can be installed like this:

_trampolines_block_ptr->SysTick_Handler = SysTick_Handler_bootloader;

Unfortunately every handler that is used by the bootloader has to be explicitly assigned in the struct. There is no easy way around it. It will be slightly easier for the application.

memory map

Handling interrupts in the bootloader

memory map

Starting the application

Before the application is started the trampolines struct has to be filled with the vectors from the application. You can see the typical vector table layout in my older article. The application only has to be linked at a higher address in flash. No other changes are needed! If the application binary follows the standard Cortex-M convention then populating the trampolines struct is as simple as a memory copy.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Copy application vectors to trampolines struct
uint32_t *src = (uint32_t *)(APPLICATION_BASE + APPLICATION_VECTORS_START_OFFSET);
uint32_t *dst = (uint32_t *)GLOBAL_trampolines_block_ptr;

for (uint32_t i = 0; i < sizeof(trampolines_t) / sizeof(uint32_t); i++) {
    *dst = *src;
    src++;
    dst++;
}
__DSB();

// Second word is the application reset handler
uint32_t app_entry_point_ptr = *((uint32_t *)(APPLICATION_BASE + sizeof(uint32_t)));

interrupt_handler_t app_entry_point = (interrupt_handler_t)app_entry_point_ptr;
app_entry_point();

I used a loop to avoid linking memcpy in my bootloader. It was the only place where it would be used and code size was larger than an explicit loop. memcpy would have worked identically.

APPLICATION_BASE is some compile-time constant where the application code can begin. For example if the bootloader occupies 9.2 kilobytes the application can start at the 10th kilobyte. A convenient address is the address of the first erasable chunk of flash after the bootloader.

APPLICATION_VECTORS_START_OFFSET is a constant that specifies the offset of the first relocated interrupt handler. If the layout is standard then this will be the SVC_Handler at offset 11 * 4 (every vector is 4 bytes long). If your application does not use an RTOS then the first vector to copy can be the SysTick_Handler.

After the copy is done the bootloader jumps to the function at the address specified by application reset vector (which is, again, by convention the second word).

memory map

Handling interrupts in the application

memory map

Overhead

Let's have a look at the disassembly. Every stub handler is pretty short, identical, and differs only by the address constant. There are 4 instructions before the actual handler is called (the blx) and 1 instruction after the actual handler (the pop). So the interrupt latency is worse by 4 instructions (5 in the case of tail-chaining due to the pop). Not too bad. 🙂

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
0000050c <SysTick_Handler>:
     50c:       b510            push    {r4, lr}
     50e:       4b02            ldr     r3, [pc, #8]    @ (518 <SysTick_Handler+0xc>)
     510:       681b            ldr     r3, [r3, #0]
     512:       4798            blx     r3
     514:       bd10            pop     {r4, pc}
     516:       46c0            nop                     @ (mov r8, r8)
     518:       2000001c        .word   0x2000001c

0000051c <SSP0_Handler>:
     51c:       b510            push    {r4, lr}
     51e:       4b02            ldr     r3, [pc, #8]    @ (528 <SSP0_Handler+0xc>)
     520:       685b            ldr     r3, [r3, #4]
     522:       4798            blx     r3
     524:       bd10            pop     {r4, pc}
     526:       46c0            nop                     @ (mov r8, r8)
     528:       2000001c        .word   0x2000001c

0000052c <SSP1_Handler>:
     52c:       b510            push    {r4, lr}
     52e:       4b02            ldr     r3, [pc, #8]    @ (538 <SSP1_Handler+0xc>)
     530:       689b            ldr     r3, [r3, #8]
     532:       4798            blx     r3
     534:       bd10            pop     {r4, pc}
     536:       46c0            nop                     @ (mov r8, r8)
     538:       2000001c        .word   0x2000001c

There is also a nop that is generated for function alignment. It will not be executed as the pop will change the program counter so it does not contribute to overall latency. It just wastes 2 bytes.