M0AGX / LB9MG

Amateur radio and embedded systems

Kinetis - relocating variables to upper SRAM

NXP Kinetis microcontrollers have an inconvenient architectural feature - split RAM. The memory is split into two areas of equal size. You can run into this issue when the size of all RAM variables (data+bss) approaches half size of available SRAM. It manifests itself with a linker error looking similar to this: ld: region 'm_data' overflowed by 132 bytes I will use an MK22FN512 as an example, but this post applies equally to all Kinetis K-series MCUs.

Brief reminder about MCU memories and sections:

  • flash - holds the code (text section) and initial values for variables different than zero (data section)
  • RAM - holds all runtime variables (bss - data initialized to zeros, data - data initialized with values different than zeros)

Placement of all sections in the output binary is configured by the linker script. In Kinetis Design Studio projects the linker script is located in "Project Settings"-"Linker_files" subdirectory.

In my dummy MK22FN512 project linker script starts with:

1
2
3
4
5
6
7
8
MEMORY
{
  m_interrupts          (RX)  : ORIGIN = 0x00000000, LENGTH = 0x00000400
  m_flash_config        (RX)  : ORIGIN = 0x00000400, LENGTH = 0x00000010
  m_text                (RX)  : ORIGIN = 0x00000410, LENGTH = 0x0007FBF0
  m_data                (RW)  : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
  m_data_2              (RW)  : ORIGIN = 0x20000000, LENGTH = 0x00010000
}

The first three sections reside in flash, m_data is the lower SRAM (also called SRAM_L), m_data2 is the upper SRAM (SRAM_U). Length of both is 0x20000, which corresponds to total RAM (128KB) available in MK22FN512.

The problem

I made a very simple dummy project in Kinetis Design Studio with two 32KB arrays:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#include "MK22F51212.h"
#include <stdint.h>
static int i = 0;

static uint8_t my_big_array_1[32*1024]; //each uses 32KB of RAM
static uint8_t my_big_array_2[32*1024];

int main(void)
{
    for (;;) {
        my_big_array_1[1] = i; //use the array
        my_big_array_2[1] = i; //to prevent optimizing it out
        i++;
    }
    return 0;
}

Two 32KB arrays should easily fit into 128KB of RAM. The code compiles cleanly but fails to link:

1
2
3
4
5
6
7
Building target: dummy_relocation_project.elf
Invoking: Cross ARM GNU C++ Linker
arm-none-eabi-g++ -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O0 -fmessage-length=0 -fsigned-char -ffunction-sections -fdata-sections  -g3 -T "MK22FN512xxx12_flash.ld" -Xlinker --gc-sections -L"/home/lb9mg/kds_workspace/dummy_relocation_project/Project_Settings/Linker_Files" -Wl,-Map,"dummy_relocation_project.map" -specs=nano.specs -specs=nosys.specs -o "dummy_relocation_project.elf"  ./Sources/main.o  ./Project_Settings/Startup_Code/startup_MK22F51212.o ./Project_Settings/Startup_Code/system_MK22F51212.o   
/opt/Freescale/KDS_v3/toolchain/bin/../lib/gcc/arm-none-eabi/4.8.4/../../../../arm-none-eabi/bin/ld: dummy_relocation_project.elf section `.bss' will not fit in region `m_data'
/opt/Freescale/KDS_v3/toolchain/bin/../lib/gcc/arm-none-eabi/4.8.4/../../../../arm-none-eabi/bin/ld: region `m_data' overflowed by 132 bytes
collect2: error: ld returned 1 exit status
make: *** [dummy_relocation_project.elf] Error 1

The problem is that all variables go only into m_data section, which is only 64KB large (and there are some extra bytes used by the code besides the arrays).


The solution

The linker script needs a new section (because you can't place variables directly into output sections):

1
2
3
4
.upper_RAM_section :
    {
        *(.upper_RAM_section )
    } > m_data_2

This has to be placed right after .bss and before .heap.

Variables must be manually placed in the new section using the following attribute:

1
static uint8_t my_big_array_2[32*1024] __attribute__((section(".upper_RAM_section,"aw",%nobits@")))

Now the project links cleanly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
Building target: dummy_relocation_project.elf
Invoking: Cross ARM GNU C++ Linker
arm-none-eabi-g++ -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O0 -fmessage-length=0 -fsigned-char -ffunction-sections -fdata-sections  -g3 -T "MK22FN512xxx12_flash.ld" -Xlinker --gc-sections -L"/home/lb9mg/kds_workspace/dummy_relocation_project/Project_Settings/Linker_Files" -Wl,-Map,"dummy_relocation_project.map" -specs=nano.specs -specs=nosys.specs -o "dummy_relocation_project.elf"  ./Sources/main.o  ./Project_Settings/Startup_Code/startup_MK22F51212.o ./Project_Settings/Startup_Code/system_MK22F51212.o   
Finished building target: dummy_relocation_project.elf

Invoking: Cross ARM GNU Create Flash Image
arm-none-eabi-objcopy -O ihex "dummy_relocation_project.elf"  "dummy_relocation_project.hex"
Finished building: dummy_relocation_project.hex

Invoking: Cross ARM GNU Print Size
arm-none-eabi-size --format=berkeley "dummy_relocation_project.elf"
   text    data     bss     dec     hex filename
   1572   32876   34848   69296   10eb0 dummy_relocation_project.elf
Finished building: dummy_relocation_project.siz

Adding attributes to many small variables makes little sense so most obvious candidates are big arrays, buffers etc. The aw and nobits attributes make the section treated as not occupying output binary space (ie. like .bss), so it will not be initialized like .data (otherwise after objcopy the .bin file can be 512MB large). Variables placed in this section will most likely not be zeroed out by the startup code (because technically it is not .bss), however it does not matter much if they are used for buffers.

Why?

Kinetis architecture is explained in AN4745. The Cortex-M4 core has three AHBLite buses. ICODE and DCODE busses can only access the lower addressing space (including SRAM_L), the system bus can only access higher addressing space (including SRAM_U). Kinetis bus architecture diagram

Both parts of RAM behave mostly the same from C programmer's point of view. The only difference is that code executed from SRAM_L runs at full speed, while from SRAM_U requires a wait state (so it runs at half-speed). Data access is identical and runs at full speed.