Amateur radio and embedded systems

Bug of the day: missing entry point

This is a tale of a strange bug I ran into when doing my first ever SAM E70 project. A new MCU. A new type of product. A custom PCB. What could possibly go wrong? 🙂

Spoiler: Not strange. Not a bug.

I started by using Atmel's START tool (pun intended). It is a web-based project generator. I selected my MCU, picked the clock configuration, added some extra components (like the USB stack) and got a zip file in the end. This tool hits the sweet spot as it does not require you to download many gigabytes of SDKs and IDEs, yet gives you a nice minimal project to start with (the headers + startup code + clock configuration + basic Makefile). START is still available but is deprecated (as of 2023).

The project built nicely with gcc. I connected a J-Link to my custom board. I opened the elf in Ozone, flashed the MCU, let it run and... immediately ended up in the hard fault handler. Not great, not terrible. The firmware crashed instantly but on the other hand I could connect with a debugger and flash the chip without any errors.

Ozone by default after flashing halts on main() so the startup code is executed but not any of the "application" code. This initially led me to suspect the startup or the clock setup code. In the hard fault handler the CPU behaved normally. I was able to single step the instructions so the hardware could not be that bad. Unfortunately, all the diagnostic variables in the hard fault handler had only junk addresses so I could not see an obvious cause for the crash.

I started checking everything I could think of:

  • Supply voltages - nominal
  • Crystal signal and frequency - nominal
  • Solder joints - good
  • gcc build flags - all good for Cortex-M7
  • Part number on the package vs. the one in START - exact match
  • Ozone MCU selection - exact match
  • Oscillator config in START - matched the PCB
  • Reset handler code - seemed to make sense
  • Vector table contents - good (initial stack pointer at top of RAM and good reset handler address)
  • Binary inspection in hex editor - nominal

After exhausting all options I added code that changes state of GPIOs at various stages to "debug with a scope" or "debug with an LED" in order to find the last working step. To my surprise the code worked after flashing and a power cycle. The code must have been correct all the time. It must be the debugger that does something wrong.

To confirm it I loaded a raw binary file into Ozone. It does not have any metadata so I had to input the load address manually. I also manually typed the initial stack pointer and program counter into the registers using the debugger. Then, I let the CPU run. And it worked!

This discovery made me certain that is must be either something with Ozone or with the elf file. On one hand the elf was perfectly loaded into flash so the linker script could not be that broken (it was provided by Atmel START anyway). On the other hand Ozone could not reach main(). The linker script looked sane at first glance. It took me quite a while to spot the problem. A missing ENTRY(Reset_Handler). It seems that Atmel START must have been using a wrong linker file template.

After adding the missing entry point to the linker script Ozone was able to start the CPU properly and reach main. Yay! 🙂

Why did the broken elf (partially) worked?

An elf does not need an entry point to start with. An example can be a shared library (though they have standardized "constructors" which are a kind of entry points for the loader and operating system).

The layout of the sections was specified correctly in the linker script so all the code ended up in flash at the correct address. The code used RAM at the correct address. The vector table had references to Reset_Handler so it was not optimized out during linking. Basically the .hex or .bin file was fully usable when it ended up in the flash of the MCU.

That day I learned that the only problem was Ozone not being able to recognize where the firmware should start from.