Formatting a number in decimal stops execution

Hello :slight_smile:

I have created a similar issue more than 3 years ago but am now back at the same problem. I am working on bare metal with #![no_std] and formatting a number in decimal stops execution (without panicking) if the number has more than one digit.

log_line!("{}", 9); // runs just fine
log_line!("{}", 10); // will halt

But weirdly:

log_line!("{:x}", 1000); // also works fine
log_line!("{:b}", 1000); // and so does this

I would assume that it has something to do with the amount of space that is being preallocated for the formatting but i find it extremely difficult to verify this, let alone fix it.

If anyone has an idea on what could cause this i would greatly appreciate any help :slight_smile:

1 Like

The only difference I can think of is that your halting line needs a division and the others don't.
But I don't know how that could cause a halt.

That is a very good point but this code

log_line!("{:x}", 4 / 2);

works like a charm. The only difference i can think of there is that this will actually use a udiv instruction and the formatter seems to use a function from the compiler builtins crate for division.

"4 / 2" is likely evaluated at compile time and "2" used directly in the code. So no division code generated.

yeah.. i don't know what i was thinking, of course there was no instruction. I tried it again with a volatile read to avoid optimization and double checked that there was a udiv instruction, which there was, and the code still ran successfully.

Can you step through your code and see where it halts?

I am just beyond confused. So i was playing around with some target parameters and tested it again and wouldn't you know, it actually worked. Until i added a panic for testing way further down in my code, at which point the formatter halted again. So then i removed the panic and said parameters again to see if it would still work. And it does. So now i have no idea what's going on. To me it makes no sense at all that a panic at the end of my kernel main should effect a format function that i call on setup, unless the problem is related to alignment and the panic is changing the memory layout to the point where the formatter no longer works.

I wish i could :smiley: But i'm actually testing on real hardware and when the processor halts i'm pretty much out of options

You could try this to investigate; it should force the values to stay machine-word aligned, whatever that means on your platform.

log_line!("{}", 9_usize);
log_line!("{}", 10_usize);

What MCU and compiler flags are you using?
And what does log_line!() actually look like?

I think it may be better to just send you the link to the repository.

build script: https://github.com/vE5li/alb1/blob/master/bootloader/makefile
log_line macro: https://github.com/vE5li/alb1/blob/master/bootloader/src/peripherals/logger.rs

The fact that it does not seem to be directly related to the log_line! input indicates to me that there might be a more general issue. Alignment could be an issue, but I don't know enough about aarch64 or xargo to be of much help...

You could also try to isolate if it's the formatting or the printing by not printing it to screen and only formatting it.

I would agree. Sadly i don't really know what to do now.

Nice idea but that didn't have any effect either :slight_smile:

Interesting.

My experience of such "Heisenbugs", where things mysteriously fail in random ways, when unrelated code induces that, when trying to debug it makes it go away, have often been that there is a memory corruption problem going on somewhere.

Perhaps running out of stack.
Something misusing the heap.
Some threads trampling each other.

Would be interesting to see what code that log_line! actually becomes.

I think currently it's almost impossible for me to run out of stack since i set the stack size to 0x60000 (out of paranoia). The heap is really super simple currently, you can see the heap allocator here https://github.com/vE5li/alb1/blob/master/bootloader/src/memory/heap/mod.rs. I will definitely look into the heap some more, since i actually hadn't considered that yet. And lastly, currently 3 of the 4 cores are halting, so it's definitely not caused by multi-threading.

Can you find out what the exception is that causes the halt? Maybe by implementing a panic handler?

Your framebuffer code does a lot of pointer math; if something asks to draw to an off-screen position, it will happily write into arbitrary memory locations. While you're trying to debug the system, I'd put a check for this kind of overrun into draw_pixel()— It'll slow things down a lot, but might prevent some spurious halts.

EDIT: In particular, your logger currently has no protections against a message being too wide for the screen. In most cases, this will just write into the next line but if you're at the bottom of the screen, it'll write into adjacent non-framebuffer memory instead.

I do have a panic handler implemented, but whatever is failing doesn't panic. Could be an unaligned memory access but i currently have no method of verifying that

That is true, but i don't have many messages currently. They hardly take up half of the screen, so i'm pretty sure that the framebuffer is not the culprit in this case. Also the image comes out exactly as i expect, so the pointer math should be correct too.

Edit: just tested it and not using the framebuffer yields the exact same result.