Formatting a number in decimal stops execution

"4 / 2" is likely evaluated at compile time and "2" used directly in the code. So no division code generated.

yeah.. i don't know what i was thinking, of course there was no instruction. I tried it again with a volatile read to avoid optimization and double checked that there was a udiv instruction, which there was, and the code still ran successfully.

Can you step through your code and see where it halts?

I am just beyond confused. So i was playing around with some target parameters and tested it again and wouldn't you know, it actually worked. Until i added a panic for testing way further down in my code, at which point the formatter halted again. So then i removed the panic and said parameters again to see if it would still work. And it does. So now i have no idea what's going on. To me it makes no sense at all that a panic at the end of my kernel main should effect a format function that i call on setup, unless the problem is related to alignment and the panic is changing the memory layout to the point where the formatter no longer works.

I wish i could :smiley: But i'm actually testing on real hardware and when the processor halts i'm pretty much out of options

You could try this to investigate; it should force the values to stay machine-word aligned, whatever that means on your platform.

log_line!("{}", 9_usize);
log_line!("{}", 10_usize);

What MCU and compiler flags are you using?
And what does log_line!() actually look like?

I think it may be better to just send you the link to the repository.

build script: alb1/makefile at master · vE5li/alb1 · GitHub
log_line macro: alb1/logger.rs at master · vE5li/alb1 · GitHub

The fact that it does not seem to be directly related to the log_line! input indicates to me that there might be a more general issue. Alignment could be an issue, but I don't know enough about aarch64 or xargo to be of much help...

You could also try to isolate if it's the formatting or the printing by not printing it to screen and only formatting it.

I would agree. Sadly i don't really know what to do now.

Nice idea but that didn't have any effect either :slight_smile:

Interesting.

My experience of such "Heisenbugs", where things mysteriously fail in random ways, when unrelated code induces that, when trying to debug it makes it go away, have often been that there is a memory corruption problem going on somewhere.

Perhaps running out of stack.
Something misusing the heap.
Some threads trampling each other.

Would be interesting to see what code that log_line! actually becomes.

I think currently it's almost impossible for me to run out of stack since i set the stack size to 0x60000 (out of paranoia). The heap is really super simple currently, you can see the heap allocator here alb1/mod.rs at master · vE5li/alb1 · GitHub. I will definitely look into the heap some more, since i actually hadn't considered that yet. And lastly, currently 3 of the 4 cores are halting, so it's definitely not caused by multi-threading.

Can you find out what the exception is that causes the halt? Maybe by implementing a panic handler?

Your framebuffer code does a lot of pointer math; if something asks to draw to an off-screen position, it will happily write into arbitrary memory locations. While you're trying to debug the system, I'd put a check for this kind of overrun into draw_pixel()— It'll slow things down a lot, but might prevent some spurious halts.

EDIT: In particular, your logger currently has no protections against a message being too wide for the screen. In most cases, this will just write into the next line but if you're at the bottom of the screen, it'll write into adjacent non-framebuffer memory instead.

I do have a panic handler implemented, but whatever is failing doesn't panic. Could be an unaligned memory access but i currently have no method of verifying that

That is true, but i don't have many messages currently. They hardly take up half of the screen, so i'm pretty sure that the framebuffer is not the culprit in this case. Also the image comes out exactly as i expect, so the pointer math should be correct too.

Edit: just tested it and not using the framebuffer yields the exact same result.

I just realized that there is another bug that might be realted, although i'm not sure. This code:

String::from("123456");

executes correctly but this:

String::from("1234567");

also locks up, like the formatter. Every String shorter than 7 characters works fine, everything longer than that halts. Looking at it now that would sort of support the heap theory, though i still don't know what exactly the problem would be.

Well, it only took me 3 years but i think i finally found the bug. The aarch64-unknown-none target has a feature called +strict-align. After enabling it, everything seems to work as it should. I really hope nobody else has to go through the pain of debugging this. I am beyond happy that it's finally working, thank you! ^^

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.