Excessive stack usage for formatting


#1

I’ve ben working on rust for embedded work, in particular on the TI CC3200, which has 256K of RAM for code, stack, and heap.

We’ve got the basics running (if you’re interested, see: https://github.com/fabricedesre/cc3200-rs).

Anyways, we quickly ran into some stack overflows when trying to use println! or logging and I decided to investigate a bit. It looks like calling println! to print a simple string takes a whopping 650 bytes of stack. Adding a {} to format a single integer seems to add another 176 bytes, and each additional {} adds about 40 bytes.

Formatting the first float adds an additional 1500 bytes of stack (oover and above the initial 650) and then roughly 50 bytes per additional {} added.

I found the big culprit for formatting floats, which seems to be this:

Is this something that could be made configurable through a feature? I don’t mind allocating 32 or 64 bytes for a temp buffer, but 1K is totally prohibitive for embedded applications.

I need to dig around some more and find out why println! takes up so much. I figured I’d post what I have so far and see if anyone has some feedback.

And we’re not using libstd, so we actually created our own println! macro which can be seen here:


#2

cc @alexcrichton @brson
related issue rust-embedded/rfcs#18

Like I said in the linked issue, I think we should experiment with a new “out of tree” set of fmt traits. See how that goes and try to upstream some improvements to core::fmt.

Ultimately, I think we should try to make the formatting traits “pluggable” as in I should be able to use my own, code and stack size conscious set of formatting traits instead of the ones in core::fmt without having to make every crate on crates.io add a derive(MyDebug) to every struct that already has derive(Debug). Now, I have no idea whether this last part is feasible or not (or how hard it would be).

Either that or we could add a set of formatting traits optimized for code/stack size to the core crate that replace the default ones via some Cargo feature.


#3

Note that core::fmt has basically been never optimized, so improvements are of course always welcome!


#4

This came up more recently, and I figured I’d chime in support.

Looks like dhylands made some more optimized variants:

Is there a feature (alloca equivalent) that making this more efficient is dependent on?