far too often i've seen beginner programers asking questions about how they can save a byte of stack space, so i guess this is a psa to say that's almost always a waste of time, and this is coming from someone who will spend an hour on a 1% performance improvement.
to allocate heap data, you need to perform a function call to the allocator, which then traverses it's internal datastructue and possibly executes mmap syscalls.
to allocate stack data, you add an immediate argument to the stack pointer. that's it. and because of how addition works, you (or in most cases the conpoler) don't even need to do this once per variable, you can coalesce all of these into a single addition instruction.
as for memory use, memory is acquired from the OS in pages, not individual bytes, so 99.6% of the time a single extra byte won't actually result in your program consuming any more memory.
i've worked on 16 bit systems with no memory protection, writing raw bios calls, and even there, a single byte of stack space Does Not Matter.
any time you think you might want to spend optimizing your stack usage would be spent doing something else instead, like reducing calls to clone or not repeating volatile calculations inside of inner loops. and really you should be profiling your code to see what's slowing it down, so you actually know what needs work.
The only time that actually optimizing stack space ever really makes sense is in a deeply recursive situation where you're trying to fit more frames on the stack before hitting the stack limit. And even there you're probably better off refactoring to use eg a manual stack on the heap.
and note that this only applies if you're writing a recursive function that cannot possibly be tail call optimized, otherwise your time would be better spent messing around in godbolt untill you get tail call optimization to trigger.
and yeah, loops plus some Vec will almost always be more memory efficient for deep recursion, since it doesn't have to deal with the overhead of the return pointer
also note that even this extreme example doesn't break my rule of "a single byte/native word on the stack never matters", since the deeply recursive nature means eliminating a single variable eliminates a lot more than 8 bytes. and even then it's dubious, since modern stacks are quite large. i don't think i've ever encountered a stack overflow that was the result of anything other than programmer error (even on tiny embedded systems!)
A common case you can hit is in some real language parsers if you throw a big enough "1 + 1 + (...) + 1" expression at them (often low hundreds of values), if they used a fully recursive descendant parser instead of using Pratt parsing for their infix.
i'm not sure deliberately crafting a malicious expression is exactly a "common case", and arguably not limiting the amount of recursion that can be produced by user input is unsound, as it is a possible denial of service vector on systems with memory protection, and an ACE vector on systems without it.
Back in the day many beginner programmers cut their teeth on the 6502 microprocessor back in the day. The 6502 had hardware support for only a 256 byte stack.
that's not really a full stack, it's used for call stacks and saving registers. i mean i guess if you use a lot of integer variables you're gonna have to save a lot of registers... but for large structs and arrays, the compiler will almost certainly use a supplemental stack.
What compiler? When I said "cut their teeth" I meant those nascent programmers were kids moving on from BASIC to assembler on their Commodore C64's and the like. Which they had to do to get their game ideas running at acceptable speeds.
And people complain about Rust being hard for beginners. I don't know. Kids today...
More seriously I have worked on many embedded systems built in C on small microcontrollers where memory and hence stack space was very limited. Rust can be used in such places as well.
I know a guy who implemented AES in 256 bytes for some kind of 4bit microcontroller… but that was done via breaking all sensible rules of safe programming.
Similar thing happens when you squeeze Rust into few kilobytes of code: the only way to fit large program in such micro is to reuse parts of code with unsafe and at this point you have program which is hard to tweak because Rust tries safety by insisting different things are not aliasing and manual code which works around Rust ilimitations thus getting program that's neither safe nor easily tweakable, worst of both words.
Rust makes a lot of sense on today's microcontrollers, but that's because even very cheap STM32 is a 32bit microprocessor with, usually, more memory than Atari 400 home computer!
In theory it should be possible to port Rust to 8bit CPUs but it wouldn't be feasible to use, I'm afraid.
People have hit stack overflows in real compilers with real code before, it's just the reduced report code that looks like that. (Normally it's generated code, though I've seen some real scary hand written expressions!)
To be honest I have not used Rust on such small systems yet. However Cliff Biffle has an impressive demo in Rust in an STM 32 with only 192K RAM. Rewriting m4vgalib in Rust - Cliffle. Rust is used on ESP32 devices with down to 320K RAM. Amazingly I hear Rust can be used on the 8 bit AVR ATMega devices which are tiny.
What I had in mind was to use Rust with my RISC V core on an Altera FPGA. That only has 32K RAM available.
And that's 3 times more than C64 had. And that was considered pretty damb “beefy” home computer at the time. Most home computer had less.
Rust is perfectly feasible on today's microcontrollers, but that because said microcontrollers are more powerful than home computers of yesterday which had IDEs, complicated games, spreadsheets and many other quite sophisticated programs.
I agree 1 byte isn't much but this argument doesn't help. If 99.6% of the time you waste 0 bytes and 0.4% of the time you waste 4096 bytes, that comes out to 16 bytes wasted on average. You probably meant 99.98% which would still mean you're wasting 1 byte of stack on average.
The flip-side is that if you're not in a constrained environment, rather than engage in ever heavier contortions to save a byte, you should do the thing that makes the code easier to maintain when you come back to it. It's important to note here that as the compiler gets better, it can often elide stack allocations itself - and to do that, the compiler needs the code to be clear so that it's able to deduce that an allocation can be elided.
This is especially true for people still learning Rust, because the contortions you come up with as an early Rustacean are going to make it hard to refactor later for a much bigger reduction in stack consumption, once you know how to work with the compiler.
And I say this as someone whose professional software work has spanned everything from 8 bit machines with 256 bytes of RAM and 1024 bytes (1 KiB) of EPROM, all the way to giant server farms where individual hosts had RAM and SSD measured in TiB. I've had more experiences in Rust where doing the simple thing made the compiler elide the allocation completely (so it's present in the source code but not the output binary) than I've had success with any form of contortion to remove a stack allocation; while I've had to contort my code on occasion, it's never been a trivial thing to do.
I worked with 4bit microcontrollers but only with systems with memories and SSDs measured in gigabytes, not terabytes. And in my experience that is true for cosntrained environments, too:
because
In my experience when your clear code is compiled into something unacceptably big then it's much easier to just give up, use asm and have full control over what you are producing than trying to keep code in an ugly shape which compiler may they turn into simple machine code.
Optimizations are always opportunistic, they are never guaranteed while asm is stable. It makes huge difference in practice unless you just completely freeze your toolchain and never upgrade (thing that many embedded developers try to do).
With Rust “never upgrade” strategy quickly leaves you in an isolated insland detached from everyone else. Not a good place to be in.
Back in the day the original PL/M for Intel 8080 had a neat optimisation re: stack usage. Procedures were not reentrant by default. Any parameters they needed were passed through static memory, local variables were held in static memory. Why? Because all that pushing and popping of parameters and accessing variables via the stack pointer took more instructions, made the code bigger, and slowed things down. If you really wanted a recursive function or something that could be used in an interrupt handler or other thread then you had to add the REENTRANT attribute to the procedure definition.
I notice today that there still is a C compiler, C51, for the 8051 that does similarly.
Think I'll write a Rust RFC for the opposite attribute in Rust
this is basically exactly what i was talking about in this thread.
the main issue i see is that unless the compiler inserts enter/exit guards around the function to panic if it's called reentrantly, this constitutes an unchecked precondition that causes memory corruption if violated, which means it would have to require unsafe.