What's the best way to deal with these four scenarios in Rust?

When you (the developer) allocate memory, there are four different possibilities regarding scope and size:

  • You know the scope and size
  • You know scope but not the size
  • You know size but not the scope
  • You know neither scope nor size

In an imaginary programming language, these situations could be dealt with:

  • Stack alloaction
  • Region, memory pool
  • Global variable
  • Some kind of GC

Example use-cases:

  • Array of fixed size, e.g. number of bullets in a game are set to max 3
  • Linked list that depends on user input, say parsing
  • Configuration variable
  • [todo] Factory method...?

So my question is, how would you deal with these in Rust?

Edit: Sorry, scope should be replaced by lifetime, I think is more clear.

1 Like

What does it means you don't know the scope?

It means that a developer allocates a variable but doesn't know of care to figure out its lifetime. For example, returning a new object from a factory method (in a traditional OOP language).

Well OK, maybe it would be more clear to say lifetime than scope above. Sorry.

Well you can always return a value. No heap allocations needed. Global variables have lots of concurrency problems like reentrancy.

And for linked list, especially doubly linked one, is bad. It's really hard to find a use case where its more appropriate than the contiguous buffer based ones like Vec or VecDeque. On modern multicore multilevel cache processors, cache miss ruin the performance.

As always, the stdlib docs mentioned it nicely.

https://doc.rust-lang.org/stable/std/collections/index.html#when-should-you-use-which-collection

Interesting options there.

For many projects I have worked on over the years, embedded systems, safety critical systems, etc, the last three cases would have been considered unacceptably bad design or implementation and rejected immediately.

If you don't know the scope, be that actual syntactic scope in the language or where your data is via some smart pointer, and where it is no longer required to exist then you are opening up to memory leaks and or dangling pointers.

If you don't know the size you are opening yourself up to stack overflow or memory exhaustion.

Now, as you correctly imply, many programmers in many situations, "don't know of care to figure out" these things. They assume they have infinite stack and heap. They assume that if there is a problem with either it's not a big deal, the OS will kill it, the OS or user will restart it. It does not happen so often as to have to worry about it, right?

Kind of appealing really.

Last week one of our servers in the cloud when down. Could not even log in. Turned out my colleague, a Python head, had a Python service running there for many weeks just fine. Until one day it decided to eat all memory and fill up temp file space. He is still trying to find out where and why that might happen.

To your question:

Rust works like pretty much every other language in common use.

If data only comes into existence in a function and is only needed in that function then it is local variables or perhaps passed in as a parameter. All happening on the stack.

If data needs to live longer than a function, or function call chain if passing reference down, then it needs to be allocated on the heap. Rust has "smart pointers" like Rc and Arc to take care of that. Or just "move" ownership around.

Except if "local data" is actually going to be huge put it on the heap so as not to blow the stack space.

I really like this question because it's the sort of thing that makes languages designed for "systems programming" stand out from the others.

Often when you don't know the size you'll use dynamic dispatch (which necessitates an indirection like a reference or Box), and when you do know the size you can use the type directly.

When the memory has a fixed scope/lifetime you normally pass it around by value or store it in a variable on the stack and pass by reference (depending on whether copying/moving makes sense). When the lifetime is unknown or more complex you'll often put it behind some sort of smart pointer that manages the memory's lifetime at runtime and is passed around by value.

Common examples are Box<T> for single ownership, Arc<T> for shared ownership, or using some GC<T> smart pointer from a garbage collection library when you've got a more complicated web of objects.

Most pointer types support unsizing from a Box<T> to a Box<dyn Trait> (e.g. inside the factory function) so you get something like this:

Known Lifetime Unknown Lifetime
Known Size T &T T, Rc<T>
Unknown Size &dyn Trait Box<dyn Trait>, Rc<dyn Trait>

@Michael-F-Bryan I'm pretty sure that T and Box<T> have to be in the same square in your table.

1 Like

Good point. I probably should remove Box<T> from the "known size" category altogether because it's only relevant to "unknown size and lifetime".

I'd say the two options are being forced to use either type erasure (dyn) to hide the concrete type and requiring a smart pointer for lifetime management, and from there you can construct your own comparison matrix... The real world is a bit less black and white (where would something like Cow<'a, str> fit in?), but I think that's a useful first approximation.

An interesting middle case in rust is when the size is unknown, but can only be one of several alternatives, known at compile time. In that case, representing the alternatives by an enum rather than by a dyn trait, which avoids the virtual call and the indirection associated with the latter. I find this case very common.

1 Like

That is a bit of miss because any type has known size.
Whether you put stuff into enum or uses dynamic dispatch is a different matter

I always know the scope.
If you do not know exact scope it means that allocation is dynamic and performed on demand(the same with dealloc)

C++ object model when it comes to storage duration is pretty much applicable to Rust so I would suggest to read up on it

1 Like