Reference/Lifetime Program Design Tips

Are there any general program design tips you have in regards to lifetimes? I have a pretty common trend which arises as a new Rust programmer which I really want to work out:

  1. I write a program as a "normal programmer" would in a language that I am familiar with like C
  2. I end up having to refactor something or for example, take some variables and group them into a struct
  3. In doing this refactoring, I run into something I'll call "lifetime hell" where I need to make something a reference that I didn't previously consider, and this sets off a chain-reaction where I have to change bits and pieces of my functions and and structs all over the place to now accomodate for the reference and introduce lifetimes, where I then face a huge series of errors involving undeclared lifetime usage and similar.

Of course, all programming has this sort of thing - if a piece of data changes type in one place, all other areas must be updated unless they were written "generically," so that's not necessarily the complaint. However, I'm wondering if you had any "high level design tips" when I am first designing a program before I start coding which will help save time when writing Rust programs, specifically involving references and lifetimes. One such tip for example may be "favor the usage of references to moves in most situations for reason x" etc...

Thank you.

2 Likes

This can likely evolve into a mini-book (and maybe should, at some point), but let's start with some simple "thought snippets".

So, some random thoughts that spring to mind:

  • if the type does not survive a single thread's execution (i.e. it's active for 1+ call frames, but does not cross threads), it's likely a good candidate for borrowing data from elsewhere. Think iterators as the canonical example.

  • if the type represents some long-lived state of the application, then it's unlikely to be borrowing anything - it owns its data. It may be borrowed from by items in the previous bullet point.

  • if the type is a "message" passed between threads, it likely won't borrow anything but will be moved around (ultimately between threads). It should own the data (even if it's technically sharing ownership via Arc or whatnot).

  • in general, the longer lived something is the less likely it can borrow; the shorter lived, the more likely it can borrow.

  • Really nail down (possibly by iterating over the code/design space) who owns what data - these owners will not be borrowing anything. For each type you define, what's its purpose? Where does it get data it needs? Is it owning that data or is it merely a "helper" to do something and can borrow from elsewhere?

  • Really nail down how data flows across the components. This flow may entail moving lightweight "message"-like types around.

  • There's a technique sometimes called "state splitting" - this is about placing a piece of data in the right spot (i.e. struct) so that it's conducive to borrowing. It stems from the fact the compiler is able to understand that borrows of distinct fields of a struct are disjoint (and thus allowed at the same time).

  • Somewhat as a consequence of the above things, don't be afraid of creating "lots of little dedicated types". This is "too much bundling of state" is commonly seen in languages that don't make this pattern painful.

  • Experiment + experience! No amount of guidelines will really substitute experience, particularly across your specific application(s). It's likely you're not (yet) building libraries for other people to consume, so you have the freedom of radically changing the implementation without affecting anyone other than your own code. So you'll likely feel like you're playing a bit of "type tetris", but that's ok. Folks building libs for others to consume are in a much more difficult predicament because any design mistake along any of the axes is harder to correct (certainly without breaking users' code).

Getting this right is a challenge, no matter one's Rust proficiency, so don't be discouraged if you find yourself rejigging things around a lot.

16 Likes

I would add to this:

  • Try to focus on DAG (Directed Acyclic Graphs) - avoid creating cycles except where you have absolutely no other choice, and if you are forced to create cycles, consider using WeakRef from RefCell, etc.
  • Instead of creating a bunch of structs to hold data (things) create Traits (actions/sets of actions) and write your code to take and return either impl Trait (static dispatch/monomorphosized) or Box<Trait> (dynamic dispatch/single implementation). Only actually implement structs for things and assign the traits to them after you've written out algorithms in terms of the traits (sets of actions) they deal with
  • Be sure to separate "Data Holder Objects" (which will likely be owned by a collection) from "Functional Service Objects" (where the structs will likely live on the stack and only have member fields that are "dependencies" for the needed functionality - also, those dependencies should likely be Box<Trait> (again dynamic dispatch) or generic over one or more Tra(static dispatch/monomorphosized) its rather than a specific Struct.
1 Like

Unlearn using pointers. Don't think of passing/returning/storing things by pointer (you may worry that returning "by value" does a lot of copying — it doesn't. Don't worry). For C programmers that's the usual source of self-inflicted pain caused by insisting on using references where they are not required.

Really understand the concept of ownership. Know where each value is stored (lives) in the program, conceptually and in practice.

Learn about interior mutability. Large programs will need Arc<Mutex<>> sooner or later.

6 Likes