Data ownership design patterns? Feasible or not?

While participating in this thread on the good ol' twitter, we came to the topic of data ownership modeling.

We came to this topic through the dreaded question: When is it right to use Rcs, Arcs vs reshaping the program's architecture?

My understanding is that, it's complicated, like many things in our domain😅. It takes flair, expertise, trials and errors to know that using one or the other solutions is the right answer to the problem.
More often that not, beginners will feel this intense pressure not to use them. I for one, know that they are correct to use but I will never know if I could have down better.

Out of this realization, I believe (and that's pure belief, a leap of faith of sorts), that by exposing those memory management primitives, Rust opens up a field of discussion (that was initially reserved to the garbage collector or the compiler) around modelling ownership. The borrow checker ensures that your ownership model is sane, not that it's appropriate.

The same way the GoF published the design patterns for OOP by studying OO Programs and abstracting away the various solutions found here and there, there may be an opportunity here to study many rust programs and focus on the data modelling solutions they employ.

I give you an example. ECS was mentioned during the discussion, for game dev. But do we recognize this pattern somewhere else? It it being used to solve similar problems in different problem spaces? What can we learn from this pattern, it's strength, it's weakness, where it can be applied, examples of implementations, novel or in the wild etc...

You see where I'm going with it.

A Fool's Errand or a noble quest?

4 Likes

I feel that this is definitely a "noble quest"™. Consider all the forum responses that point new Rustaceans to ECS when they try to create structures such as graphs that would be inter-linked with pointers in other languages. The book could be titled something like Design Patterns for Rust.

2 Likes

That sounds like an interesting project. Even one I would contribute to.

Interesting. It is certainly somewhat of an art to tell whether using an Arc is the right choice. I suppose some problems are just inherently about shared data, and others, not so much.

2 Likes

I could totally imagine 2-3 books about this topic with the titles: "Conquering Rust, Ownership Level: Beginner / Intermediate / World Conquerer" or something similar. :grin:

Rust's type system gives you choices between:

  • owning vs borrowing
  • shared vs exclusive
  • single-threaded vs thread-safe

And when you need the particular combination of owning+shared+threaded, then you need Arc. If you don't need threads, then it's Rc. If you don't need shared, then it's the Box. If you don't need owning, then it's &, etc.

I'm not sure how to explain that as ownership design patterns. It's rather the other way: general programming patterns require certain types of ownership. You choose your design, you analyze what data it has to own, mutate, and share, and then you choose data types accordingly.

10 Likes

shared → shared read-only vs. shared read-write
shared read-write + thread-safe → lock vs. lock-free (vs. wait-free)

No, it's not about mutability! That's a common misconception.

For example, &AtomicUx is shared, but still allows writes. You can also declare lifetimes in a way that lets you return an immutable & reference that is still exclusive (when it has the same lifetime as some &mut).

Arc<Mutex> is shared and read-write, and certainly not wait-free.

2 Likes

My bad, you're right, of course! I was thinking beyond pure ownership.

If I understand correctly, you're saying that one could not devise design patterns around memory ownership because it's more of a set of rules, a checklist of sorts, and I don't disagree that there's a strong checklist aspect to the problem.

My concern is not to get the checklist right because the compiler tells you if it is or not ultimately, but how to get your program modelled right around those constraints.

And this is likely where we diverge. Using those rules you can produce an indefinite amount of different programs that all will compile, but some will surely be better than others, by changing the data ownership model. The goal of the endeavour is to surface the best practices and patterns that may have arisen in the community over time.
In your opinion @kornel, looking into the problem will likely not go very far off as it's always solvable through those rules?

Design patterns are great, as long as they don't become prescriptive - like trying to use singletons everywhere, or shoehorning rest APIs where they don't belong.

1 Like

My rule of thumb is to get the compiler to do as much of my work for me as I can. That means I turn to reference counting only as a last resort. So far, I've managed to build a rather complete simulator of a distributed network of servers without any RC and Arc only appearing as Arc<Mutex<_>>.

Getting there required long conversations with the borrow checker and more clone() operations than I would like, but I have no runtime panics as could happen when using RC in a way that violates the borrow rules. Things may change once I start tuning for performance, but so far I've been able to run examples as large as 100 nodes in a minute or so on my laptop.

1 Like

You're confusing it with RefCell, I think?

Cool. What are you simulating?

I :100:agree with you. Having written a book about using the GoF design patterns in Swift, I can only concur that they can become the source of lots of confusions for beginners.
When you discover a new pattern, you tend to over use it in wrong scenarios.

I believe this endeavour would aim to achieve the opposite.

Focusing on the problems, not the solutions. Recognizing the patterns in the problems rather than the solutions. It's kinda the opposite of a regular design pattern book that focuses on pre made recipes but often forget to ask the kinda cake you wanna eat first.

Is it open source? This kind of project is likely a great source of inspiration when it comes to abstracting away how you solved your own memory management issues overtime.

Yes, I should have said RefCell. It's an easy mistake for me to make, since I haven't resorted to using it.

Mixing up Rc, Arc, Mutex and RefCell is something I encounter quite often. Rc is rarely used without a RefCell, and Arc is rarely used without a Mutex, which makes their distinct purposes unclear.

The startup I'm working for has a new design for datacenter networks. Since we don't have the money to buy a datacenter, I built a simulator that we use to verify that our algorithms all work. Its behavior is as close as I can make it to what will happen on a physical network.

Oh, and please let me know if you're looking for something to invest in. :smiley:

Unfortunately, our code is not open source. One plan is to open source the simulator once we start shipping our product. We think it could be a useful tool for developers of distributed applications.

1 Like