Writing code without memory layout assumptions

Say I am writing a function that needs mutable access to 4 objects. The idiomatic thing to do is create a struct with 4 fields and pass in a mutable reference to it. But what if the 4 objects are not necessarily adjacent or created at the same time? Well you can still have a struct, but have every field be a mutable reference, instead of the object itself. This way the 4 objects could be anywhere in memory.

Of course introducing more indirection can have a performance penalty. So in a perfect world our function would be generic, letting us choose to pass in a regular struct or a struct of references. I run into 2 issues trying to do this:

  • Generics in Rust require a trait. So I need accessor trait methods that return mutable references. The regular struct will return a mutable reference to its real object field, which I assume the optimizer will always see through, while the reference struct will just return a copy of one of its internal references which could be pointing anywhere so the optimizer will still have to emit real loads of the references themselves in addition to loading data from where they point. These accessor methods will borrow the whole struct though, which is not great ergonomically.

  • Mutable references don’t work well for assigning. Users can’t write their code as if they were assigning to a regular struct field, they have to explicitly deref the reference to reassign.

This (1):

a.b = foo;

To remove memory layout assumptions becomes (2):

*a.b_mut() = foo;

The irony is in C++ this is easy — references can’t be reassigned so they are always dereferenced implicitly, even for assignment, and generics are duck typed so both structs just work with (1) without any special hackery.

Is there a good way to write the generic function in Rust? I’m not even sure there is a good way to generate two functions from a single implementation with proc macros because you need a deeper analysis than just tokens/AST to know where to insert dereferences.

Ideally I could preserve ergonomics of naked struct access.

If you have trait method return a struct-of-references, you can get near the ideal ergonomics.

Code Crimes

With DerefMut crimes, you can potentially have perfect ergonomics! Your input struct also contains a MaybeUninit of a struct-of-references. Deref panics. On DerefMut, you re-set the interior struct-of-references to have correct references, and return a reference to it. You now have dot syntax.

Please don't do this. This is for educational value only. The contents of this details panel are licensed under WTFPL-ease don't do this.

However, I'd suggest against doing this. This is a lot of additional complexity for, at best, a microöptimization. Especially if the function border isn't an optimization boundary (i.e. a cross-crate call with no generics, no #[inline], and no LTO), I'd question if it'd even has a notable impact after optimization.

If the input four objects are logically distinct units, pass them in as distinct references, or as a struct of references. If they make a logical unit, and it doesn't make sense to decompose the physical structure of that unit, pass in that logical unit.

Express the data vocabulary of your domain first, and then pollute it with the fact that it has to run on a real machine if (and only if) benchmarks show that the corruption of the data-shape-correct is meaningfully more performant on real data.

Unless you can show your homework that you can do better then the optimizer, the optimizer can do better then you.

2 Likes

If the fields are non-Copy values rather than borrows, they can also be located anywhere in memory. The only values I expect to be stored inline in the value itself are Copy values (and the Copy parts of non-Copy values).

As you have demonstrated, in Rust it's easy too. The extra Deref isn't exactly difficult to use.

Most decidedly not. Duck typing works at runtime, and traits (just like anything else in the type system, except for trait objects) are resolved at compile time.

You could write setters for the fields. Either way those will need a value rather than a borrow. I wouldn't advise trying to abstract over mutability though. Just write them by hand.

OK, you mean like:

struct A {}
struct B {}
struct C {}
struct D {}

fn my_func(a: &mut A, b: &mut B, c: &mut C, d: &mut D) {
}

?

Hmm... I'm lost, why would one want to do that? Why not something like I imagine above?

Especially under those conditions.

But my A, B, C, D objects can be anywhere in memory already.

What am I missing? Perhaps if you have a concrete example to show of the problem you want to solve?

Sounds wicked. I'd love to see an example code that does this :slight_smile:

Something like this?

Woah, not so fast. Two things:

  1. The optimizer sees through &mut references just fine most of the time, since they are guaranteed not to alias.
  2. "More indirection" is probably not true just because you happened to write out a mutable reference type. You can't do anything with any object without having a pointer to it in the first place, so unless point (1) applies and the optimizer SROA's the value away into registers, you were going to have that indirection anyway.

Bottom line: if you need a struct with mutable reference fields, just go for it.

2 Likes

I mean that if I pass in a struct with 4 fields (held directly by value), the compiler knows the 4 fields are adjacent. If I pass in a struct that contains 4 references instead, the compiler does not know if the 4 locations they point to are adjacent or not.

In a situation where I am presenting an API for lots of other people to use, it's going to be exhausting to explain over and over why the idioms they expect do not work. People expect a.b = x to work.

I'm saying in C++ generics (templates) are duck typed, in the sense that there is no traits system, which is why my example works there. C++ lets you write templates (generic functions) that use a type any way you want without declaring an interface for the type, all that matters is if the final monomorphized version type checks.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.