Does exist a technique to inline fields of another struct?

Does exist a technique to inline fields of another struct?

pub struct TransactionBody
{
  pub sender : String,
  pub receiver : String,
  pub amount : f64,
  pub time : i64,
}

pub struct TransactionWrap
{
  pub sender : String,
  pub receiver : String,
  pub amount : f64,
  pub time : i64,
  // ... TransactionBody,
  pub hash : String,
}

Something like this: ... TransactionBody

No, but you can store the struct as a field.

struct TransactionWrap {
    body: TransactionBody,
    hash: String,
}
3 Likes

Hi Alice. Yes, but how to initialize TransactionWrap so to avoid extra allocation of memory( even on stack ) for TransactionBody? Otherwise it looks like sub-optimal solution, because first you have to allocate and fill TransactionBody and after allocate TransactionWrap and copy all fields from TransactionBody to TransactionWrap.

Can macros help?

There is no extra allocation of memory here. Rust is smart enough to construct the body in the right place so it doesn't have to be moved around on the stack. Rust is not like Java where every object is a separate allocation.

7 Likes

Alternatively, you can make a Transaction struct that's generic over its payload:

pub struct Transaction<T:?Sized>
{
  pub sender : String,
  pub receiver : String,
  pub amount : f64,
  pub time : i64,
  pub payload: T,
}

This way, functions that only care about the header can be made to work with all payload types T.

3 Likes

Structs don't exist at runtime, so the only memory your TransactionWrap will use is the sum of its fields.

The compiler may need to add some padding bytes to make sure fields are aligned in memory (unaligned reads = bad time), but that would happen regardless of whether the nested type was inlined or not.

That's not what you'll see when the optimiser is finished with your code.

The compiler will set aside enough stack space for a TransactionWrap instance and initialize the TransactionBody field in-place. These sorts of unnecessary copies or variables are quite straightforward for an optimiser to remove and one of the reasons you see such a massive performance difference between cargo run and cargo run --release.

4 Likes

The nested type is going to have the same layout as the independent type (so you can e.g. std::mem::swap(&mut nester.field, &mut standalone). If you inline instead, Rust may intermix the other fields of the containing struct to find a smaller or otherwise more performant layout overall.

(But I still feel the benefits of nesting almost always outweigh that cost.)

3 Likes

I'm curious: Does this hold in debug mode?

1 Like

There's a bit of nuance here. The padding can sometimes result in slightly larger types, because it forces grouping some things together.

struct Inner {
    x: u8,
    y: u64,
}
struct Outer {
    inner1: Inner,
    inner2: Inner,
}

struct Inlined {
    x1: u8,
    y1: u64,
    x2: u8,
    y2: u64,
}

Inlined is 24 bytes. Outer is 32 bytes.

Depends on what you're asking about.

w.r.t. layout optimizations: Yes. I'm fairly certain that rust performs identical layout optimizations to all types in both Debug and Release. I.e. unless there's some #[cfg(debug_assertions)] fields on a type, one should expect that the size and alignment of a type are always identical between debug and release.

w.r.t. heap allocation: Yes. rust will never, ever, ever in a million years implicitly allocate something from the heap (unless perhaps it is compiling to some unusual target that doesn't have anything that resembles the stack?). Even in debug mode. This would go against its fundamental design goals, and compromise its ability to be used for purposes like writing an operating system.

w.r.t. temporaries on the stack: No. Unlike C++, Rust has no guaranteed copy-elision anywhere in the language. On the one hand, Rust doesn't really need it that badly, because all copies in the language are shallow memcpys (unlike in C++ where they may turn into arbitrary function calls that deeply clone arbitrary resources). The downside is that the unoptimized code emitted by rust does contain a lot of unnecessary temporaries and memcpys, and thus the optimizer has a significantly greater impact.

Thus, something like

let foo = Foo {
    bar: Bar { x: 1, y: 1 },
    z: 2,
};

may create dedicated stack space for the temporary Bar before copying it into the Foo. In theory, it might even create dedicated stack space for the integer constants before copying them into the Bar. There are no guarantees here.

10 Likes

Interesting idea!

Thanks for answers :slight_smile:

As others have explained to you, it isn't, but even it it were: perceived, unmeasured micro-optimizations should not be used for guiding the designs of a data structure. Model your problem domain first, and optimize for readability. Even if you find a bottleneck in your code, it's unlikely that it will have come from initialization order of fields or something like that. Modern optimizing compilers find this kind of low-hanging fruit with overwhelming probability.

4 Likes

Thanks, H2CO3. I am rather doing research than design.

That's not what I am talking about.

The "design" of a data structure is what it represents, how you lay it out in memory, what relations are between its elements, what kind of operations are easier and harder to perform on it, etc. You are necessarily doing this if you are creating any sort of data structure.

So my claim that "unmeasured micro-optimizations should not be used for guiding the designs of a data structure" refers to the fact that initially, you should not create a struct layout that you think is "optimal" in terms of performance. Instead, you should create a structure that communicates the purpose of the code as clearly as possible. Whatever you are doing with the code, this will be valuable.

4 Likes

Yes, I understand. Premature optimization is anti-pattern.

That's actually not a good advice. I personally achieved 10x speedups for modules which were written with optimize for readability data structures — and usually the only realistic way to do that was to throw away optimized for readability data structures which all the code and write everything from scratch.
The reason here is simple: while modern optimizing compilers are doing marvelous local optimizations (like elimination of needless temporary variables) they still are not that good with global optimizations (Rust have some, e.g. Option<Enum> would have the same size as Enum, but not that many).
It's never too early to think about global optimizations. Things like conversion from array of structs to structs of arrays may provide amazing speedups in many cases — but it's hard to perform after code is already written (and compiler would most definitely not perform these for you).
This said it's still a bad idea to do unmeasured micro-optimizations. Very often your intuition will fail you. E.g. unoptimized 32byte data structure may, in practice, be faster than optimized 24byte one.
Leave small, local optimizations to the last stage of development. If optimization doesn't change your API then it's not even worth thinking about it at early stages of development: 90% of time compiler would do it for you, in remaining 10% you may easily do it later.

1 Like

I can assure you it is.

Congrats.

Those data structures must not have been terribly well encapsulated, then.

You see, I'm not saying you should never optimize. I'm saying that micro-optimizations shouldn't be the first thing you do with your code, especially in the early phases.

The memory layout problem above is absolutely not global.

Again, that is only a matter of encapsulation. Video games do get developed somehow without all of them ending up a tangled bowl of spaghetti, despite game programmers making extensive use of SoA. If you use e.g. an in-memory relational repository (something like 2e71828's thesis work, for example), you can completely hide the representation from the interface, and you can refactor your data structures with reasonable effort, without needing to rewrite everything from scratch.

2 Likes

Video games do get developed somehow without all of them ending up a tangled bowl of spaghetti, despite game programmers making extensive use of SoA.

Video games are pretty poor example. Most of them are terribly wasteful and underoptimized. They don't need to perform as well as possible, they only need to perform well enough to be playable. Plus when they use 3D graphics (which is most of them) bottleneck is usually in GPU, not CPU, anyway.

If you use e.g. an in-memory relational repository (something like 2e71828's thesis work, for example), you can completely hide the representation from the interface, and you can refactor your data structures with reasonable effort, without needing to rewrite everything from scratch.

And then you achieve about 10% of what's achievable in principle. Heck, think about Rust compiler itself. What exactly does it do to make it approximately 10x slower than C++ compiler and why couldn't developers make it faster despite years of efforts?

Those data structures must not have been terribly well encapsulated, then.

They were encapsulated. It was enough to make everything slow.

Linus wasn't joking when he said trust me: every problem in computer science may be solved by an indirection, but those indirections are expensive — pointer chasing is just about the most expensive thing you can do on modern CPU's.
Remember the Numbers Every Programmer Should Know? One single access to main memory which you may need if your data layout doesn't make it possible to fit all required things in the cache and/or make it impossible to read with prefetch is equivalent to about 1000 simple operations.
If your API doesn't make it possible to fetch and return just the data you actually need then you hit these issues pretty fast and if your API have bazillion knobs which make it possible to pick what you need to fetch and return and these knobs survive till execution time then you would lose in these (even well-predicted jumps are not free and, worse yet, they can make it possible to vectorize code).
Sometimes you have no choice. E.g. when your target moves so fast that without very flexible API you would never finish your work. Then skipping global optimization work may be sensible (that's how Rust compiler become what it is today). But you always pay for it. Usually you lose 90% of speed, sometimes 95%, if you are lucky you lose 80%.
These are from my experience.

1 Like

Are you saying the Rust compiler is sub-optimal? Could you please suggest an article explaining it, preferably with data?