How does the compiler manages struct field access?

Hi Rustaceans.

I've always thought about structs as some kind of hashmaps given that they look pretty similar to JS objects (I come from a JS background). Lately I've been studying the memory representation of various data structures and to my surprise, the memory layout of both normal structs and tuple structs are almost the same to that of tuples: each value within the struct is laid out one after the other in stack memory with some alignment and padding. No reference to the field name or field index is stored in memory, only the values of fields are stored. Additionally, the order of the fields in the Rust code is not guaranteed to be the same in memory as the compiler might change it for memory optimization.

What I don't quite understand is how does the Rust compiler perform the access to the fields within those kinds of structs. Let me explain my reasoning:

For arrays the process is straightforward as I know that an access expression is transformed into the address where the array is stored plus the index: an access like foo[3] means something like "go to the address where foo starts and add 3 times the size of the array item type and there's the value you're looking for". However, in a tuple struct, an access like foo.3 can't work that way since each value can have a different size and the order of the values is not guaranteed. In a normal struct, an access like foo.myfield clearly don't have a number, so number indexing does not apply, and given that the field name is not stored in memory, I'm not sure how does the compiler perform the access under the hood.

So my question regarding struct field access is, since only values are stored in the actual memory layout, and the size of each field can be different, and the order in which those values are stored might not be the order specified in the Rust code, then how does the compiler know which part of memory to target when an access to a specific field is made? Does the compiler internally register the final place in memory of each individual field within the struct and when a field is accessed it then knows exactly to what place in memory that access belong?

One of the important differences between Rust and JS is that Rust is statically typed and compiled. So there are two things to consider separately:

  1. Compile-time.
  2. Run-time.

The first is when you are running the compiler to create the executable file. The second is when the program actually runs. During compile-time, the Rust compiler keeps track of the type of every variable and knows a lot of information about each type such as the list of methods. One of the things it knows is the offset of each field. The compiler uses this to translate each field access into an offset operation, and the actual executable file is just going to hard-code the integer that corresponds to the offset.

So all of the information is known ... but only when the compiler creates the executable. When you actually runs it, it has been translated to a hard-coded integer offset. The string name is completely gone at that point (except for debugging information that is stored separately from the executable machine code).

8 Likes

Yes, precisely. When the compiler encounters a struct definition it decides how that particular struct type will be laid out in memory and keeps track of that information, and then every time a field is accessed it looks up the definition of that struct's type to find the offset of the field. Since all variables have a known type at compile time, it's just a straightforward lookup into the compiler's internal data.

1 Like

Ok. That makes sense. It's pretty amazing the amount of work that the Rust compiler do under the hood. Thanks for the quick responses @alice and @tornewuff.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.