Clarification on how rust compiler handles borrowing literals and static variables

Hi all! I'm struggling to understand how the Rust compiler handles borrowing literals and static lifetime variables. I put together this code sample:

use std::fmt::Display;

static MY_STR_REF: &str = "my str";

static MY_NUM: i32 = 10;
static MY_NUM_REF: &'static i32 = &MY_NUM;

fn print_static_val<T>(val: T)
    where T: 'static + Display
{
    println!("{}", val);
}

fn main()
{
    print_static_val("hi");
    print_static_val(MY_STR_REF);

    print_static_val(20);
    print_static_val(&20);

    print_static_val(MY_NUM);
    print_static_val(&MY_NUM);

    print_static_val(MY_NUM_REF);
}

If I am understanding it correctly, this all works because each value passed in has a static lifetime and has a size known at compile time, so they can be displayed. I assumed that the Rust compiler would just allocate the literals "hi" and 10 in addition to the static variables in the binary, allowing them to be referenced / borrowed later. Though, this isn't reflected in the godbolt output of this program

Most of the output makes sense, but lines 217-218 confuse me (when doing &20). It accesses the field .L__unnamed_6 which holds .asciz "\024\000\000", meaning that passing "\024\000\000" as a parameter to print_static_val works as intended, printing a borrowed value of 20. This, as well as how the compiler seems to treat static lifetime variables differently than references and literals (usage of mov vs lea) confuses me, especially since my asm skills are a bit :woozy_face:. So, any clarification or sources that explain this would be hugely appreciated! Thanks!

It's not really clear what your question is. What confuses you? What do you think should happen, and specifically how is reality different from your expectations?

Modern high-level languages don't map directly to any particular hardware or ISA. The compiler obeys the so-called "as-if rule": it's allowed to generate any and all kinds of machine code, as long as the high-level behavior, as specified by the language, is preserved. In particular, it need not naïvely emit instructions by transliterating the source code expression-by-expression.

When you do &20 the compiler implicitly promotes the 20 to a static variable, through the so called "rvalue static promotion", so it ends up being the same as when you pass &MY_NUM.

Note that .L__unnamed_6 is not a field, it's just an assembly label, a way to show a more human-friendly address offset into the binary.

1 Like

Thanks, that's exactly what I was looking for! Though, why don't I see a static variable being allocated in the asm? And how does "\024\000\000" map to the reference / borrow of that variable?

What do you expect to see there and why? The whole point of a compiler is to take source code (where variables do exist) and turn them into machine code (where they don't exist). Lots of details are removed in assembler, already.

"\024\000\000" is just integer with value 20.

I mentioned that I assumed the compiler to allocate 20 as a static variable since it's being referenced / borrowed, but instead, the compiler is working with .asciz "\024\000\000", which must somehow be equivalent since the program is working as intended (as you mentioned). The "as-if" rule definitely is useful to keep in mind, but the assembly does show behavior that would indicate something similar to that (at least is how I interpreted it)

I was expecting to see it as a label like the other variables; maybe something like .L__unnamed_20:.byte 20

The "20" in your code has type i32, i.e. 32-bit number with the value 20. .asciz "\024\000\000" is 4 bytes (32 bits) representing the number 20. asciz means that there is an extra zero at the end. \024 means: a byte expressed in octal notation. 24 octal = 20 decimal.

1 Like

Ooh. I see what confused you. .asciz "\024\000\000" is four bytes assmbler string. z in the name means that assembler would add zero byte at the end. \000 is another zero byte. And \024 is 20 because assembler uses octal (you can guess it goes back to 1960th when octal was more popular than hex). Thus we have four bytes: 20, 0, 0, 0.

On little-endian machine that's representation of 32 bit integer 20.

Assembler doesn't care about types. Representing 20 as string with appropriate four bytes as perfectly value strategy to put it memory. You can use .4byte 20 instead of .asciz "\024\000\000" and would get exactly the same binary file after running it through assembler.

What other variable beside that very .asciz literal do you expect? The literal 20 doesn't have a name, it's not a "variable", either. That looks like a perfectly litera translation to me.

Omg I have never actually seen octals being used LOL! That combined with asciz adding an additional zero byte is totally what was confusing me. I see now that the compiler is just inserting another literal with the value 20, not a true reference, since it is not needed in this case. That's probably what the other commenter was talking about when mentioning compiler shortcuts too. Thanks!

No, the label does result in indirection, it's a literal in the binary, and it's accessed by (de)referencing its address.

That's only an implementation detail at this point though, because impl Display for &T simply forwards to impl Display for T, which is trivial enough to be inlined most if not all of the time — it might very well be the case that impl Display for i32 (no references in sight) is also implemented with indirection.