Mut &str - if preallocated read only memory expand? Implicit lifetime for &str

a) What implicit lifetime does the &str have, is it &'static str or &'a str or something else?

let a: &str = "hey";


b) The data "hey" is in preallocated read-only memory.

1 | let mut x = "hey"; -> 3 bytes - UTF-8

What happens in memory(preallocated read-only) when the value changes?

2 | x = "world"; -> 5 bytes - UTF-8

  • x will point to another memory location and will take 3 + 5 bytes in total? The program memory swells forever
  • "hey" will be removed somehow from read-only memory, and "world" - will be written elsewhere - 5 bytes in total?
  • "hey" will be overwritten by "world" and memory extended - 5 bytes total?

I wonder what type to use in a structure that will only be one, for the duration of the program. Structure values will change dynamically.

struct User {
    username: String,
    email: String,
}

or

struct User {
    username: &'static str,
    email: &'static str,
}

or 

struct User<'a> {
    username: &'a str,
    email: &'a str,
}

It's &'static str.

None of these. String literals are placed into a read-only section of your executable at compile time, and hence loaded into memory together with the program code when you start the program. They are there for the whole duration of the program and they stay there forever. I.e .the thing that happens is:

  • read-only memory already contains (3 bytes) "hey" and (5 bytes) "world" at the start of the program
  • let mut x = "hey" writes the address and length of the "hey" string into the variable x (pointer and length makes for 2 * 8 bytes on the stack on 64-bit architectures)
  • `x = "world" changes the pointer and length information on the stack so that it now points to the "world" string that was already present in read-only memory

How does that work that there already is a string "world" in read-only memory available at the start of the program? Answer: during compilation, the compiler collects all string literals appearing in your entire program source (and its dependencies), possibly skipping the ones that are known to be never used (in reachable code), and possibly removing duplicates, and places them into read-only memory. If you re-use the same literal e.g. in a loop, then the same data in the same place in read-only memory can be re-used again. (This is one reason why it's important that the memory is read-only.) That's also why you can only assign string literals to such a variable of type &'static str; not any strings that are generated at run-time e.g. from user-input. Well, technically, you can explicitly leak memory (using Box::leak) to create a &'static str from run-time data, but in Rust it is almost impossible to accidentally leak memory, so you generally don't need to worry about any mechanism that will surprise you by making "program memory swell forever".


In this case, the data can obviously have more possible values than just values of string literals, so &'static str won't really work. A struct like User<'a> is also usually not what you want, structs with a lifetime argument are inflexible/hard to work with since you need to keep some other place/struct/whatever to own the borrowed data; typically you should go with owned Strings in this case, i.e. the first option.

If you need to copy strings (or more precisely "clone" them) a lot (without modifying most of those copies) and want to avoid overhead from such cloning, you could also work with something like Arc<String>, and e.g. use Arc::make_mut to implement clone-on-write behavior when mutating one of the strings is necessary. But all of this is really just an optimization; cloning a few Strings is typically not all that expensive, and it's way easier, particularly in case you're just learning the language and aren't familiar with smart-pointers like Arc yet.

4 Likes

The lifetime in this case is 'static. This is because string literals have the type &'static str. They are statically allocated in the program's .rodata section:

$ cargo new hello
$ cd hello
$ cargo build --release
$ objdump --syms ./target/release/hello | grep rodata | head -n1
000000000003b000 l    d  .rodata        0000000000000000              .rodata
$ xxd ./target/release/hello | grep 03b000
0003b000: 4865 6c6c 6f2c 2077 6f72 6c64 210a 0000  Hello, world!...

(Note that the section may be given a different name depending on the target.)

1 Like

Note that the &str case is part of a larger feature called promotion, wherein references to literals or other consts results in putting the backing value into static memory and letting the reference be 'static. E.g. the same thing is at play when you use a &0.

2 Likes

The amount of hard-coded string literals (because this is how the compiler generates &'static strs, as very well detailed in the previous responses) is limited within a given fixed input source code, so it can't swell forever:

let mut x = "hey";
x = "world";

thus consumes as much memory as let x = "heyworld";


Tip: using [u8]s to better grasp what is going on

As an aside, since str is an unsized type that you thus can't inline, it can be quite hard to experiment with it sometimes. But if you consider it represents UTF-8 encoded strings, it's thus, memory-wise, equivalent to [u8]: a slice of u8 bytes, i.e., a contiguous sequence of bytes whose len can vary from one instance to another (it is not Sized (all the instances of a Sized type have the same size)).

So, [u8] is quite similar to str, including its un-Sized-ness, but it's suddenly way more concrete. Mainly, any fixed-len array of bytes, such as [u8; 3], when behind indirection, can be coerced to [u8].

For instance, the following snippet

let mut x: &'static [u8] = b"hey";
x = b"world";

can be unsugared to the following fully equivalent code:

// read-only memory
// vvv
static HEY: [u8; 3] = [
    // there is also the `*b"hey"` shorthand syntax for this
    b'h',
    b'e',
    b'y',
];
static WORLD: [u8; 5] = *b"world";
// 3 + 5 = 8 bytes total

// x starts pointing to HEY
let mut x: &'static [u8] =
    &HEY // : &'static [u8; 3]
        as &'static [u8] // coercion
;
// make x point to WORLD, now
x =
    &WORLD // : &'static [u8; 5]
        as &'static [u8] // coercion
;
8 Likes