Why string literals are in borrowed form?

Why string literal are in borrowed form , but not other literals
for ex:

let a : &str = "hello";

if variable 'a' is holding the borrowed version of string literal, then who owns's that "hello" literal,
And why its happening in the case of strings only not for other literals. If possible please address this in form of stack and heap memory basis

2 Likes

It's baked into the executable.

Because their type, str, is dynamically-sized. It's not possible to hold onto DSTs by value directly. They must be behind indirection.

Because other literals (I assume you mean primitive numbers like integers and floats, and arrays) are statically-sized, so they can be created and stored by-value. If every literal worked by-reference, they would be very annoying to use, especially simple numbers.

4 Likes

According to rust ownership rules, every value must have an owner, u said its baked into executable, then it will contradict with ownership rules

In principle, the language could have been designed so that "hello" is of type str. But if that were the case, you'd always have to write &"hello" to pass it around. That'd be more code to write, for very little benefit. (The one benefit I can imagine is that you could easily write &mut "hello" to make a temporary mutable str. But that's very rarely useful.)

You can think of it as that the executable is the owner. When the program starts, it's loaded into memory by the OS and outlives all the running code.

But really, ownership is an abstraction that we make by writing code that obeys it.

  • There is code that obeys all the rules but produces a value that doesn't have an identifiable owner — Box::leak(). The Box was the owner, but is it still? You can argue it either way. If you like, you can think of “forever/leaked” as a special “null owner”.

  • If you use Rc or Arc, there is no one object that owns the value the Rc points to. Again, it's extending the ownership model. Again, this is fine.

“ownership” is really just the idea of “the responsibility for deallocating this data must be well-defined”. “This value owns that value” is the simplest and most common case, so we have simple terminology for it, but the responsibility doesn't have to be of that form. Literals, statics and leaking are “this is never deallocated”. Rc is “this is deallocated when all of the shared-owners are gone”. More variations are possible; the important thing is that the responsibility is always defined.

12 Likes

I'll try to to explain it: If the owner of a variable is dropped, the associated memory is freed. But, you are not allowed to call free on static data, so you can't have an owner. That way, free isn't called.

2 Likes

OK, then why only the string literals are in borrowed from not an integer or arrays?

Again, because they have a statically-known size.

It doesn't. It's simply a value living for the static lifetime (ie., the duration of the program).

Do you think the following contradicts ownership rules?

static BYTES: [u8; 5] = [b'h', b'e', b'l', b'l', b'o'];
static STR: &'static str = unsafe { core::str::from_utf8_unchecked(&BYTES) };
2 Likes

Note that you can make literals that are borrowed from the program memory too:

fn takes_static(_: &'static i32) {}

pub fn demo() {
   let x = &4;
   takes_static(x);
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=456344936db507b98665f19381f92de6

5 Likes

It’s not a contradiction, but it is a complication. Insofar as that the dedicated mechanism of “baking a value into the executable”, static variables, also needs to care about the question of why owns the value and what implication this has. Or put differently, your thoughts exactly address the question why global variables are a bit more complicated to use in Rust than in some other languages.

The typical form ownership in Rust works is that a value is ultimately owned, at least indirectly, at a local variable by some function. (Or more precisely by some specific function call; the “same” local variable in different calls to the same function makes for different owners.)

The “baked into the executable” for of ownership, i.e. the ownership in static variables, is different… the implications of global variables is that any function (and call to any function) can access the variable, but no function owns it. So it’s inherently shared, which in Rust often means the same as “immutable”, and additionally its also inherently shared between threads, so it even requires Sync.

The other implication of “baked into the executable”, independent of ownership, is that the value must be known at compile-time (because there is no good time when some initialization code could be run), which for statics translates into the restriction to const-evaluable expressions as initial value.

Back to string literals: Their value is known at compile-time and strings are thread-safe; their data – being “baked into the executable” can be accessed in a shared manner anyways, so making their type be &'static str to begin with makes sense. Though as others have manages, this design is also related to how str is a type without a fixed size known at compile time. (And b"hello world" syntax for byte-strings returning &[u8; LEN] instead of [u8; LEN] might be for consistency.)

For comparison, literals of types that do have a known type at compile-time can also be “baked into the executable” via so-called “static promotion” (the precise rules of which are beyond the scope of this comment), but for them you’ll have to write the & yourself, as … ah @scottmcm demonstrated that just now :slight_smile:

The values of &str or static-promoted literals are a bit different from true statics in that

  • you cannot name them from other functions
  • there is no guarantee that they always refer to the same static piece of memory as true statics would; on the other hand
  • they may also be deduplicated if the compiler wants to do so. I’m not sure whether that’s allowed for actual statics, too (in cases where you do observe their address; and assuming no interior mutability)

Even though you cannot refer to the same baked-in value from a different function (other than using the same value and hoping for the compiler to de-duplicate them), as noted before, the relevant detail is different function calls are different owners, so as different calls to the same function would share the access to the same values, essentially all the same considerations apply as with true statics. Edit: Funnily enough, thinking about this for this reply made me just re-discover an (apparently known) Rust-issue about one – possibly problematic – aspect of how they aren’t currently treated the same as statics (specifically, w.r.t. Sync).

7 Likes

It doesn't matter if integer literals are borrowed or not, because the primitives types are Copy

To answer your specific question, it's neither in the stack nor the heap. Rather, it's in the data segment of the executable.

Furthermore, on some targets, string literals are stored in a read-only data segment. In this case, the OS will prevent the executable from modifying the underlying data. (So, even from an unsafe block, or if you're trying to be malicious).

1 Like

I am a member of T-opsem, but this is from my own memory and should not be considered indictive of T-opsem opinion or in any way authoritative.

I do not recall if/where immutable static items being disjoint objects (i.e. disjoint address if non-zero size) has been officially guaranteed, but essentially all discussion has taken that fact as a given.

I vaguely remember putting forward a potential model that would allow disjoint objects to observably overlap (in the context of eliding copies, however, not for overlapping between static items), but it was always more of "this is a possibility" than an actual proposal.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.