Is rust built-in string null terminated?

In C++, all literal strings are modified by automatically adding the terminated null character. I wonder whether rust has a similar process.

For observing that, I did an experimental, the code is

fn main(){
   let s = "abcd";
   let p = s.as_ptr();
   for i in 0..5{
      unsafe{
         let current = p.add(i);
         let c = *current;
         println!("{c}");
      }
   }
}

The output is

97
98
99
100
0

We can observe the terminated null character 0. Is it true that Rust's String literals are also terminated with a null character?

No. What you've seen is an example of UB - namely, accessing memory beyond the available allocation.

18 Likes

No. Rust uses a pointer+length encoding, and there is no guarantee that there will be a null byte after the end of the buffer.

11 Likes

Where can I find the relevant proof that says Rust's string is not terminated with a null byte?

Where is the part in Rust's reference that says this point?

You can take a reference to a substring that will point to a part of the original string.

let s = "abcdef";
let s2 = &s[1..3];

I assume mt example is also unsound, but it does contrast your original example

2 Likes

If you're asking "why it's UB to access this?", then it's here:

Behavior considered undefined
...

...
A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same live allocation (so in particular they all have to be part of some allocation).

If you're about the string representations, then it's here:

A value of type str is represented the same way as [u8] , it is a slice of 8-bit unsigned bytes.

7 Likes

Here's one such place, and another, but you don't need documentation to tell it's true.

3 Likes

For an additional/alternative source, running the original post’s code through MIRI (in the playground under TOOLS→Miri) gives this feedback directly without any need to read through lots of documentation.

error: Undefined Behavior: dereferencing pointer failed: alloc1653 has size 4, so pointer to 1 byte starting at offset 4 is out-of-bounds
 --> src/main.rs:7:18
  |
7 |          let c = *current;
  |                  ^^^^^^^^ dereferencing pointer failed: alloc1653 has size 4, so pointer to 1 byte starting at offset 4 is out-of-bounds
  |
  = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
  = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
  = note: BACKTRACE:
  = note: inside `main` at src/main.rs:7:18: 7:26

The output up to the point of reaching UB is only

97
98
99
100

so clearly, since it did not appear here, the 0 we saw was just an arbitrary result of undefined behavior and doesn’t mean anything.

8 Likes

Maybe, my asking was not clear. I ask, where is the part in Rust's reference that says the string literal is not terminated with null character?

I want to find the relevant articles or rules that says Rust's string literals are not terminated with null characters

A section on str layout:

String slices are a UTF-8 representation of characters that have the same layout as slices of type [u8] .

No mention of trailing zero, as you can see. Explicit statement of "not being terminated" is probably not in the reference, since that's implied by the lack of requirement for them to be terminated.

12 Likes

There is no place that says there is a null character (and I’ll discuss below why there cannot be such a place, without anyone needing to actually read the whole reference), so there needs be no place explicitly calling out that there isn’t.

Null-terminated strings are considered bad language design by some people anyways, for reasons like: determining their length is an O(n) operation[1], and you cannot create a sub-string without copying all the data. Rust strings can be sliced into sub-strings efficiently because str doesn’t need to end with a null, as demonstrated by multiple people in previous responses in this thread, which should be sufficient proof that there cannot be any place in the reference that says that strings are null-terminated.


If you mean your question more narrowly for only string literals (i.e. those things of type &'static str that are created from the "foo" syntax and are compiled into static memory), then, still – logically – nothing behind/terminating a string literal, since (as people have also demonstrated above) you – or any part of your code – cannot actually access the byte behind the last character without causing immediate undefined behavior. This doesn’t yet rule out that – perhaps – at runtime, every string literal is – in practice – followed by at least one 0… but such considerations would have nothing to do with the language Rust anymore. It would be a valid implementation strategy for the compiler[2], but it doesn’t matter, since any memory outside of the string literal cannot be accessed without causing undefined behavior.


  1. assuming the C-style approach where you don’t pass some length information alongside it; but if you did that the null-termination would become unnecessary, leaving only disadvantages (besides possibly better C-interop) ↩︎

  2. though I would be surprised if you couldn’t, with the correct length literal, come up with example code demonstrating that this implementation strategy is not the one taken by rustc+LLVM ↩︎

9 Likes

CString is nul-terminated. Regular strings don't provide such a guarantee.

4 Likes

I’m wondering why nobody pointed that out earlier… or perhaps that, too, might’ve been subject to change for some reason, but the rust playground does AFAICT very consistently not print a 0 even for the original code in this thread; in all configurations: for and combination of stable/beta/nightly and debug/release. (At the moment, I’m getting either 10 or 100 printed, depending on debug vs. release mode.)[1]


  1. of course, since the code is already proven to produce UB anyways, this observation is technically entirely irrelevant, but at least it does – at least in my interpretation – even more clearly refute the original premise ↩︎

5 Likes

Yeah, I realized later the citations weren't about literals specifically, but I'd argue it's implicit given the runtime erasure of lifetimes, ability to subslice &str, type equality of leaked (&'static str) strings, no defined (non-UB) way to read past the end, etc. That said, I don't think a clarification in documentation somewhere for people coming from other languages would be a bad thing.

(Incidentally Rust has no spec or standard, and as such normative citations are scarce.)

You can also easily see that rustc adds no trailing NUL in the assembly (.ascii vs .asciz).

1 Like

The existence of https://doc.rust-lang.org/std/primitive.str.html#method.split_at.

It can't be null-terminated if you can split "text" into ("te", "xt") without copying.

4 Likes

That seems a rather odd request to me. Why should a specification describe all the possible ways it does not do things? That would be a rather large document!

4 Likes

"are not terminated with null characters" is what Rust does, according to previous answers. I just want to read the normative part regarding this.

What Rust does is "stores the strings as UTF-8 representation of characters". Every other storage format, including null-termination, is what Rust doesn't do.

11 Likes