Rust strings are not so friendly with C/C++

Why don't strings in Rust end in '\0'? It would be much easier to use strings to communicate with C if that were the case.
To pass a string to C I need to create a CString and get a pointer to it and ensure that the CString is not dropped when using the pointer. And if I want to use this object for longer I have to save it, along with the pointer in the same struct.
Why are strings like this?

Here are some advantages:

  1. The len() function runs in constant time.
  2. A string slice can end earlier than where the zero terminator would be.
  3. A Rust string is able to contain null bytes.

It's worth mentioning that if you have a Rust String object, you can also just push a zero byte to the end of the String and use a pointer into the String. Similarly, if you make a Box<[u8]> that is zero terminated, you can use Box::into_raw to get a raw pointer. Once you are done, you can use Box::from_raw to drop it.

13 Likes

When I use the strings I do it using the prefix 'b' and adding the '\0' at the end of the string. But the problem is forget to use '\0' in the end, it is a silent bug that is very difficult to notice. I don't see a good reason not to add the '\0' to the end of the string, even though it looks redundant.

The cstr crate can add the trailing \0 automatically for you.

9 Likes

Another point is that if I have a C struct I cannot pass a CString or other object to it, the object that contains the string would have to be kept apart.
Again, if the string ended in '\ 0' I would just need to use a pointer to a static string and everything will be resolved. What I mean is, what could be something simple was too complicated.
I wanted to understand the reason for this decision and not a solution to the problem, I know them and others have already given it right here.

If you want the reason for this decision here it is in long and understandable form :

11 Likes

The primary design goal of str is to represent Unicode text in a conformant and performant manner, not to interoperate seamlessly with whatever C does. Everything you've said argues just as well the other way: C strings should actually work like Rust strings, since that makes interoperability between the languages simpler.

20 Likes

Technically, (1) and (3) don’t exclude a trailing null—C++11’s std::string stores internal null bytes and an explicit length while still being null-terminated for convenience with C and older C++ (and in practice so did earlier versions of std::string due to .c_str()’s easiest implementation). That said, C++17’s std::string_view makes no such guarantee, which often catches people out. I prefer Rust’s consistency and explicit use of string type to denote intention though.

EDIT: Rust (and C++ string_view) allow convenient non-modifying sub-stringing by not being null-terminated, which can be a significant gain in some cases.

1 Like

Actually, I would argue that C's null-terminated strings were a mistake, precisely because people and libraries forget to 0-terminate their strings all the time. Basically the only remotely safe way to handle strings in C is to pass around a pointer and an explicit length at all times, which is just an elaborate and explicit way of writing the fat pointer that &str is.

If you allow me a shameless self-plug: I recently released zstr, because I got annoyed with cstr making an unstable assumption about the layout of CStr and using transmute directly. zstr instead choses to behave like a good citizen and expands to a call to CStr::from_bytes_with_nul_unchecked().

24 Likes

Hmm... off the top of my head, as still a relative Rust neophyte, I can think of some good reasons:

  1. C does not have a string type.

That might sound like an odd thing to say but check the C standard and try to find one. Except in as much as literal strings but that does not help much. What C has is a bunch of standard library functions that work to a convention of what a string is. A bunch of bytes with a zero on the end. These library functions are notorious for being easy to use incorrectly and cause bugs and security issues. Likewise the programmer has to maintain this convention manually all the time, again prone to bugs and security issues.

  1. The C string handling functions don't support Unicode.

  2. C++ inherits all of those issues. But introduces a String class. That is great but again hopeless for Unicode. Converting from C++ String to C string is about as difficult as doing similar in Rust.

In short I don't see any way that strings are more friendly in C or even C++.

6 Likes

It's worth noting that C++ is moving in the direction of Rust. Modern C++ guidelines like Bjarne Stroustrup and Herb Sutter’s CppCoreGuidelines recommend using std::string_view (which typically has the same representation as Rust &str) for passing around immutable string references. Since string_view is not guaranteed NUL-terminated, other types (like zstring) must be used when C interop is necessary.

3 Likes

I am thankful every day that Rust does not use null terminated strings. I am working on some code that has to deal with null terminated strings. Ugh. I can't say anything good, so I will say nothing at all.

For clarification: The 'nothing good' is about null terminated strings, not Rust's handling of them.

5 Likes

This is probably worth reading: The Most Expensive One-byte Mistake

8 Likes

Yes, I have read several texts referring to NUL as the error of 1 billion dollars

That article linked by parasyte is about zero terminated strings, not NULL values for reference types.

6 Likes

Does that make NUL the 1 BILION mistake?

7 Likes

Isn't it the same object treated in the article, the NUL?

https://hinchman-amanda.medium.com/null-pointer-references-the-billion-dollar-mistake-1e616534d485

The article that @parasyte linked to is about null-terminated strings, but this article is about allowing null as a valid value for variables. These are not the same mistakes.

11 Likes

Perhaps the real billion dollar mistake is using the same word for '\0' and (void*)0? :grinning_face_with_smiling_eyes:

5 Likes