Which side is handling lifetime specifier in rust compiler?

Ooh. Sorry about that. I was using examples from C on the assumption that “C is what everyone knows”.

The story with strstr is the following: The strstr () function shall locate the first occurrence in the string pointed to by s1 of the sequence of bytes (excluding the terminating NUL character) in the string pointed to by s2.

When we swap arguments it doesn't start to suddenly return NULL, no. It just returns pointer to a different argument! If it returns pointer to modifyable string then program works, if it returns pointers to unmodifyable strings then it crashes.

And literals, on modern OSes, are kept in read-only segments (shenanigans of early FORTRAN compilers where you can change literal 2 to become 3 and then 2 * 2 would return 9 are no longer possible).

My point was: this function takes references to two strings and returns pointer that is pointing into s1 (or NULL) and not into s2… and you may only use that pointer for as long as the original string, that you passed into strstr, is valid… but compiler have no idea about that, that's something only programmer know!

In Rust the signature for such function would become, approximately:

fn strstr<'a, 'b>(haystack: &'a str, needle: &'b str) -> &'a str;

Now compiler knows that result of that function is tied to existence of haystack and can detect attempts to use in places where haystack is deallocated but result of that function is still used (but needle can be deallocated, dropped, no problems there).

You have to follow the same rules in C, too, but now, because all that information only exists in the documentation, developer have to check all the4se rules!

And the logic is extension of what C does. In C you can add pointer to integer and generated code would be identical to what attempt to add two integers produce, but attempts to add two pointers together are rejected. Rust have much richer typesystem and may detect many more errors, but Rust uses, ultimately, the same core of the complier than C and C++ are using.

1 Like

Thanks for the info. I modified your example a little bit: Compiler Explorer

Now I'm telling what I understood from this example:

const char* a = strdup("fox"); // stores data in heap
const char* b = "fox"; // stores data in stack
char* c = strstr(a, b); // returns correct pointer
...

const char* a = strdup("fox"); // points to heap
const char* b = "fox"; // points to stack
char* c = strstr(b, a); // returns null, bcs b is stack, a is heap
...

const char* a = strdup("fox"); // heap
const char* b = strdup("fox"); // heap
char* c = strstr(b, a); // wrong pointer but at least it doesn't return null

For strstr() function works correctly it needs the first parameter must be point data in heap, second parameter must points data in stack. Right? Please correct me if I'm wrong.

Edit: I loved rust again after seeing strstr's strange behaviours :smiley:

Stack is never used in that example at all, that's probably where confusion comes from.

const char* a = strdup("fox"); // stores data in heap
const char* b = "fox"; // stores data in READ-ONLY DATA SEGMENT
char* c = strstr(a, b); // returns a which can be modified
...

const char* a = strdup("fox"); // points to heap
const char* b = "fox"; // points to data in READ-ONLY DATA SEGMENT
char* c = strstr(b, a); // returns b, which can not be modified = program crash
...

const char* a = strdup("fox"); // heap
const char* b = strdup("fox"); // heap
char* c = strstr(b, a); // returns pointer to b, can be used

No, no, no. strstr is not that strange. It always returns reference to first element and never returns NULL (in the examples involved).

But it has an identity crisis: it accepts two read-only strings, yet, suddenly, returns pointer to modifyable substring. How can that ever work?

Well, in first example a lives on heap and thus is actually modifiable. It's passed as read-only into strstr then comes back as modifiable substring and everyone is happy.

And then in the second example b lives in a read-only data segment and thus couldn't be modified. But compiler doesn't know that! And, worse, strstr doesn't know it either!

That's why program is compiled successfully, but crashes in runtime.

In Rust function like strstr would have to decide what it works with: it may accept read-only string (and return read-only substring) or it may accept modifyable string (and return modifyable substring). Or, to sidestep the issue entirely it may do what Rust actually does: return index of substring and not an actual substring (and then the caller may decide what to do with that index: get read-only or modifiable segment depending on what was available in the beginning).

But C was made by people who wrote whole operation system in assembler, remember?

They have never even asked themselves about that question: would that string be modifiable or read-only? Heck, as I already hinted, back then even literals weren't readonly, world was much simpler back then!

And when C standartization committee needed to define how strstr works… it was much to late to split strstr into two functions. And, because some users passes literals into it, it had to accept const char*, but because some other users were using it with modifiable strings… it had to return char*.

That's an obvious soundness hole, but what can they do, at this point? It's too late to ask users to throw all their code away and rewrite it!

Sorry about the whole story: I kind of assumed that you know a bit of C and that story with strstr would just highlight only lifetime issues (arguments have the exact some types, how would we know which is used for the result?), but instead we went deep into The Soundness Pledge.

Well… as they say: good APIs are easy to use, but great APIs are also hard to abuse. And Rust usually strives for the latter… but sometimes the end result is not great and not even good because on an attempt to create API hard to abuse it becomes hard to use instead.

Designing great APIs is very hard work and as you can see to design API even very simple, almost trivial task of checking whether needle exist in a haystack… you need to think about many things.

1 Like

Probably this is best description of the trillion dollar mistake (null pointer).

Fine, I understand lots of things about C and Rust. Thanks for this. May be strstr's behaviour isn't about directly of this topic but at least I better understand that where rules of Rust come from.

Thanks for everything bro.

Edit: Sometimes I'm thinking that I must spend a couple of years with C before Rust. What do you say about that? @khimru

I have no idea whether it's good idea or not. C, because of it's history, includes a lot of “rough” edges, but all these edges can be shown in the unsafe Rust, too.

On the other hand the ability to read (but not write!) C code is something that would be needed for many years yet, because it plays the role of a A lingua franca.

I think maybe reading some book which describes C would be nice, but writing actual programs in C… that's not so nice and I'm not even sure how long that skill would be needed.

C is quickly becoming what Latin was in the medieval times: language that nobody actually speaks, but everybody still uses to drescribe things!

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.