Unsafe std::env::set_var change

The last three comments are:

  • Mentioning that is safe on Windows
  • Saying that it's not reasonable to wait for the OS to provide a thread safe option given the ecosystem inertia, nor to wait for Rust to add #[deprecated_safe], which would only deprecate using it without unsafe rather than fail the build
  • Describing the state of #[deprecated_safe]

Seems pretty on topic to me?

And I suppose someone could create a crate that simply wraps set_var with unsafe, but it's trivially unsound, and can be implemented yourself with far less effort than even the tiny effort to add a create, so you might be the only user of such a crate.

Nobody is going to stop you from calling set_var, and you are either in a situation where:

  • it won't race in your process and it's correct to just add unsafe
  • it will race, and unsafe is reminding you to check that

Where's the problem?

8 Likes

No, this memory-safety can't be guaranteed. And the official rust docs about the guaranteed safeties doesn't say this.
You can overwrite the whole heap memory in unsafe rust code, or stop the CPU, while your other thread would like to run perfectly safe rust code accessing the same heap addresses via safe Vec or String methods.

Let us stop this missleading, this doesn't answer the intention. The question is not, why calling clibs setenv is unsafe, the question is why Rust doesn't provide a safe std::env::set_var, like it does for hopefully all the other not MT-safe libc calls and what has been the cause to decide this?

I don't understand where the problem is, if there would be already a correct, working spinlock in set_var, like someone have written above, why there should be any possibility that other threads should be able to read from the env in a race. This threads are then blocked before the one with set_var would be finished with this critical part.

They were actually able. That's just one example.

5 Likes

Yes, then there is no working spinlock for the env.

Have you read this comment?

In particular this part:

3 Likes

Of course. There can't be, as long as we allow any crate to link to libc, since calls into libc (not only getenv itself, as the link shows) will bypass any Rust-side locking.

3 Likes

In theory Rust may introduce “big C lock” (like the infamous big kernel lock) and then spend two decades fine-graining it, but it seems as if simply marking set_env as unsafe is much more sane approach.

You have to use it with caution if C and you have to use it with caution in Rust.

Don't like it? Don't change the environment from inside of your Rust program, use bazillion other possible approaches.

3 Likes

If I understand this correctly (take the lock every thing a FFI function is called) it would be a pretty big breaking change since it could easily introduce deadlocks in perfectly correct code.

3 Likes

Yes I have read this.
It is no explanation, because you can't force any library or sys call to anything all the time.
This is not specific to environ, getenv or setenv.
It is generally valid and so it makes no sense in the context of decision.
Libraries are completely free what is called and how they change intern over time. Next there are roughly 30 POSIX clib API functions with documented comparable MT-unsafety situation like getenv/setenv.
The "getter" std::env::var is not becoming uncritical, after "setter" std::env::set_var is marked unsafe.
You can be only sure to retrieve values, when a getenv called by std::env::var doesn't race in a setenv, but up to now there is no control about this.
At least no more control than to collect/set your environ before any code is called which might spawn threads. Which is the usual usecase for the environ and then it is practical no problem.
POSIX getenv is only MT-safe when the environ is not modified in MT context. This is because environ is copied from fix sized process stack to heap and getenv has no chance to return the correct pointers before memory is allocated and set.
But to the situation that anyone might call setenv in threading context out of control of safe rust and a badly timed "safe" std::env::var call doesn't get the environ, you can't do anything. With or without marking the std::env::set_var unsafe. It doesn't matter in this case.
But this doesn't mean that getenv/setenv corrupts data, it may not find the environ or only parts. Search in the Rustonomicon for "resource race". It is not guaranteed.
Btw. getaddrinfo doesn't call getenv, but msg formating is configurable by environ and does a static initialise with getenv. And localtime also doesn't call it directly, but the timezone handling.
And getenv of glibc, freebsd and musl can return null and/or set an error for any key while there is modification to environ.
So the story of the segfaulting because of this is wrong.
std::env::var should result in an error, like getenv does error. And for std::env::var_os this is documented at the end of 1st sentence, for ::var it is missing.
And many of the comments in the corresponding issue(s) have similar discrepancies.

And it is questionable why I've to explain all the time, become criticised in the first comment, get a comment blocked because of alleged offense, although I've simply exemplified my reading of the issue comments without writing about anybody.
And nobody is capable to answer my question, or links to the comment where the decision has been conceived.

getaddrinfo does call getenv, to retrieve the values of the LOCALDOMAIN, RESOLV_HOST_CONF and RES_OPTIONS environment variables, among others. This can be easily verified with gdb:

(gdb) bt
#0  __GI_getenv (name=name@entry=0x7ffff7f7487b "LOCALDOMAIN") at ./stdlib/getenv.c:38
#1  0x00007ffff7f2532c in __nscd_getai (key=key@entry=0x55555555ee70 "www.example.com",
    result=result@entry=0x7fffffffde00, h_errnop=0x7ffff7dd8724) at ./nscd/nscd_getai.c:47
#2  0x00007ffff7ecc412 in get_nscd_addresses (res=0x7fffffffdde0, req=0x7fffffffe4d0,
    name=0x55555555ee70 "www.example.com") at ../sysdeps/posix/getaddrinfo.c:495
#3  gaih_inet (tmpbuf=0x7fffffffdf30, naddrs=<synthetic pointer>, pai=0x7fffffffddb0, req=0x7fffffffe4d0,
    service=<optimized out>, name=0x55555555ee70 "www.example.com") at ../sysdeps/posix/getaddrinfo.c:1173
#4  __GI_getaddrinfo (name=<optimized out>, service=<optimized out>, hints=0x7fffffffe4d0, pai=0x7fffffffe4c8)
    at ../sysdeps/posix/getaddrinfo.c:2398
#5  0x0000555555558aea in ?? ()
#6  0x0000555555557710 in ?? ()
#7  0x00007ffff7e0224a in __libc_start_call_main (main=main@entry=0x555555557630, argc=argc@entry=3,
    argv=argv@entry=0x7fffffffe718) at ../sysdeps/nptl/libc_start_call_main.h:58
#8  0x00007ffff7e02305 in __libc_start_main_impl (main=0x555555557630, argc=3, argv=0x7fffffffe718,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe708)
    at ../csu/libc-start.c:360
#9  0x00005555555577d1 in ?? ()

One of the internals threads linked earlier includes a Rust program that fails with a segmentation violation using just set_var() and to_socket_addrs() in different threads, rather than failing safely with an error: https://gist.github.com/leo60228/d3709b9b3a4b5338ef89c98b28ad078e.

This issue has been extensively discussed in other forums and Github issue threads, and disputing these facts in the Rust Users Forum won’t change anything about the Rust standard library making set_var() unsafe.

9 Likes

If I understand correctly, one of your issues is that, for example, it's just as possible that the getenv() called by env::var() could race with an external setenv()? And therefore, just because Rust code can race with external code, there's no reason to mark Rust code itself as unsafe, if using nothing but safe Rust code is still guaranteed to be sound?

As I understand it, the point here is that a safety contact needs to be on an unsafe API, so without the proposed change lots of FFI calls that result in a getenv() are logically required to bubble up the unsafety to the application so it can ensure that no env::set_/var() calls are happening anywhere concurrently.

With this change, only FFI calls that result in a setvar() are, and those are vanishingly few and have little use for most Rust programs; and in exchange the caller of the dubiously useful set_var() must take on the responsibility of ensuring no external env access is happening concurrently.

Safety in Rust is largely about finding someone to blame, in other words.

5 Likes

No, it wouldn't. Big kernel lock was recursive. You could take it as many times as you want on a thread which already had it.

Kernel did that for more than ten years. Python still does it.

It works. Just very inefficient and probably not a good enough solution for a scope of the problem.

Why do you say so? You can say that set_var is now unsafe and then it's responsibility of someone who calls set_env to unsure that it doesn't break anything.

It does. With unsafe markup it's your reposibility to ensure dafety, not Rust's standard library.

But if function is not marked as unsafe then it should be, well… safe. It should be possible to call it at any point and be sure that program wouldn't corropt it's memory and crash.

That's how Rust work.

That's possible (see above), but highly unlikely to be feasible.

Look, sir Black Knight, no one may answer question if the other side doesn't want to accept any answers. And at some point the only decision would be to ignore your screams about being undefeated and victorious even without any limbs and move on.

And I think we are fast approaching that point.

I think the point that if someone looks on all the references and links and ignores everything and anything that may be found there, unless it supports their POV then s/he may remain in the “unconvinced” state forever, like Black Knight may remain in the “undefeated” state forever even after losing all limbs by simply refusing to admit defeat.

6 Likes

C code can call back into Rust code, and then this Rust code can cause another thread to call back into C and wait for this other thread to finish, which would deadlock if there is a big C lock. With the big kernel lock in the linux kernel this was not an issue as the lock would always be unlocked before returning to userspace. Unlike the Windows NT kernel, Unix doesn't do reentrant userspace entry. Even signals will return all the way back to the syscall entry point (which would unlock the big kernel lock) before calling the signal handler, hence why EINTR exists to exist that a syscall was aborted to run a signal handler.

2 Likes

It's possible to make that illegal and make an attempt to create a thread while holding C lock panic.

Possible, but as I have said, not really feasible.

Note that all that talk about backward compatibility is a red herring: even the very first blog post had this caveat: We reserve the right to fix compiler bugs, patch safety holes, and change type inference in ways that may occasionally require new type annotations.

This being said obviously marking set_var as unsafe and asking people to use it responsible is better than introducing insane complexity to support something that should have been forbidden from day one.

Environment variables are a means to pass information between processes, the fact that set_var even exists is a concession of sorts, making it “safe” is just to hard… and not needed: Rust code shouldn't use environment variables except when it communicates with other processes or with C code and while first part is obviously simple and same the second part is, just as obviously, dangerous and unsafe, the decision to mark set_var as unsafe to make it explicit that it should only be used rarely when one needs to interact with C code should be no-brainder and certainly doesn't deserve all that attention it got.

1 Like

That won't even work. The FFI function can spawn a thread, and return immediately, causing the lock to release. And if Rust was also supposed to track everything the C code does indirectly, including all other threads, then it's much easier to just monkeypatch setenv itself.

1 Like

And even that won't work, because extern char ** environ is a thing that exists in POSIX (as others have mentioned before).

I don't get what the big fyzz about this being made unsafe is. If you know it is safe in your use case, just add the unsafe block. If you have many instances you could even use something like ast-grep to automate adding the unsafe.

2 Likes

why doesn't to_socket_addrs hold the environment lock then?

Basically every C/C++ call can read from the environment. Heck you don't even need to explicitly call into C/C++ for an environment read to happen. You may have a C++ library with a static constructor which spawns a new thread reading from the environment.

4 Likes

Because the use of the environment is a detail of the specific C library implementation that underpins to_socket_addrs. By the POSIX definition of the program environment, any code running in your program can require you to "hold the environment lock", but POSIX provides no such lock. You could have a "not in Rust" lock, but then you hit a lot of pain taking and releasing the lock all the time.

This is what makes the whole problem so gnarly; while Rust could have an "FFI lock" for Rust code, the definition of the environment makes setenv dangerous unless you've audited all of the non-Rust code in your process, since that code may be assuming that you don't call its initialization function (which, thanks to dynamically deferred initialization, can run before main) until after all calls to setenv have been done. No amount of Rust-level locking helps at this point, since you need non-Rust code to take and release the lock around getenv calls, but that non-Rust code doesn't even know that a lock exists.

1 Like