Unsafe std::env::set_var change

arlecchino · June 9, 2024, 3:41pm

What do you think about the coming change of std::env::set_var() becoming unsafe?

I'm using this in tests to set the log levels to a predefined value.
Not sure what has been the thoughts to do this change. Then you have also to mark any File read unsafe, because some other thread or process could change the file content while reading/writing the file.

paramagnetic · June 9, 2024, 3:49pm

No, files are synchronized at the OS level.

arlecchino · June 9, 2024, 3:52pm

Only after the sync-call.

Cerber-Ursi · June 9, 2024, 4:38pm

They can change it concurrently, but not racily. In other words, each read you do will be from either the old or the new version - they won't be interleaved and won't let you touch the non-existing values. That's the crucial difference.

paramagnetic · June 9, 2024, 5:04pm

No. That's not what "synchronized" means.

arlecchino · June 9, 2024, 7:58pm

Oh, come on, there are network file systems, try it out with multi mounting and appending to same files in multiple threads without file locking.
Because of this file locking exists.

khimru · June 9, 2024, 8:19pm

Which networking filesystem would randomly crash your program if you access files “incorrectly”? I guess in an era from before Netware where “files sharing” was, assentially, raw access to the remote volume (akin to todays NBD with sharing between different devices) that was possible, but I don't think any of these would ever be used in today's world. Certainly Rust wouldn't be able to provide any safety guarantees in such a user-hostile invironment.

File locking exists to provice correct results, not to prevent malfunction of programs. In today's world filesystem would give you something to each call to read without crashing.

That may not be the infomation you seek, but it would be there.

Compare to set_var situation where each and every access is sure-fire way to crash your program except if you invent some external protocol to make it safe and crash-proff.

That's textbook definition of what unsafe means in Rust, isn't it?

quinedot · June 9, 2024, 8:36pm

It is in POSIX terms MT-unsafe, so you can cause a segfault or other undefined behavior calling it in a multi-threaded context.^[1] You're not supposed to be able to cause UB in Rust without using unsafe, so this is a soundness fix.

The core idea^[2] is that the environment should be basically read-only; there may be a period where you know you're the only thread and set up the environment for everything else. During such a period it's up to you to ensure there's nothing else around that might call getenv or similar (which is effectively allowed to happen at any time, and does, because e.g. random libraries read the environment unannounced to look for FROBLIB_BEHAVIOR_HACK=17).^[3]

One may ask, are their other possible fixes besides making the function unsafe? And indeed, that question was asked and various alternatives explored.

Add environment locking in std?
- Already present, actually, but there's no way to force others to use it; particularly C libraries and even other system calls (localtime, getaddrinfo, ...)
Clone the environment in every program as part of the Rust runtime (Rust shadow environment)?
- Breaks backwards compatibility as now there are two environments; similar problems in that you can't force everyone to use the Rust environment; opposition to the existence of a shadow environment at all; adds to startup cost even if you don't need it
Detect threads and panic or something?
- No guaranteed way to do this, false-positive panics are undesirable and may make set_env useless for some use cases, also makes set_env useless if you know you have an inert thread, arguments that you need "was never another thread", ...
Change POSIX (and wait for "everything" to catch up)
- Good luck and we're talking decades even if it would fly

Here's a good starting point if you want to read more.

TL;DR it's not a frivolous or cavalier change, and has been hashed out over the course of years.

File ops in contrast can't cause this. ↩︎
outside of Rust's control ↩︎
Or... RUST_BACKTRACE ↩︎

arlecchino · June 9, 2024, 9:11pm

The environment is part of the process and inherited by childs, every child has its own copied environment. A child can not change the environment of parent. So where is now exactly the problem that Rust takes care of its process environment?

quinedot · June 9, 2024, 9:50pm

Threads share the environment.

khimru · June 9, 2024, 9:53pm

You lost one word which have turned everything on it's ear. This one small, simple, innocpus line in the setenv description:

The setenv() function need not be thread-safe.

The problem is that Rust doesn't take care of something that's out of Rust control.

Semantic of setenv as defined by POSIX standard is really nasty: if you call it in multi-threaded environment then you are, essentially, playing with fire: many implementations of libc don't bother to do any locking when they modify the environment.

Worse, they couldn't do any locking, because raw access to internal data structures (not protected by any locks) is also part of POSIX standard.

It's worse than that: not only they share the environment, but POSIX standard, essentially, mandates extra-unsafe implementation! Which is wide open to any abuse as long as environ variable is part of the API (and yes, there are lots of programs that access it directly).

And even that is not the actual disaster, the actual disaster, as that deiscussion very clearly shows is that application developers have no idea that environment-accessing API is this unsafe and unstable!

Why should they? Who reads the documentation, anyway?

CAD97 · June 9, 2024, 10:29pm

To make this explicit: even the people who pushed for set_env to be unsafe wish that it were safe. But since it fundamentally isn't, even when used entirely correctly and defensively, it being marked unsafe is to reflect the reality of the situation that modifying the process environment is not threadsafe.

arlecchino · June 9, 2024, 10:35pm

Yes, but only threads spawned by this process and this could be handled by Rust std.
You also provide stdout-mutex-lock, although you provide the panicing e/print(ln)! when the process changes the fd of stdout/err e.g. during piping and the macro uses the stored in the meantime closed.

quinedot · June 9, 2024, 10:38pm

Sorry, I'm not interested in some Socratic rebuttal thread. The alternatives were explored in the links I provided; you can find discussion of the shortfalls of any such approach there.^[1]

And any reply of mine that wasn't off the top of my head would be sourced from those discussions anyway. ↩︎

arlecchino · June 9, 2024, 10:46pm

Who the hell should be able to modify the process environment of the Rust program next to the Rust program itself?

CAD97 · June 9, 2024, 11:45pm

The code which the Rust program calls. The OS functionality which you call could look at environment variables, and in fact there are notable cases where it does already mentioned in this thread, e.g. localtime, getaddrinfo.

It's not just about writes racing with writes, it's writes racing with reads. If you call setenv in one thread and any OS call (thus may potentially call getenv transitively), that's a potential data race and UB.

Not "unpredictable behavior" like with concurrent modification of files. UB as in use-after-free or other arbitrarily bad time traveling nonsensical executions of the code.

It's unsafe because the OS says it is. That's not a particularly satisfying answer, but it's the real one.

Rust std does mitigate this with a lock in the Rust library. But Rust isn't the only actor in a program, and it doesn't pretend it is either.

kornel · June 10, 2024, 5:16am

Please check the previous discussions. These functions and possible solutions have been thoroughly debated.

github.com/rust-lang/rust

Consider deprecating and/or modifying behavior of std::env::set_var

opened 03:55PM - 26 Oct 21 UTC

closed 02:26PM - 30 May 24 UTC

leo60228

T-libs-api E-help-wanted C-bug A-process

Observing an environment variable concurrently with a call to `setenv` constitut…es a data race. Rust currently handles this by using a mutex, but that only protects against functions in `std::env`. My interpretation of POSIX (which appears to be the same as the glibc developers') is that any libc function is allowed to call `getenv`. This has already caused problems with `getaddrinfo` in #27970. Additionally, a large amount of C code calls `getenv` but is otherwise thread-safe. While this isn't necessarily the standard library's issue, it's impossible for third-party libraries to soundly use this code without requiring the user to certify that this isn't an issue via an `unsafe` block. Some examples of this happening in practice are in https://github.com/time-rs/time/issues/293 and https://github.com/rust-lang/flate2-rs/issues/272. https://github.com/rustsec/advisory-db/issues/926 had several proposals on how this could be handled brought up. ## Make `std::env::set_var` unsafe This is arguably the cleanest solution to the issue. This could be considered a soundness fix, which would make it an option, but the ecosystem impact feels like it'd be too big. ## Don't actually call `setenv` `std` could keep track of the environment itself, and make `set_var` changes only visible to `var` and subprocesses. This is probably the way to solve the issue with the least impact on existing code, but the behavior is somewhat unexpected and not zero-cost. ## Only call `setenv` in single-threaded code This would reduce the impact further, but seems even less expected for negligible benefit. ## Deprecate `std::env::set_var` This would make it clear that setting an environment variable in the current process is discouraged. It could also be combined with not actually calling `setenv`, which would be my preferred solution to this issue.

arlecchino · June 11, 2024, 7:18am

Then the other libs are unsafe, but not the secured stdlib.
There is everywhere mixed glibc setenv, libc::setenv and std::env::set_var as beeing all the same.
The OS doesn't change the environment of your program.

paramagnetic · June 11, 2024, 7:36am

You are missing the point.

Soundness doesn't mean "I don't do bad things myself". It means "even if everyone else does bad things, I remain memory-safe".

Thus, when there is an external shared mutable resource that Rust can't protect, access to that resource must be marked as unsafe, because otherwise Rust code could exhibit UB within "safe" code, which would be unsound.

SkiFire13 · June 11, 2024, 7:48am

Sure, that was another option, but people still want to use those libraries safely, how would that possibly work?

And neither those libraries do, but they do read it (and that's very reasonable!). But if bad Rust code changes it under their feet then that's a problem.

Topic		Replies	Views
Why env::set_var unsafe now? community	37	1271	October 6, 2025
Preventing rust from calling setenv/getenv, from a library crate help	19	1084	January 31, 2023
Using unstable APIs? Tell us about it!	169	28472	August 14, 2015
Global state and env::set_current_dir	9	1848	January 12, 2023
Impossible to safely wrap thread-unsafe FFI?	17	2294	January 12, 2023

Unsafe std::env::set_var change

Related topics