Unsafe std::env::set_var change

What do you think about the coming change of std::env::set_var() becoming unsafe?

I'm using this in tests to set the log levels to a predefined value.
Not sure what has been the thoughts to do this change. Then you have also to mark any File read unsafe, because some other thread or process could change the file content while reading/writing the file.

1 Like

No, files are synchronized at the OS level.

2 Likes

Only after the sync-call.

They can change it concurrently, but not racily. In other words, each read you do will be from either the old or the new version - they won't be interleaved and won't let you touch the non-existing values. That's the crucial difference.

6 Likes

No. That's not what "synchronized" means.

Oh, come on, there are network file systems, try it out with multi mounting and appending to same files in multiple threads without file locking.
Because of this file locking exists.

Which networking filesystem would randomly crash your program if you access files “incorrectly”? I guess in an era from before Netware where “files sharing” was, assentially, raw access to the remote volume (akin to todays NBD with sharing between different devices) that was possible, but I don't think any of these would ever be used in today's world. Certainly Rust wouldn't be able to provide any safety guarantees in such a user-hostile invironment.

File locking exists to provice correct results, not to prevent malfunction of programs. In today's world filesystem would give you something to each call to read without crashing.

That may not be the infomation you seek, but it would be there.

Compare to set_var situation where each and every access is sure-fire way to crash your program except if you invent some external protocol to make it safe and crash-proff.

That's textbook definition of what unsafe means in Rust, isn't it?

It is in POSIX terms MT-unsafe, so you can cause a segfault or other undefined behavior calling it in a multi-threaded context.[1] You're not supposed to be able to cause UB in Rust without using unsafe, so this is a soundness fix.

The core idea[2] is that the environment should be basically read-only; there may be a period where you know you're the only thread and set up the environment for everything else. During such a period it's up to you to ensure there's nothing else around that might call getenv or similar (which is effectively allowed to happen at any time, and does, because e.g. random libraries read the environment unannounced to look for FROBLIB_BEHAVIOR_HACK=17).[3]

One may ask, are their other possible fixes besides making the function unsafe? And indeed, that question was asked and various alternatives explored.

  • Add environment locking in std?
    • Already present, actually, but there's no way to force others to use it; particularly C libraries and even other system calls (localtime, getaddrinfo, ...)
  • Clone the environment in every program as part of the Rust runtime (Rust shadow environment)?
    • Breaks backwards compatibility as now there are two environments; similar problems in that you can't force everyone to use the Rust environment; opposition to the existence of a shadow environment at all; adds to startup cost even if you don't need it
  • Detect threads and panic or something?
    • No guaranteed way to do this, false-positive panics are undesirable and may make set_env useless for some use cases, also makes set_env useless if you know you have an inert thread, arguments that you need "was never another thread", ...
  • Change POSIX (and wait for "everything" to catch up)
    • Good luck and we're talking decades even if it would fly

Here's a good starting point if you want to read more.

TL;DR it's not a frivolous or cavalier change, and has been hashed out over the course of years.


  1. File ops in contrast can't cause this. ↩︎

  2. outside of Rust's control ↩︎

  3. Or... RUST_BACKTRACE ↩︎

17 Likes

The environment is part of the process and inherited by childs, every child has its own copied environment. A child can not change the environment of parent. So where is now exactly the problem that Rust takes care of its process environment?

Threads share the environment.

3 Likes

You lost one word which have turned everything on it's ear. This one small, simple, innocpus line in the setenv description:

The setenv() function need not be thread-safe.

The problem is that Rust doesn't take care of something that's out of Rust control.

Semantic of setenv as defined by POSIX standard is really nasty: if you call it in multi-threaded environment then you are, essentially, playing with fire: many implementations of libc don't bother to do any locking when they modify the environment.

Worse, they couldn't do any locking, because raw access to internal data structures (not protected by any locks) is also part of POSIX standard.

It's worse than that: not only they share the environment, but POSIX standard, essentially, mandates extra-unsafe implementation! Which is wide open to any abuse as long as environ variable is part of the API (and yes, there are lots of programs that access it directly).

And even that is not the actual disaster, the actual disaster, as that deiscussion very clearly shows is that application developers have no idea that environment-accessing API is this unsafe and unstable!

Why should they? Who reads the documentation, anyway?

To make this explicit: even the people who pushed for set_env to be unsafe wish that it were safe. But since it fundamentally isn't, even when used entirely correctly and defensively, it being marked unsafe is to reflect the reality of the situation that modifying the process environment is not threadsafe.

10 Likes

Yes, but only threads spawned by this process and this could be handled by Rust std.
You also provide stdout-mutex-lock, although you provide the panicing e/print(ln)! when the process changes the fd of stdout/err e.g. during piping and the macro uses the stored in the meantime closed.

Sorry, I'm not interested in some Socratic rebuttal thread. The alternatives were explored in the links I provided; you can find discussion of the shortfalls of any such approach there.[1]


  1. And any reply of mine that wasn't off the top of my head would be sourced from those discussions anyway. ↩︎

5 Likes

Who the hell should be able to modify the process environment of the Rust program next to the Rust program itself?

The code which the Rust program calls. The OS functionality which you call could look at environment variables, and in fact there are notable cases where it does already mentioned in this thread, e.g. localtime, getaddrinfo.

It's not just about writes racing with writes, it's writes racing with reads. If you call setenv in one thread and any OS call (thus may potentially call getenv transitively), that's a potential data race and UB.

Not "unpredictable behavior" like with concurrent modification of files. UB as in use-after-free or other arbitrarily bad time traveling nonsensical executions of the code.

It's unsafe because the OS says it is. That's not a particularly satisfying answer, but it's the real one.

Rust std does mitigate this with a lock in the Rust library. But Rust isn't the only actor in a program, and it doesn't pretend it is either.

15 Likes

Please check the previous discussions. These functions and possible solutions have been thoroughly debated.

10 Likes

Then the other libs are unsafe, but not the secured stdlib.
There is everywhere mixed glibc setenv, libc::setenv and std::env::set_var as beeing all the same.
The OS doesn't change the environment of your program.

You are missing the point.

Soundness doesn't mean "I don't do bad things myself". It means "even if everyone else does bad things, I remain memory-safe".

Thus, when there is an external shared mutable resource that Rust can't protect, access to that resource must be marked as unsafe, because otherwise Rust code could exhibit UB within "safe" code, which would be unsound.

3 Likes

Sure, that was another option, but people still want to use those libraries safely, how would that possibly work?

And neither those libraries do, but they do read it (and that's very reasonable!). But if bad Rust code changes it under their feet then that's a problem.