Preventing rust from calling setenv/getenv, from a library crate

We maintain hexchat-unsafe-plugin. We'd like libstd to bind to our own environ instead of using libc's. This is for soundness purposes. That's because global state is inherently unsound, so much so that there's work towards deprecating static mut, and yet libstd doesn't seem to care for that when it comes to environ.

How do we do this?

I don't believe that there's any way to do so.

The lib team is aware of the set_env soundness issue and actively figuring out how to deprecate it, FYI.

1 Like

var is also unsound. we just want to isolate the libstd concept of "vars" from the libc concept of "environ", so they don't interfere with eachother. (tho ideally we'd still like to preload it from libc's environ, and Command should follow the rust/libstd env.)

Ultimately, the unsoundness is the fault of the unsafe code that calls into the libc code. It is usually not possible to prevent incorrect unsafe code from doing bad things.

1 Like

var is not unsound. The underlying libc call is explicitly thread-safe.

1 Like

environ is not. how can getenv be thread-safe if environ is not? (also doesn't setenv invalidate getenv? how are we defining "thread-safe" here?)

Reading operations are considered MT-safe but writing operations are not (by POSIX), in brief.

There's a lot more conversation in #27970 and this IRLO thread.

This is not exactly accurate; POSIX explicitly allows getenv() not to be thread-safe:

The getenv() function need not be thread-safe.

The only guarantees we get are from glibc and other implementations on our supported targets.

POSIX gives you direct access to environ. How can anything be thread-safe in such configuration?

obviously none of this would be a problem if setenv leaked the environ, but sadly that's not what existing implementations do.

(we guess we should just carry on with making hexchat-plugin wasm-based, built upon hexchat-unsafe-plugin...)

That is answered on the same page:

Conforming multi-threaded applications shall not use the environ variable to access or modify any environment variable while any other thread is concurrently modifying any environment variable. A call to any function dependent on any environment variable shall be considered a use of the environ variable to access that environment variable.

In other words, an implementation is required to support multiple threads concurrently reading from environ, but it is not required to support multiple threads concurrently calling getenv(), nor reading from environ or calling getenv() while environ is being written to.

The problem is that when we have gotten ahold of environ we have entered the twisted maze of defined and undefined behaviours. Some things are permitted, some things are not permitted but there are absolutely no way to make the whole thing sound.

The nature of simple pointer variable which is very much part of the API makes it impossible.

(honestly tho so uh why can't we just do linker overrides on it? .-.)

If you mean we should use linker magic to replace the environ static variable with our own, then we still haven't solved the problem that there's an unsynchronized global variable anyone can use.

If you overrode setenv() and getenv() to use a function that accesses environ via a lock, then what's stopping code that's already been compiled from using environdirectly?

If your overridden setenv() and getenv() used a synchronised copy of the original environ variable, you now have a situation where one piece of code could be writing to the synchronised copy and a different piece of code (maybe an installed C library) could be reading from the original environ variable. If your program relies on these two seeing a consistent set of environment variables, you're going to run into weird bugs.

Either way, there's no simple trick we can use to fix things - otherwise it would have been fixed years ago.

I don't see there being much point in worrying, though. The lack of thread safety as guaranteed by POSIX seems to be inconsequential in practice, and mostly a concern for people working in adversarial environments or academics.

what prevents a crate like hexchat-unsafe-plugin from preventing a crate like std from linking to the libc environ/setenv/etc and instead providing our own, with rust-level linker magic? (link_name attribute maybe?)

There is nothing preventing you from doing it.

My comment was pointing out that providing your own versions wouldn't solve your underlying problem. environ is still going to be an unsynchronized global variable that can be mutated directly by anyone wanting to do so.

at least it wouldn't interact with the hexchat environ, or that of the other plugins.

I think C stdlib symbols are mostly weak, so your program’s getenv calls glibc getenv on gnu systems if it didn’t find a redefined symbol. You could define your own getenv and it will override the libc symbol (similar to how you can redefine malloc):

use std::os::raw::c_char;

#[no_mangle]
extern "C" fn getenv(_: *const c_char) -> *mut c_char {
    // or override it however you wish
    std::ptr::null_mut()
}

fn main() {
    // this will fail even though we didn't call getenv directly
    println!("{}" ,std::env::var("HOME").unwrap()); 
}

Gives:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: NotPresent', main.rs:9:42
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Even though HOME is set on my system.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.