Feasibility of single-threaded Unix shell in Rust with signal-hook or similar library?

After abandoning my first attempt at learning Rust I'm now ready to give Rust 2018 a new try. With Rust being advertised as a system-level language I would like to investigate whether it's possible to implement a traditional, single-threaded Unix/POSIX shell in Rust at this point.

In C, such shells are implemented (among others) using a SIGCHLD handler that reaps exiting background processes when the shell sits at the prompt and waits for user input.

Is my understanding that it's generally impossible to write general signal handlers in Rust correct? I took a look at the signal-hook crate which appears to be painstakingly coded to avoid any interaction with the Rust runtime (such as alloc) in its signal handlers. Instead, these signal handlers either set atomics, or perhaps do a single call to an async-signal-safe function such as send(2) that communicates via an IPC mechanism with some receiving threads in the same process.

Unless either multiple threads or a multiplexing facility such as select(2) or equivalent are used, there appears to be no way to react to signals while the main thread is blocked waiting for user input. Is this assessment correct?

If so, what is the idiomatic way in Rust to perform necessary signal handling, ideally in the context of a single-threaded application?

References:

It's definitely not impossible to write general signal handlers in Rust—if nothing else, you can always just translate whatever C code you would otherwise write into libc calls. What might be difficult or impossible is writing general signal handlers in safe Rust.

The only libc call would be a wait4 or waitpid call. The issue is that the signal handler code would then need to interact with data structures (such as lists or hash tables) also accessed by the main thread, which ideally are implemented in safe Rust.

When the main thread accesses these data structures, the signal is ordinarily blocked; which leads me to asking about the feasibility of an alternative idea. Suppose a signal is blocked (masked) most of the time, except during certain system calls (say reading from the console) at well-define points in the code (say an outer loop). Would it then be safe to use safe Rust from inside the signal handler?

In other words:

block_signal();
loop {
   // ...
   unblock_signal();
   libc::read(0, ....); // Rust runtime may be reentered here via the signal handler path
   block_signal();
}

would there be any remaining risk in a single-threaded application?

That's fine.

The embedded world have to deal with this stuff all the time whenever interrupts (the hardware version of signals) are fired. The way to do this is by using safe synchronisation tools (atomics, channels, lock-free data structures, etc.) to communicate between the signal handler and the rest of the program. Sure these are implemented using unsafe internally because they are the ones ensuring thread safety, but that's fine as long as the implementation is correct.

In general, if a type is Sync and is re-entrant then you should be fine. Note that mutexes typically aren't re-entrant because you'll encounter a deadlock if the thread executing the signal handler is already holding the lock.

How do I tell, in general? For instance, std::collections::HashMap implements the Sync trait (which I assume means it's thread-safe), but how would I tell whether it's reentrant? Is there a Rust trait or similar to describe this property or would I have to read its implementation/documentation? (Ctrl-F'ing the documentation page of HashMap for either mutex or lock yields no results.) The reference to the Abseil implementation discusses neither thread safety nor reentrancy.

It seems that implementing a data type in safe Rust that's reentrant could be very difficult to do due to memory management - can you point at some example types that may exist in the standard library or elsewhere?

The Sync trait means that it is safe to immutably access it from multiple threads at the same time.


In the case of HashMap you must follow the normal aliasing rules, that is, at any one time you may have several immutable references to it, or you may have exactly one mutable reference, but not both. As long as your code ensures that these are satisfied, it will be safe.

Reentrancy really only matters for things that allow mutation through immutable references such as mutexes. For things that don't do this, the "max 1 mutable reference" rule is enough even in the face of signal handlers (this is not taking memory allocators into account).

This is a good point, let's pursue this line further.

For the shell use case, let's assume the hash table contains information about which jobs and processes the shell currently knows about. Entries to this hash table would be added only in the main thread. The signal handler would need the ability to find entries and to update certain primitive fields in the state object, which is stored as the value in the hash table.

If my understanding of Rust is correct, I would need to accomplish this using only immutable accesses. Let's see if this is possible. The get() and get_key_value() methods take &self but they return an immutable reference to the value. The get_mut() method returns a mutable reference to the value, but requires a &mut self reference which I cannot obtain from the signal handler. Am I out of luck or can I add a layer of indirection, perhaps by obtaining a reference to an atomically modifiable object of some kind (I have not yet read up on Rust's features in this regard as I was shooting for a single-threaded implementation at first.)

I think I am out of luck, actually, upon further reflecting on:

Memory allocators would likely be involved. Even if I were somehow able to call get() instead of get_mut() on the HashMap, a concurrent insert() call in the main thread could cause a rehashing which at least traditionally is implemented by reallocation.

I don't think it would be possible it modify the hash map directly from the signal handler if it is as you describe.

The typical solution is to not do the mutation inside the signal handler. Instead, you'll send a message to the main thread that something needs updating (e.g. with a lock-free queue) then return from the handler.

That means there is no concurrent mutation of the hashmap (the signal handler doesn't even have a reference to it) and if the queue is pre-allocated you won't need to touch the allocator.

This is also good in your shell case because what you are allowed to do inside signal handlers is quite limited.

Thanks. Note that I intentionally included the word "single-threaded" in the topic line of this thread.

My concern with doing so was that in a general multi-threaded design, it would be perhaps equally difficult or impossible to implement the fork() to exec() path in safe Rust, because this path would then operate under the same restrictions as the signal handler since an async-signal-unsafe operation could be in progress, and thus suspended, by a different thread. (Before someone mentions to use posix_spawn, let me point out that it doesn't work for job control shells.)

Perhaps additional effort would be needed to ensure that no other threads are active when the fork() occurs; I haven't fully thought it through. This would essentially require to stop the thread handling the signal-derived notifications and restarting it after each fork(), which at first glance at least doesn't sound great; but perhaps it's the solution with the least overall complexity.