we sometimes encounter deadlocks (clarification: which are located and fixed) during runtime which I think could be caught by the compiler if it had some more information. My forum search didn't bring up any results so maybe I'm asking the wrong question.
To visualize this I've made two diagrams. The deadlock behavior looks like this:
This, I think, could be caught by the compiler if the lock() call not only returns a MutexGuard but also some meta-information that is checked during the lock-context, like the following diagram.
The solution here would be to avoid the global LIST and pass the guard, or a &mut ref to what it guards, explicitely to the callback.
Though you generally can't use the iterator API while modifying the same iterator.
let mut list = list.lock(); // i assume list is vec or similar
for i in 0..list.len() {
let item = list[i]; // maybe you'll need to clone here
item.callback(&mut list);
}
Something to that effect. Which effectively uses the borrowing rules to your advantage.
Thanks @erelde for your input, but I think handing over the guard is not an option. To show why I've added another caller detect_new_elem() to the diagram.
The add_to_list(elem) function should be callable independently of the call chain. The user must not care if add_to_list(elem) is at the end of a chain like
You could move your list implementation to an inner type which is owned by the mutex.
mod inner {
pub struct List {
...
}
impl List {
// all your methods
}
}
struct List(Mutex<inner::List>);
impl List {
// forward all operations to the inner List after taking the lock
}
All operations on the outer List have to take the mutex, and the actual operations on the inner List are lock free. Though that will basically make all accesses to that inner list serial.
And I feel compelled to tell you again, get rid of the global state and pass your dependencies explicitely. Rust is designed to make using global state unergonomic, you'll be fighting an uphill battle.
You're right in the context of making an entire application. But in our context it's a bit split up. One example where we are using this logic is a timer implementation.
The user should just be able to do something like this without having to take care of a list handle from the implementation:
Very much simplified:
impl UserApp {
fn init(&self) {
self.timer_handle = register_timer(|| { check_timeout() });
}
fn check_timeout(&self) {
if timeout_expired {
// do something to handle timeout
} else {
// Warning: `check_timeout` is called during the list lock is taken and `reset`
// will led to a deadlock as it tries to manipulate the timers timeout which
// also needs the lock.
self.timer_handle.reset();
}
}
}
In the simplest case the stack has the information at compile. It would be slow; as compilers only look at single functions at a time. Don't track implementation of other functions they call.
It is an easy bug to find though with basic testing. Effort is best spent on the bugs developers can't produce with basic tests.
It isn't possible to detect all deadlocks. Even your code with a small indirection would be hard to detect if it every gets used without actually running the code.
eg elem: Box<dyn Callback>. (You would need something explicit that says everything that can be dyn Callback must not lock.)
Note as a design principle that you should not call into user-supplied code while holding a lock, partly because of the risk of deadlocks, but also because it results in the lock hold time becoming unbounded (which means that you cannot provide useful guarantees of progress, since a callback can hold the lock forever).
There's a couple of routes that you can take to remove the need to hold a lock while calling user-supplied code:
Wrap up the callback in an Arc or similar reference counted structure, and use Arc::clone or similar under the lock as part of building a list of callbacks to call. Once you've got your list of callbacks, release the lock and consume the list, calling the callbacks one at a time.
Don't call user code at all; instead, trigger notifications of some form (send on a channel as the simplest option) to wake up the user task instead of running a callback, and rely on the user code handling that notification.
We already took care to not call callbacks during the lock. The question was more of the kind: would this be possible in Rust - maybe not now but with a future approach?
So I understand that this is a complicated thing to solve as it would also involve knowledge of things like the current thread context. If the nested API call (like timer_handle.reset()) is done in the same thread the compiler must fail whereas starting a new thread in the callback and call timer_handle.reset() from it it must not. That would require a lot more knowledge/context in the compiler.
So thanks again all for your suggestions and clarifications.
If you're really determined to allow nesting of calls, keep an eye on the tracking issue for ReentrantLock; this is a variant on Mutex that allows the same thread to take the lock more than once, but which prohibits mutable access to the contents.
You should be able to design something where you take a ReentrantLock at the beginning of all operations on a data structure (like "LIST" in your diagrams), and take a Mutex briefly for cases where you mutate the data structure. This is decidedly non-trivial to get right, and won't have compiler support for getting it right, but will let you implement your original design.
I'd still prefer a design based around Mutex, where you guarantee by code inspection that lock hold periods are bounded (implying that you don't call user code such as callbacks while holding the lock), but this is an approach you could consider.
Sorry, there was a misunderstanding. I don't want to use reentrant locks at this point, I just described a bit of the logic the compiler needs to have to detect these kind of situations when there are complex call-chains and deadlocks like the described in the diagram are not obvious to the programmer.
The Rust compiler will probably never be able to do that kind of analysis. It only looks at functions signatures. Another static analyzer on top could eventually. Possibly.
In principle, you could build a lint of the form you describe (where calling Mutex::lock or Mutex::try_lock on a function increments the reference count of a lock, and the reference count is decremented when the compiler is able to show that the matching MutexGuard must have been dropped by this point), but I see two challenges for you to deal with as you implement this:
As soon as dynamic dispatch (dyn Trait) is involved anywhere in the call stack, you're in trouble. The compiler doesn't have all the information it needs to do complete reference counting of locking and unlocking at this point, and therefore you need some way to manually annotate that a trait implementation can take, or must not take, a given lock - along with compile-time checking of those annotations.
For any complicated sequence of locking and unlocking, there will be either be false positives or false negatives. The compiler does not know (cannot know) the values of conditionals at the point it's looking at doing this, and thus must make assumptions to proceed. You can choose between false negatives, by assuming that conditional locking never happens or that conditional unlocking always happens, or false positives by assuming that conditional locking always happens and conditional unlocking never happens. Any other combination gets you both false positives and false negatives.
My gut feeling is that between dynamic dispatch and inability to analyze complex locking, you'll find that such a lint doesn't pay the cost of both implementing it and making it perform well. But I've been wrong before, and I could well be wrong this time.