Wrapping a RwLock*Guard along with something that borrows from it

I have a collection of collections. Each of the inner collections inside the outer collection is wrapped in a RwLock. I need to be able to iterate over corresponding entries from each inner collection, with some of the inner collections being borrowed shared (and locked for read) and others being borrowed mutably (and locked for write).

Distilled example:

#![feature(array_methods)]

use std::{
    collections::HashMap,
    slice::{Iter, IterMut},
    sync::{RwLock, RwLockReadGuard, RwLockWriteGuard},
};

type TestMap = HashMap<&'static str, RwLock<Vec<String>>>;

enum AccessType {
    Shared(&'static str),
    Exclusive(&'static str),
}

enum Accessor<'a> {
    Shared(RwLockReadGuard<'a, Vec<String>>, Iter<'a, String>),
    Exclusive(RwLockWriteGuard<'a, Vec<String>>, IterMut<'a, String>),
}

enum Access<'a> {
    Shared(&'a String),
    Exclusive(&'a mut String),
}

struct TestCollection {
    map: TestMap,
}

struct TestCollectionAccessor<'a, const N: usize> {
    accessors: [Accessor<'a>; N],
}

impl TestCollection {
    fn access<const N: usize>(&self, types: [AccessType; N]) -> TestCollectionAccessor<N> {
        TestCollectionAccessor { accessors: types.map(|access_type| {
            match access_type {
                AccessType::Shared(key) => {
                    let guard = self.map.get(key).unwrap().read().unwrap();
                    let iterator = guard.iter();
                    Accessor::Shared(guard, iterator)
                },
                AccessType::Exclusive(key) => {
                    let mut guard = self.map.get(key).unwrap().write().unwrap();
                    let iterator = guard.iter_mut();
                    Accessor::Exclusive(guard, iterator)
                },
            }
        })}
    }
}

impl<'a, const N: usize> Iterator for TestCollectionAccessor<'a, N> {
    type Item = [Access<'a>; N];
    fn next(&mut self) -> Option<Self::Item> {
        let many = self.accessors.each_mut().map(|accessor| {
            match accessor {
                Accessor::Shared(_, x) => x.next().map(|x| Access::Shared(x)),
                Accessor::Exclusive(_, x) => x.next().map(|x| Access::Exclusive(x)),
            }
        });
        if many.iter().all(|x| x.is_some()) {
            Some(many.map(Option::unwrap))
        }
        else {
            None
        }
    }
}

fn main() {
    let mut test = TestCollection { map: HashMap::new() };
    test.map.insert("a", RwLock::new([
        "foo", "bar", "baz", "bang",
    ].into_iter().map(str::to_string).collect()));
    test.map.insert("b", RwLock::new([
        "alpha", "beta", "gamma", "delta",
    ].into_iter().map(str::to_string).collect()));
    let accesses = [
        AccessType::Shared("a"),
        AccessType::Exclusive("b"),
    ];
    for x in test.access(accesses) {
        let (left_string, right_string) = match x {
            [Access::Shared(left_string), Access::Exclusive(right_string)]
                => (left_string, right_string),
            _ => panic!(),
        };
        right_string.push(' ');
        right_string.push_str(left_string);
    }
    for (key, value) in test.map.iter() {
        println!("{:?} = {:?}", key, *value.read().unwrap());
    }
}

This example wants to read-lock test.map.get("a"), write-lock test.map.get("b"), and modify each element of "b" to be followed by its counterpart from "a". Ideal output:

"a" = ["foo", "bar", "baz", "bang"]
"b" = ["alpha foo", "beta bar", "gamma baz", "delta bang"]

(or the other way around, depending on a flip of the HashMap coin)

The problem is that I want Accessor to wrap its respective iterator, but that iterator borrows from the guard... and I can't just return the guard and the iterator, like I try to do in the above example, because that moves the guard while the guard is borrowed. I can make a custom iterator that iterates on a RwLock*Guard<Vec<String>> instead of using Vec<String>'s iterators directly, but that gives me an icky feeling. In my real application, instead of a Vec<String>, it's a complicated custom data structure, and I really don't want to have to implement all of the iteration logic twice. I want there to be a better way.

Any ideas?

P.S. I know this simple example could be done with test.map.get("a").unwrap().read().unwrap().iter().zip(test.map.get("b").unwrap().write().unwrap().iter_mut()). Just trust me that the application this is distilled from can't be rewritten in those terms. Among other reasons, in the real application, the definition of "corresponding" is fairly complicated, so the real next actually some hefty logic in it, and each value yielded from the outer iterator may consume more than one value from any given inner iterator.

You simply can't store a reference along with what it refers to, for the exact reason you mentioned – that's a self-referential type, and if it's moved, the reference gets invalidated.

You should only return the lock guard, and instantiate the iterator where and when you need it, no earlier.

The problem is that I need to wrap complicated logic around the iterator—and not just one iterator, but several (potentially very many) iterators of the same type. And that complicated logic will sometimes need to be wrapped one more time, even.

A solution I thought of since making that post is to use the "with" pattern, where instead of returning an iterator I take a callback that is called once per iteration. That solution brings some problems with it, but it's probably what I'll do if I can't find another.

There has to be some way to "hang onto" a guard. Maybe involving a Box?

Then abstract that logic away using a function.

Box doesn't solve this problem (nor does any other smart pointer or heap allocation), because the reference handed out by Box is still tied to the lifetime of the Box (and rightfully so – it owns its referent). But the borrow checker doesn't know about stack vs heap allocations (and it couldn't/shouldn't — references to stack and heap allocations have the same type), so it can't possibly know that moving a box doesn't invalidate the address of its referent.

Then abstract that logic away using a function.

I'm trying to write that function... T_T

Maybe if I rephrase the question...

I have a collection of collections. Every inner collection is protected by a RwLock. I want to make an Iterator over multiple, arbitrary inner collections, such that this Iterator will yield corresponding entries from each inner collection, borrowing some mutably and some immutably, skipping any entries that are missing from some of the inner collections.

If the inner collection Iterator looks like this:

struct SomeIter<'a> {
  collection: &'a SomeCollection,
  ...
}

...then I can't do that. MyIter will borrow through the guard, and once it does that, there's no way to move the guard. I know this won't work.

But if the inner collection Iterator looked like this instead:

struct SomeIter<'a> {
  collection: RwLockReadGuard<'a, SomeCollection>,
  ...
}

Then it would trivially be possible to do what I need. My question is, is there some way to do that without reimplementing SomeIter? Is it completely impossible? I have a really strong feeling that it's possible to do this somehow, and that I'm just not clever enough to figure it out.

If I understand correctly, you need the functionality of a map() method on RWLock*Guard. That doesn't exist in the standard library, but you could try parking_lot, which does.

That is exactly what I need. parking_lot to the rescue (again)!

Thank you for the help.

Edit:

Curses, foiled again. The function passed to parking_lot::RwLockReadGuard::map must return a simple reference, and can't return a referencing type like an Iterator implementation. But maybe I can improve parking_lot to solve that problem, and thus solve my problem...

Thank you again for the help, regardless.

You might want to look at Allow owned data in MappedMutexGuard by eira-fransham · Pull Request #290 · Amanieu/parking_lot · GitHub
It might be nice if someone found a viable solution, however I feel like right now every "working" solution will have bad ergonomics due to poor inference from the compiler.

I guess it's not possible, then, to change:

pub fn map<U: ?Sized, F>(s: Self, f: F) -> MappedMutexGuard<'a, R, U>
where
F: FnOnce(&mut T) -> &mut U,

to:

pub fn map<'b, U: 'b, F>(s: Self, f: F) -> MappedMutexGuard<'a, R, U>
where
F: FnOnce(&'b mut T) -> U,

That was what I was planning to try, though I expect I'll find that you can't use lifetimes that way.

No, that's unsound, the caller can just choose 'b = 'static and get a &'static mut T inside the closure, which is of course wrong because such a reference doesn't exist. The lifetime of the reference given in the closure needs to be defined as part of the FnOnce bound, not on the function arguments.

No you can't, because this says that the lifetime of U is independent of the lifetime of the mutex, which is obviously wrong.

I tried to implement my own MappedMutexGuard, but I failed: neither 'b: 'a nor 'a: 'b satisfies the borrow checker (and using only one lifetime parameter would also be obviously wrong, because &'a mut T<'a> is a common anti-pattern that requires the object to be borrowed for the rest of its lifetime, making it unusable or even un-droppable).

All in all, I don't think you can achieve projection to an arbitrary, non-reference type.

I'm not sure how these problems don't apply to the current prototype.

Isn't it the same as:

pub fn map<'b, U: ?Sized, F>(s: Self, f: F) -> MappedMutexGuard<'a, R, U>
where
F: FnOnce(&'b mut T) -> &'b mut U,

If not, I definitely don't understand what's going on.

It's not. In the signature you described, 'b is still chosen by the caller. However, in an Fn, the syntax Fn(&T) -> … is actually shorthand for HRTB, i.e., for<'arg> Fn(&'arg T) -> ….

1 Like

So, then... the problem is that we end up with:

pub fn map<U, F>(s: Self, f: F) -> MappedMutexGuard<'a, R, U>
where
F: for <'b> FnOnce(&'b mut T) -> U,

and then get stuck, because there's no way to attach that 'b to U without moving 'b into map's type signature and giving the caller the ability to specify unsound bounds?

You can use a visitor/scoped pattern instead. Basically callers of visit supply the body of the iteration loop.

1 Like

That's one of the two solutions I considered. In the end, I'm probably just going to make RwLock*Guard versions of the iterators for my custom collection instead. The reason I prefer that over the visitor pattern is that it's easier to chain.

No, you can do that by using Higher Ranked Types. In Rust they aren't first class but you can emulate them with GATs or HRTBs (the latter only for lifetime-generic types). The problem with this approach is that type inference will break and you'll have to manually annotate types. This is of course not an ideal workflow. See for example https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1e3950eb57722eff00dd5a08d405b3d2

And while you can improve the experience, for example by avoiding having to implement an extra type and trait, the experience will still be subpar Rust Playground

For posterity, I changed my custom collection's iterator to:

pub struct MyCollectionIter<'a, T: Deref<Target=MyCollection>> {
    collection: T,
    // ... regular iterator state goes here ...
    _phantom: PhantomData<&'a MyCollection>,
}

In exchange for slightly clunkier syntax for getting the iterator in the first place, this seems to lets me iterate over anything "reference-like" (including RwLockReadGuard), and I didn't need to add any unsafe blocks to make it work, nor did I have to implement my iteration logic twice.

If somebody spots a way I've created unsoundness, feel free to let me know.

Edit

The PhantomData stuff wasn't necessary. My final solution:

pub struct MyCollectionIter<T: Deref<Target=MyCollection>> {
    collection: T,
    // ... regular iterator state goes here ...
}

If T really does have a referential dependency on some real MyCollection, that information will go along with that type. I didn't need to put that in myself. I would've needed to put that lifetime information in my struct only if I were putting in a field that definitely used that lifetime.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.