How to convince the borrow checker?

Hi everyone,
I was working on a logic where if there are items in a map satisfy a predicate, return those items in a borrowed form as Err, otherwise insert a new kv pair and return Ok.

Below is a simplified version of the logic. For insert_v1, it's clear that the compiler can not infer if items.len() == 0 disjoins the else block, so borrow check failure is expected.
But in insert_v2, the items will not available in the None arm, the compiler still complains.

Is there any way I can assure to the compiler the Err and Ok path won't overlap?

struct Records {
    data : HashMap<u32, String>,
}

impl Records {
    fn insert_v1(&mut self, newK : u32, newV : &str) -> Result<(), Vec<&str>> {
        let items : Vec<&str> = self.data.iter().filter_map(|(k,v)| if newK > *k { Some(v.as_ref())} else { None }).collect();
        if items.len() == 0 {
            self.data.insert(newK, newV.into());
            Ok(())
        } else {
            Err(items)
        }
    }

    fn insert_v2(&mut self, newK : u32, newV : &str) -> Result<(), Vec<&str>> {
        let smaller_items = {
            let items : Vec<&str> = self.data.iter().filter_map(|(k,v)| if newK > *k { Some(v.as_ref())} else { None }).collect();
            if items.len() == 0 {
                None
            } else {
                Some(items)
            }
        };

        match smaller_items {
            Some(items) => Err(items),
            None => { self.data.insert(newK, newV.into()); Ok(())},
        }
    }
}

As a nitpick: vec.len() == 0 should be vec.is_empty() (same goes for hashmap, lists, etc) :wink:

1 Like

You have to split the length (empty) check and the actual insertion, e.g. I went for your v1

fn insert_v1(&mut self, newK: u32, newV: &str) -> Result<(), Vec<&str>> {
    let items_empty = self.data.iter().all(|(&k, _)| newK < k);
    if items_empty {
        self.data.insert(newK, newV.into());
        Ok(())
    } else {
        Err(self
            .data
            .iter()
            .filter_map(|(k, v)| if newK > *k { Some(v.as_ref()) } else { None })
            .collect())
    }
}

I hope I didn't change your logic in the all closure. Not really sure, you may double check that :wink:

I also thought about that, but that means you have duplicate computation.

It would still be ideal if going through the vector is done only once, despite both being O(n), and the extra computation may not be too bad in this case.

I would probably go for fn any<F>(&mut self, f: F) -> bool to save some iterations, but that's still not an optimal solution, as it requires to iterate twice.

This is one of the current limitations of the borrow checker: you know that the borrows are disjoint, but Rust cannot see that. Maybe the next borrow checker, Polonius, will manage to solve your case.

In the meantime, you need to

  • either iterate with two passes, as shown by @hellow (careful with the inequality negation, though, in this case I'd rather use .any(p()).not() than all(not_p()) because it avoids this kind of mistakes);

  • Wrap the HashMap within a RefCell,

  • use Arc<str> instead of String and .clone() the values instead of .as_ref();

  • you can use unsafe because this pattern is known to be sound. This is dangerous if the code gets refactored by someone not fully aware of the invariants that made unsafe usage sound (and that person can be oneself after a few months):

    struct Records {
        data: HashMap<u32, String>,
    }
    
    impl Records {
        fn insert (
            self: &'_ mut Self,
            new_k: u32,
            new_v: impl Into<String>,
        ) -> Result<
                (),
                Vec<&'_ str>,
            >
        {
            let items: Vec<&str> =
                unsafe { &*(&self.data as *const HashMap<u32, String>) }
                    .iter()
                    .filter_map(|(&k, s)| if k < new_k {
                        Some(s.as_str())
                    } else {
                        None
                    })
                    .collect()
            ;
            if items.len() == 0 {
                mem::drop(items);
                self.data.insert(new_k, new_v.into());
                Ok(())
            } else {
                Err(items)
            }
        }
    }
    
  • and last but not least, for it is a 0-cost non-unsafe solution (it's just that the call-site ergonomics are not great), you can use continuation passing style:

    struct Records {
        data: HashMap<u32, String>,
    }
    
    impl Records {
        fn with_insert<R, F> (
            self: &'_ mut Self,
            new_k: u32,
            new_v: &'_ str,
            f: F,
        ) -> R
        where
            F : FnOnce(Result<(), Vec<&str>>) -> R,
        {
            let items: Vec<&str> =
                self.data
                    .iter()
                    .filter_map(move |(&k, s)| if k < new_k {
                        Some(s.as_str())
                    } else {
                        None
                    })
                    .collect()
            ;
            if items.len() == 0 {
                self.data.insert(new_k, new_v.into());
                f(Ok(()))
            } else {
                f(Err(items))
            }
        }
    }
    
    fn main ()
    {
        let mut records = Records {
            data: hash_map! {
                10  => "Hello",
                20  => "World",
                100 => "Big key",
            },
        };
        dbg!(&records);
        records.with_insert(30, "foobar", |ret| {
            let _ = dbg!(ret);
        });
        records.with_insert(0, "tiny key", |ret| {
            let _ = dbg!(ret);
        });
    }
    
6 Likes

Indeed, if you use the nightly toolchain and pass -Z polonius to rustc, then the original code compiles.

12 Likes

Can you show how you’d do this approach? I tried, but couldn’t figure it out due to the Ref being local to the function.

I'm just gonna guess that @Yandros meant something like this. (I also wanted some practice with RefCell and Arc :smiley:! so if I'm way off sorry.)

struct Records<'s> {
    data : RefCell<HashMap<u32, Arc<&'s str>>>,
}

impl<'s> Records<'s> {
    fn insert_v1(&mut self, new_k : u32, new_v : &'s str) -> Result<(), Vec<Arc<&str>>> {
        let items : Vec<Arc<&str>> = self.data.borrow().iter()
            .filter_map(|(k, v)| if new_k > *k { Some(Arc::clone(v))} else { None })
            .collect();
        
        if items.is_empty() {
            self.data.borrow_mut().insert(new_k, Arc::new(&new_v));
            Ok(())
        } else {
            Err(items)
        }
    }
}

playground

fn insert_v1(&mut self, new_k : u32, new_v : &'s str) -> Result<(), Vec<Arc<&str>>> {
        let items : Vec<Arc<&str>> = self.data.borrow().iter()

This function compiles just as fine without the RefCell. If you ever need to mutably borrow a RefCell (as you did with &mut self), you're probably doing something wrong, as the whole purpose of RefCell is that a &RefCell<T> can give you a &mut T.

3 Likes

Is it possible to do it with RefCell but without Arc? That's where I was getting tripped up.

I don't think it's possible with RefCell around the data field without resolving https://github.com/rust-lang/rust/issues/54776

The author of that issue tried to implement a working solution in the "cell" crate but its API is unsound. An discussion of why it's unsound can be found in Unsafety with advanced version of RefCell

I suggested that off the top of my head, but you are right, the fact that we are returning a borrow does not make RefCell or other Interior Mutability wrappers a solution (for it to work, the pattern with closures would be needed, and in that case RefCell is no longer necessary).

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.