Blog post series: Rust patterns

I'm opening this thread for questions or comments regarding the blog post series I've been writing about "Rust patterns". Each post aims to explain some tricky problem that Rust users may encounter -- both why the problem exists, how to solve it today, and how we might enable more elegant solutions in the future.

I'm a bit late in opening this thread, as there have been two blog posts thus far, and both were some time ago:

26 Likes

New post:

Iterating over an Rc<Vec<T>>

This post examines a particular, seemingly simple problem: given ownership of a Rc<Vec<u32>> , can we write a function that returns an impl Iterator<Item = u32> ? It turns out that this is a bit harder than it might at first appear – and, as we’ll see, for good reason. I’ll dig into what’s going on, how you can fix it, and how we might extend the language in the future to try and get past this challenge.

8 Likes

The bit about "hidden" auxiliary values on the caller slot is very interesting: I hit the need for this pattern every now and then, and I don't know of a practical workaround other than just cloning the data.

Though, the particular example is probably not illuminating: fn iterate<T>(data: Rc<Vec<T>>) -> impl Iterator<Item = &T> is exactly fn iterate(data: &'a [T]) -> impl Iterator<Item = &'a T> + 'a: you don't need an auxiliary caller-stored slot because, in this specific case, you can just accept the reference.

EDIT: it is possible to emulate aux slots today, but the call site looks super-ugly: Rust Playground

EDIT: funny enough, I need 'aux in the very code I am refactoring right now, to get rid of this clone

Can this case be handled by returning a Cow tied to the lifetime of self borrow?

It could, yeah, but then the return type would be Cow, and not just &str. There are cases though, where a Cow won't work (like, if you'd want to take a [3..] of result).

Yeah, return type is Cow<'self, File> but you avoid the clone at least :slight_smile:

Your blog post doesn't specify if unsafe is OK.

If it is fair game, I propose the following:
1° Exploit the fact that the Vec's address inside the Rc won't change as long as the refcount doesn't become zero.
2° "Augment" the signature of the iterate function with a PhantomData that expresses the desired lifetime for our iterator.

Here's the result.

Granted, my solution is maybe silly because I don't really understand what's at stake. In a similar situation, I'd just modify iterate's signature to accept a ref to Vec<T> and pass &*rc as argument rather than this complicated dance that is required to pass the Rc directly :slight_smile:.

EDIT: Slightly simplified version here.

EDIT2: Turns out I don't really need to change iterate's signature, other than adding the appropriate lifetime.

For reference, here's how you can solve the problem using rental:

#[macro_use]
extern crate rental;
use std::rc::Rc;

rental! {
    pub mod x {
        use std::slice;
        use std::rc::Rc;
        #[rental]
        pub struct RcVecIterator {
            data: Rc<Vec<u32>>,
            iter: slice::Iter<'data, u32>,
        }
    }
}
use self::x::RcVecIterator;

impl Iterator for RcVecIterator {
    type Item = u32;
    fn next(&mut self) -> Option<u32> {
        self.rent_mut(|iter| iter.next().cloned())
    }
}

fn iterate(data: Rc<Vec<u32>>) -> impl Iterator<Item = u32> {
    RcVecIterator::new(
        // `data` field:
        data,
        // closure to produce `iter` field from borrowed `data` field:
        |ref_data: &Vec<u32>| ref_data.iter())
    )
}
6 Likes

Also, this is somewhat pointless / overly specific to the problem at hand, but here's a solution using generators (playground link). It works but requires unnecessary boxing, because borrowing data across yield points requires the generator to be marked as immovable (static).

fn iterate(data: Rc<Vec<u32>>) -> impl Iterator<Item = u32> {
    IteratorWithGenerator(Box::new(static move || {
        for &n in data.iter() {
            yield n;
        }
    }))
}
2 Likes

Here's a few simple patterns that I sometimes run across:

Optional buffer allocation / Allocate and reference

Best example I can think of at the moment is where you may have an API that optionally allows the caller to provide their own buffer to reuse between calls, perhaps for a type that is constructed with a builder pattern. In the bellow snippet, a vector will be allocated and a reference to it returned in the event that no buffer is found. Otherwise, the provided buffer is simply obtained.

fn do_thing(buffer: Option<&mut [u8]>) {
    let mut allocated: Vec<u8>;
    let buffer: &mut [u8] = match buffer {
        Some(buffer) => buffer,
        None => {
            allocated = Vec::with_capacity(64 * 1024);
            &mut allocated
        }
    };
    
    do_thing_with_buffer(buffer);
}

The key is that you can assign a value to a previously-unassigned variable, and then take reference it so that the match arm suceeds in attaining a mutable reference with either arm.

Generic collection return type

Some APIs return Vec<T> instead of allowing the caller to to choose the type they want to collect into.

use std::iter::FromIterator;

// Where `X` is the type that the collection stores.
fn collect_thing<T: FromIterator<X>>(&self) -> T {
    self.iterable_thing().collect::<T>()
}

let thing_map = collect_thing::<HashMap>();
let thing_vec = collect_thing::<Vec>();

Although it's probably better to return impl Iterator<Item = X> nowadays.

Actions with pre and post conditions

It's often better to use a closure when you may perform tasks that have the same pre and post conditions. In the event that you'd also like the inner closure to reuse some variables that the function borrowed, you can pass the borrows to the closure to avoid an issue with the borrow checker.

fn backup(config: &Config, mut func: impl FnMut(&Config) -> io::Result<()>) -> io::Result<()> {
   let backup = do_backup_with(config)?;
   func()?;
   restore_backup(backup)
}

backup(config, || install_things_with(config));

Pass borrowed mutable borrows into inner closure

Sometimes you may run into a situation where you want to re-use a mutable borrow within the closure passed into a function that also borrows the same value.

fn do_thing(borrow: &mut Thing, mut func: impl FnMut(&mut Thing) {
    do_thing_with(borrow);
    func(borrow);
    do_other_thing();
}

do_thing(&mut value, |value| also_with(value));

Generic hash key

Some types don't work very well as keys for hash maps due to requiring an allocation and borrowing issues when getting and inserting keys. A solution is to hash the key in advance and use that.

fn hash_key<T: Hash>(key: &T) -> u64 {
    let mut hasher = DefaultHasher::new();
    key.hash(&mut hasher);
    hasher.finish()
}

let key = hash_key(&struct.name);
map.insert(key, struct);

Inherit methods of a base structure

In some situations it may be ideal to inherit the methods of a structure that you are extending. I've seen a need for this quite often when designing GTK widgets and views, where that widget should be capable of inheriting the methods of the base widget it extended. Implementing the Deref and AsRef traits on the type will do the job.


struct View { ... };

impl View {
    fn set_title(&self, title: &str);
    fn get_title(&self) -> &str;
}

struct SpecificView {
    view: View;
}

impl Deref for SpecificView {
    type Target = View;
    fn deref(&self) -> &Self::Target {
        &self.view
    }
}

impl AsRef<View> for SpecificView {
    fn as_ref(&self) -> &View {
        &self.view
    }
}

let specific = cascade!
    SpecificView::new();
    ..set_title("Specific View");
    | stack.add(specific.as_ref());
};

eprintln!("{}", specific.get_title());

Using Arc::strong_count to detect remaining threads

Sometimes you may want to know how many threads are left laying around instead of blocking to wait for them, which can be made more difficult when threads may spawn other threads to share a value. This can solve that:

let atomic_value = Arc::new(T);

let handles = spawn_many_threads_with(atomic_value.clone());

while Arc::strong_count(atomic_value) != 1 {
    // threads are still alive
    // possibly do other thing while we wait
    sleep(Duration::from_millis(1));
}

let result = handles.into_iter().map(|h| h.join().unwrap()).collect();
9 Likes

@nikomatsakis what is the difference between moving into a cloning closure and cloning the element beforehand since the beginning ? (Actually I feel like I'm already answering my question)
In other words, how does your solution compare with Rc::deref(&data).clone().into_iter() ?

1 Like

This would clone the entire underlying Vec, whereas the solution in the blog does not do that.

1 Like

Yes, since the point of iterators is producing on demand, the move closure allows to just clone on demand and is thus a good solution.

(I had still wanted to point out a correctly typed solution, for the sake of comparison)

I just got the (discourse) digest email and am super excited to read through these. My brain works so much better when I have patterns to apply to solving problems and I feel like I haven't a) found good Rust-related resources on the topic and b) haven't yet developed any of my own (after about 2 years of Rusting). TIA!

Alternatively, if the function took a borrowed vector of type &Vec<u32> , there would still be an easy solution. In that case, we couldn’t use into_iter , because that requires ownership of the vector. But we could write data.iter().cloned() – data.iter() gives us back references ( &u32 ) and the cloned() adapter then “clones” them to give us back a u32 (try it on play).

That play link is wrong and points to the same code as the previous link. I can't seem to get the code that you intended to compile (at least not with trivial corrections).

Apparently, you need this: Rust Playground

fn iterate<'a>(data: &'a Vec<u32>) -> impl Iterator<Item = u32> + 'a {
    data.iter().cloned()
}

fn main() {
    let v = vec![1, 2, 3];
    for x in iterate(&v) {
        println!("Hi! {}", x);
    }
}

What about a non-language solution that focuses on Rc, which is similar to shared_ptr's solution to the same problem:

template< class Y >
shared_ptr( const shared_ptr<Y>& r, element_type* ptr ) noexcept;

The aliasing constructor : constructs a shared_ptr which shares ownership information with r , but holds an unrelated and unmanaged pointer ptr . If this shared_ptr is the last of the group to go out of scope, it will call the stored deleter for the object originally managed by r . However, calling get() on this shared_ptr will always return a copy of ptr . It is the responsibility of the programmer to make sure that this ptr remains valid as long as this shared_ptr exists, such as in the typical use cases where ptr is a member of the object managed by r or is an alias (e.g., downcast) of r.get()

The issue in Rust would be whether this functionality can be made safe or not. Perhaps the best approach would be a map function which takes an Rc and a Fn(&T) -> &R which then stores the R. I suppose this should be safe, since &R can only refer to something in &T or something that is 'static, unless unsafe is used.

Not saying that language features would be a bad idea, but they aren't required to solve this problem, you'd just end up with a fn iterate(data: Rc<Vec<u32>>) -> Rc<impl Iterator<Item = u32>> instead.