How to move values to closure that indirectly depends on them

Hi, I am trying to compile a piece of code that IMO does not break lifetime rules but the compiler is not allowing me to.

I constructed a minimal example below. Assume that build_compare is an external library that I have no control over, and that the signature of some_compare cannot change, i.e. it cannot be a Vec or something that does not require an allocation. I am trying to return a new DynComparator on top of the one returned by build_compare (essentially build_compare is a factory, and I am decorating that factory).

My expectation is that because values_right and values_left was passed to build_compare, it should be moved to inside the closure, but this is not really happening, and I have been unable so far to move them to inside the closure.

Because compare is never used outside the closure, I would expect this to be fine, as there is nothing outliving.

use std::cmp::Ordering;

pub type DynComparator<'a> = Box<dyn Fn(usize, usize) -> Ordering + 'a>;

fn build_compare<'a>(left: &'a Vec<i32>, right: &'a Vec<i32>) -> DynComparator<'a> {
    Box::new(move |i: usize, j: usize| {
        left[i].cmp(&right[j])
    })
}

fn some_compare<'a>(
    keys_left: &'a[i32], values_left: &'a[i32],
    keys_right: &'a[i32], values_right: &'a[i32]
) -> DynComparator<'a> {

    let values_left = values_left.to_vec();
    let values_right = values_right.to_vec();
    
    let compare = build_compare(&values_left, &values_right);

    Box::new(move |i, j| {
        let key_left = keys_left[i] as usize;
        let key_right = keys_right[j] as usize;
        (compare)(key_left, key_right)
    })
}

fn main() {
    let a = vec![0i32];
    let b = vec![1i32];
    let c = vec![0i32];
    let d = vec![1i32];
    let cmp = some_compare(&a, &b, &c, &d);
    println!("{:?}", (cmp)(0, 0));
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0515]: cannot return value referencing local variable `values_right`
  --> src/main.rs:21:5
   |
19 |       let compare = build_compare(&values_left, &values_right);
   |                                                 ------------- `values_right` is borrowed here
20 | 
21 | /     Box::new(move |i, j| {
22 | |         let key_left = keys_left[i] as usize;
23 | |         let key_right = keys_right[j] as usize;
24 | |         (compare)(key_left, key_right)
25 | |     })
   | |______^ returns a value referencing data owned by the current function

error[E0515]: cannot return value referencing local variable `values_left`
  --> src/main.rs:21:5
   |
19 |       let compare = build_compare(&values_left, &values_right);
   |                                   ------------ `values_left` is borrowed here
20 | 
21 | /     Box::new(move |i, j| {
22 | |         let key_left = keys_left[i] as usize;
23 | |         let key_right = keys_right[j] as usize;
24 | |         (compare)(key_left, key_right)
25 | |     })
   | |______^ returns a value referencing data owned by the current function

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0515`.
error: could not compile `playground`.

To learn more, run the command again with --verbose.

So the problem here is that build_compare produces a DynComparator containing the references to values_left and values_right that you passed it. Since values_left and values_right are local variables, the resulting compare may not escape the some_compare function. Putting it inside of the closure that’s being returned violates this.

AFAICT there is no solution that doesn’t involve new allocations on every call to the returned DynComparator<'a> without changing the signature of build_compare or some_compare. The least worst I can come up with is

fn some_compare<'a>(
    keys_left: &'a[i32], values_left: &'a[i32],
    keys_right: &'a[i32], values_right: &'a[i32]
) -> DynComparator<'a> {

    let values_left = values_left.to_vec();
    let values_right = values_right.to_vec();

    Box::new(move |i, j| {
        let key_left = keys_left[i] as usize;
        let key_right = keys_right[j] as usize;
        (build_compare(&values_left, &values_right))(key_left, key_right)
    })
}

but that will still create a new Box for the inner DynComparator creation/build_compare call every time it’s called.

Since you’re explaining that build_compare comes from an external library, I’m wondering: what kind of library uses &Vec<i32> in its API? This type should never be used since it’s strictly less general than &[i32] and has an extra indirection.

1 Like

Thanks a lot, Steffahn for your thoughts. I also though about that, but that is pretty expensive :frowning:.

What I do not understand is why this we can't move the values_left to inside the closure, which respects all lifetime bounds: compare is only used inside the closure, does not leak to outside of it, and the lifetime of values_* is also restricted to the return value of the closure. I can't see where this cause an invalid reference.

For reference, I used Vec as an example here. In the actual code Vec is actually a dynamically typed container, Arc<dyn Array>, whose all concrete implementations represent as byte buffers and reinterpret them via pointers for efficient pointer arithmetic. I actually control build_compare, but it has a lot of constraints:

build_compare creates a comparison between two such Arrays (when they represent the same type) by each of those concrete types (via downcast). Something like this

pub fn build_compare<'a>(
    left: &'a Arc<dyn Array>,
    right: &'a Arc<dyn Array>,
) -> DynComparator<'a> {
    match (left.data_type(), right.data_type()) {
        (a, b) if a != b => {
            unreachable!()
        }
        (Boolean, Boolean) => {
            let left = left.as_any().downcast_ref::<BooleanArray>().unwrap();
            let right = right.as_any().downcast_ref::<BooleanArray>().unwrap();
            Box::new(move |i, j| left.value(i).cmp(&right.value(j)))
        }
        ...
        (Dictionary, Dictionary) => some_comparison(...)
    }
}

However, the specification (arrow btw) also has a dictionary array, that uses keys to indicate the position of each of the values in the array (e.g. for compression, in case a value is repeated multiple times). Any array can be dictionary-encoded in this way.

some_compare above represents the implementation of a comparison between two dictionary-encoded arrays. For that, when I ask for index i, first it fetches the key corresponding to that index, and then the value, like values[key[i]]. The values[_] requires the indirection, which is why I need to re-cast the array of values (the values_right).

Maybe there is a smarter way of achieving this whole thing (re-interpret bytes through a pointer), but the point remains: we need to allocate a pointer to efficiently perform pointer arithmetic on the byte buffer. The to_vec is representing that operation (of allocating something).

Use rental

A lazy playground approach. Requires the DynComparator not to have a drop (true on stable rust) since no guarantee the pins won't drop first.

    let mut pin_values_left = Box::pin(values_left.to_vec());
    let mut pin_values_right = Box::pin(values_right.to_vec());
    let values_left:&'a Vec<i32> = unsafe { (pin_values_left.as_mut().get_unchecked_mut() as *mut Vec<i32>).as_ref().unwrap() };
    let values_right:&'a Vec<i32> = unsafe { (pin_values_right.as_mut().get_unchecked_mut() as *mut Vec<i32>).as_ref().unwrap() };
    let bc_dc = build_compare(values_left, values_right);
    let compare = Box::new(move |l,r| {
        if false {
            &pin_values_left;
            &pin_values_right;
        }
        bc_dc(l,r)
    });

(Careful with Pin-nin Unpin values, it does nothing except "signal an intent" to the readers, which a pin_ variable name already does (without the false sense of security)).


That being said, I agree that the pattern the OP has in mind is sound, provided our self-referential struct is somehow pinned.

I have thus achieved a dependency-less unsafe-less solution to this problem, it is, however, unstable, since it relies on generators and "their ability to be self-referential":

#![feature(generator_trait, generators)]
use ::std::cmp::Ordering;

pub
type DynComparator<'a> =
    Box<dyn Fn(usize, usize) -> Ordering + 'a>
;

fn build_compare<'a> (
    left: &'a Vec<i32>,
    right: &'a Vec<i32>,
) -> DynComparator<'a>
{
    Box::new(move |i: usize, j: usize| {
        left[i].cmp(&right[j])
    })
}

fn some_compare<'a> (
    keys_left: &'a [i32], values_left: &'a [i32],
    keys_right: &'a [i32], values_right: &'a [i32],
) -> DynComparator<'a> {

    let values_left = values_left.to_vec();
    let values_right = values_right.to_vec();
    
    let generator = static move |mut args| -> ::core::convert::Infallible {
        let compare = build_compare(&values_left, &values_right);
        loop {
            let (i, j) = args;
            let key_left = keys_left[i] as usize;
            let key_right = keys_right[j] as usize;
            args = yield (compare)(key_left, key_right);
        }
    };
    // Use a `RefCell` because generators need `&mut` to be polled.
    let generator = ::core::cell::RefCell::new(Box::pin(generator));
    
    Box::new(move |i, j| {
        use ::core::ops::{Generator, GeneratorState};
        match generator.borrow_mut().as_mut().resume((i, j)) {
            | GeneratorState::Yielded(it) => it,
            | GeneratorState::Complete(unreachable) => match unreachable {},
        }
    })
}

fn main ()
{
    let a = vec![0i32];
    let b = vec![1i32];
    let c = vec![0i32];
    let d = vec![1i32];
    let cmp = some_compare(&a, &b, &c, &d);
    println!("{:?}", (cmp)(0, 0));
}

You should be able to have this code run on stable using the unsafely-hacked versions of generators for stable Rust (the advantage compared to hand-rolling an unsafe-ly self-referential struct directly, is that you know that, eventually (once generators become stable), you will be able to remove this unsafe dependency).

1 Like