Anonymous generic closure inside a struct's iterator

Consider the following problem: we have bytes &[u8] representing &[P] and we want to perform the transformation &[u8] -> P -> T on an item per item, where each of the transformations is known at compile time but branches according to some run time parameter. We do not want to convert all the values at once, and instead allow an iterator based mechanism to pull some values.

An example where all items would be consumed at once for a specific P=i32 and T=i16 would be:

use std::convert::TryInto;

fn read_all<F: Fn(i32) -> i16>(data: &[u8], op: F) -> Vec<i16>{
    data
        .chunks_exact(std::mem::size_of::<i32>())
        // map to the P space
        .map(|chunk| {
            let chunk: [u8; 4] = match chunk.try_into() {
                Ok(v) => v,
                Err(_) => unreachable!(),
            };
            i32::from_le_bytes(chunk)
        })
        // map to the T space
        .map(op)
        .collect()
}

To allow only part of the iterator to be consumed, I need to store its state. My challenge has been to store the state correctly in the trait system.

This is how far I was able to go so far (T is the i16 above, P is the i32 above)

use std::convert::TryInto;

/// simple interface for POD types to read from little endian
trait FromBytes: Sized + Copy + 'static {
    type Bytes: AsRef<[u8]> + for<'a> TryFrom<&'a [u8]>;
    fn from_le_bytes(bytes: Self::Bytes) -> Self;
}

/// reads a chunk to T.
#[inline]
fn read_item<T: FromBytes>(chunk: &[u8]) -> T {
    let chunk: <T as FromBytes>::Bytes = match chunk.try_into() {
        Ok(v) => v,
        Err(_) => unreachable!(),
    };
    T::from_le_bytes(chunk)
}

/// struct containing an iterator of `T` through the intermediary op P -> T
struct AA<'a, T, P, F, G>
where
    P: FromBytes,
    F: Fn(P) -> T,
    G: for<'b> Fn(&'b [u8]) -> P,
{
    values: std::iter::Map<std::slice::ChunksExact<'a, u8>, G>,
    phantom: std::marker::PhantomData<P>,
    phantom_f: std::marker::PhantomData<F>,
}

impl<'a, T, P, F, G> AA<'a, T, P, F, G>
where
    P: FromBytes,
    F: Fn(P) -> T,
    G: for<'b> Fn(&'b [u8]) -> P,
{
    fn new(data: &'a[u8], op: F) -> Self {
        Self {
            values: data
                .chunks_exact(std::mem::size_of::<P>())
                .map(|x| op(read_item(x))),
            phantom: Default::default(),
            phantom_f: Default::default(),
        }
    }
}

this does not compile because there is an anonymous closure |x| op(read_item(x)). I also tried

.map(read_item).map(op)

and change values signature to

values: std::iter::Map<std::iter::Map<std::slice::ChunksExact<'a, u8>, G>, F>

but that does not work either with

impl<'a, T, P, F, G> RequiredPlainDataPage1<'a, T, P, F, G>
    |                     - this type parameter
...
230 |               values: page
    |  _____________________^
231 | |                 .buffer()
232 | |                 .chunks_exact(std::mem::size_of::<P>())
233 | |                 .map(read_item)
234 | |                 .map(op),
    | |________________________^ expected type parameter `G`, found fn item
    |
    = note: expected struct `std::iter::Map<std::iter::Map<ChunksExact<'a, _>, G>, _>`
               found struct `std::iter::Map<std::iter::Map<ChunksExact<'_, _>, for<'r> fn(&'r [u8]) -> P {read_item::<P>}>, _>`

Any ideas?

Note that I am using traits for functions because I want them to be inlined, it is used very often / performance critical

Does something like this work for you (based on your first example)? This approach should work on stable, but if you need a concrete name for the iterator type, you'll need #![feature(type_alias_impl_trait)].

use std::convert::TryInto;

fn read_all<'a, F: Fn(i32) -> i16>(data: &'a [u8], op: F) -> impl 'a + Iterator<Item=i16> {
    data
        .chunks_exact(std::mem::size_of::<i32>())
        // map to the P space
        .map(|chunk| {
            let chunk: [u8; 4] = match chunk.try_into() {
                Ok(v) => v,
                Err(_) => unreachable!(),
            };
            i32::from_le_bytes(chunk)
        })
        // map to the T space
        .map(op)
}
2 Likes

You can get a concrete name for the iterator type if you implement Iterator yourself. Other notes:

  • I think you meant G: /*...*/ -> T not -> P and refactored accordingly
  • I made everything order <P, T, F> for consistency as part of that

But if I guessed wrong I think you'll still get the idea from the example.

2 Likes

Thanks both for the corrections and for the suggestions.

I though about implementing Iterator but If I implement an iterator, we lose the TrustedLen invariant that can only be implemented in the unstable channel. The TrustedLen is an important part of the inlining aspect of this problem.

I was able to get away with this via the .map.map idea, by passing both operators to new instead of using a function pointer in the initialization. This allows it to be bound to G, which compiles:

use std::convert::TryInto;

trait FromBytes: Sized + Copy + 'static {
    type Bytes: AsRef<[u8]> + for<'a> TryFrom<&'a [u8]>;
    fn from_le_bytes(bytes: Self::Bytes) -> Self;
}


fn read_item<T: FromBytes>(chunk: &[u8]) -> T {
    let chunk: <T as FromBytes>::Bytes = match chunk.try_into() {
        Ok(v) => v,
        Err(_) => unreachable!(),
    };
    T::from_le_bytes(chunk)
}

struct AA<'a, T, P, F, G>
where
    P: FromBytes,
    F: Fn(P) -> T,
    G: for<'b> Fn(&'b [u8]) -> P,
{
    values: std::iter::Map<std::iter::Map<std::slice::ChunksExact<'a, u8>, G>, F>,
    phantom: std::marker::PhantomData<P>,
}

impl<'a, T, P, F, G> AA<'a, T, P, F, G>
where
    P: FromBytes,
    F: Fn(P) -> T,
    G: for<'b> Fn(&'b [u8]) -> P,
{
    fn new(data: &'a[u8], op1: G, op2: F) -> Self {
        Self {
            values: data
                .chunks_exact(std::mem::size_of::<P>())
                .map(op1)
                .map(op2),
            phantom: Default::default(),
        }
    }
}

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.