Collect for ExactSizedIterator?

Is there any API to collect from an ExactSizedIterator? something like FromExactSizedIterator and corresponding collect?

If not, are there any plans to consider this? Alternatively, does collect have a specialization for ExactSizedIterator?

At the arrow crate (from Apache arrow), are working with immutable containers and we have a significant perf difference between creating a new container out of transformation applied over an existing container (which has a known size).

Is there something equivalent for Extend?

Iterator::collect uses the FromIterator trait. Most FromIterator implementations call Iterator::size_hint to pre-allocate if possible. For types that implement ExactSizeIterator correctly, this pre-allocation is guaranteed to be optimal.

For example, iter.collect::<Vec<_>>() always starts by allocating a Vec with capacity equal to the lower bound returned by iter.size_hint(). If iter implements ExactSizeIterator, then this must be equal to iter.len().

The same is true for typical Extend impls.

Can you be more specific about which types and iterators you are using, and the performance differences you are seeing?

3 Likes

Hi @mbrubeck , thanks a lot for your answer.

The code where we observe such differences is here.

Note how the only difference between extend_from_iter and extend_from_exact_iter in there is the if statement inside the for loop, which in extend_from_iter is necessary due to the unknown sized nature of the iterator.

Given a slice, there is a 2x difference between

// slices' items are monotonically increasing.
let lengths = slice.windows(2).map(|offset| offset[1] - offset[0]);

let mut buffer = MutableBuffer::new(0);
buffer.extend_from_exact_iter(lengths);

and

let buffer: MutableBuffer = slice
    .windows(2)
    .map(|offset| offset[1] - offset[0])
    .collect();

I could try to reproduce this in a smaller example, but note that because we use a custom allocator (to enforce that allocations are always along cache lines) and stable rust, our growable container (MutableBuffer) does not rely neither on Vec nor RawVec (nor many of the alloc functions).

It's not sound to rely on ExactSizeIterator implementations being correct. This code from your branch has undefined behavior if iterator.len() returns an incorrect value. Since ExactSizeIterator is a safe trait, this means that someone could cause undefined behavior in safe Rust, by implementing ExactSizeIterator incorrectly and then calling your safe method:

pub fn extend_from_exact_iter<T: ArrowNativeType, I: ExactSizeIterator<Item = T>>(
    &mut self,
    iterator: I,
) {
    let len = iterator.len() * std::mem::size_of::<T>();`

    self.reserve(len);

    let mut dst = unsafe { self.data.as_ptr().add(self.len) as *mut T };
    for item in iterator {
        // note how there is not reserve here (compared with `extend_from_iter`)
        unsafe {
            std::ptr::write(dst, item);
            dst = dst.add(1);
        }
    }
    self.len += len;
}

There is an unsafe trait TrustedLen that is designed for this sort of specialization, but it is not yet available in stable Rust. See the tracking issue for some details.

5 Likes

Good catch. However, that is solved through an assert_eq! inside the loop. It seems that the if inside the loop is expensive.

Could it be that the performance difference derives from the issue that SetLenOnDrop addresses, Nightly performance issue with vec::extend_with_element · Issue #32155 · rust-lang/rust · GitHub?

An assert_eq also contains an if that has to run in each iteration.

2 Likes

It's certainly possible. I had to replicate SetLenOnDrop to make SmallVec::extend perform well while still handling iterator panics correctly: Don't leak on panic in extend by mbrubeck · Pull Request #137 · servo/rust-smallvec · GitHub

Thanks a lot for your comments. This is outside my comfort zone and I am strugling even to understand why the SetLenOnDrop is needed / addresses the issue.

Could you clarify or point me to a resource about this? When I read the documentation of ExactSizeIterator, there is a very clear "must" on it. It is true that it does not describe what happens if people do not (e.g. panic or UB?).

Is it the argument that if the trait is safe, there should be no way for it to lead to undefined behavior, and that is why the consumers should also verify that contract?

Because in this case, I can't really see the use-case for ExactSizeIterator: the only optimization I can think of is to use unsafe to speed up iterations, and ExactSizeIterator does not allow that. Then what is it useful for?

Yes, precisely. Errors in safe code must not lead to undefined behavior. So if memory safety of some unsafe code relies on a trait implementation outside its control, that trait must be marked unsafe.

There are some other minor differences between ExactSizeIterator and TrustedLen (discussed in the original pull request), but for the discussion here, the only difference that matters is that one is unsafe and the other is not.

Indeed, I don't think it's very useful for optimizations. It can be useful in safe code that uses the length of the iterator for other purposes. These cases are somewhat rare, in my experience. For one example, the Zip iterator adaptor requires it for implementing DoubleEndedIterator::next_back.

Note that if ExactSizeIterator is implemented incorrectly, this may produce incorrect results, but it cannot cause undefined behavior.

If I understand correctly, it's needed only because of a bug in the optimizer where writing to the self.len field causes it to believe (incorrectly) that other fields from self might also be modified, which inhibits other loop optimizations.

1 Like

Safe code is allowed to produce garbage, it's just not allowed to be memory-unsafe.

So, for example, a sort is allowed to not sort if an Ord implementation doesn't meet the MUST requirements on the trait (totality, asymmetry, and transitivity). It can even panic in that case. It just can't invoke UB.

Or, to use a size_hint-related example, if the size_hint is (4, Some(4)), you're allowed to do anything safe if the iterator isn't actually that long -- for example you could collect using something like for _ in 0..4 { v.push(it.next().unwrap()); } and people wouldn't be allowed to complain about the panic that would emit if the iterator was actually shorter than 4 items, because it would mean that the iterator didn't meet its MUST requirements.

That's because it doesn't exist for optimization. It exists for two things:

  • because people like having a .len() on iterators sometimes
  • to be able to return a from-the-left position from Iterator::rposition (this is why it doesn't allow lengths above usize::MAX)
1 Like

By the way, if you do use SetLenOnDrop, I think it will be important to split your code into two loops: In the first loop, you only write to the pre-reserved buffer. The compiler will be able to see that self is never modified in this loop, and optimize it better. In the second loop, you write any remaining elements that don't fit in the pre-reserved space. In this loop you will need to potentially reserve more space before writing each element, so the same optimizations will not be possible.

One last note: I see that you also have an extend_from_slice method. In my experience, this covers the majority of uses cases for extend_from_exact_iter, without the implementation pitfalls.

@mbrubeck , thank you so much for all your help and explanations. It really made a difference in understanding this aspect, and I am humbled by having your time to explain this.

Yeah, I incorrectly assumed that ExactSizedIterator could be used in an unsafe env. Without that assumption, we do not benefit from extend_from_exact_iter at all.

It happens is that extend is not very fast. Basically, we currently ask our users to

  1. initialize a buffer with zeros (indirectly calling alloc_zeroed)
  2. use iter_mut to populate it

The hypothesis that I am testing is that we can offer a safe and faster API to build buffers by starting from an uninitialized buffer and growing it via extend.

It happens that, in the implementation I linked to, initializing with zero and using iter_mut is faster than extend_from_iter. This is not unexpected as the iter_mut does not perform extra checks. OTOH, the extend_from_exact_iter is faster (-25%) than iter_mut (but unsafe, as you mentioned).

So, we lose the convenience and expressiveness of collect (because people will go for faster option, even if it requires more code), as well as the speed of an implementation that a "TrustedLen" would offer.

One option is to offer an unsafe extend_from_trusted_len while TrustedLen is not stabilized.

Even if TrustedLen is stabilized, is there any trait to implement "ExtendFromTrustedLen" that would allow people to create a custom container and use TrustedLen (without the specialization stabilization, which I do not think I should approach given its complexity)), or would we need to start a new RFC for something like that?

I think there's a good chance this could be improved by optimizing extend with SetLenOnDrop and related changes.

In addition, methods like extend_from_slice (which you already have) and from_slice (which you could easily add) can provide safe, optimal replacements for extend and collect for many common ExactSizeIterator use cases. (How many of your use cases involve types that can be accessed as slices?)

It sounds like the current plan is to stabilize specialization first, and then stabilize traits like TrustedLen that are designed to be used with specialization. This will take longer, but the advantage is that we won't end up with new traits and methods that would become obsolete as soon as specialization is stable.

One other option for now would be to create your own unsafe TrustedLen-like trait and implement it for various standard iterators. Consumers of your crate could also implement the trait for their own iterators, but would need wrapper types to implement it for custom iterators from third-party libraries.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.