Determine the current index for a possibly advanced iterator

mickvangelderen · December 5, 2023, 5:54pm

Hey all! I would like to be able to determine the index for an iterator that may have already been advanced.

let v = vec!["a", "b", "c"];

let mut iter = v.into_iter();
assert_eq!(iter.next(), Some("a"));

let mut iter = iter.index(); // Similar to `Iterator::enumerate`, but counting from the beginning of the collection.
assert_eq!(iter.next(), (Some(1), "b")); // Note that the index is 1, not 0!

For std::vec::IntoIter I can determine the current index by reading some private fields of the struct. This requires some unsafe code.

let index = unsafe {
    pub struct Global;
    #[allow(unused)]
    pub struct IntoIter<T, A = Global> {
        buf: std::ptr::NonNull<T>,
        phantom: PhantomData<T>,
        cap: usize,
        alloc: std::mem::ManuallyDrop<A>,
        ptr: *const T,
        end: *const T,
    }
    let iter = &*std::ptr::addr_of!(iter).cast::<IntoIter<T>>();
    if std::mem::size_of::<T>() == 0 {
        iter.ptr as usize
    } else {
        iter.ptr.offset_from(iter.buf.as_ptr()) as usize
    }
};

Since the memory layout for #[repr(Rust)] is not well defined, this code is unsound. Even if the representation was defined, the definition of std::vec::IntoIter is free to change which could invalidate the correctness.

Is there an API available in the standard library to determine the current index of an element for an iterator over a collection where each element has a well defined index? If not, are there any relevant proposals?

P.S. I think the best approach right now would be to require a user to write collection.into_indexed_iter() instead of collection.into_iter().index() to guarantee that the iterator has not been advanced. I would like to make the API similar to enumerate() though, but this may not be possible without unsafe code or a custom IntoIter implementation that tracks the index regardless of whether it is used.

khimru · December 5, 2023, 6:20pm

Sounds very much like XYProblem to me.

If you want to work with iterator interface then that's, of course, impossible and such API wouldn't, shouldn't and couldn't be defined. E.g. “random iterator” which just generates random number when you need next one, couldn't know index of sequence in principle).

If you want to work with vectors and slices then you may, of course, create your own iterator which would be able to return index, when asked.

Some vital detail is missing.

Iterator is not guaranteed to track index and iterator is not guaranteed to be related to any container whatsoever.

Take Lines iterator, e.g.: it reads file line-by-line and can be applied to stdin, too (where it would request user to enter new line when it's needed).

What's the index here? Does it include part of stdin that was read before said iterator was created? How may it even know that? And even if not, is it worth keeping counter around just to ensure that someone, somewhere, sometime would be able to call index ?

mickvangelderen · December 5, 2023, 6:22pm

Sorry for not being clear. I am not asking to provide this functionality for all iterators. Only the ones where it makes sense. The iterators you mention indeed don't have a reasonable "index" defined.

I am indeed interested in Vec<T>, [T] and Box<[T]> mostly.

Like ExactSizeIterator, DoubleEndedIterator, there could be an IndexableIterator trait which can, if the iterator implementation allows it, provide an index. I can implement this iterator myself as mentioned before, but I would have to either 1) store additional data to track the index, or 2) read private fields with unsafe code and hope that the definition is correct.

khimru · December 5, 2023, 6:33pm

And that's precisely and exactly the difference between Rust and C++.

What you are seeking is perfecly valid approach in C++: if some template function doesn't make some sense for one type and is usable for the other type then everything is fine as long as you don't use that function in a way where it couldn't be used.

Rust approach is the exact opposite: if your trait have some function then said functions is supposed to work unconditionally, for all implementations of trait.

And, of course, there are 3): implement the whole thing manually. Both IndexableIterator trait and Iterator trait.

If you want to propose addition of this thing to standard Rust then you may talk on IRLO but there question of why such thing is needed. What's the usecase?

mickvangelderen · December 5, 2023, 6:36pm

Why would I need to reimplement Iterator?

The use case is for strongly typed indices which shouldn't start at an arbitrary position in the collection.

khimru · December 5, 2023, 6:46pm

Because, as you noticed, it's not possible to achieve what you want to achieve without doing unsafe (not just unsafe, but actually unsafe) tricks.

And if you really need something like that then it would be prudent to see if you may just implement what you need wuthout crazy stricks and only then look to see if you may use them to make something faster/smaller/etc.

mickvangelderen · December 5, 2023, 6:49pm

If you mean that I will have to reimplement Iterator implementations then I see that. Otherwise, I am not sure I need to reimplement the Iterator trait itself. An extension trait should do.

Completely agree with the sentiment towards using unsafe to read private fields. That was more for self educational purposes. I would rather store additional data than have that unsafe code if I was writing a library.

H2CO3 · December 5, 2023, 7:23pm

No you can't. The unsafe code you wrote is unsound, because the layout of default-repr structs is not guaranteed.

Require an Iterator<Item = (usize, T)> and call .enumerate() on the iterator.

tczajka · December 5, 2023, 7:24pm

What's wrong with enumerate()? collection.into_iter().enumerate() is the way to do this.

quinedot · December 5, 2023, 7:25pm

It's not sound. The layout of structs -- even if identically defined -- can differ, unless they have an explicit representation. (And you're otherwise relying on implementation details).

You can make a sound convertible version by wrapping up IntoIter from the beginning in various ways. Otherwise you need an additional method on the iterator to get the current offset or similar in addition to the existing methods.

mickvangelderen · December 5, 2023, 7:44pm

Right. Does that also mean that there is no way in general to access private fields of foreign types if the type is #[repr(Rust)]?

I realize how badly I've phrased the question. I meant to ask if there is an Iterator API that provides the index of the elements being iterated over, where the index is the well defined index of the element from the original collection.

I guess the answer is no though, the std lib does not need such a thing. It would be nice if some of the iterators that can could expose a current_index(&self) -> usize.

There is nothing wrong with this but it means something different. enumerate works for any iterator and always starts counting from 0; it does not guarantee to return the index of an element in the original collection. Using enumerate in my example would return (Some(0), "b") instead of (Some(1), "b").

Got it, thanks!

H2CO3 · December 5, 2023, 7:50pm

Correct. They are called "private" for a reason. (Besides being impossible, you should also not do it, even if it were possible – privacy is used for upholding guarantees of unsafe code, so messing around with private state can cause additional unsoundness, unrelated to layout guarantees.)

Not in itself, but you can create such a wrapper. T is essentially equivalent to calling enumerate directly on the iterator returned from the collection, which should be quite easy to wrap in a newtype and/or bound. (You can directly require an Enumerate<slice::Iter<_>>, wich is only possible to create in the correct/expected way, as far as I'm aware.)

Bruecki · December 5, 2023, 7:50pm

Is std::io::Cursor<&[T]> maybe a better fit for what you are trying to achive?

tczajka · December 5, 2023, 8:03pm

It depends on how you modify the example. Why not stick the enumerate on the initial iterator, instead of creating a new type of iterator?

If you have a slice that is a sub-slice of a longer slice, do you want to get the index in the longer slice? This is impossible.

mickvangelderen · December 5, 2023, 8:22pm

Is the to turn .into_iter().enumerate() into a single method so that .next() can not be called in between? While that is a fine solution, I want the API to resemble that of enumerate.

It may be impossible for [T], but you could track the index of the first element in the slice through a wrapper IndexSlice<'a T>(usize, &'a [T]) and IndexSliceMut<'a, T>(usize, &'a mut [T]) at the cost of storing an additional usize and having to reimplement the whole API or deref.

Edit: There is no way to implement Deref when you need to add some information to an &[T] and so that wouldn't work. You would have to obtain the IndexSlice* through methods that can return a newly constructed value.

jumpnbrownweasel · December 5, 2023, 8:22pm

If you haven't sliced the original yet, you could call a utility method that slices and then enumerates with the original indexes. Kinda strange, but...

    use std::ops::Range;

    fn enumerate_slice<T>(
        orig_slice: &[T],
        range: Range<usize>,
    ) -> impl Iterator<Item = (usize, &T)> {
        range.clone().zip(orig_slice[range].iter())
    }

    #[test]
    fn test() {
        let mut iter = enumerate_slice(&['a', 'b', 'c', 'd'], 1..3);
        assert_eq!(Some((1, &'b')), iter.next());
        assert_eq!(Some((2, &'c')), iter.next());
        assert_eq!(None, iter.next());
    }

system · March 4, 2024, 8:23pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Access iterator without advancing it help	6	1369	March 31, 2024
Getting current item of iterator help	15	876	October 14, 2024
IntoIter<T> and current() help	16	399	October 14, 2024
Access iterator in for loop; refactorings not working; mutable referance to non-mutable object; sharing large base class	1	823	January 12, 2023
Add .iter_idx() and .iter_mut_idx()	4	1290	January 12, 2023

Determine the current index for a possibly advanced iterator

Related topics