Determine the current index for a possibly advanced iterator

Hey all! I would like to be able to determine the index for an iterator that may have already been advanced.

let v = vec!["a", "b", "c"];

let mut iter = v.into_iter();
assert_eq!(iter.next(), Some("a"));

let mut iter = iter.index(); // Similar to `Iterator::enumerate`, but counting from the beginning of the collection.
assert_eq!(iter.next(), (Some(1), "b")); // Note that the index is 1, not 0!

For std::vec::IntoIter I can determine the current index by reading some private fields of the struct. This requires some unsafe code.

let index = unsafe {
    pub struct Global;
    #[allow(unused)]
    pub struct IntoIter<T, A = Global> {
        buf: std::ptr::NonNull<T>,
        phantom: PhantomData<T>,
        cap: usize,
        alloc: std::mem::ManuallyDrop<A>,
        ptr: *const T,
        end: *const T,
    }
    let iter = &*std::ptr::addr_of!(iter).cast::<IntoIter<T>>();
    if std::mem::size_of::<T>() == 0 {
        iter.ptr as usize
    } else {
        iter.ptr.offset_from(iter.buf.as_ptr()) as usize
    }
};

Since the memory layout for #[repr(Rust)] is not well defined, this code is unsound. Even if the representation was defined, the definition of std::vec::IntoIter is free to change which could invalidate the correctness.

Is there an API available in the standard library to determine the current index of an element for an iterator over a collection where each element has a well defined index? If not, are there any relevant proposals?

P.S. I think the best approach right now would be to require a user to write collection.into_indexed_iter() instead of collection.into_iter().index() to guarantee that the iterator has not been advanced. I would like to make the API similar to enumerate() though, but this may not be possible without unsafe code or a custom IntoIter implementation that tracks the index regardless of whether it is used.

Sounds very much like XYProblem to me.

If you want to work with iterator interface then that's, of course, impossible and such API wouldn't, shouldn't and couldn't be defined. E.g. “random iterator” which just generates random number when you need next one, couldn't know index of sequence in principle).

If you want to work with vectors and slices then you may, of course, create your own iterator which would be able to return index, when asked.

Some vital detail is missing.

Iterator is not guaranteed to track index and iterator is not guaranteed to be related to any container whatsoever.

Take Lines iterator, e.g.: it reads file line-by-line and can be applied to stdin, too (where it would request user to enter new line when it's needed).

What's the index here? Does it include part of stdin that was read before said iterator was created? How may it even know that? And even if not, is it worth keeping counter around just to ensure that someone, somewhere, sometime would be able to call index ?

1 Like

Sorry for not being clear. I am not asking to provide this functionality for all iterators. Only the ones where it makes sense. The iterators you mention indeed don't have a reasonable "index" defined.

I am indeed interested in Vec<T>, [T] and Box<[T]> mostly.

Like ExactSizeIterator, DoubleEndedIterator, there could be an IndexableIterator trait which can, if the iterator implementation allows it, provide an index. I can implement this iterator myself as mentioned before, but I would have to either 1) store additional data to track the index, or 2) read private fields with unsafe code and hope that the definition is correct.

And that's precisely and exactly the difference between Rust and C++.

What you are seeking is perfecly valid approach in C++: if some template function doesn't make some sense for one type and is usable for the other type then everything is fine as long as you don't use that function in a way where it couldn't be used.

Rust approach is the exact opposite: if your trait have some function then said functions is supposed to work unconditionally, for all implementations of trait.

And, of course, there are 3): implement the whole thing manually. Both IndexableIterator trait and Iterator trait.

If you want to propose addition of this thing to standard Rust then you may talk on IRLO but there question of why such thing is needed. What's the usecase?

1 Like

Why would I need to reimplement Iterator?

The use case is for strongly typed indices which shouldn't start at an arbitrary position in the collection.

Because, as you noticed, it's not possible to achieve what you want to achieve without doing unsafe (not just unsafe, but actually unsafe) tricks.

And if you really need something like that then it would be prudent to see if you may just implement what you need wuthout crazy stricks and only then look to see if you may use them to make something faster/smaller/etc.

1 Like

If you mean that I will have to reimplement Iterator implementations then I see that. Otherwise, I am not sure I need to reimplement the Iterator trait itself. An extension trait should do.

Completely agree with the sentiment towards using unsafe to read private fields. That was more for self educational purposes. I would rather store additional data than have that unsafe code if I was writing a library.

No you can't. The unsafe code you wrote is unsound, because the layout of default-repr structs is not guaranteed.

Require an Iterator<Item = (usize, T)> and call .enumerate() on the iterator.

2 Likes

What's wrong with enumerate()? collection.into_iter().enumerate() is the way to do this.

2 Likes

It's not sound. The layout of structs -- even if identically defined -- can differ, unless they have an explicit representation. (And you're otherwise relying on implementation details).

You can make a sound convertible version by wrapping up IntoIter from the beginning in various ways. Otherwise you need an additional method on the iterator to get the current offset or similar in addition to the existing methods.

3 Likes

Right. Does that also mean that there is no way in general to access private fields of foreign types if the type is #[repr(Rust)]?

I realize how badly I've phrased the question. I meant to ask if there is an Iterator API that provides the index of the elements being iterated over, where the index is the well defined index of the element from the original collection.

I guess the answer is no though, the std lib does not need such a thing. It would be nice if some of the iterators that can could expose a current_index(&self) -> usize.

There is nothing wrong with this but it means something different. enumerate works for any iterator and always starts counting from 0; it does not guarantee to return the index of an element in the original collection. Using enumerate in my example would return (Some(0), "b") instead of (Some(1), "b").

Got it, thanks!

Correct. They are called "private" for a reason. (Besides being impossible, you should also not do it, even if it were possible – privacy is used for upholding guarantees of unsafe code, so messing around with private state can cause additional unsoundness, unrelated to layout guarantees.)

Not in itself, but you can create such a wrapper. T is essentially equivalent to calling enumerate directly on the iterator returned from the collection, which should be quite easy to wrap in a newtype and/or bound. (You can directly require an Enumerate<slice::Iter<_>>, wich is only possible to create in the correct/expected way, as far as I'm aware.)

1 Like

Is std::io::Cursor<&[T]> maybe a better fit for what you are trying to achive?

2 Likes

It depends on how you modify the example. Why not stick the enumerate on the initial iterator, instead of creating a new type of iterator?

If you have a slice that is a sub-slice of a longer slice, do you want to get the index in the longer slice? This is impossible.

Is the to turn .into_iter().enumerate() into a single method so that .next() can not be called in between? While that is a fine solution, I want the API to resemble that of enumerate.

It may be impossible for [T], but you could track the index of the first element in the slice through a wrapper IndexSlice<'a T>(usize, &'a [T]) and IndexSliceMut<'a, T>(usize, &'a mut [T]) at the cost of storing an additional usize and having to reimplement the whole API or deref.

Edit: There is no way to implement Deref when you need to add some information to an &[T] and so that wouldn't work. You would have to obtain the IndexSlice* through methods that can return a newly constructed value.

If you haven't sliced the original yet, you could call a utility method that slices and then enumerates with the original indexes. Kinda strange, but...

    use std::ops::Range;

    fn enumerate_slice<T>(
        orig_slice: &[T],
        range: Range<usize>,
    ) -> impl Iterator<Item = (usize, &T)> {
        range.clone().zip(orig_slice[range].iter())
    }

    #[test]
    fn test() {
        let mut iter = enumerate_slice(&['a', 'b', 'c', 'd'], 1..3);
        assert_eq!(Some((1, &'b')), iter.next());
        assert_eq!(Some((2, &'c')), iter.next());
        assert_eq!(None, iter.next());
    }