Sketch of Vec, Slice and Array -> Iterator: Helpful? Clear?

Hello Everyone,

Iterators are clearly a powerful tool. I recall reading once that fold is the universal constructor. Any transformation can be expressed in terms of a fold. If that's true, then all the more power to iterators.

When first delving into Rust, I took a look at how the standard library implemented an iterator for Vec. I suppose one way of describing that experience: "drinking water from a fire-hose". I got wet and did not take much in.

In trying to find inspiration for how to design my current project, I thought I would take another crack at how to unify and consolidate my understanding of how Vec, slice and array iterate over T.

I have attached a schematic of what I hope is a useful resource for the community at large. To make sure it's so, I wonder if you might take a look to share some of your thoughts on how it might be improved. It's dense, but reflects the implementation as best as I could tell. Notwithstanding, I'm hoping I was able to pull a few patterns that resonate with others.

last updated Feb 16, 2021

Roughly speaking, the goals were to:

  • better understand the "namesake" of into_iter() when used with &Vec<T> compared to iter()
  • understand how IntoIterator and Iterator work together for these types (it seems so unnecessarily complicated; what don't I know?)
  • The three types are "same but different"; being so similar, especially when borrowing, how do their differences "play out" when iterating over T elements (e.g., similarly, arrays in C are performant in how they use pointers; how does it play here?)
  • Identify "where the action happens" and why so much indirection
  • Finally, I wanted to understand how traits interact with what I've been calling "go-between" structures in a way that seems powerful and productive (e.g., Iter and IterMut); what is their role and how to think more generally about them?

The chart describes the following (for T, &T and &mut T):

Vec<T> -> Iterator
slice -> Iterator
array -> Iterator

It describes the single path to a "move" semantic and the choices for the "borrows" (read-only and mut).

My intention is to provide early Rustaceans the benefit of my prior efforts, and to develop the presentation to something that resonates with the community at large. Ideally, it would be fodder to how the Rust Book introduces and develops the use of iterators and more generally as one of the ways to unify our data types to benefit from code-reuse.

Thank you in advance for any comments/questions!

PS: I could only include a .png; please feel free to reach out to request a .pdf version of the chart.

1 Like

If there weren't separate Iterator and IntoIterator traits, then collections would have to implement Iterator directly.

That is, however, not great – I'd say it's outright impossible in some cases. Because how do you store the state of the iterator, then? For a Vec, it might be possible to store an additional current_index: usize, but then the question arises: how would it yield the elements efficiently, one-by-one? (Since a vector can't just remove an element from its beginning without shifting everything else to the left.)

In other cases, e.g. slices, it's downright impossible to store the state – a slice is just a pointer and a length, so it can't have additional information embedded into it as to where it is in the iteration. While you can certainly "solve" this problem by just mutating the slice so that it starts pointing to the next element upon every iteration, this is surprising (since iterating over immutable elements will have changed the slice and you can't iterate over it twice), and becomes even harder/impossible to realize when the collection does need additional information, not just a bare index or pointer.

So, the IntoIterator trait serves the purpose of providing specialized Iterator-implementing types, which have the opportunity to steal the contents of, or point back to, the original collection being iterated over, while also providing any private state necessary for performing the iteration. This decouples iteration from storage, and makes collections a tad more memory efficient as well, which can add up (just think about how many Strings or Vecs your programs allocate).

4 Likes

Comments on the schematic:
I don't really understand the schematic. It's too dense and not very clear. I'm not sure how to read it.

  • There are just arrows everywhere.
  • I don't know what the vertical arrows mean.
  • Same for the blue arrows next to the word "Move" and the rectangle of orange and yellow arrows around the world "Borrow".
  • What do you mean by a "witness"? :eyes:
  • There are pieces of floating text where it's not clear what the text is related to or what the text is doing.
    For example, an orange arrow pointing to
core::slice
self.iter()
self.iter_mut()

doesn't mean much to viewers of the schematic. There are more examples of stuff whose relevance isn't clear.

  • Overusing various symbols as shorthand to make the schematic more dense makes it confusing to read. Use shorthand symbols sparingly, if at all.
    When I see a cloud with the word "What?" in it, it's not immediately clear what it means. I have to go look at the legend, speaking of which:
  • The legend in the bottom right is confusing. I wasn't sure what the table(?) under the "Move" was supposed to show. It took a long time to realize that the 1st row that says "T as value, &T..." was the header.
  • Tables should have clear borders or alternating fill for their rows, and columns. Headers should be bolded or have darker fill. Tables look more aesthetic without them, but being able to tell table elements apart is needed for readability.

I think in general, sacrificing readability for density isn't worth it. If you can make it smaller without impacting readability, or if there's another way to frame the information that's smaller and clearer, go ahead.

2 Likes

I'll note that having two different "interfaces" involved is actually normal. Even in .Net you have IEnumerable<out T> that the containers implement, with a GetEnumerator method which returns an IEnumerator<out T> that's the thing which actually iterates as you call the MoveNext method. (That structure maps surprisingly well to IntoIterator::into_iter and Iterator::next.)

3 Likes

There's a lot of overlap between the various methods and the various IntoIterator implementations.
There are 3 different implementations of IntoIterator involving Vec . They are:

impl<T> IntoIterator for Vec<T>             // 1
impl<'a, T> IntoIterator for &'a Vec<T>     // 2
impl<'a, T> IntoIterator for &'a mut Vec<T> // 3

and if you look at the code for implementation numbers 2 and 3, you'll see that they just use .iter() and .iter_mut(), respectively:

impl<'a, T> IntoIterator for &'a Vec<T> {
    type Item = &'a T;
    type IntoIter = slice::Iter<'a, T>;

    fn into_iter(self) -> slice::Iter<'a, T> {
        self.iter()
    }
}

impl<'a, T> IntoIterator for &'a mut Vec<T> {
    type Item = &'a mut T;
    type IntoIter = slice::IterMut<'a, T>;

    fn into_iter(self) -> slice::IterMut<'a, T> {
        self.iter_mut()
    }
}
// https://doc.rust-lang.org/1.49.0/src/alloc/vec.rs.html#2003-2021

The difference between all three is that #1 consumes the vec and gives you an iterator over owned items (Item = T).#2 borrows the vec and gives you an iterator over borrowed items (Item = &T).
#3 mutably borrows the vec and gives you an iterator over mutably borrowed items (Item = &mut T).

The reason that there is overlap between impl<'a, T> IntoIterator for &'a Vec<T> and vec.iter() is that it's convenient. It allows you to write code like:

fn main() {
    let mut vec = vec![1, 2, 3, 4, 5];
    for i in &vec { // iterate over just references
        // i == &i32
        // equivalent to `for i in vec.iter() {`
    }
    
    for i in &mut vec { // iterate over mutable references
        // i == &mut i32
        // equivalent to `for i in vec.iter_mut() {`
    }
    
    for i in vec { // iterate over the owned values
        // i == i32
    }
}

That explains the need for Iter and IterMut, structs that host the state of the iterator (and the required ref to the data). They could implement Iterator (as they do). Why not have Vec implement the method that instantiates Iter, IterMut in addition to IntoIter (as it does for the latter when impl IntoIterator)?...

In my current understanding: To enable coercion to “keep looking”, use Deref if need be... where Rust will find slice.iter() and slice.iter_mut() (Rust never has to coerce to find into_ iter(); this method is generated using monomorphization so finds it on the receiver). So, to unify Vec with slice for implementing an iterator for borrows. Agree?

... What about the IntoIterator trait?... what value does this indirection/interface provide?

You can't make a trait for .iter()/.iter_mut() because they take the lifetime of their borrowed receiver type as a parameter. If you try, you'll find that there's no way to name the return type:

trait Iterable {
    type Iter;
    fn iter(&self) -> Self::Iter;
}

This might seem to work just fine until you try to implement it for something:

impl<T> Iterable for [T] {
    type Iter = std::slice::Iter<'?, T>;
    // what do you put here: ----^^
    // to represent the lifetime chosen by the caller of this method?
    fn iter<'a> (&'a self) -> std::slice::Iter<'a, T>;
}

What you need is a generic associated type that can be parameterized with a lifetime without giving up its genericity:

trait Iterable {
    type Iter<'s>; // this associated type can be parameterized with any lifetime...
    fn iter<'a>(&'a self) -> Self::Iter<'a>; // e.g., with that of self
}

But that's not possible in today's Rust. It's in the works, though. (If the above explanation is confusing, here is a decent place to start)

Fortunately, although an Iterable trait would be nice sometimes, it's not extremely necessary because IntoIterator can be implemented for references, as already noted.

impl<'a> IntoIterator for &'a [T] { // the `'a` being part of the Self type...
    type IntoIter = std::slice::Iter<'a, T>; // allows you to use it here without genericity
    fn into_iter(self) -> Self::IntoIter;
}
2 Likes

Thank you for your response. Does what you are saying hold for creating methods that are part of the Vec implementation? (so not traits) This is how slice implements iter() (a slice method that instantiates a struct Iter)

To be honest, I don't understand what you are asking here. There's already Vec::iter() and Vec::iter_mut(), as well as IntoIterator impls for Vec, &Vec and &mut Vec with matching semantics.

Having inherent impls directly on Vec is useful for reducing noise in contexts where deref coercion does not occur. This is also the reason why there is a separate Vec::len() or Vec::is_empty() method. For example, higher-order functions. The following code would not compile if there were no such inherent methods:

fn total_length<T>(iter: impl Iterator<Item=&Vec<T>>) -> usize {
    iter.iter()
        .map(Vec::len)
        .sum()
}

and you would have to write a seemingly redundant closure, |vec| vec.len() instead.

It also simplifies interfaces, in generic code. It basically spares us a lot of explicit typing of .into_iter(), .iter(), .iter_mut() where a function should "obviously" accept an "iterable" type. Basically, if a function takes an Iterator directly, then you can't pass a Vec nor a slice, because they are not themselves Iterators. But that makes interfaces look dumb and confusing. Therefore, functions usually take IntoIterator instead, and call .into_iter() inside. Which of the following piece of code would you rather read and write?

// without IntoIterator
[1, 2, 3]
    .iter()
    .chain([4, 5, 6].iter())
    .zip(['a', 'b', 'c', 'd', 'e', 'f'].iter())

or:

// with IntoIterator
vec![1, 2, 3]
    .iter()
    .chain(&[4, 5, 6])
    .zip(&['a', 'b', 'c', 'd', 'e', 'f'])

In Rust, every kind of construct, trait, type, etc. has its well-defined, reasonable purpose — it's not like some other design-by-committee languages where features are incidental or purely legacy/historical artefacts.

2 Likes

I did not think those were implemented on Vec. I thought they require Deref in order to Vec -> slice. Right?

I’ll think more about what you are conveying here. The example with and without IntoIterator is elucidating.

it's not like some other design-by-committee languages

I concur 100%.

I'm not sure I understand the question. I thought you were asking why there wasn't a trait for iter_mut and iter like there is for into_iter, so that's what I responded to. Inherent methods (those not part of a trait) can have any signature they want, so [T]::iter works fine.

If the question is "why is iter defined for [T] but not Vec<T>?" it's not because you couldn't write Vec<T>::iter, but because you don't need to; you get it for free because Vec<T> dereferences to [T]. The same mechanism lets you call .iter() on other kinds of pointers, like &mut [T], Rc<[T]>, Box<[T]>, &Rc<Box<&Ref<&mut Arc<&&mut [T]>>>>... it's a composable feature.

Sorry, yes, I was confusing that with <&[mut] Vec as IntoIterator>, and I think they'd be useful, although not critical.

Thank you for your detailed and constructive feedback. By definition of your experience reading it, I agree. There are a "boat-load" of arrows. If I were presenting this in person, I would do so over several slides and with a build for some of the ideas. But that is not the format. Notwithstanding, I have iterated on the chart to reduce the complexity of both the color scheme and number of arrows. I made other changes to try and clarify. This all said, it remains a WIP while I continue to integrate the feedback received to-date.

I have thus far kept the idea of a "witness" but changed the label to "instantiate". In my understanding, a "witness" in our context, is used to communicate to the compiler what generics and traits need to be implemented based on the concrete types instantiated in the code. The term is more often used to convey the existence and validity of a relation between types by definition of instantiation (an idea that is related to "proof by instantiation" when working with strongly typed languages).

I'm still playing with how this idea might be useful here; I could be off base. What I'm trying to internalize/intuit the "defining a function and function application" is to "specifying a composition of traits held together by a common output/input type spec and executing that chain"... rough. But note how IntoIterator and Iterator traits are composed using IntoIter (output of one, input to the other). The realization of that composition occurs when the code calls a function that instantiates an instance of IntoIter.

Yes, this is what I was talking about. Having gone through this exercise, I can clearly see why we don't need to... it comes for free from the slice representation of Vec... that and many other functions. Moreover, if I did implement it, I would prevent/frustrate an inherently powerful feature of how the dot operator works in Rust.

This is impressive: create a single layer of references using dot notation to access methods implemented at different layers of the data type.

  • use a trait method on Vec to move elements out
  • use the power of the dot notation to access the "down-casted" version of the type to borrow elements. This is made possible by in fact, actively not having the iter and iter_mute functions implemented for Vec.

More generally, when thinking about the functionality for the abstractions consider:

struct FullFeatured<T> {
  data: T,
  other: Other
}

// borrow some view of the data; but that's it, never own the data.
struct DataOnly<'a, T> {
   data: &'a T
}

I wonder if DataOnly can ever own the data? in other words, what we see play-out in Vec and slice is a general truism. Do you know what I mean?

If this is true, then a general pattern would also be to have the majority of the functions that operate on the data, do so as part of the DataOnly implementation... That also means that the functions implemented on FullFeatured must have something "special" about them; is it only that they need ownership?

Updated version v4.1. Still a WIP but is a simpler, and likely more accurate representation.

You might be drawing connections I didn't mean to make. Let's back up a minute.

  1. It's true that .into_iter() on a Vec<T> consumes the Vec and yields Ts, while .iter() and .iter_mut() do not consume and do not yield Ts.
  2. It's also true that .into_iter() is a method of a trait, IntoIterator, while .iter() and .iter_mut() are inherent methods.
  3. Finally, it's true that .into_iter() is implemented on Vec<T> precisely, while .iter() and .iter_mut() are implemented for [T] and used on Vec through Deref and DerefMut.

But these three facts are essentially orthogonal; they have almost nothing to do with each other. You can't connect any of them with the word "because" or "therefore" because they all have different reasons.

  1. is just the definition of how iteration for those items works.
  2. is a consequence of the fact that there is no trait for iter_mut() and iter(). Partly this is because such a trait is inexpressible without GATs, as I mentioned earlier; however, you also don't really need a trait version of those methods because slices (and all well-behaved data structures) just make foo.iter() an alias for (&foo).into_iter() and foo.iter_mut() an alias for (&mut foo).into_iter(). There is no need for Iterable and IterableMut traits to implement on Vec because they would be redundant with the IntoIterator implementations that already exist for &Vec and &mut Vec.
  3. is simply the consequence of two (unrelated) limitations in the Rust language itself. In the first place, it's not possible to implement IntoIterator for [T] without breaking backwards compatibility, although the limitation that originally prevented this from working has since been removed. But it's also not possible to call self-taking methods on non-Copy receivers through a pointer unless the pointer is specifically Box, because there's no DerefMove trait. This is another thing that has been proposed but no solid design for DerefMove has ever been established and it is less popular than GATs, so it probably won't be implemented any time soon.

My point is that there's not a grand unified theory that explains all these things, like you seem to be grasping for. All these facts are true but for different and mostly unrelated reasons. Don't look too hard for patterns or you'll start to see them where they aren't there.

As for other data structures that relate to each other like Vec<T> and [T], consider String and str, PathBuf and Path. The thing to note about these is that none of the "borrowed" types are references; instead, they're the thing being referenced, because that's the only way you can implement Deref. Your suggested system of FullFeatured<T> and DataOnly<'_, T> would not be able to implement Deref correctly (although I think you could do something with trait objects...)

2 Likes

Thank you. You are indeed getting at the heart of what I am trying to accomplish. So, I will lower my expectation (if you will :). What I'm after is a consistent sequence of "how to apply" in a way that is unified in that it is consistent throughout how the functions might be composed, and otherwise to re-use code in a scalable way.

You made several helpful points and demonstrations. When you said:

... in terms of "what comes first, chicken or the egg" kind of thing, are you suggesting implement the logic in (&foo).into_iter(), then use it to implement foo.iter()?

Am I right to relate what you are saying here to the array in C. So, "they're the thing being referenced" is like my using a pointer to access the array itself (the array variable is a pointer in C). &str is not a borrow but how we access the data itself. Yes?

In going from FullFeatured to DataOnly I believe you said that I would not be able to correctly implement Deref. I thought that would have been the case for AsRef but not Deref. I have been modeling deref as a version of my struct that is the serializable version of my struct (i.e., show me the data minus any functions)... also as the properties that are likely to define equality. Is that a flawed view in your opinion?

In the link you posted

Is there any reason Box<[T]> shouldn't implement IntoIter ?

IntoIter is a struct, or a type spec made by the IntoIterator trait. How does this suggestion make sense?

One last question: If you have a data type that hosts a collection of data, generic over T, how do you think about the design of how to iterate over the data?

Would that not undermine the ownership boundary? I.e., let a borrow take ownership is precisely not allowed. True unless working in an “inner mutability” context created using Box?

This makes sense. The chain, zip, eq_cmp etc... all require a second Iterator but does so via IntoIterator. Precisely via U: IntoIter; a spec expressed using a trait alias of IntoIterator with one of the two type parameters specified:
IntoIterator<Item = Self::Item> where IntoIterator has two type parameters Item and IntoIter.

fn chain<U>(self, other: U) -> Chain<Self, U::IntoIter>
    where
        Self: Sized,
        U: IntoIterator<Item = Self::Item>,
    {
        Chain::new(self, other.into_iter())
    }

Similarly, the for loop.

Something was still not fully resonating with me about the design until re-reading the posts this morning. Your intuition in a prior post to talk about why a Collection generally is not the right place to store the state of the Iterator, was on to something.

If we submit that a Collection cannot also be an Iterator AND we want to treat a Collection as Iterator "on demand", how might we accomplish this?

This is a powerful construct. You have two structures that are inherently incompatible with each other, in this case one is an observer of the other (not allowed to be self-aware :)), but using the type system and traits, we create an infrastructure that treats each type as if they were "one in the same".

That's big. I get the idea of unifying types, but I did not see this one before now.

Iterator<Item = T> ~ Collection<T>

// some combination of coercion and dispatch
Collection<T> -> Iterator<Item = T>
Iterator<Item = T> -> Collection<T>

I suspect this might help simplify and convey an "inherent truth" about iterating over Vec, Array and slices (keeping in mind what @trentj mentioned earlier; don't look for something that isn't there).

Do you know where else I see this general "incompatibility"? Types that read from themselves to generate a report/custom view of "self", i.e., a subset of self-referencing data structures. The ones that I would exclude are those where the "view" is an element of the collection used to extend the collection; so truly conceptually inseparable (e.g., a struct that hosts a list of types, where a new type is expressed in terms of the types already in the collection).

No, they all take/require an IntoIterator (but every Iterator is also an IntoIterator)

2 Likes