The Relation Between Collections and Iterators


#1

It seems to me the safety of iterators over collections is enforced by “into_iter” taking ownership of the collection.

This seems at odds with the common uses of collections, for example I have a free memory list that should last the lifetime of the program. I have various algorithms that need to iterate over the list, like find block, merge blocks, that get run at various times (on allocations, on free etc). It seems the need to insert or delete nodes requires access to the collection itself.

What is the ‘Rustic’ way to do these things?


#2

I think that you usually want iter() or iter_mut(), not into_iter().
There the safety is provided by borrowing, not by taking ownership.


#3

How would I do that myself. For example if I define:

trait Iterable {
    type IteratorType;
    fn begin(&mut self) -> Self::IteratorType;
    fn end(&mut self) -> Self::IteratorType;
}

I can create multiple iterators from the same collection, so how do I tie the returned iterator to the lifetime of the collection?


#4

The implementer should add a lifetime parameter on their IteratorType, like slice::Iter has.

Your begin and end put up a warning flag to me that you’re thinking like C++ iterators, basically cursors. A Rust iterator is usually a self-contained range, like what you’d have with a pair of C++ iterators.


#5

I am implementing the iterators from Stepanov’s Elements of Programming in Rust, and they currently work like C++ iterators. Stepanov’s iterators and coordinates go much further into the classification of access patterns as an abstract algebra than Rust does, so you have:

  • single-pass-forward-iterator
  • multi-pass-forward-iterator
  • bidirectional-iterator
  • indexed-iterator
  • random-access-iterator

and then you move on to multi-dimensional iterators and coordinate structures like:

  • bifurcating-coordinate
  • bidirectional-bifurcating-coordinate
  • coordinates with mutable successors

You can see the elements_in_rust code here.

I need to resolve the naming issue above, as I would rather have as systematic way of doing it to avoid having to think of new names all the time. I think that value semantics should be the default, so:

fn half_nonnegative(mut self) -> Self

That leaves the other case:

fn replace_with_half_nonnegative(&mut self)

seems a bit long? maybe:

fn alter_half_nonnegative(&mut self)

I am not sure if I am going to leave the iterators like C++ iterators, or if they should be ranges like the Rust ones. I don’t think Rust iterators will work on trees and higher dimensional structures. So I have two choices, and I would be interested in opinions. I could rename the iterators ‘Cursors’ or ‘Coordinates’ and leave them “unsafe” as an alternative to the Rust built-in ones, or I could try and express the algorithms in terms of Rusts ‘Iterators’ (that are actually ranges).


#6

For examples of Cursor-based APIs in Rust, have a look at these two crates:


#7

Those look interesting, but are a different approach focusing on the collections themselves. The elements of programming stuff is more like a library of algorithms, and you can use them by implementing the API on your collection, which is very minimal.


#8

I believe we’ve already talked about the need for a certain form of higher kindedness to express an iterator that can yield overlapping mutable references. The Iterable trait has the same problem, I think. You want to write something like this, but you cannot:

trait Iterable {
    type Elem;
    type Iter<'_> where for<'x> Iter<'x>: Iterator<Item=&'x Elem>;
    fn iter(&'a self) -> Self::Iter<'a>;
}

#9

I don’t understand this, as it seems like it can be done easily like this:

impl<T> Iterable for [T] {
    type IteratorType = SliceIterator<T>;
    fn begin(&mut self) -> Self::IteratorType {
        SliceIterator::new(self.first_mut().unwrap())
    }
    fn end(&mut self) -> Self::IteratorType {
        SliceIterator::new(unsafe{(self.last_mut().unwrap() as *mut T).offset(1)})
    }
}

#[derive(Clone, PartialEq, Debug)]
struct SliceIterator<T> {
    ptr : *mut T
}

impl<T> SliceIterator<T> {
    fn new(r : *mut T) -> SliceIterator<T> {
        SliceIterator {
            ptr : r
        }
    }
}

impl<T> Reference for SliceIterator<T> where T : Regular {
    type ValueType = T;
}

impl<T> Readable for SliceIterator<T> where T : Regular {
    fn source(&self) -> &T {
        let v : &T;
        unsafe {v = &*((*self).ptr);}
        v
    }
}

impl<T> Writable for SliceIterator<T> where T : Regular {
    fn sink(&self) -> &mut T {
        let v : &mut T;
        unsafe {v = &mut *((*self).ptr);}
        v
    }
}

impl<T> Mutable for SliceIterator<T> where T : Regular {
    fn deref(&self) -> &mut T {
        let v : &mut T;
        unsafe {v = &mut *((*self).ptr);}
        v
    }
}

impl<T> Iterator for SliceIterator<T> where SliceIterator<T> : PartialEq, T : Regular {
    type DistanceType = usize;
    fn alter_successor(&mut self) {
        unsafe {self.ptr = self.ptr.offset(1);}
    }
    fn alter_add(&mut self, n : Self::DistanceType) {
        let m : isize = num::NumCast::from(n).unwrap();
        self.ptr = unsafe {self.ptr.offset(m)}
    }
    fn dif(&self, f : Self) -> <Self as Iterator>::DistanceType {
        num::NumCast::from(
            (self.ptr as usize - f.ptr as usize) / mem::size_of::<T>()
        ).unwrap()
    }
}

#10

Let me rephrase the question. I implemented an iterable trait, free iterators (don’t have to be paired), that allow multiple mutable overlapping regions in a straightforward way and did not encounter a need for higher kinded types. In fact it seems if you hold the reference the the collection in a raw pointer you can do whatever you like as in C/C++. So I don’t understand the statement that you need HKT to do this.


#11

I can very easily produce a dangling pointer from the code you provided:

fn dangle() -> SliceIterator<i32> {
    let mut arr = [1, 2, 3];
    SliceIterator::new(&mut arr);
}

Try it on playpen (It should print 1, instead it prints 32767).

In order to avoid that, the lifetime of Self::IteratorType needs to be bound to the lifetime of the borrow in begin and end, but it can’t be, because that would require Self::IteratorType to be a type constructor of the kind lifetime -> type.

This is easily the most immediate use for higher kinded polymorphism in Rust - using lifetime -> type associated type constructors to create ‘view’ types.


#12

I have added a phantom lifetime parameter to the SliceIterator that I think prevents the dangling pointer. This appears to prevent the problem whist still allowing multiple mutable iterators. What do you think? https://is.gd/nVm1V1

Is there going to be a runtime cost (space or time) to using the phantom type?

Edit: It seems to me the formulation of Rust iterators which combine Stepanov’s Readable and Iterator traits into a single trait is the cause of the need for HKT. Separating the traits, so that Readable has a lifetime parameter but Iterator does not, appears to make the problem go away in a very simple way. My only concern is if the phantom type is generating any runtime code.


#13

Can’t you work around that with an intermediate trait and HRTB? https://is.gd/Hnr7pa

although I’m not sure if that works for non-'static types (maybe requires specialization?)


#14

Yes, but this is convoluted compared to something that just works.


#15

Your link doesn’t work, so I’m not sure what you did. Phantom types do not have a runtime cost.


#16

The link works for me, so Im not sure why (do I need to use Gist instead of Shorten?). Ill cut and paste the code from the playground here:

trait Readable<'a, T> {
    fn source(&'a self) -> &'a T;
}

trait Iterator : PartialEq {
    type DistanceType;
    fn alter_successor(&mut self);

    fn successor(mut self) -> Self where Self : Sized {
        self.alter_successor();
        self
    }
}

trait Iterable<'a> {
    type IteratorType : Iterator + 'a;
    fn begin(&mut self) -> Self::IteratorType;
    fn end(&mut self) -> Self::IteratorType;
}

#[derive(Clone, PartialEq, Debug)]
struct SliceIterator<'a, T> {
    ptr : *mut T,
    phantom: std::marker::PhantomData<&'a ()>
}

impl<'a, T> SliceIterator<'a, T> where T : 'a + PartialEq {
    fn new(r : *mut T) -> SliceIterator<'a, T> {
        SliceIterator {
            ptr : r,
            phantom : std::marker::PhantomData
        }
    }
}

impl<'a, T> Readable<'a, T> for SliceIterator<'a, T> {
    fn source(&self) -> &T {
        let v : &T;
        unsafe {v = &*((*self).ptr);}
        v
    }
}

impl<'a, T> Iterator for SliceIterator<'a, T> where SliceIterator<'a, T> : PartialEq {
    type DistanceType = usize;
    fn alter_successor(&mut self) {
        unsafe {self.ptr = self.ptr.offset(1);}
    }
}

impl<'a, T> Iterable<'a> for [T] where T : 'a + PartialEq {
    type IteratorType = SliceIterator<'a, T>;
    fn begin(&mut self) -> Self::IteratorType {
        SliceIterator::new(self.first_mut().unwrap())
    }
    fn end(&mut self) -> Self::IteratorType {
        SliceIterator::new(unsafe{(self.last_mut().unwrap() as *mut T).offset(1)})
    }
}

//fn dangle() -> SliceIterator<'a, i32> {
//    let mut arr = [1, 2, 3];
//    arr.begin();
//}

fn main() {
    let mut v = [4, 5, 6];
    let mut f = v.begin();
    let l = v.end();
    while f != l {
        println!("{:?}", f.source());
        f = f.successor();
    }
    //println!("{}", *dangle().source());
}

#17

The link works for me now, it was an issue with the server. However, that does not prevent dangling pointers because you haven’t associated the lifetime parameter with the borrow in begin or end. https://is.gd/biNeSv

Once you’ve done that like this, it will work:

trait Iterable<'a> {
    type IteratorType : Iterator + 'a;
    fn begin(&'a mut self) -> Self::IteratorType;
    fn end(&'a mut self) -> Self::IteratorType;
}

However, now a T: Iterable has to be a T: for<'a> Iterable<'a>. There may also be more intricate lifetime issues in complex scenarios because of the way this lifetime parameter hasn’t been encapsulated.

None of this is different between the way that you have defined the iterator trait and the standard library’s definition of iterators.


#18

I can see that now, and it answers my earlier question about how to link the iterator to the underlying collection lifetime.


#19

I am getting closer, I have fixed it so you cannot create a dangling pointer, but still allows multiple write iterators to be held concurrently see code in the Rust Playground.

First question: The Iterable trait I am using seems to work fine? What problem was Comex trying to fix above? Does my formulation not suffer from this problem, or am I missing the test case to pick it up (what is the problem case)?

There is another problem, as I am trying to separate the iteration from the dereferencing, currently you can generate a segmentation fault if you try to pass a read-only collection see broken code in the Rust Playground.

Second question: Obviously the ‘Writable’ trait should only be defined if we are iterating a mutable collection. Is there any way to check if something is mutable before borrowing it immutably?


#20

Regarding the naming of functions above, I notice that the type classes for “+” and “+=” are Add and AddAssign respectively.

So perhaps the following makes more sense:

fn half_nonnegative(mut self) -> Self

fn half_nonnegative_assign(&mut self)