Problems with Iterator impl (writing a Linked List)

Hi,

i'm trying to get into Rust and thought of writing a linked list, which by now helped me a lot to dig into the language and its documentation.

My approach was creating two structs, a Node and a List Struct and maybe an own implementation of a list iterator, either using a struct or implementing the standard into_iter impl of Rust itself on the List and the Node struct.

To get to the point, i wrote an add function to create a node holding a i32 value and a pointer that should be able to reference to the next node, if a new node is created. To test this i needed to implement an iterator to print out the actual nodes, but that is were i get stuck.

I can iterate through the created linked list, where i implemented the into_iter() trait, but that just gives me the address where the linked list is created, which means, i also need to iterate through all dependent nodes in the linked list to actually get the i32 values contained in the nodes.

this is the code:

list.rs

use std::ptr::read;
use std::fmt::{ Display, Formatter, Result };

pub struct List {
    head: *const Node
}

impl List {
    pub fn new() -> Self {
       List {
           head: ptr::null()
       }
    }

    pub fn add(&mut self, value: i32) {
        if self.head == ptr::null() {
            self.head = &Node::new(value);
            return
        }

        let mut _ptr1 = self.head;

        unsafe {
            while (*_ptr1).next != ptr::null() {
                _ptr1 = (*_ptr1).next;
            }
        }

        let mut _ptr2 = Node::new(value);

        unsafe {
            read(_ptr1).next = &_ptr2;
        }
    }
}

impl Display for List {
    fn fmt(&self, f: &mut Formatter) -> Result {
        write!(f, "List: {}", self)
    }
}

pub struct Node {
    value: i32,
    next: *const Node
}

impl Node {
    pub fn new(val: i32) -> Self {
        Node {
            value: val,
            next: ptr::null()
        }
    }
} 

impl IntoIterator for List {
    type Item = *const Node;
    type IntoIter = ListIntoIterator;

    fn into_iter(self) -> Self::IntoIter {
        ListIntoIterator { list: self, index: 0 }
    }
}

pub struct ListIntoIterator {
    list: List,
    index: usize,
}

impl Iterator for ListIntoIterator {
    type Item = *const Node;
    fn next(&mut self) -> Option<Self::Item> {
        let result = match self.index {
            0 => self.list.head, // List 1
            // {
            //     for node in self.list.head.into_iter() {
            //        return node.value;
            //     }
            // },
            1 => self.list.head,
            // 2 => self.list.head,
            _ => return None,
        };
        self.index += 1;
        Some(result)
    }
}

impl IntoIterator for Node {
    type Item = i32;
    type IntoIter = NodeIntoIterator;

    fn into_iter(self) -> Self::IntoIter {
        NodeIntoIterator { node: &self, index: 0 }
    }
}

pub struct NodeIntoIterator {
    node: *const Node,
    index: usize,
}

impl Iterator for NodeIntoIterator {
    type Item = i32;
    fn next(&mut self) -> Option<i32> {
        unsafe{
        let result = match self.index {
            0 => (*self.node).value, // Node 1
            1 => (*self.node).value, // Node 2
            2 => (*self.node).value, // Node 3
            _ => return None,
        };
        self.index += 1;
        Some(result)
        }
    }
}

impl Display for Node {
    fn fmt(&self, f: &mut Formatter) -> Result {
        write!(f, "Node: {}", self)
    }
}

main.rs

pub mod list;

use list::List;

fn main() {
    let mut list1 = List::new();
    list1.add(3);
    list1.add(6);
    
    // let mut list2 = List::new();

    for _i in list1.into_iter() {
        println!("{:?}", &_i);
        // for j in _i.into_iter() {
        //     println!("{}", j);
        // }
    };
}

What can i do to make this work? Also, how can i improve this code? ( e.g.: am i using unsafe where i shouldn't or do i have to implement a list iterator myself to make this work more properly by using raw pointers?)

i hope this is a legit thread and not too much to ask, but i'd be glad for any help to make this implementation of the linked list work.

kind regards,
Timo

I'm not sure why you're trying to use index and match on it. The iterator should hold a pointer to the next list element.

As for unsafety and impossibility to make the borrow checker happy, the obligatory reading: Learning Rust With Entirely Too Many Linked Lists

2 Likes

Thanks for the link! Really helps understanding the basics even more. By the way, even though there are solutions offered to it, i'm still tempted to work on my implementation just for learning sakes and was asking myself why my add implementation doesn't work as expected. While debugging i found out that adding the first two nodes, including values, isn't a problem, but after adding a third it acts like its memory is deallocating and starting the head node from new....could this be because im not using the heap aka Boxes to keep the content longer in memory?

Here's the slightly changed approach:

list.rs

use std::ptr;
use std::fmt::{ Display, Formatter, Result };

pub struct List {
    head: *mut Node
}

impl List {
    pub fn new() -> Self {
       List {
           head: ptr::null_mut()
       }
    }

    pub fn add(&mut self, value: i32) {
        if self.head == ptr::null_mut() {
            self.head = &mut Node::new(value);
            return ()
        }

        let mut _ptr1 = self.head;
        // let mut _ptr3 = ptr::null_mut();
        // self.head = _ptr3;

        unsafe {
            while (*_ptr1).next != ptr::null_mut() {
                _ptr1 = (*_ptr1).next;
            }
        }

        let mut _ptr2 = ptr::null_mut();
        _ptr2 = &mut Node::new(value);

        unsafe {
            if (*_ptr1).next == ptr::null_mut() {
                (*_ptr1).next = _ptr2;
            }
        }
    }
}

impl Display for List {
    fn fmt(&self, f: &mut Formatter) -> Result {
        write!(f, "List: {}", self)
    }
}

pub struct Node {
    value: i32,
    next: *mut Node
}

impl Node {
    pub fn new(val: i32) -> Self {
        Node {
            value: val,
            next: ptr::null_mut(),
        }
    }
}

impl IntoIterator for List {
    type Item = i32;
    type IntoIter = ListIntoIterator;

    fn into_iter(self) -> Self::IntoIter {
        ListIntoIterator { list: self }
    }
}

pub struct ListIntoIterator {
    list: List,
}

impl Iterator for ListIntoIterator {
    type Item = i32;
    fn next(&mut self) -> Option<i32> {
        let mut start = self.list.head;
        unsafe {
            if (*start).next != ptr::null_mut() {
                start = (*start).next;
            } 
        
            return Some((*start).value);
        }
    }
}

Yes. Let's take a look:

  • Node::new(value) constructs a value of type Node. Since rust is not a managed language, no allocation occurs; the value and next fields are stored directly on the stack.
  • &mut Node::new(value) takes the address of that temporary on the stack.
  • That address is stored in self.
  • return releases the stack frame, and thus the Node becomes uninitialized. No deallocation occurs, but the pointers are nonetheless invalid and the stack space they point to will be freely used by other functions.

There is a simple transformation you can usually apply to solve this sort of problem by performing heap allocation:

        if self.head == ptr::null_mut() {
            // allocate on the heap
            let node = Box::new(Node::new(value));

            // get address of heap data. This method also leaks the box,
            // so that it is not deallocated
            self.head = Box::into_raw(node);
            return ()
        }

// ...

impl Drop for List {
    fn drop(&mut self) {
        // ... do other stuff if necessary ...

        // Deallocate that node
        let node = unsafe { Box::from_raw(self.head) };
        drop(node);

        // ... do other stuff if necessary ...
    }
}

But mind that this is a pretty broad hammer. It can be used to solve any problem of this sort, but it is almost never the best way.

2 Likes

Thanks a lot @ExpHP ! I will update my example as soon as possible. Also, i got a few questions. unsafe is used at from_raw, but not into_raw, why is that? (aren't those both functions for raw pointers?) Hence, even though that may be a common question. What would happen if i would use RC for memory allocation? Would that make a lot of difference in this case?

For better examples i guess i look more into the link provided by @kornel :=)

p.s.: Another one, does the iterator implementation look fine to you?

BTW, don't think of & as a pointer, but as a temporary, shared read-only (or &mut exclusive read-write) lock on data. Note that &/&mut can't exist without the owned counterpart also existing somewhere else.

If you want problems to solve in Rust, https://exercism.io has very nice ones that fit Rust better.

If you use Rc for memory management here, all will be easy, because you'll be able to clone the smart poitners at will instead of trying to borrow and modify the only copy.

wants to dive into Rust

pointers, unsafe

something wrong

2 Likes

Thanks for the very helpful two liner, but i didn't ask for unfunny sarcasm. :kissing_heart:

This is a good observation! It is true that in rust, unsafe APIs are often asymmetric.

Don't think of it as "raw pointers are unsafe." Rather, think of it as "Box::from_raw can cause Undefined Behavior (UB) even without additional unsafe code." More specifically:

  • When a Box<T> is dropped or dereferenced-by-move (*p without a & out front), the memory it points to is deallocated using a sized deallocation on the global allocator. If this memory was not originally allocated by the global allocator (e.g. it is a static or an object from C code), or if T has the wrong size, this is UB.

Box::into_raw on the other hand cannot lead to UB in safe code. This is because all possible UB involving raw pointers is already forbidden by marking other functions as unsafe.


Part of it also has to do with the definition/scope of UB. Prior to Rust 1.0, Box::into_raw was unsafe because leaking memory was considered UB (allowing for functions like std::thread::scoped, which relied on deterministic destruction for safety).

When it was revealed that Arc could easily be used to create reference cycles that are never dropped in safe code, a choice had to be made:

  • Make certain methods of Arc unsafe, and continue to consider refcycles as UB?
  • Stop considering leaks to be UB. Get rid of scoped. Make Box::into_raw and mem::forget safe.

The second option was taken.

(note: my description may contain historical inaccuracies as I was not there)

1 Like

Two things here:

  • As @ExpHP already said, it's very common for one direction to be safe, but the other direction to be unsafe. For example, you can turn a &T into a *const T in safe code, but going the other way requires unsafe. And you can go from &mut T to &T in safe code, but no code can soundly go the other way.

  • But I think there's a more nuanced point in your question. There are lots of "function[s] for raw pointers" that are safe. For example, let p = 13 as *const String; is completely safe. As is then printing p with println!. Even pointer arithmetic with wrapping_offset is safe. It's only when need to dereference that pointer that you need unsafe (and of course dereferencing p from earlier is UB). Rust is very careful to pick a narrow set of unsafe superpowers.

1 Like

Thanks again for all the nice and specific explanations. I'm really glad i opened a thread here and asked for info. I got one more thing though.

Before i get to see other examples or try to make the linked list work without unsafe and raw pointers i wanted to make my implementation work. The iterator nearly works, but i'm getting a segmentation fault.

e.g.: i'm adding 3 values to my list, the Iterator implementation goes through it, returns all 3 values and prints them, but it seems to go through the list a fourth time where i'm getting the segmentation fault.

First attempt to solve this was to match the head to a null pointer and therefore return a None to stop the iteration.

Since it still gets the segmentation fault i found out that the Drop implementation seems to deallocate the list/nodes before the 4th iteration starts. Therefore, a segmentation fault happens.

Hence, removing the Drop implementation makes it work without getting the segmentation fault, but then again, the memory doesn't get deallocated? Is there a way to fix this somehow by keeping the Drop implementation?

Code/list.rs:

use std::ptr;
use std::fmt::{ Display, Formatter, Result };

pub struct List {
    head: *mut Node
}

impl List {
    pub fn new() -> Self {
       List {
           head: ptr::null_mut()
       }
    }

    pub fn add(&mut self, value: i32) {
        if self.head == ptr::null_mut() {
            // allocate on the heap
            let node = Box::new(Node::new(value));

            // get address of heap data. This method also leaks the box,
            // so that it is not deallocated
            self.head = Box::into_raw(node);
            return ()
        }

        let mut _ptr1 = self.head;
        // let mut _ptr3 = ptr::null_mut();
        // self.head = _ptr3;

        unsafe {
            while (*_ptr1).next != ptr::null_mut() {
                _ptr1 = (*_ptr1).next;
            }
        }

        let mut _ptr2 = ptr::null_mut();
        let next_node = Box::new(Node::new(value));
        _ptr2 = Box::into_raw(next_node);

        unsafe {
            if (*_ptr1).next == ptr::null_mut() {
                (*_ptr1).next = _ptr2;
            }
        }
    }
}

impl Display for List {
    fn fmt(&self, f: &mut Formatter) -> Result {
        write!(f, "List: {}", self)
    }
}

pub struct Node {
    value: i32,
    next: *mut Node
}

impl Node {
    pub fn new(val: i32) -> Self {
        Node {
            value: val,
            next: ptr::null_mut(),
        }
    }
}

impl IntoIterator for List {
    type Item = i32;
    type IntoIter = ListIntoIterator;

    fn into_iter(self) -> Self::IntoIter {
        ListIntoIterator { list: self }
    }
}

pub struct ListIntoIterator {
    list: List,
}

impl Iterator for ListIntoIterator {
    type Item = i32;
    // let mut _ptr = self.list.head;
    fn next(&mut self) -> Option<i32> {
        unsafe {
            let stop = match self.list.head {
                x if x == ptr::null_mut() => true,
                _ => false,
            };

            if stop {
                return None
            }

            let result = match self.list.head {
                _ => Some((*self.list.head).value),
            };

            self.list.head = (*self.list.head).next;
            
            result
        }
    }
}

impl Drop for List {
    fn drop(&mut self) {
        // ... do other stuff if necessary ...

        // Deallocate that node
        let node = unsafe { Box::from_raw(self.head) };
        drop(node);

        // ... do other stuff if necessary ...
    }
}

I haven't looked hard enough at the code to figure out the explanation for the behavior you are seeing (in part because I don't want to take away the learning experience), but here's some hints:

  • The Drop impl I wrote was incomplete. It only frees the head node (to match the other bit of code I presented in the example). You will want to add code that traverses the list and frees the others.

  • Because ListIntoIterator::next sets a new head (making the old one inaccessible), it should also free the old head.

    • Edit: Even better: Instead of dropping the old head, this method should move it onto the stack. You can do this by dereferencing a Box using the * operator. This will make your code work better for types other than i32, which don't implement Copy.
  • Unrelated (?) to your problem, but I believe your fmt::Display impl calls itself (recursing infinitely).

1 Like

One other suggestion:

Right now, I believe everything you have written thus far can be done by storing actual Boxes.

struct Node {
    value: i32,
    next: Option<Box<Node>>,
}

The None variant of the Option plays the same role as null does in your current code.

Starting with this, you won't need any unsafe code or Drop impls for what you have currently written. You should only start running into problems once you try to implement iter_mut (I think?) or if you want to make a double-ended list.

Once you do start running into those problems, you can replace the Box with *mut, and try to write Drop impls that simulate what happens in the safe version.

1 Like

hey, i read into lifetimes yesterday and i think that the problem is that the drop gets called on the head of the list before the actual end of the current scope of the iterator. (just a guess though) So i tried implementing it, but couldn't get it to work, since i'm still trying to figure out what would be the best way to implement a longer lifetime on my list in the iterator.

one of the reasons i'm trying to figure this out is that i'm learning a lot by practical examples and comparison to e.g.: c++

i also started with a new list implementation without using unsafe, like you suggested above :slight_smile: ...works out pretty great so far, but this segmentation fault on the first implementation still bugs me :stuck_out_tongue:

Lifetimes are a red herring here! Your type has no lifetimes because it contains no references.

It's a common misconception that lifetimes describe how long a value exists. In reality, they describe how long a & or &mut borrow exists. The sole purpose of lifetimes in rust is to prevent aliasing, i.e. to ensure that something which is mutably borrowed is not also borrowed anywhere else.

In response to this, I would also like to point out that lifetimes have no runtime effect. As long as your code compiles, it will have the same behavior no matter what lifetimes are used.

1 Like