When is a linked list not a linked list?

Over on this thread How to make rust usable in schools is a discussion about the suitability of using Rust as an introductory language for those that had never programmed before.

It was suggested that if one were to do that one should not teach the idea of a linked list to those beginner programmers. Especially not a doubly liked list. Which I would agree with if one thinks of a linked list in terms of nodes allocate to random memory locations and linked by references. As one might think of a linked list in C using pointers for example. Which every C beginner learns very early on, what with it being an example in the "White Book". This is a notoriously complicated thing to do in Rust: https://rust-unofficial.github.io/too-many-lists/

My claim however was that this was not a problem, one could introduce the concept of a linked list very easily as a collection of nodes in an array or vec. Linked together by a "next" index in each node. Nodes being linked together by array indices rather than raw memory pointers. All the operations one expects to perform on a linked list, insertion, removal, search, can be done on such an array based Linked list in exactly the same way a the traditional C style linked list.

Surprising to me, some people disagreed. Stating that a linked list in a vec is somehow not a real linked list.

Some called for code to demonstrate my point.

Ok. Last week or so I was busy generating random mazes as seen in the old Unix "Hunt The Wumpus" game. The thing about the mazes is that they are a graph of 20 nodes and 30 edges. Each node has 3 links to next nodes. The links are bidirectional. I made that Wumpus graph in a Vec. It looks like this:

type Exits = Vec<usize>;
#[derive(Eq, PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct Node {
    id: usize,
    exits: Exits,
    data: HashSet<i32>,
}

impl Node {
    pub fn new(id: usize) -> Node {
        Node {
            id,
            exits: Exits::new(),
            data: HashSet::new(),
        }
    }
}
#[derive(Hash, Eq, PartialEq, Debug, Clone, Serialize, Deserialize)]
pub struct Edge {
    node_1: usize,
    node_2: usize,
}

impl Edge {
    pub fn new(node_1: usize, node_2: usize) -> Edge {
        let mut edge = Edge { node_1, node_2 };
        if node_1 < node_2 {
            std::mem::swap(&mut edge.node_1, &mut edge.node_2);
        }
        edge
    }

    pub fn to_string(&self) -> String {
        format!(
            "    {} -- {};\n",
            //"    {} -> {} [dir=\"both\", weight=1];\n",
            self.node_1,
            self.node_2
        )
    }
}

pub type Nodes = Vec<Node>;

#[derive(Eq, PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct Graph {
    pub nodes: Vec<Node>,
    pub edges: HashSet<Edge>,
}

impl Graph {
    pub fn new() -> Graph {
        Graph {
            nodes: vec![],
            edges: HashSet::new(),
        }
    }

    fn add(&mut self, node: Node) {
        self.nodes.push(node);
    }

    fn connect(&mut self, start: usize, end: usize) {
        self.edges.insert(Edge::new(start, end));
        self.nodes[start].exits.push(end);
        //self.nodes[end].exits.push(start);
    }

    pub fn to_string(&mut self) -> String {
        let mut result: String = "".to_string();
        for edge in &self.edges {
            result.push_str(&edge.to_string());
        }
        result
    }
}

pub mod wumpus_graph {
    use super::{Graph, Node};
    use rand::Rng;


    pub fn generate(size: usize) -> super::Graph{
        let mut graph = super::Graph::new();
        let mut valence_2_nodes: Vec<usize> = vec![];

        // Create a set of numbered nodes
        for id in 0..size {
            graph.add(super::Node::new(id));
        }

        // Create bidirectional connections of all nodes in a simple loop, thus ensuring at least one Hamiltonian path/cycle in the graph.
        for id in 0..size {
            graph.connect(id, (id + 1) % size);
            valence_2_nodes.push(id);
        }

        graph.nodes[12].data.insert(4);

        while !valence_2_nodes.is_empty() {
            // Choose a random pair of nodes to connect.
            let (n1, n2) = remove_random_pair(&mut valence_2_nodes);

            // Connect the chosen nodes bidirectionally.
            graph.connect(n1, n2);
        }
        graph
    }

    fn remove_random_pair<T: PartialEq + Clone>(v: &mut Vec<T>) -> (T, T) {
        assert!(v.len() > 1);
        let mut rng = rand::thread_rng();
    
        let i = rng.gen_range(0, v.len());
        let n1 = v[i].clone();
        v.retain(|n| *n != n1);
    
        let i = rng.gen_range(0, v.len());
        let n2 = v[i].clone();
        v.retain(|n| *n != n2);
        (n1, n2)
    }
}

Probably not brilliant. Very probably not brilliant Rust. But still, a graph, in a Vec. A doubly linked list is a subset of that. In fact the random wumpus_graph enerator starts by creating such a doubly linked list.

Can it really be said that this is less a linked list than some memory allocated arrangement using pointers?

5 Likes

I would certainly call it a linked list. I made a linked cycle based on the same principle where it had an index you could move in either direction, and use to swap around elements efficiently.

OK. One supporter.

What I don't have in the code presented is the typical linked list insertion, removal and find operations. Not needed for the Wumpus Graph. But I'm sure one can how they could be done easily.

Internal memory representation should not matter in deciding whether something is a linked list or not. Implementation details may affect performance, but they don't change contracts made with other code.

+1 for mentioning Hunt the Wumpus, the first computer game I ever played.

1 Like

If I need to decide whether or not something is a linked list or not, I look at 2 things:

  1. Asymptotic behavior for commonly used operations such as add element, remove element, and indexing must match those for a linked list
  2. Does it actually use links to chain the elements together? If not, it's not a linked list regardless of asymptotic runtime behavior.
1 Like

A lot of people feel like it's "cheating" if you implement a list data structure... using a list data structure. Like if I said I implemented a Rust compiler called verycoolrustc and you examined verycoolrustc and found out that it was just a symlink to rustc then have I really implemented a Rust compiler? It's the same thing with using a Vec inside of a linked list, people think the Vec is doing too much of the actual work and it disqualifies the whole implementation.

I'm not saying I agree or disagree with the above explanation, just offering it.

For me, the essence of a linked list is that the nodes store a link to the next node. In a doubly linked list, each node stores two links. Those links could be indexes into an array, rather than pointers into memory, as long as there isn’t an extra requirement that all entries in the array must be filled for example. Then the array works like a custom arena memory allocator’s allocation region.

6 Likes

Linked lists are (almost) always embedded inside an enumerated sequence of random-access storage locations anyway: main memory. Using something like a Vec to reserve a bounded region to hold the list isn’t cheating any more than using virtual addresses instead of physical ones is— it’s just a sensible protection against bugs.

6 Likes

So when you apply your criteria to my "Linked List in a Vec" on which side do you come down. Real or fake?

Someone on the thread I linked to above quoted the wikipedia definition of a Linked List in computer science:

In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. In its most basic form, each node contains: data, and a reference (in other words, a link) to the next node in the sequence. This structure allows for efficient insertion or removal of elements from any position in the sequence during iteration.

I think the Linked List in a Vec has all the check points checked there.

2 Likes

I even suspect a Linked List implemented as elements of an array might perform better than one that the traditional linked list that depends on malloc and pointers in C or whatever "smart pointers" in Rust.

Especially if one has an idea how big the thing might be. One gets to grab a whole lot of memory in one allocation of a Vec rather than many separate allocations. Using indices into the Vec as links might be quicker than using reference counted pointers, despite any array bounds checks.

Good point.

But that is not what happens here. A Vec has a strict sense of order, as determined by the indices used to access it's elements and operations like "push".

A linked list also has a strict sense of order. But that order is defined by it's links, not the order of elements in memory like a Vec. A key point in the wikipedia definition. The operations one implements to form that linked list are not the operations of the underlying Vec abstraction.

1 Like

truly....but nowdays linked list based upon pointers is rarely teached on itself sake only. It is just a mean to be quite familar with low-level aspects of RAM organization and manipulations with. .

Even the standard library uses an optimization trick for BTreeMap because working with independent nodes is inefficient. The "nodes" are actually contiguous sequences of items.

Whether you're using a Vec with a custom API to mimic a linked list or a HashMap to lookup nodes in a "graph" in constant time, the question is whether you want a legitimate textbook implementation or good software.

Interesting. I was about to add to my post that in many cases a Linked List in a Vec might likely be more cache friendly.

Also interesting. What is a "legitimate textbook implementation" now a days?

Those learning about linked lists in a first programming language like Java, Python, Javascript, Scheme, pretty much anything that is not C, C++, Pascal or Rust is for sure not using raw memory addresses.

Things were so much simpler when we programmed in assembler :slight_smile:

It's an inherently contextual question. There is the abstract point of view (diagrams of boxes and arrows), the API point of view (operations and their costs), an overview of the implementation, and other contexts as well. I might say "your library implements a linked list as a vector". But I wouldn't say your library isn't a linked list, assuming it has the operations I would naturally expect.

From a pedagogical point of view, though, I can understand the desire to enforce some amount of strictness between the abstract and the implementation in order to maximize understanding.

Take for example, a full and complete binary tree. It can be stored in an array in such a way that the pointers are bidirectional but implicit (simple math on the index), all leaves are contiguous, etc. It's a great optimization for certain use cases. Despite being implemented using Vec or an array, I would still call it a tree data structure.

But it probably wouldn't be the first implementation I tried to explain right after introducing someone to tree data structures.

(That said, I would consider it a good learning exercise at some point! Once one has a grasp on data structures generally, understanding how they can be implemented in a variety of ways and the trade-offs involved is valuable wisdom.)

5 Likes

Indeed.

I'm looking at it from the point of view of implementing that API. Not just simply using it and being aware of the costs. The point being having to learn what a linked list is and how to make one. Quite enough for a beginner to programming, the context in which the question arose.

I'm pretty sure that with enough traits and generics one could create a linked list API in Rust that could be implemented as elements of nodes in an array or individually memory allocated nodes and smart pointers.

As far as the big O performance characteristics go they would be indistinguishable.

I opened up Programming Principles and Practice Using C++ because it’s a textbook that covers this sort of thing. (Disclaimer: Like most people who own this book, I never bothered to learn C++.) Stroustrup demonstrates a linked list as having all of a vector’s operations for initializing and finding its size but without subscripting (so you can’t do “list[1000]” and accidentally visit 1001 elements) plus insertion after a given element, removal of a given element and a means of traversal (i.e. an iterator). He shows an example using two pointers (doubly-linked) and a value in a Link type. His List type would use the Link as its element and can refer directly to the first and last elements. Iteration starts at the first element and follows the forward pointer until reaching the last element.

So in this textbook the linked list is strictly implemented without hidden optimization. That makes sense as it would be a pretty bad textbook if it didn’t actually teach the concept and differentiate it from a vector.

He also says, “There are many ways of implementing linked lists and presenting them to users” and notes that a vector’s use of consecutive memory is not required by his* definition of a list but that adding elements without moving elements, just adjusting pointers, is required to fit his* definition. I think a few changes to the list implemented in The Book would match exactly what Stroustrup demonstrates in his own book.

One thing we’re all doing wrong, accordingly to Stroustrup, is talking about lists without drawing diagrams. “... we strongly encourage you to draw little diagrams to visualize the operations you are considering. Linked-list manipulation really is a topic where a picture is worth 1K words.”

[*] By “his” I mean the C++ “standard template library” depictions of data structures.

Totally agree with Stroustrup there.

The diagrams look the same no mater if your linked list is a bunch of structs linked by array indices or raw pointers.

Here is an example wumpus graph produced by my random graph builder:

Can you spot the error(s)?

Here is a better one:

Not without knowing what you were trying to achieve. Were you expecting these to be planar graphs?

13/14/15 could have been layed out differently to avoid crossing paths on the map, but the connections around 1/16/17 seem fundamentally non-planar to me.

Ah, I did not make that clear. Buried in my opening post is the requirement:

Last week or so I was busy generating random mazes as seen in the old Unix "Hunt The Wumpus" game. The thing about these mazes is that they are a graph of 20 nodes and 30 edges. Each node has 3 links to next nodes. The links are bidirectional.

If I understand correctly the original Hunt The Wumpus game was played in maze that formed a dodecahedron. Soon after someone, Dennis Ritchie I believe, made a version with randomly generated mazes, but still conforming to 20 nodes, 30 edges and 3 links per node. Being planar is not required.

So that was the problem I set about solving in Rust on a recent rainy Sunday afternoon. Little did I know I was tackling the impossible, building a doubly linked list in Rust. Nobody told me it was hard to do :slight_smile:

Anyway, the problem is that my algorithm, as shown above, kind of "paint's itself into a corner", as it choses random node pairs to connect up in it's second phase it runs out of nodes but still leaves nodes with only 2 exits in the finished graph. It manages a properly connected wumpus graph, 3 exits per node, about 30% of the time.