When is a linked list not a linked list?

Honestly, the first thing that came to my mind was "modeling an undirected graph as a doubly-linked directed graph". But I guess if you were to use an adjacency matrix or adjacency list, then it would hardly be relevant to a conversation about linked lists...

Regarding your graph-building problem

This actually resembles something I did at $day_job recently. Since the skeleton of any 3-dimensional convex polyhedron is a 3-connected graph, you can generate a 3-connected graph by starting with a tetrahedron and repeatedly doing random operations that increase the number of vertices while maintaining the polyhedron invariant, such as:

  1. pick a k-sided face of the polyhedron and add a vertex in the middle of the face by creating a k-sided pyramid with the original face as base;
  2. pick an edge of the polyhedron joining two faces and add a vertex in the middle of it, dividing the faces on either side.

Now -- this gives you a graph of which all the faces are triangles, and a given vertex may have any number of edges. To turn it into a graph where each vertex has exactly three edges, you take its dual so that the vertices become faces and faces become vertices.

This algorithm can't generate all polyhedral graphs. At least, there exist non-planar 3-connected graphs that it cannot generate. But it might be able to generate all regular polyhedral graphs of degree 3.

This has been brought up before on another forum where this problem was suggested.

The way I see it, if I have nodes A and B and I want be able to get from A to B then A had better contain a reference to B. And if I want be able to get from B to A then B had better contain a reference to A. That is a doubly linked graph. And sometimes I draw it that way to be sure all the references are made correctly.

Of course in the Wumpus maze has one tunnel between neighboring caverns and one can travers it either way. That is an undirected graph. My same graphs can be drawn like that as well:

For analogy - binary trees sometimes are backed by array of nodes (in a binary heap fashion), this doesn't make them less binary or less trees. Same for your linked list example, in fact you can abstract away the array storage by introduction of local custom allocator, which hides array parts under the carpet. The significant parts of your abstraction are still a linked list.

This is the "object graph" model. There are three other common ways to model a graph: edge list, adjacency list, and adjacency matrix. None of the other three require storing references inside of nodes.

There is something of an inversion of responsibility here. Instead of saying "the rooms have corridors to other rooms" you might say "the map contains rooms and corridors that connect them". Shifting the responsibility for understanding the connections from the rooms to the map (the graph). In my experience, this has the advantage of making graph traversals simpler as well as making the ownership of the nodes trivial. (I'm talking about mostly C++ and Python, not Rust -- the problems of the object graph model are not just that it interacts badly with the borrow checker.)

Object graphs also tend to be distressingly non-compact at times. Ignoring any per-node (and per-edge) data, a regular k-conected object graph of n nodes requires k*n*size_of<ptr> bytes (ptr being whatever type you need to reference another node) whereas an adjacency matrix needs n*(n+1)/2 bits. With a naive implementation where each node contains three pointers and its own ID, you could theoretically store the entire adjacency matrix for any 20-node Wumpus map in the space it takes to store a single node of the object graph! That's an extreme example, of course, but even with more realistic assumptions you can usually beat an object graph with one of the other approaches.

As it happens my Wumpus maze generator spits out an edge list as it proceeds. I need one to detect forward and backward links when I only want an undirected graph output. And hence to produce the `.dot' files from which graphviz draws the graphs. Like so:

graph wumpus_graph {
    node [margin=0 fontcolor=blue fontsize=10 width=0.3 shape=circle style=filled fillcolor=yellow]
    ratio=1.0;
    size="8.3,11.7!";
    margin=0;
    bgcolor="#564321";
0 [fillcolor = "#564321"];
1 [fillcolor = "#0000ff"];
2 [fillcolor = "#00ff00"];
3 [fillcolor = "#00ffff"];
4 [fillcolor = "#ff0000"];
5 [fillcolor = "#ff00ff"];
6 [fillcolor = "#ffff00"];
7 [fillcolor = "#ffffff"];
8 [fillcolor = "#123456"];
9 [fillcolor = "#000080"];
10 [fillcolor = "#008000"];
11 [fillcolor = "#008080"];
12 [fillcolor = "#800000"];
13 [fillcolor = "#800080"];
14 [fillcolor = "#808000"];
15 [fillcolor = "#808080"];
16 [fillcolor = "#148a7c"];
17 [fillcolor = "#fc1f32"];
18 [fillcolor = "#4d7ed8"];
19 [fillcolor = "#d24f4a"];
    4 -- 3;
    16 -- 6;
    12 -- 2;
    18 -- 10;
    5 -- 4;
    12 -- 10;
    16 -- 14;
    6 -- 5;
    15 -- 5;
    7 -- 6;
    18 -- 8;
    19 -- 11;
    18 -- 16;
    9 -- 8;
    17 -- 15;
    15 -- 13;
    3 -- 2;
    14 -- 4;
    11 -- 1;
    17 -- 7;
    14 -- 12;
    19 -- 9;
    9 -- 0;
    13 -- 11;
    10 -- 0;
    8 -- 7;
    19 -- 17;
    13 -- 3;
    2 -- 1;
    1 -- 0;
}

That edge list production is not required to create the random Wumpus graph. Maybe I should pull it out into a post processing step.

I'm not much of a graph guy, not since I worked on a CAD package for designing Printed Circuit Board layouts too many years ago: CADSTAR | eCADSTAR which was full of net lists. But a few things occur to me:

If there were such a thing as a real Wumpus maze I would see the possible exits from a cavern from the cavern I'm in. They are right there. No need to consult some external edge list or other map to find out where I can go. So I like my object graph.

It's not clear to me how my random Wumpus graph generator would work with only an edge list, adjacency list or adjacency matrix. But then, maybe I just need to find a different algorithm.

I'm not sold on the "Object graphs also tend to be distressingly non-compact" idea. If we assume the smallest numbers we can reasonably deal with are bytes then my object graph in an array boils down to 3 * 20 = 60 bytes. (I don't think I need the id field in a node, that is just it's "address" anyway and redundant). Meanwhile the edge list is 2 * 30 = 60 bytes as well!

I'm inclined to call this a genuine LL implementation, just using a different means of indirection than a blunt pointer.
But I add the caveat that I haven't done rigorous Big-Oh analysis on it as of yet. So basically it satisfies criterion 2 for sure, and I'm uncertain as of now about criterion 1.

I have not shown the operations for inserting, removing and traversal of a linked list in an array in my example code above. But they would look much the same as one might see for a traditional pointer based linked list. Except we are moving array indices around instead of pointers. Whether accessing by index or pointer those operations are both O(1) so there is no change in the overall Big-Oh.

Note: There is no moving of nodes up and down the array when inserting or removing nodes, as suggested in the thread that triggered this discussion, only swapping of index values around. Just like swapping pointers around in a pointer based list.

Big-O wise, the vector gives the insert operation an amortized constant runtime instead of worst case constant runtime due to resizing. The others remain unchanged.

Actually I was wondering how people count the operations on a linked list to arrive at a Big-O estimate.

Specifically, when inserting a new node does one count the operation of actually creating the new node?

In the typical C/pointer style linked list that involves a malloc or new to create the node. In the Vec based linked list that could be:

  1. A push() to the vec to get a new free node at the end of the Vec.

  2. A simple update of a 'next_fee' index, if the Vec was created with some large size to begin with.

  3. One could implement ones own allocation scheme within the Vec. A second "free list" for example. Node creation would then be a case of unlinking a node from the free list and linking it to the linked list in use.

  4. Or some combination of all the above.

It all seems a bit vague to me. Who knows how long a malloc will take in a pointer based list?

The Big-O thing (from those classes I never had in my mechanical engineering program, that I still get beat up with in interviews) is as best I can tell, intentionally vague, and all about the big picture, not worrying about the constants (e.g. the cost of an allocation). Its a useful logical tool, but it doesn't replace simulation and performance testing, particularly for small data sets (e.g. small number of nodes in a Vec/linked-list) which can be the most important case in the real world.

Yeah, I didn't get any Big-O in my Physics either. But I have intently watched undergraduate CS lecture course from the likes of Berkeley aired on YouTube (Brian Harvey is wonderful). Just to catch up with what I missed out on and see what the kids are up to now a days.

As far as I can tell Big-O always wins at some scale. But in the real world we can cheat a bit. For example if most of your queries on a big data set are for the same few items, might as well cache that and use a crappy-O algorithm to look them up, thus keeping most of your users happy most of the time.

A little while ago, on a lark, I implemented big integer multiplication using the Kartsuba divide and conqure algorithm, but having chopped the numbers down below a certain size with Karatsuba I fell back to slow old schoolboy long multiplication as it was quicker.

I find it interesting that you call it intentionally vague. I definitely see where you're coming from in the sense that it ignores constants and looks at the big picture, but it isn't vague per-se. In fact, it is very precisely defined mathematically. Of course, you're all right regarding small numbers — big-O doesn't care about small inputs at all, the only thing that matters is how it increases as it goes towards larger and larger inputs.

As for malloc, there is an interesting question here. We have to make an assumption about how fast it is to make a proper big-O analysis.

Or just ignore it all together. Why not?

Just assume that if ones task is to insert a node one already has that node to insert.

After all, the node one is inserting may well have just have been unlinked from some other linked list or whatever data structure. No allocation required.

I interpret "intentionally vague" in the sense that if your run time turns out to be some polynomial at some point the biggest power term contributes so much more that one might as well ignore the other terms. Or if there is an exponential in there might as well forget about the polynomial terms, and so on.

It intentionally ignores things that might be very significant, and often are, for small scale problems.

Sure. But that's not necessarily useful. If you were a pawn on a life-sized chess board you would see the adjacent squares, but that doesn't mean that you should model a chessboard as a 64-node, 161-edge object graph.

What you're implementing is a game, not a player. (Even when you are implementing the player, there can be a difference between the player's view of the game and the player's internal model of the game that it uses to play.) It can make sense to model the whole map at once even though the player only sees a part of it. For a game like Hunt the Wumpus maybe it doesn't really matter.

The problem with this assumption is that some representations make sub-byte data much easier to deal with "reasonably". An adjacency matrix lends itself to storing inside a BitVec in ways that an edge list or index-based linked list doesn't. Still, there may be scenarios where object graphs are the right way to go; I don't want to extend this discussion past its expiration date which was probably yesterday.

Pulling this all around to linked lists: if you're using the "object graph" model to store a sequence, that sounds to me like a linked list. Whether that is with indices or direct pointers, if you're storing objects with links to other objects (and the graph is in the shape of a sequence), I'd call it a linked list, big-O notwithstanding. On the other hand, you could consider a vector to be the degenerate case of storing an adjacency matrix (or adjacency list or edge list), where the content of the matrix (or list) is constant and predictable so it doesn't need to be explicitly stored, so the only thing that's "left" is the nodes themselves.

That is precisely what happens past a certain value of n, mathematically :slight_smile: They were conceived as a tool for numerical analysis before computers were a thing, incidentally.

While this is true, it's also true that a O(1) algorithm is almost never going to be slower than an O(n) algorithm* (etc.) in a way that matters in practice. Especially when talking about something as simple as linked list insertion.

(* Big theta technically)

The opposite can also be true; as Knuth has said, "log(log(n)) is practically a constant".

Of course, whenever in doubt given a concrete situation, measure. Even when an algorithmic analysis gives a complete formula (doesn't elide terms or constants), it probably doesn't take into account all the variables like cache lines or what kernel scheduler you're using this month.

I wouldn't consider it a real linked list. An insertion into a linked list does not move around the rest of the elements. If I had a pointer (here index) to one of the nodes, and someone else inserted an element somewhere in the list, my pointer should still point to the node it was pointing to before. For example, say you are making a graph, but you don't know all nodes ahead of time. You keep the nodes in this "linked list" (maybe because they are heavy) and use indices into it to form your edges. Say also that you need to keep the nodes list sorted.

Now when you encounter a new node, you need to insert it in the list and possibly go through all your edges and change the pointers there.
I wouldn't say that this is a linked list's behaviour.

I think you're envisioning a plain Vec being used in place of a linked list, rather than a linked list embedded inside the Vec. (This is why I asked for code samples in the original thread)

A structure like this is what I believe @ZiCog is talking about, and it doesn't have many of the properties you are ascribing to it. In particular, the node indices don't change on insert/delete because the list order is specified by the links instead of the storage order inside the Vec.

(Edit: I forgot that they posted code in the intro of this thread :confused:. This is indeed the sort of thing we're talking about.)

struct Node<T> {
    payload: Option<T>,
    next: Option<usize>,
    prev: Option<usize>
}

pub struct SortedList<T> {
    nodes: Vec<Node<T>>,
    free: Option<usize>,
    first: Option<usize>
}

Ah, but there is the point. When implementing a linked list that lives in array/Vec there is no moving around of the rest of the elements when inserting a new node.

In the simple case one just pushes a new node to the end of the Vec. Which is equivalent to allocating space for the new node any other way. Then one adjusts the "next" index of whatever existing node to make the relation to the new node. Whether that be an insertion at the head or tail or somewhere in the middle. Which is equivalent to tweaking the pointers in a C style linked list.

Similarly one can remove nodes from a linked list in a Vec without moving any other nodes around. Just tweak with the "next" index of the appropriate node in the list.

Of course this simple model will continuously eat memory as nodes are added, and leave unused "holes" in the vector as nodes are removed.

No problem. One could implement a node allocation/deallocation scheme that recycled nodes that became unlinked from the linked list.

The simple way to do that is maintain a second linked list within the Vec. A list of "free" nodes. When one inserts a new node first try and do it be unlinking a node from the "free list" and reusing that. Which is equivalent to having malloc get space for a new node in C.

All in all, the operations on a Linked list in a Vec are equivalent to a heap allocated linked list. Just tweaking indices instead of actual memory addresses.

Yes, exactly.

My example graph code in my first post does not have any "null" links to terminate the list. Being a wumpus graph it just goes around and around. So it's good you show use of 'Option' to indicate a null link.

Right. Sorry. I only looked at the code from afar. On a closer look, this seems a really nice idea. Maybe I'll use it somewhere someday too.