What are intrusive linked lists and does Rust really struggle with them?

Hi. From time to time I've heard term "intrusive linked list". I also remember that I've read claims that they are particularly (more than standard ones) difficult to implement in Rust. I've found a nice article Intrusive linked lists, to understand what they are, and maybe why their implementation would be difficult. This article is really put well, however after reading it I have more questions than before.

I think that the main point of this article is that in C you can define a linked list either like this:

struct list_node {
    struct list_node *next;
    struct list_node *prev;
    char *data; // type erased linked list value
};

Where data and node are allocated separately, or:

struct list_node {
    struct list_node *next;
    struct list_node *prev;
    struct Data data; // concrete Data type
};

Since C does not support generics the former definition is easier to use, since one can erase type of data using another layer of indirection. And the latter one does not result in memory fragmentation, but requires defining a unique node type for each "generic instance" of a list.

However an intrusive linked list is defined as:

struct list {
    struct list *next;
    struct list *prev;
};

Data stores it inline:

typedef struct item {
  int val;
  struct list items;
} item;

And later one can use pointer offsets to go from pointer to node, to pointer to data.

What I don't understand is what is the difference between intrusive linked list (as defined above) and a linked list, which defines node which stores data inline. It's as if node and data were two boxes, one containing the other. It doesn't matter witch one is the outer, and which one is the inner one. Outer box will always occupy the same amount of space, and getting from one to the other can always be defined as some pointer offset. Is the real usefulness of intrusive linked lists only that they allow C code to have a "generic" implementation of list algorithms?

Does any of this changes when implementation is written in Rust, and not in C? Is there anything more than standard headache when it comes to this variant of lists? I've found quite popular intrusive_collections crate, so there must be something that intrusive lists provide, that cannot be done using std::collections::LinkedList, is there?

3 Likes

There two things in your question:

  1. Intrusive data structures, in short they are the poor man's generics. We, mostly, don't need them, we have generics.
  2. Linked list and co are hard to implement in Rust, but not impossible, because they have intrinsic semantics hard to reconcile with the borrow checker.

Hm, maybe I'm wrong. I have always thought intrusive linked lists were used when the list element must be allocated separately, for whatever reason, and the links are placed in that element's data type.

  • This avoids another data structure with additional allocations and overhead. There is less indirection to traverse the list.

  • Perhaps more importantly, it allows putting elements into a linked list when those elements are part of some other data structure, such as a hashmap. I believe this is how linked hashmaps work.

I don't know of any reason that intrusive linked lists would be more or less difficult in Rust than extrusive linked lists.

3 Likes

Ad 2. I understand that. I've even linked Learn Rust With Entirely Too Many Linked Lists. :slight_smile:

Ad 1. This is my real question. If they are really that, then why would they be defficult to do in Rust. Is it caused by uncertainty around provenence (alluded by @the8472)?

Wouldn't that add space overhead of next/prev pointers to each element? Or is this tradeoff that such data structure makes? And how would it be different from storing Node<T> instead of T with adjacent data, other that conceptually being more straightforward?

1 Like

Yes, it is called intrusive because the links have to be defined when the element's data type is defined. For example, if you want the elements in two linked lists, you need to add two sets of link fields! However, the total overhead is the same for intrusive or extrusive -- the size of the links is the same in either case.

In an extrusive linked list (as in Rust's std linked list), the items (the wrapped elements along with the link fields) are allocated in a single array (a Vec for example), or perhaps in chunks. In the intrusive linked list, normally the elements are each allocated separately as I mentioned, or as part of some other data structure.

7 Likes

You can't implement shared ownership between multiple linked lists via Node<T>. If you have this kind of node wrapper, when would you drop the node? When it's removed from the list, the only reasonable answer. But if the node is supposed to be still present in a different list, how do you handle it? That's where the fundamental clash between Rust and intrusive linked lists happens: Rust wants unique ownership, while intrusive linked lists are fundamentally about shared ownership.

On its own, it's not a big deal. After all, ordinary (double) linked lists are also a form of shared ownership, as well as many other memory management primitives. This just means you need to implement it via unsafe code. But intrusive linked lists can include an object in an arbitrary number of lists, while you can't write code polymorphic over the number of fields. This means the user needs to write unsafe code, or use a macro.

There may also be issues with mutation. An object placed in two intrusive linked lists can be accessed by both lists simultaneously. Mutating the data requires some synchronization, but removing the node from different collections, or adding it to them, can be safely performed concurrently, since they use different links. I think this kind of constraint may be difficult (impossible?) to express via a safe API.

Overall, I don't think Rust handles intrusive linked lists worse than C, it's just not as good as we usually expect. And personally I don't think intrusive linked lists are common outside of certain niche C code. Linked lists are already niche collections, multiple-ownership intrusive linked lists are even more so.

Unfortunately, this niche includes the Linux kernel, where intrusive linked lists are commonly used to manage resources.

6 Likes

Take a look at the ListLinks type in the Rust-for-Linux documentation.

Do you mean this? Or do you mean some other documentation?

This is very interesting. I recall that intrusive linked lists were indeed mentioned in the context of many lists sharing the same nodes. However, now when I think about it, I cannot understand how it is possible. Say we have two circular double linked lists A & B, both having n elements. a_i points to a_(i - 1) and a_(i + 1). Even if b_i points to a_i as its successor, a_i already uses both of its pointers, so it cannot point to b_(i + 1) as "another" successor.

Could you point to me where I make deduction error? Because it feels like sharing nodes between lists makes sense, but I cannot make it click.

In this scenario, each node will contain 4 pointers, two for its position in list 𝒜 and another two for its position in list ℬ.

1 Like

You just include 2 pairs of pointers: one for a and one for b.

1 Like

It also includes the implementation of some better-behaved collection types, like B+ Trees, which link the leaf pages together for more efficient iteration.

Ok. That makes sense. So we must statically know in how many lists given node should be in. Still, how does this differ from something like this?

struct DoubleNode<T> {
    first_next: *mut DoubleNode,
    first_priv: *mut DoubleNode,
    second_next: *mut DoubleNode,
    second_priv: *mut DoubleNode,
    data: T,
}

Right. What makes it intrusive is that the payload type is aware of and cooperates with the list implementation instead of being an encapsulated passenger— The overall collection and the individual items can be entangled together in whatever arbitrary ways suit the problem domain.

2 Likes

That's pretty much how it works, except that a pair of list pointers would be its own type.

I don't think there is some deep reason why "Rust is bad with intrusive linked lists". Rust is bad with linked lists, period. That's well known. Intrusive linked lists just crank up the difficulty an extra level: now there are multiple lists with shared ownership of data, and there is no longer a clean separation between the data and the collection.

Right, but in that case the linked lists are an implementation detail, not part of a public API, so I'd argue that their complexities don't matter that much. It's just part of the implementation of an (internally unsafe) collection. Also those inner lists would probably be non-owning, which significantly simplifies the issue.

1 Like

Thanks. That makes sense on a conceptual level, but I still struggle to understand usefulness of this approach. Could you point me to some implementation, where payload indeed uses this information? I've found and linked earlier kernel crate and its List type. Would this be example of what I am looking for?

Are they doing it only for better cache locality (the data is stored along the next/prev pointers) or do they also use the feature, that one item can be in multiple lists?

That's an extremely old and incorrect the kernel List implementation. Please see this link for the latest version:

4 Likes