Data Structures appear to have an extra level of difficulty in Rust, how would you teach them?

When I first started learning Rust months ago, I thought I would start by building linked lists and some of the early structures that my professors taught in early classes with C++.

Since then I have stuck to just following the Rust book, but I have always been curious about breaking out the old data structures textbook and trying to build trees in Rust.

Just curious about people's thoughts on the topic.

4 Likes

The easiest way to implement a classic pointerful data structure in Rust is to use a Vec or array as your memory storage: Use indices instead of pointers and everything works as expected. (Assuming all your allocations are of the same type).

There’s no risk of accidentally clobbering memory outside the data structure, and array indexing is almost as fast as a direct memory lookup. It might even be faster due to things like cache coherency.

13 Likes

The thing is, Rust borrows aren't pointers even though they compile to pointers.
In particular, trying to use a borrow as a pointer is a surefire way to run into lifetime issues.

What @46756E said is true, but only for graph-like structures. If eg you have a self-referential structure, well... YMMV there.

So how would I teach datastructures in Rust?

  1. Start with ownership and borrowing. Understanding of these are so fundamental that without it, you're going to end up frustrating yourself.
  2. Start with the easier ones to actually implement in Rust, eg some linear structure like a Stack
  3. For more advanced stuff, teach the person about ways to opt out of borrowck (in a manner of speaking) with types like Arc, Rc, their Weak counterparts, Cell, CellRef, Mutex and ReadWriteLock, and then show how to use to to e.g. implement interior mutability. Then explain why interior mutability exists in Rust in the first place
  4. Finally, teach them about stuff like PhantomData and other things in the nomicon, for truly Black Magick.
5 Likes

https://rust-unofficial.github.io/too-many-lists/index.html

1 Like

The reality is that some data structures that involve a lot of pointers are difficult to write in Rust.

9 Likes

Binary trees are easy as allocations are owned and unaliased in their core semantics. This can be generalized into N-ary tree, including singly linked list which can be viewed as unary tree. But don't store the tail ptr next to the root ptr.

Doubly linked list and more compled trees like B+tree are full of aliasing by its nature which makes it hard even for experts.

8 Likes

Depends. If you simulate the ptrs with newtyped indices and combine that with arena allocation, suddenly the tsunami of borrowck errors you could expect turns into nice calm waters. It works for trees, and I don't see why it couldn't work for arbitrary graph-like structures, including doubly-linked lists.

Well, textbook data structures hardly use anything but malloc/free to manage its allocations.

Yeah it does depend on whether or not you view indexing as similar enough to pointer dereferencing.
But if you do, then there is a benefit in the form of a more cache-friendly data structure relative to the pointer-based one.

My understanding is that the classical data structures were invented a few decades ago when memory speed and latency was about at the same level as the CPU. Teaching them today might be interesting for a history class or "theoretical" computer science, but in practice things are far more complicated today.

The old "you own all memory" also does not apply anymore and pointers are just a nice abstraction over the actual physical memory that needs to be carefully used to stay on the CPUs happy path and not loose magnitudes in performance.

In my almost 20 years of self-taught programming I rarely actually dealt with (pointer based) data structures in C and almost never in Rust. I think to understand modern data structures one needs a deep understanding of computers and compilers, which I would only consider teaching in a dedicated masters class following classes on cache (layers), TLBs, speculative execution and pipelining.

2 Likes

That is quite a record.

Since first using C in 1984 I have only ever managed to avoid pointer based data structures in embedded systems where use of memory is limited and it's use strictly controlled. Everything ever needed for the system to run being statically defined. Similarly for embedded systems in other languages like PL/M, Ada, Coral.

"classical" data structures were invented a long time ago. But I'm pretty sure that does not make the irrelevant today. Many tasks have nice solutions using graphs and such that would take for ever to run my other means.

My take on it is that most materials that teach data structures, linked lists, trees, graphs etc seem to naturally assume that the implementations will end up as basically 'structs' and pointers in any actual implementation. And they leave the little issue of where that memory actually comes from to some magic allocator, malloc.

I think this need not be so. For example see this discussion on graph generation here: When is a linked list not a linked list? - #22 by ZiCog

3 Likes

To give one example: The QuakeIII engine uses a lot of indices and very few pointer based data because they can transfer those over the network and validate them. CPython is a great case for well written data structures, but they rely on reference counting and have reached pretty high bar to get any more performance out of it because of design decisions.

In Rust I prefer to use existing libraries over writing my own and so I very rarely have to write a new complex data structure.

I think it makes sense to teach ownership and references and how to avoid cycles. (Not that cyclic data structures in C are easier to write, the compiler just allows your most likely incorrect code to compile.)

3 Likes

To respond to the OP's question... I'd say data structures are harder to write in Rust than C for the same reasons that the language has a steep learning curve.

It's not good enough for your code to work correctly when used as intended, the compiler expects you to make sure it can't misbehave when faced with any safe code.

I would say that it's pretty uncommon for people to write their own data structures nowadays outside of very low level libraries or niche applications.

C doesn't have the concept of generics so every time you needed a hashmap or resizable array you needed to write it yourself. However in Rust we have nice things like std::collections and Vector<T> which largely makes that redundant, similarly if I needed a graph I would just add petgraph to my Cargo.toml and continue on.

That's not to say data structures are irrelevant, you still need to know how they are implemented internally if you want to use them properly, but I haven't seen them used much outside of first year CS units or programming interviews.

4 Likes

For what it's worth, I've used Rust a bunch for competitive programming questions where algorithms and data structures are the bread and butter. I have yet to run in to a question where Rust is a bad choice due to pointers being hard to work with.

Some examples that seem like they would be difficult, but weren't, include:

  • A circular linked list. I wrote it using a vector. I doubt that it would have been easier to do with pointers.
  • Graphs. Even if I weren't using Rust, I would be building my graphs with indexes.
  • A tree for partitioning 2d space. I used a Box for here, so I used real pointers. For this use-case, Rust was perfectly happy with the solution.
10 Likes

I'm really glad you said that. That is my natural inclination as well. As an old time programmer of embedded systems in C where randomly mallocing things and therefore using pointers everywhere is really not acceptable.

But after the discussion here: When is a linked list not a linked list? and reading Introduction - Learning Rust With Entirely Too Many Linked Lists I started to think I missed a point and was doing it all wrong.

Aside: Back in the day I went for an interview for a contract position on a military embedded project. The project lead asked me some probing questions about the use of pointers in C. I tried to explain how all that was a really bad idea in a memory constrained, safety critical, embedded system. But I did not have my thoughts about it in a concrete form at the time. I guess he assumed I was an idiot who did not understand pointers, I did not get that position.

Meanwhile, to this day we have introductory texts on data structures that assume mallocing and using pointers are the way it is done. For example the opendatastructures.org book nice book here: 3.1 : A Singly-Linked List

Hmm, perhaps there is a nice exercise for someone to contribute a Rust version of opendatastructures.org

1 Like

I remember that discussion. It made me think about what a pointer is, and about mechanism vs policy (which only sort of applies here). My conclusion was that indices are as valid as pointers in terms of creating indirection, because the mechanism may be different (i.e. how it works internally) but the policy (i.e. What they're trying to achieve) is the same with both pointers and indexing.

It's vaguely analog to methods vs fns: other than the syntactic sugar for methods, who cares where code ends up? They're different but serve practically the same purpose i.e. to encode behavior.

True that may be, but those texts are anything but holy dogma if you ask me. They're useful purely insofar they still reflect reality, which is to say, in spirit they're still going strong (binary trees are still useful for example) but in practice decreasing so apparently. When the state of the art is using indices, you're likely not going to get away with such a suboptimal solution in anything except a personal project.

2 Likes

I did exactly the same thing. It was a terrible experience. I thought Rust was a dead-end language.

Over a year later, I tried Rust again, and now it's my favorite programming language. Writing a linked list is about the worst place to start with Rust. :slight_smile:

I like the approach of using indices mentioned by many others in this thread.

1 Like

This. Every time a post begins: "I am new to Rust and am writing a linked list..." you know it won't end well.

To answer the OP, I'd start by teaching how to properly use data structures first. Creating them should be reserved for more experienced students and really isn't important for beginners.

Yes it is easier to create some structures in other languages, but those languages are forcing you into tradeoffs to give you that simplicity - either a lack of safety, or a hit to indirection and/or performance. Rust gives you control, at the cost of being more difficult for beginners.

5 Likes

I just woke up with a strange thought in mind.

Recently I read someone saying what amounted to: "Pointers are to using memory as goto is to programming". (I forget who it was or where unfortunately)

Well, the thought was that we don't teach use of goto to beginner programmers now a days. If our beginner is even given a language that has goto. It's almost universally considered a bad idea.

Similarly then we should not be exposing beginners to pointers. Ergo getting them writing traditional linked lists and such data structures with pointers is a bad idea.

3 Likes

That is like saying that because goto is considered harmful, we should not expose beginners to the concept of control flow. Pointers are not the problem. Indirection is pretty much unavoidable for any data structure more complicated than a flat array, and it is a tremendously important concept to understand. Instead, the problem is that languages before Rust treated pointers unsoundly, and the culture of sloppiness around memory management has permeated the field of programming.

Instead of not teaching pointers, we should be teaching them correctly – ideally in a language which has a clear notion of ownership and lifetimes.

1 Like