Data Structures appear to have an extra level of difficulty in Rust, how would you teach them?

To respond to the OP's question... I'd say data structures are harder to write in Rust than C for the same reasons that the language has a steep learning curve.

It's not good enough for your code to work correctly when used as intended, the compiler expects you to make sure it can't misbehave when faced with any safe code.

I would say that it's pretty uncommon for people to write their own data structures nowadays outside of very low level libraries or niche applications.

C doesn't have the concept of generics so every time you needed a hashmap or resizable array you needed to write it yourself. However in Rust we have nice things like std::collections and Vector<T> which largely makes that redundant, similarly if I needed a graph I would just add petgraph to my Cargo.toml and continue on.

That's not to say data structures are irrelevant, you still need to know how they are implemented internally if you want to use them properly, but I haven't seen them used much outside of first year CS units or programming interviews.

3 Likes

For what it's worth, I've used Rust a bunch for competitive programming questions where algorithms and data structures are the bread and butter. I have yet to run in to a question where Rust is a bad choice due to pointers being hard to work with.

Some examples that seem like they would be difficult, but weren't, include:

  • A circular linked list. I wrote it using a vector. I doubt that it would have been easier to do with pointers.
  • Graphs. Even if I weren't using Rust, I would be building my graphs with indexes.
  • A tree for partitioning 2d space. I used a Box for here, so I used real pointers. For this use-case, Rust was perfectly happy with the solution.
10 Likes

I'm really glad you said that. That is my natural inclination as well. As an old time programmer of embedded systems in C where randomly mallocing things and therefore using pointers everywhere is really not acceptable.

But after the discussion here: When is a linked list not a linked list? and reading Introduction - Learning Rust With Entirely Too Many Linked Lists I started to think I missed a point and was doing it all wrong.

Aside: Back in the day I went for an interview for a contract position on a military embedded project. The project lead asked me some probing questions about the use of pointers in C. I tried to explain how all that was a really bad idea in a memory constrained, safety critical, embedded system. But I did not have my thoughts about it in a concrete form at the time. I guess he assumed I was an idiot who did not understand pointers, I did not get that position.

Meanwhile, to this day we have introductory texts on data structures that assume mallocing and using pointers are the way it is done. For example the opendatastructures.org book nice book here: 3.1 : A Singly-Linked List

Hmm, perhaps there is a nice exercise for someone to contribute a Rust version of opendatastructures.org

1 Like

I remember that discussion. It made me think about what a pointer is, and about mechanism vs policy (which only sort of applies here). My conclusion was that indices are as valid as pointers in terms of creating indirection, because the mechanism may be different (i.e. how it works internally) but the policy (i.e. What they're trying to achieve) is the same with both pointers and indexing.

It's vaguely analog to methods vs fns: other than the syntactic sugar for methods, who cares where code ends up? They're different but serve practically the same purpose i.e. to encode behavior.

True that may be, but those texts are anything but holy dogma if you ask me. They're useful purely insofar they still reflect reality, which is to say, in spirit they're still going strong (binary trees are still useful for example) but in practice decreasing so apparently. When the state of the art is using indices, you're likely not going to get away with such a suboptimal solution in anything except a personal project.

2 Likes

I did exactly the same thing. It was a terrible experience. I thought Rust was a dead-end language.

Over a year later, I tried Rust again, and now it's my favorite programming language. Writing a linked list is about the worst place to start with Rust. :slight_smile:

I like the approach of using indices mentioned by many others in this thread.

1 Like

This. Every time a post begins: "I am new to Rust and am writing a linked list..." you know it won't end well.

To answer the OP, I'd start by teaching how to properly use data structures first. Creating them should be reserved for more experienced students and really isn't important for beginners.

Yes it is easier to create some structures in other languages, but those languages are forcing you into tradeoffs to give you that simplicity - either a lack of safety, or a hit to indirection and/or performance. Rust gives you control, at the cost of being more difficult for beginners.

5 Likes

I just woke up with a strange thought in mind.

Recently I read someone saying what amounted to: "Pointers are to using memory as goto is to programming". (I forget who it was or where unfortunately)

Well, the thought was that we don't teach use of goto to beginner programmers now a days. If our beginner is even given a language that has goto. It's almost universally considered a bad idea.

Similarly then we should not be exposing beginners to pointers. Ergo getting them writing traditional linked lists and such data structures with pointers is a bad idea.

3 Likes

That is like saying that because goto is considered harmful, we should not expose beginners to the concept of control flow. Pointers are not the problem. Indirection is pretty much unavoidable for any data structure more complicated than a flat array, and it is a tremendously important concept to understand. Instead, the problem is that languages before Rust treated pointers unsoundly, and the culture of sloppiness around memory management has permeated the field of programming.

Instead of not teaching pointers, we should be teaching them correctly – ideally in a language which has a clear notion of ownership and lifetimes.

1 Like

I think this is the main point. Of course you can write doubly-linked lists with pointers in Rust, you just need to implement everything unsafe-ly. Which is what you are doing in C anyway, there it's just implict.

The fact that it requires unsafe points out that it's hard to get right. So I would rather flip the OP statement: doubly-linked lists (etc) are just as hard to get right in C as in Rust. Singley-linked lists are easier in Rust than in C, because the compiler helps you.

2 Likes

You can use "weak", but the interface is clunky and you have to get away from single ownership.

Could this proposed feature, "derive owner()" work?

 
#[derive(Owner<Bar>)] // generates a function owner() that returns the owner of the object.
 struct Foo {
     s1: String
 }
 
 struct Bar {
     my_foo: &Foo, 
     s2: String
 }
 
 fn main() {
     let f1 = Foo { s1: "Hello".as_string() }; // initialize Foo
     let b1 = Bar { my_foo: &f1, s2: "World".as_string() };  // b1 now owns f1
     println!("{}", f1.my_foo.s1); // prints "Hello"
     println!("{}", f1.owner().unwrap().s2); // prints "World"

The idea here is that deriving an "owner" function does two things. First, it puts a hidden field in the struct that points back to the owner. Here, that hidden field would have type Option. Ownership by anything other than a Bar puts a None in the Option.
Second, it adds code to anything that takes ownership of that struct to update the hidden field.

Deriving "owner" for an object would of course limit it to single ownership, since otherwise the owner is ambiguous.

This allows making doubly linked lists and simple trees without reference counts.

OK, what's wrong with this proposal?

Derive macros can't modify their input declaration. You'd need another kind of proc-macro for that, e.g. an attribute.

How would that work? It looks like this would need "move constructors", which (I hope) basically won't happen.

2 Likes

As defined, types Foo and Bar are extremely limited and can't work. Bar is supposed to own Foo, but it is a view struct that actually needs a lifetime annotation:

struct Bar<'foo>{
    my_foo: &'foo Foo, 
    s2: String
}

But, that ties the lifetime of a Bar instance to the lifetime of a Foo instance, instead of the other way around (which is I think always the case for an owner).

In addition, what exactly should Foo's pointer to its owner look like? If it's a raw pointer, that's unsafe and will likely lead to UB and segfaults for some code using it. If it's a borrow, that will fail to compile because Bar already has a borrow to a Foo instance. And something like Rc or Arc means that Foo ends up owning Bar (in a shared fashion) rather than Bar being the owner of Foo.

I can't prove it atm, but currently I strongly believe that any proposal to add a link back to the owner is fatally flawed at the conceptual level.

You may be right. But if a back pointer to an owner was impossible, doing this with a weak pointer woudn't work either. You'd get the double-borrow panic if you tried to go back to the previous item in a list.

Weak works here because the node here is managed by the Rc system of shared ownership. While you're working with a back pointer, both the child and the grandparent share ownership of the parent node. This shared ownership prevents you from making any changes (outside of Cell / Mutex).

1 Like

What is your take on BAPC 2010 I Keylogger which you can submit here?

Scan the input and maintain two stacks: characters left of the cursor (top of stack is right-most) and characters right of the cursor (top of stack is left-most). The operations are

match input {
    '<' => pop_left_stack_and_then_push_it_onto_right_stack(),
    '>' => pop_right_stack_and_then_push_it_onto_left_stack(),
    '-' => pop_left_stack_and_then_drop_it(),
    _ => push_left_stack(input),
}

When done, read off the stacks (left stack: bottom to top, right stack: top to bottom).

Nice. This passes in 295ms:

// BAPC 2010 I Keylogger
// as suggested by https://users.rust-lang.org/t/data-structures-appear-to-have-an-extra-level-of-difficulty-in-rust-how-would-you-teach-them/61264/28
//
// @author godmar
use std::io::{self, BufRead};
 
fn main() {
    let stdin = io::stdin();
    for s in stdin.lock().lines().map(|l| l.unwrap()).skip(1) {
        let mut left : Vec<char> = vec![];
        let mut right : Vec<char> = vec![];
        for c in s.chars() {
            match c {
            '<' => if let Some(c) = left.pop() { right.push(c); },
            '>' => if let Some(c) = right.pop() { left.push(c); },
            '-' => { left.pop(); () },
            _ => left.push(c),
            }
        }
        for c in left.iter() {
            print!("{}", c);
        }
        for c in right.iter().rev() {
            print!("{}", c);
        }
        println!("");
    }
}

I had suggested this problem because it was the only problem we've ever encountered that would actually make use of Java's java.util.LinkedList and its removing iterators in a useful way.

Overall, upon reflection, it shouldn't perhaps come as a surprise that competitive problems can be solved in Rust; after all, the most commonly used paradigm to solving these problems is C++ without pointers.

Yes. If you do that with lists, you're chasing all over memory, getting cache misses, and doing way too much allocation and de-allocation for the amount of work done for each character.

2 Likes

It's more expensive, but not "too much" under the given problem bounds. It passes in about 0.75s (including JVM overhead and "slow IO" using java.util.Scanner. Though I haven't looked at Rust's I/O so I don't know whether stdin.lock().lines() is correct.)

What counts is how quickly you get the insight and can implement it. If you don't have the stack insight, you can implement it with a linked list straightforwardly.

PS: Using a faster I/O template for Rust suggested by a CF competitor, along with using u8 instead of char, drop Rust's time from 295ms to 30ms. Java's time with fast I/O dropped only from 748ms to 623ms.

PPS: Cache misses are not likely a problem here. The code has locality since elements are added at the cursor and each operation move the cursor only one element. Likely, the link elements are allocated in spatial locality in a contiguous area. What really kills performance is the per-character overhead (likely 24 bytes or so).

1 Like

What happens, if you just lock the stdout explicitly, as well, instead of locking it implicitly repeatedly? That seems to me like a simple change with perhaps a big impact.

// BAPC 2010 I Keylogger
// as suggested by https://users.rust-lang.org/t/data-structures-appear-to-have-an-extra-level-of-difficulty-in-rust-how-would-you-teach-them/61264/28
//
// @author godmar
use std::io::{self, BufRead};
 
fn main() {
    let stdin = io::stdin().lock();
    let stdout = io::stdout().lock();
    for s in stdin.lines().map(|l| l.unwrap()).skip(1) {
        let mut left : Vec<char> = vec![];
        let mut right : Vec<char> = vec![];
        for c in s.chars() {
            match c {
            '<' => if let Some(c) = left.pop() { right.push(c); },
            '>' => if let Some(c) = right.pop() { left.push(c); },
            '-' => { left.pop(); () },
            _ => left.push(c),
            }
        }
        for c in left.iter() {
            write!(&mut stdout, "{}", c);
        }
        for c in right.iter().rev() {
            write!(&mut stdout, "{}", c);
        }
       writeln!(&mut stdout, "");
    }
}