Manually implementing a heap to overcome safety rules considered harmful


#1

There’s been a fair bit of talk about using this pattern:

struct Nodes {
    nodes:Vec<Node>
}
struct Node {
    id:u32,
    refs:Vec<u32>
}

To implement a ‘safe’ arbitrary graph in rust.

Due to the single ownership rules, this pattern is usually quite difficult to represent in rust and typically would use raw pointer references (or some abstraction) instead of indexes into the master nodes list.

I believe the pattern was initially was popularized by http://bluss.github.io/ixlist/target/doc/ixlist/struct.List.html

Now… I consider this pattern to actively harmful.

You are effectively implementing a ‘virtual heap’ to store objects in, complete with virtual raw pointers into that heap; and just like normal raw pointers, it’s trivial to end up with null pointers and invalid pointers into that heap.

To be fair, it does work within the bounds of the safety rules, and will prevent the sort of arbitrary memory access errors that unsafe code and raw pointers can result it.

…but:

  • I would argue that this pattern discards even the limited safety that raw pointers provide for a completely unchecked set of custom indexes, and is extremely prone to errors and panics resulting in out of bound access, without any unsafe blocks to track where logic audits need to be applied.

  • This pattern is fundamentally not thread safe, as you cannot safely share (afaik) a vector between threads (although perhaps you could use a mutex controlled wrapper I suppose?)

  • It is always neccessary to implement this pattern with both a Node and Container type, with the latter storing the instances of the former, and maintaining the synchronization of the index values is not trivial to implement.

I’ve argued these points to a few people now, so I thought I’d put this thread up here for future reference.

What do other people think?

Does the safety of using the pattern out weigh the negatives?


#2

The answer to this is that Rust does not care about panicking at the “safety-level”. Everything that is memory-safe (these indexes are, because they’re checked when indexing into the vector) is considered safe by Rust.

On a side note: “considered harmful”, really?


#3

Note that these errors and panics would be memory safety violations with raw pointers/unchecked code. Instead of getting a direct debuggable exit, one gets the symptoms of memory safety violations… a segfault if you’re lucky, random heap corruption if you’re not (or, worse, absolutely no sign of it until an attacker feeds in some malicious input).

One can, and should, regard the panics as the language catching what could easily have been a horrible hole in the code, not something that’s worse than unsafe.

It’s perfectly thread-safe, at least, one cannot achieve thread unsafety without trying hard; if there’s no unsafe, there’s no thread unsafety, and if there’s inherited mutability (i.e. anything that mutates goes via &mut), like primitive integers and Vec, thread unsafety isn’t so likely with unsafe.

Raw pointers, on the other hand, discard these guarantees; especially if one wants to try to implement structure that’s supports manipulation from multiple threads without a mutex wrapper.


#4

At some point it could be nice to have specialized container classes

One idea is to have a graph where the nodes are stored in an arena. Such a graph would not allow nodes to be deleted, but ideally the arena should provide a guarantee against dangling pointers. As far as I understand this would require some changes to the lifetime system. (See Nicos comment http://www.reddit.com/r/rust/comments/2s2etw/leveraging_the_type_system_to_elide_bounds_checks/ )

Another idea is to allow a mutable Graph to coerce in to a kind of “mutable GraphSlice”. This mutable GraphSlice should not allow nodes to be added or deleted, but it should allow nodes to be mutated. A GraphSlices should be able to create shortlived checked indexes that are guaranteed to never outlive the parent Graphslice. These checked indexes should not own the parent GraphSlice, but they should provide safety from dangling pointer problems (similar to the checks indexes in the link above). As far as I understand this would also require the same changes to the lifetime system, but I may be wrong…


#5
  1. I don’t think it is popularized by me in the “ixlist” (index list) code. I just wanted to expose and explain it with a good example. I learned it (in Rust context) from rustc’s graph module. Trying it properly is a prerequisite to either of the choices of further adoption or rejection.

  2. It’s a useful technique for data structures in general. I’ve come across it in other languages too, like in Python. The basic idea is just to assign unique ids to your objects and then use them instead of language-level pointers. It can make sense for all kinds of persistent structures.


#6

There are multiple points here. [OT]Sadly we don’t have a tree structured messaging board. So now you get everything in one post[/OT]

Lets take on the (usual) uses of pointers

  1. use after allocate and before free
  • our usual case, we want this
  1. free after free (double free)
  • an id might be used again when an old id tries to call the free-equivalent
    • reference counting: Use a struct with some (private) meta-info and no Copy implementation as ID-type. Forcing you to get mutable access to the “heap” (vector) to copy “pointers” (ids) should do the job.
    • One word: HashMap
      • More words: hashmaps have id -> value mapping without requiring ids to be contiguous. Simply add 1 to every new id you issue, rust’s overflow mechanism will protect you.
  • an id might be free and freed again
    • you need to keep track of which fields are free anyway to allocate new fields
      • using an Option<T> instead of a T
      • refcounting
      • HashMap + incrementing ids
  1. use after free
  • see 2.
  1. use before allocate
  • see 2.
  1. never free (memory leak)
  • here you need reference counting, HashMap or Option don’t help.
    • And then you might leak memory due to cycles.
  1. never allocate
  • non-issue

Alternatively you can always use Rc to create a graph. The only issue is possible memory leaks due to cycles. So you need to do manual cleanup.

So all the issues go away with garbage collection. Yay. someone go write a gc crate for rust!


#7

You can click “reply as linked topic” to split them up, actually.


#8

It might be nice if Rust could eventually solve this with some kind of ‘GC arena’, which would be limited using lifetimes to a specific root rather than having to scan the whole heap.


#9

The point I was trying to make is that with a pointer based graph you can form N unrelated subgraphs and process (update) them in parallel without allocations; since this pattern uses a single point of backend storage, you cannot mutable share subgraphs across threads without allocating a new storage container.

‘safety’ and ‘robustness’ are mutually independent metrics for code.

You may get the ‘safe’, but robustness is also important; for a user, if your application crashes, that’s it. There is no ‘good failure’ and ‘bad failure’. It just crashed.

The core of my argument against this pattern is that I feel it significantly reduces code robustness: as I said before, the logic is complex, and auditing for bugs becomes more difficult.

I mean absolutely, if you create a safe abstraction that implements these patterns in a library; sure. …but that’s true of unsafe pointers for implementing a graph too.

Do we really win much from being safe with a high complexity, rather than having a safe abstraction that is unsafe internally with a low complexity and explicit zones for auditing?

I’m not convinced.


#10

That is a reasonable point (it definitely wasn’t obvious from your original post, though).

Although, humans are nothing if not fallible, and safe concurrent code is hard to write; I would certainly personally prefer to have classes of hard-to-debug concurrency bugs automatically disallowed, if possible, and writing code in a way that lets the compiler help you is a great to achieve this.

They are definitely not independent.

Robustness requires safety. If an application is unsafe, then it cannot be robust: all robustness guarantees are off the table if it misuses unsafe blocks and hence causes bad things like dangling pointers, memory corruption, undefined behaviour.

Of course, it is definitely true that that an application that is safe is not automatically robust, and additional thought/design/debugging may be required.

There is good failure and bad failure.

An application can use a monitor thread to “catch” and handle an unexpected panic as usefully as possible, e.g.

  • a web server that panics while serving a request would print a “500 internal server error” for that specific request, but continue handling everything else as normal
  • a text editor might try to save as much state as it can
  • in general, an application that panics while opening/processing a file could indicate that the specific file could not be handled due to an application bug (providing e.g. info about how to get help/file a bug) but let the user continue using the application,
  • if an image/video/audio decoder panics in a web browser, it could substitute a placeholder indicating an error, but not take down the whole browser, not even taking down the single web-page being displayed (I believe that servo may (try to) do this now, already)

It is harder to achieve this for memory safety problems, even handling/recovering from the best case (a segfault immediately after the memory safety violation) is non-trivial. And that’s just the best case, the problems may not manifest as a crash at all, just heap corruption and “heisenbugs”; silently breaking the user’s data, or letting their computer be pwned, or displaying their private keys to the whole internet (for example).

Of course, I definitely agree that high-complexity is not good at all, whether in safe or unsafe code, but please don’t ignore the difference between panic!s and unsafe.

It is definitely not clear to me that it will be possible to have a low-complexity graph abstraction using unsafe internally to expose a safe API that allows for e.g. mutating disjoint subgraphs concurrently.