Reducing verbosity while satisfying the borrow checker

Hey Rustaceans!

I'm pretty new to Rust, coming from a C++ background, and I'm scratching my head over a borrowing issue. I've got this Graph struct, and I'm trying to figure out the most idiomatic way to work with it without running into borrow checker issues.

Here's what I've got:

pub struct Graph<T> {
    indices: Vec<usize>,
    edges: Vec<usize>,
    vertices: Vec<T>,
}

pub fn edges<T>(graph: &Graph<T>, vertex: usize, offset: usize) -> Result<&[usize], Error> {
    let edge = graph.indices[vertex];
    let size = graph.edges[edge];
    if offset >= size {
        return Err(Error::NoHandle);
    }

    Ok(&graph.edges[edge + header_element_offset() + offset..edge + size + header_element_offset()])
}

// Usage
let e2_edges = edges(&graph, e2, 0)?;
for &edge in e2_edges {
    if edge == 0 {
        get_mut(&mut graph, *edge) = "2.1 edited" 
        assert_eq!(*get(&graph, *edge), "2.1.edited");
    }
}

The problem is, when I try to modify a vertex after getting the edges, the borrow checker throws a fit. I get that it's because I'm borrowing the whole Graph in the edges function, even though I'm only using some fields.

I've found a workaround by making separate functions for each field:

pub fn edges_from_indices(indices: &Vec<usize>, edges: &Vec<usize>, vertex: usize, offset: usize) -> Result<&[usize], Error> {
    // Similar implementation
}

// Usage
let e2_edges = edges_from_indices(&graph.indices, &graph.edges, e2, 0)?;
    for edge in e2_edges {
        match *edge {
            0 =>{
                //*graph.get_vertex_mut(*edge) = "2.2.edited";
                *get_from_vertices_mut(&mut graph.vertices, *edge) = "2.2.edited";
                assert_eq!(*get(&graph, *edge), "2.2.edited");
            }
            _ => continue,
        }
    }
}

But man, this feels super verbose. In C++, I'd just pass a reference to the whole Graph and be done with it.

So, my question is: what's the idiomatic Rust way to handle this? Is there a pattern that lets me pass the whole Graph into the function without running into borrow issues, but also without having to split every field into its own parameter?

I've thought about a few options like using RefCell for interior mutability (But that has performance implication), splitting the Graph structure, or even using unsafe code with raw pointers (which feels wrong coming from C++, ironically). But I'm not sure what's the best approach here.

Obviously the program in the background is the same in both cases, but in one case the borrow checker doesn't know that only some fields are being used and instead counts the whole &Graph as a reference

Any insights from you Rust veterans would be super appreciated! Thanks in advance!

Full code here. What I wrote in this thread is just a very shortened pseudo-code

1 Like

I wrote a response below, but it doesn't actually do much (if anything) as far as reducing verbosity goes. Rust does tend to have a lot of boilerplate. My response does add some encapsulation.


Here are two articles on or related to the topic:

You've gone the "free variables" route with edges_from_vertices.

Here's a version using a view struct.[1]

Another option would be some sort of visitor pattern:

impl<T> Graph<T> {
    pub fn visit_neighbors_mut<F>(
        &mut self, 
        vertex: usize, 
        offset: usize, 
        mut visitor: F,
    ) -> Result<(), Error>
    where
        F: FnMut(&mut T),
    { /* ... */ }
}

Another (extreme) option would be to temporarily take ownership of the vertices, ala the sentinel pattern article.


Here's some unsolicited feedback about your playground in general.

  • Take &[T] not &Vec<T>. (And similarly &mut [T] when it makes sense.)

  • You have a lot of freestanding functions that could be &self or &mut self taking methods.

  • Don't use return X; as the last line in a function/method.

  • Almost no-one can get unsafe right in the learning phase.[2] Avoid it until you're more grounded.[3] Your unsafe block is possible in safe code. You also imported transmute (but didn't use it) which is what actually prompted this comment initially.

  • Your get and get_mut should probably return Options and you can implement Index and IndexMut for the panicking versions instead, mirroring the API of slices and Vec. Ultimately I removed them as they were unused.

  • get_from_vertices_mut seems superfluous.

  • (There were other unused functions/methods including two edges, which I just deleted rather than figuring out which to keep.)

This is a midpoint I got to with your playground, taking the above points into consideration. I.e. before adding the view struct or visitor parts.


  1. and a bunch of other changes from below ↩︎

  2. Beyond some real trivial cases like eliding an index check, at least. ↩︎

  3. Incidentally, what's UB in C/C++ and what's UB in Rust are not the same. ↩︎

7 Likes

which are negligible, with overwhelming probability.


Anyway, if you can't clearly express your data structure as non-overlapping borrows, then just make copies of the data you need. Copying even a moderately-sized contiguous slice of integers is nothing compared to all the branching and random access you are going to do with your graph traversal anyway.

which are negligible, with overwhelming probability.

Meeh. Being a long time C/C++ coder I got used to the zero overhead mentality "You only pay for what you use."
I never used shared pointers in C++ because ownership can be almost always clearly defined. If not, that means my design wasn't good enough.

Seeing stuff like that in the code makes me a bit uneasy. I avoid it like a plague and use it as the absolutely, completely last resort.

if you can't clearly express your data structure as non-overlapping borrows, then just make copies of the data you need

Don't want to do that. Memory allocations are one of the most expensive operations you can make. It has a massive overhead if the call has to go through the kernel.


@ quinedo

Thanks a lot. Really appreciate it as you put a lot of effort into that post. There is nothing better than example code to learn from.

I will look into that and make adjustments.

My only gripe is that you are accessing the edges directly through an index---I don't like this because a user has no business of knowing how the data is internally represented.

2 Likes

I don't necessarily disagree with the sentiment (and you could make a newtype for the indices to address this), but I still feel my modifications have more encapsulation than your playground where you were reaching into the (publicly exposed) fields of the Graph. Moreover, your use of get_from_vertices_mut was no less direct or hiding-the-fact that usize was an index than implementing IndexMut -- it has exactly the same signature, and merely less sugar. Implementation-detail wise, it's just as leaky.

Sadly, the playground is busted right now, but basically: put everything but main in its own pub mod but make the fields of Graph private. Your OP will break, whereas the use of a view type keeps those details private. (And the visitor pattern even moreso.)

In Rust, you also only pay for what you use.

If you are doing shared mutability, then you are paying runtime safety checks for not having designed a data structure that obeys shared-xor-mutable.

  1. This is simply false, as most modern memory allocators (including the ones commonly and by default used with Rust) absolutely don't reach out to the kernel upon every single allocation.
  2. Pointer chasing in a graph already has a massive overhead compared to code that predictably (ie., linearly) scans a flat array. Have you actually measured that cloning in the required places would actually result in an inacceptable (or even detectible…) slowdown?
1 Like

What you are saying is misleading.

  • Memory allocations are always expensive, regardless of whether they go through the kernel. Even a simple in-app memory pool has to manage memory and deal with issues like reallocations, internal and external fragmentation (which will happen if you do many small reallocations). Not doing it and accessing memory directly via pointers is always better if possible.

  • The overhead of pointer chasing is irrelevant. Graph structures by default are hard to deal with in a flat-like manner. That doesn't imply that you should give up performance in places where you don't have to.

  • If you can write performant code, write it. Cloning is slower than not cloning—therefore, a solution that doesn't involve cloning is better. No need to do measurements.

If you are doing shared mutability, then you are paying runtime safety checks for not having designed a data structure that obeys shared-xor-mutable.

Then my design is wrong, and the point is to rework it—which is why I made this post because I am new and don't know all the cool Rust quirks.


quinedot

Moreover, your use of get_from_vertices_mut was no less direct or hiding-the-fact that usize was an index than implementing IndexMut -- it has exactly the same signature, and merely less sugar. Implementation-detail wise, it's just as leaky.

get_from_vertices_mut accepted two input parameters that did background magic and returned you a value.

Your direct index access requires you to know how the underlying implementation works and what the values mean. If the data layout was different and more complex than an offset by index, the implementation you provided would fall apart---There is no encapsulation involved.

The rest of the feedback is very valuable however and will try one of those patterns

1 Like

We fundamentally disagree here. Optimizing without doing measurements is a shot in the dark. The point of measuring is to decide what you can get away with in a simpler way and what you have to re-work at the expense of increased complexity.

Trotting out blanket assertions like "not cloning is always better than cloning" is unscientific and empirically false, since the whole point is that some runtime costs are acceptable if they are small compared to other things the program is doing, while they also allow the code to be simplified. If what you are stating were true, there would simply be no Clone or Copy trait in the standard library.

I think the key difference here is:

  1. Rust has safe code, but C++ does not.
  2. Safe code relies on the borrow checker to be able to prove the soundness of what you're doing with pointers, and there are many entirely sound uses of pointers that the borrow checker cannot prove.
  3. “Good Rust code” often pays some “non-zero cost”, such as using Arc where it could in theory be avoided, to get the benefit of staying in the safe subset.

That is, there may be situations where the ownership is clearly defined, but not in a way comprehensible to the compiler, and we therefore make the choice of using Rc or Arc instead of unsafe code, because we value being able to say “this is 100% safe code, so the memory bug cannot be here”. It's not that you must use only safe code, of course, but rather that it’s supposed to be a last resort after the safe options are not good enough.

But C++ has no equivalent of “safe code”, so this tradeoff simply doesn't come up, and you might as well carefully write the zero-overhead version, since you have to be that careful all the time.

3 Likes

Why would that be a shot in the dark?
Clones / copies are always slower than re-using existing data. That is a fundamental fact. There is no reason for you to need to do any measurements.

Sure, writing efficient code can be a compromise. But engineering is all about finding unions between the possibe solutions and your requirements.

If there is a solution that allows me to write the code efficiently without copies, I will always prefer it over a code that doesn't.

what you are stating were true, there would simply be no Clone or Copy trait in the standard library.

That is a bit of strawman. The fact that that I don't want to use it here doesn't mean there isn't a legitimate usecase anywhere else.

False: copies that place the results in a more cache-friendly pattern are very often faster. Any copy that avoids a branch is near certainly faster. Any copy that is folded together with an operation that would read then write every byte of some data is free (think updating a table in place vs writing an updated copy).

Lots and lots more examples. Performance has pretty much one rule: you're never right about what's slow.

4 Likes

Place both approaches into the same circumstances...

If you have a guaranteed cache locality, direct access to that part of the memory over many iterations is going to be faster than copying the memory somewhere else and then accessing it.

Sure in a more complex scenario, if you have to chose between a system where you perform copies and it guarantees high cache locality vs a system that does direct access but branches a lot, then doing copies is better.

But that wasn't my point.

It isn't as simple as saying False or True.

The thing that's false is that it's always faster, as you said.

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.