How to efficiently compute depth of generated tree

Hello :waving_hand: , I'm new to rust so I'm trying to implement some interesting problem I already have in another language. But i keep running into problems with the borrow system. In particular I have this virtual tree, generated from the root that has type System<'a>.

System
struct System<'a> {
    env: Rc<Environment<'a>>,
    set: Set<'a>,
    process: Process<'a>,
    rules: Rc<Vec<Rules<'a>>>,
}

Set is implemented as a hashset of atoms and process is a recursive enum.

In particular, env and rules are shared by all members, set and process instead are unique for each node.

The first problem i have is that i would like to know the depth of the tree. Since i know that its perfectly balanced, it is enough to check the depth of the leftmost leaf.

fn depth<'a>(system: &'a System<'a>) -> Result<i64, String> {
    let current =
	match one_transition(system)? {
	    Some((_, system)) => system,
	    None => return Ok(0),
	};
    Ok(depth(&current)? + 1)
}

The function one_transition simply follows the leftmost branch and has signature:

fn one_transition<'a>(system: &'a System<'a>) -> Result<Option<(Label<'a>, System<'a>)>, String>

where the label is metadata about the transition so its ignored in this case. The computation may fail so its wrapped in a Result, if there is a next node we return Some(...) otherwise we return None.

The problem with the function depth is that it keeps in memory all previous nodes until we have found the last one. I would like to rewrite it in a more imperative style like:

fn depth<'a>(system: &'a System<'a>) -> Result<i64, String> {
    let current = one_transition(system)?;
    if current.is_none() {
        return Ok(0);
    }
    let mut n = 1;
    let mut current = current.unwrap().1;
    while let Some((_, next)) = one_transition(&current)? {
        current = next; // incriminating line
        n += 1;
    }
    Ok(n)
}

except that it doesn't work since current is dropped. Cloning doesn't seem to help.

The second problem is if i need to return some information about the last element the compiler keeps again complaining about lifetimes and how i cant clone stuff. But solving the first problem probably solves the second too.

You haven't shown the full structure of your tree, but from what you wrote it's very likely that you wrote a self-referential struct. This is a common issue that people new to Rust face, so you might want to read about it to sort it out.

The first thing I would try is changing all occurrences of &'a Xxx<'a> to &Xxx<'a>. Assuming you're not using an arena for allocating all these objects, this is a very common mistake that causes "borrowing forever". See:
https://quinedot.github.io/rust-learning/pf-shared-nested.html

You may still have other lifetime problems, but this is a good first step.

3 Likes

There is no "tree", i just generate the new nodes upon request from the parent. If you are referring to Process<'a>, then its something like this (obviously simplified):

enum RSprocess<'a> {
    Zero,
    One{ identifier: &'a str },
    Two{ set: Set<'a>,
	     next_process: Rc<Process<'a>> },
    Multiple{ children: Vec<Rc<Process<'a>>> },
}

Thanks! unfortunately it doesnt help in this case: the only point i can remove explicit lifetimes is in the depth function:

Errors
fn depth<'a>(system: &System<'a>) -> Result<i64, String> {
    let current = one_transition(system)?;
    if current.is_none() {
        return Ok(0);
    }
    let mut n = 1;
    let mut current = current.unwrap().1;
    while let Some((_, next)) = one_transition(&current)? {
        current = next; // incriminating line
        n += 1;
    }
    Ok(n)
}

and desnt compile:

error[E0506]: cannot assign to `current` because it is borrowed
   --> .../file.rs:122:2
    |
121 |     while let Some((_, next)) = one_transition(&current)? {
    |                                                -------- `current` is borrowed here
122 |     current = next; // incriminating line
    |     ^^^^^^^
    |     |
    |     `current` is assigned to here but it was already borrowed
    |     borrow later used here

For more information about this error, try `rustc --explain E0506`.

You may have a larger design problem and if you can show the full code then someone may be able to spot it. The red flag is that you're using lots of references, all with the same lifetime.

If you can't post everything, by linking to a repo for example, then I suggest at least posting enough code to see how the returned System is constructed by one_transition.

I don't understand why the first lifetime needs to be 'a in system: &'a System<'a>. This is incompatible with the fact that you're storing a System in a local variable (current) and passing a reference to that local variable. Does that make any sense?

1 Like

It is probably a larger design problem. I will probably try to convert everything to not use references, its not clear to me yet when references are the same as pointers (in c for example) in this aspect, but ill try to read up on that.

The structure is: i parse some rules and questions on the resulting structure -> from these rules i generate some trees and check if rules are satisfied or not -> i return the results.
one_transition is deceptively simple:

fn one_transition<'a>(system: &'a System<'a>) -> Result<Option<(Label<'a>, System<'a>)>, String> {
    let mut tr = TransitionsIterator::from(system)?;
    Ok(tr.next())
}

The iterator is where everything is computed, and i reuse it also in the case where i need all the children. Its also 200 lines of code, ill try to summarize how it works.

Iterator
struct TransitionsIterator<'a> {
    all_iterator: std::vec::IntoIter<(Rc<Set<'a>>, Rc<Process<'a>>)>,
    system: &'a System<'a>,
}
impl<'a> TransitionsIterator<'a> {
    pub fn from(system: &'a RSsystem<'a>) -> Result<TransitionsIterator<'a>, String> {
        // here i build the iterator (it can fail sometimes)
        // all_iterator is storing succinctly how to generate the children, but they are not stored since they may be GB
    }
}

impl<'a> Iterator for TransitionsIterator<'a> {
    type Item = (Label<'a>, System<'a>);

    fn next(&mut self) -> Option<(Label<'a>, System<'a>)> {
        let (c, k) = self.choices_iterator.next()?;
        // ... after some computation i return the next children
        let new_system = System::from(
            Rc::clone(self.system.get_env()), // get_env returns Rc<Environment<'a>>
            set,
            (*process).clone(),
            Rc::clone(self.system.get_rules()), // get_rules returns Rc<Vec<Rules<'a>>>
        );
        Some((label, new_system))
    }
}

The thought process was: in the function one_transition i just need to read the content of system, and I don't want to pass the whole data structure, so a pointer would be best. I generate the corresponding label and new system that reference the previous system with reference counters.
System needs the lifetime because I'm storing &str somewhere in a subtype. It does make sense that the compiler wants to delete the previous node traversed, and i dont know how to tell it to actually ignore the lifetime by cloning or something else.

Pointers in C that are allocated with malloc correspond to Box/Arc/Rc in Rust, not to Rust references, which are meant to be short-lived. The first paragraph of this post is a good description of how references should be used in Rust, when first learning at least.

2 Likes

I'll read it and if I have the time will post here how I changed the code. Thanks again!

1 Like

Passing a reference is fine, but not with the same lifetime as the lifetime in System<'a>. The lifetime of a reference to a local variable should almost never be the same as the lifetime stored in a struct (although there are exceptions, this is not the direction to go when you're learning).

The problem is more that current has been borrowed in order to return next, so the lifetime of the latter cannot be longer than the lifetime of the former. In general it will help a lot to get rid of the lifetimes in the data structure, and then try to avoid explicit lifetime annotations ('a) on references. The default lifetimes for references usually work well when passing references as parameters.

One thing that could help (in addition to getting rid of references in the data structure) is to pass current (not &current) to one_transition. This will move/consume current, but that's actually what you want in this case. And more importantly it will avoid borrowing from current to create next.

2 Likes

This makes sense! So as a general rule: avoid borrows and instead use counters or boxes, or is this also wrong?

Wouldn't that copy/clone the value on the stack before passing to the function? If I don't want to have it copied in c i would simply pass the pointer, so i should pass a box or a rc, right?

For data structures, yes. For parameters and temporary local variables, references work well.

System has four fields and they all contain just a pointer plus length and/or ref count. Copying such a struct on the stack is of no concern, and this copy may even be optimized away by the compiler if the called function is inlined. The (potentially) larger content of these fields (behind the pointers) is not cloned when moving the variable containing the System.


If you can post it again after you rework it a little, I think that would be a good time to give more detailed comments.

2 Likes

You most likely want the iterator to have an additional lifetime 'b and store a &'b System<'a>.

This is misleading. C pointers definitely correspond to references sometimes. Part of the translation is then adding the lifetime information, which was implicit in the C code and never checked by an automated system like the Rust compiler. This step is however not always possible (if the C code had a weird or even wrong handling of its implicit lifetime) and you may also get it wrong if you don't think about how you're using your lifetimes (e.g. a common mistake is using the same lifetime for everything, which causes everything to become tangled up and may require you to keep unrelated stuff alive for no reason).

3 Likes

Rust does the same thing to C pointers that C itself did to word in B (C predecessor).

B only had one data type: word. C added char and short, but, more importantly for our discussion, pointers were split out of ints.

Trying to map “C pointers” to anything in Rust is the exact same excercise in futility as an attempt to map that “machine word” datatype of B into anything in C.

Some C pointers would be mapped to references, some would be mapped to Arc or Rc, some would even be mapped to Rust's raw pointers.

That's why Mistake 6 in the well-known article How not to learn Rust is so expansive, with seven submistakes: most other languages simply don't include so many different flavors of pointers! In fact last 20 years were spent on attempt to hide the complexitty from the developers while Rust exposes it.

This makes Rust so different from most other mainstream languages that attempts to bring over designs from other languages often hurts more than helps.

1 Like

Thats a problem i keep finding with rust blogs, documentation and even in some errors during compilation: you are telling me what not to do, but no solutions to my problem. In other languages I had to learn there was a clear path or idea to follow, classes for java, generic functions for ocaml, message passing for erlang, ... So, are there resources on how to solve common problems "the rust way"?

Other than that, it seems like Mistake 7 was written for me :slight_smile:

The issue is, there is no general and performant solution.

For example you could wrap everything in Rc and RefCell and never use long lives references, but this would just be a worse garbage collector. This is not far from what most GC languages tell you to do.

Or you may want more performance (after all that's probably one of the reasons you choose Rust, no?) and decide to use references, but then you "lock in" a design that forces you to architect around that, and this requires knowledge about the rest of the system that will interact with it and the restrictions that this imposes. When people talk about "most common problems" they most often won't include all these informations, nor is it really possible to write some guidelines including all those cases.

3 Likes

At the end i reworked the problem: instead of storing strings i have a step during the lexer/parser and before computation that converts all strings to unique integers with a hashmap, and a last step to convert back. So now I store only u32 and they dont need lifetimes. I can also store String in the hashmap since its used only twice.

I know this is not really "solving" my initial problem. Another solution would have been to replace all &str with String and let structures own the data.

Code

Now my functions are like this:

fn one_transition(
    system: &System,
) -> Result<Option<(Label, System)>, String> {
    let mut tr = TransitionsIterator::from(system)?;
    Ok(tr.next())
}

fn all_transitions(
    system: &RSsystem,
) -> Result<Vec<(Label, System)>, String> {
    let tr = TransitionsIterator::from(system)?;
    Ok(tr.collect::<Vec<_>>())
}

fn one_target(system: &System) -> Result<i64, String> {
    let current = one_transition(system)?;
    if current.is_none() {
	return Ok(0);
    }
    let mut n = 1;
    let mut current = current.unwrap().1;
    while let Some((_, next)) = one_transition(&current)? {
	current = next;
	n += 1;
    }
    Ok(n)
}

I still keep a reference in the iterator tho:

#[derive(Clone, Debug)]
pub struct TransitionsIterator<'a> {
    choices_iterator: std::vec::IntoIter<(Rc<Set>, Rc<Process>)>,
    system: &'a System,
}

Thank you, that's correct. I was thinking of C pointers allocated with malloc in particular, and I'll add that to my post.

1 Like

Great! You removed the invalid lifetimes on temporary references, so I'd say you did solve the problem.

PS. Since you know C you may be interested in this recent book:
https://rust-for-c-programmers.com/
It's been discussed here:
https://users.rust-lang.org/t/introduction-to-rust-for-c-programmers/119965

2 Likes