Callback to recursive function

Hi!

I'm very new to Rust, but been coding in C, C++ and Python for many years. I somehow managed to build a small JSON parser based on Serde and pure luck, no skills involved! The data comes from Figma's REST API.

My problem is this:
I need to understand the tricky parts of Rust better. I think if I can get below problem solved, and explained, it would help a lot. I understand the concept of dangling pointers and scope in general, but not how to handle them in Rust. The & and * operators seem similar to C/C++, but then there are all these things with lifetime syntax, as_ref(), collect() etc. I have a rough idea about what Box does (allocates data on the heap, right?).

I have already downloaded the JSON file locally.
I first make these calls in main():

let json: String;
json = fs::read_to_string(infile).expect("Read error");
let result = serde_json::from_str(&json);
let res: FullFigma = match result {
        Ok(file) => file,
        Err(error) => panic!("Problem reading JSON {:?}", error),
};

At this point I can directly access JSON elements via the res variable, which is the top of the structure. It has some sub elements of type Document, which are recursive and sequential, plus some other elements which I want to access once this problem is solved.

If I understand correctly, Serde puts the data on the heap, which is probably why I can build and return the ids vector from the traverse_children() function below?

Inside main(), I have defined a callback function named probe(), like this:

    fn probe(t: &Document) {
        println!("probe: {}", t.id);
    }

and then I make this call, also from inside main():
let ids = traverse_children(&res.document.children, probe);

The point in having a callback function, is to be able to make the traverse_children() function generic, so I only have to write different callbacks without having to change traverse_children(). These callback functions may build their own data structures and lists, but the original data will never be altered. The callback functions should be called for every node in the list every time I call traverse_children().

The traverse_children() function is separate, defined outside of main(). It itself contains the definition of a recursive() function, which is called for each child node it detects:

fn traverse_children<'x>(tree: &'x Vec<Document>,
    probe: fn(t: &Document)) -> Vec<&str> {

    fn recursive<'node>(node: &'node Document,
        ids: &mut Vec<&'node str>,
        probe: fn(t: &Document), level: u16) {
        probe(node);
        ids.push(&node.name);
        if !&node.children.is_none() {
            // This might be an empty array!
            let childcount = &node.children.as_ref().expect("IGNORED").len();
            if *childcount > 0 {
                for ii in 0..*childcount {
                    recursive(&node.children.as_ref().expect("DING")[ii],
                        files, probe, level + 1);
                }
            }
        }
    }

    // This is not important:
    let mut ids = Vec::new();

    for child in tree.iter() {
        recursive(child, &mut ids, probe, 0);     
    }

    ids
}

The return value, ids, is not of interest, and please ignore the ugly leaf-detection.

The probe() function successfully prints the id fields of all nodes.

But what if I wanted to let probe() create a list on the heap, where each element is some struct to hold values copied from the nodes? How do I do that?
When it's done, I want to be able to use the list from inside main().

Let's say I define a structure like this:

struct Elem {
  id: String,
  level: u16,
  node: *Document, // Pointer to the the original element
  parent: Some(*Document), // Pointer to target's parent (if any)
}

The node and parent fields don't exist in the original data, They have to be calculated by probe().

I don't want to copy Document nodes to my list. I want to point to them instead, to save space because they are quite large, and there will be many lists. The Elem definition above is probably wrong, just a guess.

Once the lists have been built, I don't think I need to change them until the program terminates.

I really would appreciate a good description, with explanations and quite "straight to the point". To summarize the questions:

  1. How do I make probe() create a list of Elem elements, which I can access from main(), not just from inside traverse_children()?
  2. How shall probe() be written, to create pointers and not copies to the original nodes and their immediate parents?
  3. How shall Elem be defined, to correctly point to the original nodes?
  4. Once the Elem list is created, how can it be used to access any of the original node fields (id, name etc.)?
  5. How would a Rust expert rewrite the code, to make it more rusty?

Many thanks in advance!

Can you share an example of the JSON you use?

I recommend you start with one or more of the things listed under the Get Started section here: Learn Rust - Rust Programming Language

The particular problem you're going after, callbacks + filling one data structure which references another + more, needs a solid foundation since merging these concepts together requires a higher level of Rust expertise than someone just starting out should expect from themself.

1 Like

Assuming res (and it’s fields, recursively…) already owns all the data – in particular the values of type Document – you want to point to, and you keep it around for long enough, you can probably work with references. So it’s possible to define a struct like

struct Elem<'x> {
  id: String,
  level: u16,
  node: &'x Document, // Pointer to the the original element
  parent: Some(&'x Document), // Pointer to target's parent (if any)
}

also note that these references only give immutable access.

In this situation, all of your additional information collected via probe, comes from a single, relatively long-lived borrow of the original &Document argument – they all have the same lifetime, and in cases where you need to name the lifetime, it’s probably easiest to use a consistent name, instead of switching between 'node and 'x.

A Vec<Elem<'node>> then works very similar to how Vec<&'node str> worked for you before.


Without understanding the whole use-case I of course cannot be certain that these constraints (all references refer to things already owned by a sufficiently long-lived variable, presumably res in the main function, and immutable access through the “pointers” is sufficient) work for you. Commonly, when such conditions aren’t met, with structs that have lifetime arguments, it’s quite easy in Rust to paint oneself into a corner of impossible to fulfill lifetime constraints.

2 Likes

On second look I notice the code structure better. You are saying that probe, which is a callback, should be the thing to create the list of Elem. This is quite possible, but requires some adjustment to the function signature.

For once, the type signature probe: fn(t: &Document) currently spells out an (implicit) higher-ranked types, explicitly for<'a> fn(t: &'a Document). This unnecessarily requires too much of probe, which – in recursive – always gets passed a &'node Document, i.e. one of particularly long lifetime (coupled to the node: &'node Document argument).

Also you want the callback to have access to some state in the main function, so you can build up the list. The way the callback can capture state is via the traits Fn/FnMut/FnOnce. The right choice for this case is FnMut, so you can call it multiple times, but it is allowed to have mutable access to some captured state during the call. Finally, in order to easily pass it to multiple recursive calls, inside of the recursion, the callback should best be a mutable reference. The only remaining question then is whether you want the recursive function to be monomorphized for every choice of probe function, at the cost of using non-static function calls, or whether you want it generic over the closure type. Let’s go with the latter; for the dynamic approach, you’d use something like &mut dyn FnMut(&'node Document) instead (or with 'x… eh, inconsistent naming… let’s all call them 'tree for now :smile:)

So here’s roughly how this should look at the end:

fn traverse_children<'tree>(tree: &'tree Vec<Document>,
    mut probe: impl FnMut(t: &'tree Document)) -> Vec<&'tree str> {

    fn recursive<'tree>(node: &'tree Document,
        ids: &mut Vec<&'tree str>,
        probe: &mut impl FnMut(t: &'tree Document), level: u16) {
        probe(node);
        ids.push(&node.name);
        if !&node.children.is_none() {
            // This might be an empty array!
            let childcount = &node.children.as_ref().expect("IGNORED").len();
            if *childcount > 0 {
                for ii in 0..*childcount {
                    recursive(&node.children.as_ref().expect("DING")[ii],
                        files, probe, level + 1);
                }
            }
        }
    }

    // This is not important:
    let mut ids = Vec::new();

    for child in tree.iter() {
        recursive(child, &mut ids, &mut probe, 0);     
    }

    ids
}

Note that I’ve left the outer probe argument as an owned value, for convenience. This essentially merely generalizes the use-case as for some F: FnMut(…) closure, the type &mut F will implement the same FnMut(…) trait bound.

Sorry for the piecemeal replies… anyways, the last aspect is probably the question of how probe is supposed to know the immediate parents of the Document. I assume one possible approach is to simply give it a second argument, Option<&'node Document> [so then it’s FnMut(Option<&'node Document>, &'node Document)] for the immediate parent to be provided by the recursive function, right? Similarly, I’m not sure if you need to calculate the level information, too, of it that’s something already found in the data. If it’s the thing computed by your recursive function, that’s a good candidate for another argument to probe, as otherwise, I wouldn’t know how it should (easily) keep track of the current level itself. If there’s any remaining problems with finishing this up with my information above, feel free to post follow-up questions :slight_smile:


Another note: iterating by-index, as you do in the for ii … loop, is usually unindiomatic, and you should instead try something like for child in node.children.as_ref().expect("IGNORED") … (you also don’t need the if *chouldcount > 0 part then; though that might’ve been redundante already).

On another note, doing a is_none check followed by unwrap or expect on the same Option is note idiomatic either, you should use something like if let Some(children) = &node.children.

I also wonder what the type of children is in the first place. Option<Vec<…>>? Maybe the Option layer ways unnecessary to begin with, as empty vecs are possible… though I don’t know how that might change the interaction with the JSON, off the top of my head – feel free to try it though!

2 Likes

These files are quite large, so I hesitated to include an example in my question. But fortunately I found this site: https://pavellaptev.github.io/JSON-from-Figma/
If you press Parse My File, You'll get a downloadable example, or you can expand it a bit down on the page.
What is of interest here is the document element(s). If you expand them you can see that they contain children elements which contain other children elements, thus being recursive. I don't know why the outer nodes are called document, since they seem to have exactly the same structure as children elements.

Thanks for the advice. I think I learn best by mixing reading with trying to solve a real, own problem. Once this recursive problem is solved and I understand the solution, I'll carry one reading! :smiley:

Re: Learning more (about) Rust.. especially on the basis of having some C++ background, for an overview over some interesting aspects of and principles in the Rust language, I can also highly recommend watching this video presentation at some point (or even right now; there are no prerequisites):

1 Like

Thank you very much Steffahn!

I will study your answers in detail. It will take a while I guess!

Here's some feedback on all your replies:

  1. Yes, we can assume res already owns the data. This is what I referred to as the "original" data which is unaltered throughout the entire session.
  2. I think it's OK with immutable access from the Elem structs.
  3. I think you have grasped the idea of probe() very well. This function is like a tourist following a guided tour through a city. One probe-tourist is very interested in animals, and creates a list of all animals he/she sees in the city. Another tourist is only interested in counting the number of steps taken, so doesn't even make a list, just counts steps, and so on. The recursive function is the guide, who always takes the same tour. The main() function is the hotel, where the tourists will review/list their findings after the tour. If I understand you correctly, FnMut allows the tourists to add notes to the their lists back in the hotel. That can be useful. I'll try it.
  4. The level calculation is not a problem. As you can see, I just add one for each recursion level.
  5. Thanks for the comments about unidiomatic parts!
  6. Those children are quite complex. You can find examples here: https://pavellaptev.github.io/JSON-from-Figma/. But I don't think they are very interesting.

I will now study your replies in details and try them out. I'm not yet sure I'll be able to handle the lists, that probe() shall generate, in the main() function properly, but let's find out!

Once again thanks!

1 Like

Cool! As I mentioned before, follow up questions are welcome, but I thought, I'd let you read and experiment yourself first. For example, as an intermediate step, (maybe even as a first), you could also try to use the simple, single-argument probe callback like the code I posted in my replies to make it generate the Vec<&str> or ids; then move on to the problem of Elem and extra arguments, afterwards.

For the level I wasn't concerned with any difficulty but with difficulty to find out the level within probe if it wasn't provided to probe as an additional argument. But maybe the idea that additional arguments might help was clear to begin with, anyways. The argument is similar for determining parent - not hard, but it would be hard if probe needed to do it without the right context.

On that note, I suppose it would even be possible to generalise this, if you don't want to hard code the nature of the extra information (like the level and the parent in this case) that's accumulated down the tree and passed into probe. We can discuss this point afterwards (once you got some working code) if you like - I think I have an interesting idea in mind on how this can be made more generic.

1 Like

The more generic, the better of course. But you're right, I need to fiddle with your suggestions first which will take a while. I'm also watching that excellent video comparing C++ and Rust. It is really helpful. Halfway through it now.

1 Like

I followed your instructions and watched the video. If I change the signature of the probe parameter from fn to FnMut, I keep getting this error message:

error: expected one of `!`, `(`, `)`, `+`, `,`, `::`, or `<`, found `:`
220 |         probe: &mut impl FnMut(t: &'tree Document), level: u16) {
    |                                 ^ expected one of 7 possible tokens

The "arrow", ^, points at he colon in t:.

FnMut seems to be associated with closures according to the documentation, so I redefined probe in main() to be a closure:

    let probe = |t: &Document| {
        println!("probe: {}", t.id);
    };

This worked exactly as when using the function as before. Declaring it as fn parameter to traverse_children() worked, but I got the above error message when declaring it as FnMut.

Do you know what is expected here - and why? I'm not familiar with the Fn, FnOnce and FnMut concept.

Remove the t:. You can't give FnMut parameters names (or binding patterns).

probe: &mut impl FnMut(&'tree Document)
1 Like

Ah, sorry, I hadn't actually tested the code examples yet that I sent you. On the other hand, handling a few compiler errors can be a good exercise :grin:

Fortunately, @quinedot already explained the solution to this problem sufficiently :innocent:

1 Like

Thanks to you and @quinedot, the updated code is now working. But now comes the tricky part:

I don't want traverse_children() and recursive() to care what probe() actually does with the Document record, and I don't need these functions to build up and return the ids list.

In C, I would just send a void * pointer to probe() and let it do whatever it want with it, but I guess I can't to that in Rust. And whatever probe() creates, it shall be available in main() after the call to traverse_children().

Example: Let's say I define probe1() to just count the number of times it has been called, returning a simple u16, and probe2() to create a list of struct Elem records (perhaps with some additional fields).
So I'd like to call traverse_children() twice, first passing probe1(), then passing probe2(), but never update traverse_children() or recursive().

How would this be done in Rust?

My answer depends a bit on what you want to do. I can of course write some complete example solution, or you can share what your code looks like right now, to give sufficient context for your description of shortcomings and desired improvements to make more precise sense, and so I / we can give pointers for changing / improving / generalizing / ... the thing step by step.

(In any case, I'd be curious what kind of thing you've written and made work yourself so far, and which ones of the pointers I have above [e. g. additional arguments; or writing a simple probe function call that replicates building the ids list without hardcoding that] you already addressed / tried.)

1 Like

I have now managed to let the probe closure create a vector of struct Elem which can be accessed in main() after the call to traverse_children().

This is my definition of Elem:

#[derive(Debug)]
struct Elem {
    id: String,
    level: u16,
}

In main() I do the following:

    let mut elems: Vec<Elem> = Vec::new();
    let mut nbr_of_calls_to_probe = 0;
    let probe = |ttt: &Document, lvl: u16| {
        nbr_of_calls_to_probe += 1;
        println!("probe: {}", ttt.id);
        let elem = Elem {
            id: ttt.id.clone(),
            level: lvl,
        };
        elems.push(elem);
    };

    traverse_children(&res.document.children, probe);

    println!("elems: {:#?}", elems);
    println!("nbr_of_calls_to_probe: {}", nbr_of_calls_to_probe);

This works fine. An extract from the output:

probe: 381:364
probe: 13:4
probe: 381:363
elems: [
    Elem {
        id: "61:0",
        level: 0,
    },
    Elem {
        id: "381:366",
        level: 1,
    },
]
nbr_of_calls_to_probe: 53

All I had to do in traverse_children() and recursive() was to add the extra level parameter. I think for now, it's acceptable to pass it to the probe, but it would be nice to make it optional later on. But as I understand it, Rust doesn't allow optional parameters as in Python, right? Anyway, I'd like to deal with that problem later.

I now want to add a pointer in the Elem struct, which points to the corresponding node so I can obtain more data from it, not just the id. F.x all Document records also have name fields. I understand that Rust allows raw pointers but they shall be avoided. I fully agree to that! Avoiding dangling pointers and buffer overflows are my main reason for learning Rust!

I can fix the problem, but only with raw pointers, as follows:

Add this field to struct Elem:
node: *const Document,

Add corresponding field in probe():
node: ttt,

At the end of main(), print out the name field of the first two elems records:

    unsafe {
        println!("elems[0].name: {}", (*elems[0].node).name);
        println!("elems[1].name: {}", (*elems[1].node).name);
    }

This works fine. Output example:

elems[0].name: Product
elems[1].name: BlueFrame

But how to do it ununsafe?

Indeed, optional parameters are not possible; the best you could do is offer different simplified alternative versions of the traverse_children (with different name, too) that don’t declare those closure arguments; but that’s not commonly worth it. With type inference, closure arguments can be quite compact. For example, with the new additional level argument, I would write a call to traverse_children that merely counts the number of calls, i.e. a function that even ignores both arguments, simply as

    let mut nbr_of_calls_to_probe = 0;

    traverse_children(&res.document.children, |_, _| {
        nbr_of_calls_to_probe += 1;
    });

    println!("nbr_of_calls_to_probe: {}", nbr_of_calls_to_probe);

so the additional level argument doesn’t require all that much extra typing, if you’re going to ignore it.

Even when I’d use the arguments, I’d commonly omit the : &Document and : u16 type annotations. I’m not certain off the top of my head if type inference works just as good, or if it’s worse (Edit: turns out it’s worse, see the following replies), in the style where you define the closure let probe = …; first, and then pass it subsequently, compared to the approach of writing the closure expression as an argument directly. Feel free to compare these within the full context of your code, I’ll skip on writing a mock-up, and I’d recommend working with the style of passing it directly (as I mentioned above) as that has less potential for running into any possible compiler limitations. With more experience, if you have more confidence in what should or shouldn’t be accepted by the compiler, you can of course always go back to any preferred style, but it may be nicer to proactively avoid even the potential for unnecessary complications :innocent:

In this context, I’d personally thus write your same code from the main function in this style instead:

    let mut elems = Vec::new();
    let mut nbr_of_calls_to_probe = 0;

    traverse_children(&res.document.children, |ttt, lvl| {
        nbr_of_calls_to_probe += 1;
        println!("probe: {}", ttt.id);
        let elem = Elem {
            id: ttt.id.clone(),
            level: lvl,
        };
        elems.push(elem);
    });

    println!("elems: {elems:#?}");
    println!("nbr_of_calls_to_probe: {nbr_of_calls_to_probe}");

(contains further minor changes such as using type inference for elem, or the ability to name variable names in format strings directly)

For the pointer, I’d suggest, as I had already hinted in some earlier comment, to use a shared/immutable reference. It should be as simple as

  #[derive(Debug)]
- struct Elem {
+ struct Elem<'tree> {
      id: String,
      level: u16,
+     node: &'tree Document,
  }

……

        let elem = Elem {
            id: ttt.id.clone(),
            level: lvl,
+           node: ttt,
        };

See if these changes work for you without problems.

On that note, in general, when working with any structs with lifetimes, I’d suggest activating the lint

#![deny(elided_lifetimes_in_paths)]

at the top level, just to make sure that you don’t accidentally use the (in my opinion too confusing) old syntax that allows you to leave out the lifetime argument of Elem entirely; for explicitly elided lifetimes, one would write Elem<'_> instead. It’s unfortunate that due to shortcomings of the lint in some corner cases, it isn’t active by-default yet. (What are “elided lifetimes”? Elided lifetimes are inferred for type annotations inside of functions, and follow elision rules (also in more detail) for function signatures. For the most typical types with lifetimes, references, elided lifetimes look like “&T” or &mut S, whereas explicit ones are written &'a T or &'b mut S)


You could also avoid the need for cloning the ids and use a reference like id: &'tree str for that field, too.

Feel free to try that out, too. FYI, the usage of &str instead of &String is more idiomatic in Rust, (and in other contexts clippy will also point this out).

2 Likes

Thanks! It's very good to see alternatives, such as your more compact lambda styled probe.
I forgot to tell you that I tried your variant of struct Elem. But this gives me these errors:

149 |     let mut elems: Vec<Elem> = Vec::new();
    |         --------- `elems` declared here, outside of the closure body
...
159 |     let probe1 = |ttt: &Document, lvl: u16| {
    |                   --- `ttt` is a reference that is only valid in the closure body
...
167 |         elems.push(elem);
    |

I understand why I get these errors, but not how to fix them. To me it seems as if the lifetime issues should be OK already? I somehow must transfer the reference of current Document from ttt to node in probe(), to make it stick even after ttt has been destroyed. But clone() doesn't seem to work.

I've tried a lot of variants but I find it meaningless with too much trial and error.