Algorithm/technique for minimum required indirection in a type structure?

I am working on code generation which takes in an input schema model (Entity Data Model to be precise), and derives Rust types from it.

Types in the schema may be self referential, which means that the generated code may require some indirection - probably Boxing. For example, my generated code might naively produce the following, causing an "infinite stack size" compile error:

struct Foo {
    bar: Option<Bar>,
}

struct Bar {
    baz: Baz,
}

struct Baz {
    foo: Foo,
}

The simplest solution would be to make all fields Box variants. But clearly, in this example, indirection is only required on one of the fields. With a more complex type structure however, it's not necessarily so clear (to me at least) where indirection should be added.

Is there an algorithm to calculate where one should insert indirection, given an arbitrary graph of related types, with reasonable efficiency?

As an aside, the actual data that the generated code works with is expressed as literal JSON without references. So I think that we can assume that schemas will never include a non-terminating cycle (i.e. there will always be an Option<T> somewhere along the way). This should mean that boxing is sufficient, rather than e.g. arena-backed references.

I suppose this isn't strictly a Rust-specific question, but it's at least more relevant to languages which use stack-based values by default.

I suppose minimizing the amount of Boxes necessary is not feasible:

I guess one could look for appropriate approximation algorithms. Or if you want an easier approach, some kind of graph traversal would probably be sufficient, e.g. breadth-first or depth-first search (I’m not quite sure which one I’d prefer here): Whenever you see an edge back to a type that you already visited that last step would need a Box indirection.

1 Like

Oh, nevermind, should’ve read the whole article

Whether the directed version is polynomial time approximable within constant ratio [...] is an open question.

On a third read of that article, I’ve noticed that this is about removing vertices, whereas what you need is removing edges......

leading to

which is apparently efficiently solvable for undirected graphs.

1 Like

This is great, thanks! I'm wondering if the "particularly simple algorithm" toward the end of the article might be good enough for my purposes. I will play around with it.

I’ve just remembered that I once saw a heuristic approach to a very similar problem in a lecture on automated graph drawing (the target there was not to remove some edges but to revert some edges to get an acyclic graph). The approach was to first find a decent order of all the nodes so that all the edges going the wrong way would have to be reversed; or in this case removed. (Thinking about it, even the optimal solution would have to be of this form, since a directed acyclic graph can be ordered such that all the edges point in increasing order.)

Anyways, I looked it up and the heuristic looked like this:

Here Sl and Sr are sequences of nodes, and the final linear order is given by the sequence Sl followed by Sr.

For some context, the previous slides (click me):

^^^ In step 2, reverse does not apply for you, but rather remove

1 Like

I will have a go at implementing this when I get a chance! Maybe even submit it to a suitable crate, if I can develop it fully enough (from a quick search, it doesn't look like it's already implemented).

Thank you for taking the time to dig in and finding a good solution, by the looks of it.

For anyone interested, I've opened a pull request to add this algorithm to petgraph. The last merged PR was 3 months ago though, so it might take a while...