Return sub-type, or use nested closures?

I'm building a framework for generating docs, in which there's a trait Generator that takes in displayable items and generates the docs in its format.

There is a need for nesting items (e.g. links with text, lists, tables), and I'm trying to decide between two patterns. My first option is to return a Nest type that exclusively borrows the generator, which you can then use just like the original generator. Some pseudocode:

let mut g = ExampleMarkdownGenerator::new();

g.add("h1", "Hello, World!");

// mutably borrows `g`, so you can't add to it until
// `list` goes out of scope with `end`
let mut list = g.nest("ul");
list.add("p", "Item 1");
  
let mut bold = list.nest("b");
bold.add("p", "This is bold");
bold.end(); // we can use `list` again here
  
let mut list2 = list.nest("ul");
// and so on...
list2.end();

list.end();
// now, and only now, can we use the original `g` again.

Option two is to use closures to provide nesting. That might look like this:

let mut g = ExampleMarkdownGenerator::new();

g.add("h1", "Hello, World!");

g.nest("ul", |g| {
  g.add("p", "Item 1");
  
  g.nest("b", |g| {
    g.add("p", "This is bold");
  }); // bold ends here
  
  g.nest("ul", |g| {
    // and so on...
  });
}); // list ends here

The closure form is theoretically cleaner, by far (it avoids nasty issues around e.g. forgetting to explicitly end a nest); however, a major goal of this library is ergonomics, and I worry a little that the excessive closure nesting could become annoying with scopes and indentation.

So, what do you all think?

(Also, I'm sure the first technique has a name, but I can't think of it. If anyone knows, I'd appreciate a pointer to some existing literature on it. :slightly_smiling_face:)

Whenever possible, you should always provide a non-callback API, because callbacks greatly restrict the caller:

  • Cannot return an error (unless your signatures include Result<(), E>, and then that becomes an annoying unconstrained E type parameter in many cases)
  • Cannot use await
  • Some code patterns may encounter additional drop-checking and borrow-checking errors because the compiler doesn't know the closure is always called

So, if you provide the callback-based API, don't make it the only option.

5 Likes

Good point! I think I intend to go forward with a primarily callback-based API, as that seems neatest for both most generators and most users of them. I'll see if there's a nice way to integrate both APIs into the crate without creating extra work and code duplication for downstream (after all, the primary and possibly sole user of this library will be me).

Also note that all methods return a Result with a concrete error type, so no worries about that first point.

Couldn't this be solved with lifetimes so that you borrow from the outer scope object?

Yes, each nest type exclusively borrows its parent, effectively locking it. I'm more worried about ergonomics issues and hard-to-notice bugs.

I would go with the objects that explicitly borrow the generator because that provides the most flexibility to users. Callbacks also introduce another level of indentation and more syntactic noise, which would make the code messier.

As an aside, you could get around the whole "forgot to ens()" problem by moving it to a Drop.

I would do that too, but in practice it does mean you end up with borrow checker errors when you forget to end instead. Saves you from logical errors but doesn't "just work" and the errors aren't great. (Maybe the compiler should suggest drop(the_offender) for cases like this...)

That's exactly the problem I'm worried about, and why I'm conflicted. On one hand, the closure form could give implementors more flexibility, and remove this pain point; on the other hand, the borrow chain form might make state keeping a little nicer for implementors, and give the end user more flexibility. I wonder if I could implement one in terms of the other...

The closure form if you have the Drop form is just something like

impl Thing {
    fn nest_with<'this, R, F: FnOnce(Nest<'this>) -> R>(&'this mut self, f: F) -> R {
         f(self.nest())
    }
}
1 Like

I feel like that wouldn't be an issue in practice. For maintainability, you'll generally introduce a new function for each level of nesting so it should solve itself.

You could also add a with(self, impl FnOnce()-> T) -> T method which just runs your callback in a method and let's the scope end before returning.

I've decided to try and implement the borrow-chain approach. My current difficulty is that I'd like to keep the Generator trait dyn-compatible, so as to enable things like layering arbitrary generators. However, for nesting to be viable with the current strategy, each implementor needs to carry some sort of state with its Nest. I'll have to consider this more, and find some workarounds (perhaps some one-size-fits-all state passing, perhaps some type erasure tricks, perhaps forcing layers to instead implement a separate, dyn-compatible trait).

See, now I have this issue: each nest has its own needs, its own state it may need to carry. For a simple example, an unordered list might carry no special state, whereas an ordered list would carry (at least) an index. One could imagine cases where you need significant amounts of state. At the very least, different kinds of nestings may support different kinds of operations (you might allow nesting a link in a heading, but not the other way around).

The natural solution is to make this nest an associated type on the generator trait; however, then it becomes impossible to abstract over generators in any dynamic way, which precludes abstracting over things that operate on generators in a dynamic way: you can't pass &dyn Generator around, meaning you have to make it a generic parameter... but you can't have generics on a dyn-compatible trait, nor any sort of dyn Fn type. Perhaps for<T> would help, but for now there's no way to allow generators to choose their own nest type.

Simply forcing generators to store state in some sort of abstracted, concrete Nest type doesn't fly either, due to the fact mentioned earlier that different nests, being themselves generators, may support different operations. The only way forward I could see would be to do something like storing a Box<dyn Generator> in the Nest type, but that seems overkill and ugly for such a core, repeated operation. If you were to nest just 4 layers deep, suddenly each call is a wade through at least 8 layers of indirection.

The closure method handily avoids this problem altogether, since the intermediate/nested state of a generator never "leaks" into user code. In this way, I feel it's the simpler and cleaner solution (I had a fully-documented, satisfying prototype in about 2 hours). I may try to redefine the borrow-chain method in terms of the closure method, but I'm not very hopeful. If it comes down to it, I will probably make the (frustrating) choice to ditch async support.

Nonetheless, I'd still like to find a way to mitigate the problems around async and borrowck. Are there established guidelines for making closure-based APIs more friendly?

Not sure what are you would like to achieve, and maybe a little bit off-topic, my thoughts are somewhat:

Handle data whenever possible as data, not as code. That is, something simple as:

pub enum Fragment {
    Heading { level: u8, text: String },
    OrderedList { items: Vec<Fragment> },
    Paragraph { text: String },
    ...
}

...you get the idea.

The rational behind this: A client can choose themself how to build a document, there is no sequential ordering on his control flow induced.

Rust has a powerful type system and often you can encode requirements and rules into types. You can go wild with this. But my personal experience is, there is somewhere a limit where this gets out of hands for practical reasons.

So, for your application, I would choose some type like the enum above to encode the syntax of documents, and I would be fine with functions/methods which verifies/ensure/enforce the semantic of a document, i.e. checks there are no headers in a list etc., maybe even with something like a struct VerifiedDocument.

In this respect, software engineering is an engineering discipline, which means tradeoffs have to be made.

That's a very reasonable solution, and it's my fault for not being more clear: the goal is to support an arbitrary number of document types, with all sorts of different features and oddities. Moreover, the hope of layers is that you can stack higher-level constructs on top of a core generator (think for example a markdown generator with a GH markdown layer). So trying to encode everything in a serde-style middleman fashion isn't ideal.

Besides, I'd really like to support streaming generators (heck, that's even part of my own original use-case for this project). Requiring the user to build one comprehensive document tree would just not work with that design goal. Thank you for your input, though!