Nine Rules for Elegant Rust Library APIs

Nine Rules for Elegant Rust Library APIs

Two months ago, I decided to see if a Rust version of our bioinformatics library could be as nice for users as the Python version. The answer is “yes” but wasn’t easy. This free article lays out what I learned:

I’m most proud of getting Python-style fancying indexing working in Rust. This means users can specify which data to download with an index number, any array-like-collection of numbers, any range-like thing, or via Booleans. To make example code simpler, the library also includes a function to download samples files to a cache directory controlled by an SHA hash.

I'd love to discuss any part of the project, other folk's experiences trying to make user-friendly APIs, or what rules you'd suggest.

-- Carl

6 Likes

A couple of suggestions:

Rule 2: Accept all kinds of strings, paths, vectors, arrays, and iterables.

The advice for implementing this rule should mention the pattern of having a generic function call a non-generic function containing most of the implementation, to minimize the code size and compilation time added by the generic part.

Finally, if your Enum is used in a regular function, document that your users must use .into() when calling the function.

Functions can accept impl Into<YourMostGeneralType> rather than requiring the caller to do it — just as in your rule 2.

13 Likes
  1. Use builders, because you can’t use keyword parameters.

Builders aren't really necesary for imitating keyword arguments. An easier solution is to simply define a config struct with named fields and implement Default on it. Then consumers of the code can use FRU syntax to override only some of the fields. (This also has the added benefit of materializing the config so that you can e.g. serialize/deserialize it, should you need to support that.)

15 Likes

Thanks so much, @kpreid , for these tips. I've updated the article. - Carl

@H2CO3, thanks for this suggestion! I've updated the article with a tip about this alternative. I included links to this thread and to the Rust Book section on Struct Update Syntax. If anyone can suggest a reference that puts it all together, please share it. (There is a Medium article from 2020, but its code is no longer displaying.)

2 Likes

I have been stumbling upon that advice elsewhere already. I wonder: Isn't this something the compiler should do rather than every programmer of generic functions? Is the compiler really stuggling so much with big generic functions that we must support it (and make our code less readable)? Is it planned to improve that situation or is this problem not solvable in an automated fashion (for some reasons that I may not be aware of)?

I think I remember that even std uses these splits into generic and non-generic part, so it seems to be really important.

5 Likes

It would certainly be nice, but I don't know if there's a good enough heuristic for when it's the right thing to do.

Like it's usually a good idea for impl AsRef<Path> (like in https://doc.rust-lang.org/1.61.0/src/std/fs.rs.html#267-275), but for impl Iterator<Item = &str> it's probably more often a bad idea to do it.

3 Likes

Presumably such an optimization would not actually introduce dynamic dispatch. What I imagine it would have to do is find a “tail” of the function that can be cut off and compiled separately, where the “tail” is whatever code contains no further uses of the generic type. So, for an fn foo(x: impl Into<String>), it would automatically find the point just after x.into(), and for an iterator it would find the point after the end of the for loop or whatever.

Then the place where a heuristic is needed for optimization would be deciding whether the “tail” should be compiled inline (like it always is now) or like a separate function (which adds the costs of a function call boundary but reduces the code size).

19 Likes

I'm not sure how such generic functions are compiled when dealing with different modules or crates. I would assume that for each type, the function must be recompiled. If the function gets recompiled too often (e.g. two times or three times?), the compiler could switch to a different strategy. But I don't know enough about the compilation process for Rust to really understand how generics are compiled.

I think that function inlining is also done elsewhere and something very common (and also uses heuristics I think?). I feel like Rust needs the opposite here (as you explained): avoiding inlined non-generic code but creating a separate function for the tail. Not sure how that could be called. Maybe "generic function tail extraction".

1 Like

The reason why I brought this up: I have written some libraries that are deliberately generic to avoid having to allocate Vecs, for example. Consider sandkiste::Function::call, which expects a variable number of arguments (as it calls a function defined in a scripting language). You can use it like this: func.call(some_vec), but also like this: func.call([]).

I do this by accepting a generic type A: IntoIterator<Item = T> (where <A as IntoIterator>::IntoIter: ExactSizeIterator) as argument list instead of a Vec<T>.

This makes using the library much nicer. But apparently it will bloat up code size. If I do this "tail extraction" manually, then my library code will be less readable. It's already a lot of noise to write:

fn foo<A>(/* … */ args: A)
where
    A: IntoIterator<Item = SomeType>,
    <A as IntoIterator>::IntoIter: ExactSizeIterator, 

Instead of:

fn foo(args: Vec<SomeType>)

I feel like in a dilemma (or trilemma) here. Do I bloat up the source code of my library? (And if I do, should I use the pattern @kpreid suggested, which will make my library code even more verbose?) Or do I just keep things simple (which would require using vec![] instead of [] when calling my function, which is runtime overhead, I believe)?

I feel like something is missing on the compiler-side to solve this trilemma. Or I just have to accept to write verbose source code for library code. :slightly_frowning_face: Or I accept bloating the binary size. :face_with_diagonal_mouth:

See, I still don't know what to do. :sweat_smile:

1 Like

Note that the former technically has the advantage of also allowing other types that implement IntoIterator, while the latter only allows Vec.

What I do when using this pattern is extend it a little bit. The extension is in that I write the non-generic version (aka the tail) in a seperate fn directly below the generic version.
In addition, it's named the same as the generic version except it starts with an underscore _.

What this allows is getting some of that lost clarity back, because all you have to do to see the intent (and how various generic args are used) is look at the non-generic version.

1 Like

nit: names starting with underscores suppresses the dead code lints, to I would suggest picking a different convention. That could just be putting the underscore at the end instead, for example.

(Personally I like using inner functions, but that's a stylistic choice. However you like best is fine.)

11 Likes

I wasn't aware of this for fns/methods. Thanks for the tip!

2 Likes

I think it’s called polymorphization.

3 Likes

Thanks for the link. I found also this one: rustc dev guide on Polymorphization.

There's a procedural macro to do this transformation automatically for you for Into, AsRef, and AsMut:

These are the traits most likely to benefit from this transformation, since you're likely to call the single access method at the front and then have a nongeneric tail.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.