Programming against traits in Rust

One thing I find in my Rust projects is that I end up inadvertently exposing more and more details about internal structs / traits over time, and this makes refactoring messy (since other crates are now depending on these internal details.)

I'm wondering if these is a style of coding in Rust (and where I can read more about good design patterns for this where)

Here, "lib" is used to refer to "a group of multiple crates aimed at solving a task"

  1. Upfront, we declare a list of all public traits for the lib.
  2. We declare a list of all structs/enums needed for the arguments / return types of the pub traits.
  3. Everything else is strictly private.

In theory, this sounds great. In practice, the orphan rule (impl X for Y must be in either where X is declared or where Y is declared) sometimes causes problems for this.

Are these good resources / guides for this style of programming in Rust ?

[not meant to be a "right answer" question; meant to be a discussion]

I feel like this question is too abstract/general to address, or at least it's not clear to what what specific dynamics you're getting at.

1 Like

I just have a couple random thoughts.

Although it is conceptually pure, it may not be worth the trouble to use a trait for the interface between your libs (what I would normally call components), if there will always be just one implementation of that trait. However, a good justification for this is that alternate implementations are very useful for testing, if nothing else.

An example of "the trouble": How will objects be created, when using a trait interface? Would the trait have associated non-method functions (no self param) for that? If so, this rules out using it as a dyn trait object, if that matters to you. If not, will you have a factory trait or struct for creating objects?

2 Likes

Another random thought: Normally the difficulty in creating large applications with well defined interfaces is just designing the interfaces, independently of the language used. This is mostly domain specific and probably has little to do with Rust. So you may want to look at more general materials and advice on component interfaces and decoupling of components.

1 Like

Valid criticism. Let's say we are building crate foobar. I am curious about organizing the codebase as:

foobar_api
foobar_a
foobar_b
foobar_c
foobar_all

foobar_api would contain the public traits + types required for inputs/outputs of said traits

everything in foobar_a/b/c/all would NOT be accessible directly (instead of returning types of things in these crates, we would return things that were Box<dyn T> or Rc<dyn T> for some T defined in foobar_api)

In the terminology of the question:

  • something that is "clean" would be something where things in foobar_api are public; things in foobar_a/b/c/all are private
  • something "leaky" would be types/traits of foobar_a/b/c/all being public

In theory: The nice thing about "clean" is that we can refactor the internals of foobar_ at will. The problem with "leaky" is that there are now all these hooks/expectations about the internals of foobar.

In practice: orphan rule seems to make this difficult to implement.

This is true. In the foobar example I wrote after your post, we would need a "factory", so the public part consists of:

foobar_api
factory for constructing Box<dyn T> or Rc<dyn T> for the T in foobar_api

Another question to consider: if the problem space doesn't already prefer runtime dispatch, is it worth the runtime cost to add it?

Perhaps you're splitting things up into packages too much

2 Likes

This is a great question; and the argument I am currently going with is:

  1. within foobar_api/a/b/c/all , I want all the crazy llvm cross-function inline-stuff optimizations possible.

  2. At the boundaries between two libs, i.e. foobar_* and app_* , because it generally (in my case) tends to involve (1) a call to wasm/js boundary or (2) large amounts of data transfer -- and in both cases, the cost of an extra ptr lookup is acceptable to me.

Isn't the usual organization for a library to have foobar_api be the top level module, with the other files as submodules? What is special about this? And what is foobar_all?

Why is that? The interface trait is defined and implemented in the same module, right?

  1. I think you are right.

  2. foo_all: depends on foo_a/b/c, contains the Factory method

  3. This has the additional benefit that if some_other_crate only depends on foo_api, it is impossible for it to depend on any internal structure "leaked" by foo_a/b/c/all.

Atleast in my case, I never intend to leak internal details, but here & there, it is convenient to make something in foo_a/b/c public .. . and quickly the priv/public divide becomes swiss cheese.

That's a good reason to have an interface trait for each library, if the discipline of changing the trait helps you to avoid adding back doors (although you can still do it..). But this doesn't dictate the structure of the library, the implementation of the interface trait.

Although I have to emphasize that to keep the interface clean requires good design more than it requires a trait or a certain structure. The design is the hard part.

I think your example is still too vague. Details matter in these sorts of discussions. Why do you end up exposing private details? Are you doing it just to have external tests & benchmarks, or do you end up using thus stuff in downstream production code? In the latter case, perhaps it was wrong trying to keep those details private to begin with. Perhaps the implementation really matters for downstream code, and trying to be too abstract just makes you hoops to jump through.

After all, it's only really an abstraction if it's possible to use it purely through the defined interface without caring for implementation details. If those details end up leaking, it was probably wrong to hide them. It wasn't abstraction, it was just obfuscation.

In general, I would avoid making new traits, unless I expect myself or downstream consumers to provide new implementations of the trait. Traits are for generic programming, not for writing J2EE-style overengineered towers of interfaces. If you need encapsulation, ordinary modules, types, and their public & private functions serve that purpose well enough.

One thing to consider is that your crates may just be improperly chosen. Perhaps you should extract some self-contained low-level functionality into a crate bar which has much of its implementation exposed, so that you can write higher-level wrapper crates which hide most of those details. Your binary or root library crates would then declare only those higher-level crates as dependencies, insulating themselves from the low-level details.

3 Likes

I think we agree on the important points:

  1. Having clean, non-leaky, trait + struct/enum boundaries is good.

  2. Achieving this is very related to having good "design"

The issue here is -- I have bad design. My Rust codebases tend to "implode" in a mess of complexity at ~ 50k LOC. This is around the point where "leaky" abstractions / not having clear crate boundaries taxes my ability to keep context in my head and development time massively slows down. (How does XYZ work again?)

I think we both agree "clean design" minimizes the "mental context" one has to keep per library -- so now I'm looking for styles / strategies / structures that serves as "forcing function" for me to maintain "small surface area" design -- and this is where foo_api comes in. It's forcing me to explicitly think "do I want the mess of having this be part of the api or not"

My opinion is that creating separate libraries for internal components, and creating a process for yourself or your team about updating the interfaces, is as much a forcing function as you can get. But you have to be willing to change the design when new things come up, and keeping the interface clean is a lot of work. So you have to be willing to do that design work, it has to be high priority. It is not a technical issue, it's a priority and process issue.

1 Like

I don't really see the difference between thinking "should this be a method of my public API trait" vs "should this be a public method of my public struct". In both cases, every time you're making something public, you should pause and think whether it makes sense as public API.

3 Likes

I will say I've seen the "API" crate you're talking about called a vocabulary crate: this shows up in public API crates as things like mint providing a common vocabulary for some mathematical types without providing the actual implementations for operations (eg it has a Matrix type, but not a Matrix multiplication operation), making it easier to glue code together.

It might not be the approach you'd want to take with internal crates, exactly, but at least you're not crazy for trying it!

I think the only sure-fire way to have your programs not melt around the 50kloc size is to write it three times: the first to figure out what you want, the second to figure out what amount of engineering is over engineering (second system syndrome) and the third to actually do it properly. Hard to tell if that's actually worthwhile and holds up long term, though!

6 Likes

This is so true and painful. :slight_smile:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.