How to transport user-level values through infrastructure code in an opaque way?

Hi there,

when building libraries/infrastructure it sometimes makes sense to transport some user-code value through the infrastructure in an opaque way and then access it again on the other side of the infrastructure. I'm looking for a mechanism similar to the opaque pointers you can pass to library functions in C that will be passed to callbacks.

Here are some ways to achieve that:

  • try to make infrastructure generic (works if the all the possible values are statically the same type (make an enum to fit the infrastructure), but lets assume heterogeneous open types)
  • use custom trait objects (= vtable) to package types in a dynamic way
  • use std::any::Any (similar to trait objects)

The dynamic way somewhat works but is maybe more heavy-weight than necessary because all information is statically available to make the correct conversion, so a vtable seems not necessary.

The question is probably in which situations this can really be (memory-safely) done. The main issue is how the infrastructure would deal with the situation where the infrastructure owns the opaque type and needs to drop it (because it is shut down or cannot dispatch to the place that knows the static type).

I see two main alternatives of how this could be used:

  • pass references, in which case use-after-free must be prevented
  • pass owned values with primitive dropping behavior, i.e. any kind of compound value that can be trivially serialized in memory, for memory management purposes ideally it could be cheaply represented/converted by a fat pointer.

It feels like there is a trade-off between these options:

  • use static types to fit everything into one type that is transparent to the infrastructure (which adds some extra complexity to the user to fit a set of heterogeneous types into an enum statically and convert them into and out of a shared enum at runtime)
  • use dynamic trait objects / Any, which puts more restrictions on what kind of data you can transport (not necessarily bad) and involves some dispatching overhead for vtables / dynamic dispatch
  • use some kind of serialization (ideally zero-cost when you restrict yourself to only transport trivially serializable data objects)

Any ideas / pointers how that could be done (maybe the existing tools / dynamic / Any is just what I should be looking for)?

Passing around a Box<dyn Any> is probably the cheapest feasible option. Yes, it adds a vtable, but the vtable will only have:

  • the type's size and alignment
  • the drop glue function, which is necessary in case the value is dropped
  • the type_id() function, which is necessary for Any to work and is only called when you downcast back to the concrete type

And as to the execution cost, it's very likely that the dynamic check is utterly insignificant — unmeasurable — compared to everything else your program is doing. Beware the temptation to seek a “perfect” implementation instead of a practical one.

all information is statically available to make the correct conversion, so a vtable seems not necessary.

In the cases where there is in fact enough information within a single function body to verify this, you probably can pass references instead of transferring ownership to the infrastructure, at which point you no longer need to downcast and the problem goes away.

4 Likes

Thanks for the answer!

Yes, fully agree, I already came to a similar conclusion while writing the question. After all, all potential solutions will probably share some drawbacks (like what types can be used as Any) but these restrictions will most likely be even good ones, preventing too complicated types for this kind of use case.

In specific cases, you could consider Box<dyn '_ + Destruct>, with Destruct being any empty trait blanket implemented for all types. This allows your internals to only deal with the opaque handle while public API implements a wrapping layer associating the known concrete generic type. However, this doesn't come without some restrictions making dyn Any almost always the preferable solution:

  • Strictly speaking, Rust only provides downcast_unchecked for dyn Any and never guarantees a cast-based downcast to work (except for slice as array); non-slice unsizing coercions (don't but) could include doing data pointer fixups (e.g. like multiple inheritance in C++ does).
  • Your internals still have to be generic over the lifetime, so annotation burden is only reduced from <T> to <'a>, i.e., not much.
    • The reason it can still be beneficial is that lifetimes are polymorphic while type generics are monomorphic, permitting your type-agnostic code to exist only once, e.g. perhaps in a dylib, or just not recompiled per downstream consumer.
    • If you use 'static instead, it's strictly worse dyn Any, as you can still do the same generic wrapping layer using downcast_unchecked to shave those potential unwinds, but now you can do a checked downcast when cfg(debug_assertions), with the only cost being (around[1]) one extra data-section pointer per instantiated type.
      • Which is a lot less than the monomorphization code size cost of using generics, even if exclusively for the downcasting wrapper.

If you have an existing C library using struct { void *data; void (*dtor)(void*); } or need to provide a stable ABI, dyn '_ + Destruct style internals and a type-remembering generic wrapper for safe downcasting can make sense. Otherwise just use dyn Any.

Case study: I'm slowly working on wrapping a C/C++[2] dylib which stores a pseudo-global void userdata* provided to callbacks with an explicit goal of being maximally typesafe bindings without prohibiting direct usage of the underlying API (notably, integrating with Rust-external management of the pseudo-global resource) and already carrying a lifetime (of said pseudo-global resource). I haven't exposed the userdata yet, but if I do, this is a perfect storm for generic downcast_unchecked. And yet, I'm still considering compromising on the stated requirements because the API complexity cost of making that work also shouldn't be ignored.


  1. vtables aren't guaranteed to be unique per type, and in fact often you get one per interested codegen unit before LTO might deduplicate them again. ↩︎

  2. In this case, the "unified language" of C/C++ is correct, since the library deliberately exposes both C and C++ versions of the API from the dylib for consumption. ↩︎

1 Like

Thanks for the answer, @CAD97. I have to admit not to understand most of it completely, but I will ponder it, though likely stay with Dyn :wink: