Polymorphic iterator/collection idiom? Trait object penalties?


I am working on some network programming and need to send some fields (of various types) delimited by "\0". These fields may be integers, str, String, etc. I come from an interpreted language (mostly python) background, but am very enthusiastic about the safety guarantees and performance of Rust. I am definitely not yet a systems programmer!

As I understand it there are a couple ways to have a polymorphic collection comprised of different types.

  1. Wrap each member in an Enum. This seems pretty tedious from a coding standpoint, due to the number requests I need to write by hand.
  2. Use a trait object, via dyn keyword. So my collection could be &[T] where T: dyn MyTrait. As I understand it this comes with a runtime penalty.

I am also curious about using a macro here to get the best of both worlds.

How big is the performance penalty with trait objects? Its using a lookup table of some kind? Or am I misinformed about the trait object approach?

To illustrate, here is a simple use case where I need to write some integers alongside an empty string, along with a utility function to do the writing, generic over Display. Note, sometimes the fields will end up being complex structs that I would need to make custom trait implementations for the writing. The issue is at least illustrated by the need to have empty strings serialized (which means only the \0 delimiter is written rather than {x}\0).

I would appreciate any advice on this issue.

Many thanks!

fn write_example(dst: &mut BytesMut, user_input: u32) -> Result<()> {
    write_fields(&[85, 2, user_input, ""], dst); <- this obviously won't compile

fn write_fields<T: std::fmt::Display>(fields: &[T], dst: &mut BytesMut) -> Result<()> {
    let mut msg: String = fields.iter().map(|x| format!("{x}\0")).collect();
    let bytes = prefix_length(&msg.into_bytes())?;
    dst.reserve(4 + bytes.len() + 1);

Is it the case that a given "field" (which can somehow be identified) will always have the same type? Because in that case, the usual approach is to use neither an enum nor a trait object, but parametric generics. See e.g. Serde.

It's a vtable. See also the Book.

I recommend you to not bother with performance unless benchmarking show a difference. Both ways will be far faster than Python :slight_smile: Each way has different performance characteristics, and it's hard to tell which one will be faster without benchmark. Generally, you're right - an enum is likely to be more faster, but various conditions (e.g. bad branch prediction or bigger size) may make it actually slower.

The way dyn Trait is implemented is by using a vtable: it's like passing a function pointer around and call it.

I think not:

Is it the case that a given "field" (which can somehow be identified) will always have the same type?

I suppose it depends on the level of abstraction. To me, the fields are of different types. a "field" here could be a custom struct, an integer, a string, or an empty string like in the example. However they do all have to "become" bytes to get written to the buffer, so they start as different and then I need a generic translation to turn them into bytes. Since I have many different request functions to write I'd like to abstract away most of this.

In the example the code that has u32 in the same array as str doesn't compile, so in this sense they are different types. But maybe I am using the wrong abstaction?

Thanks, I will check out parametric generics, had not heard of that.


I recommend you to not bother with performance unless benchmarking show a difference. Both ways will be far faster than Python 

Yep, premature optimization is a no-no. I guess I will try the dyn trait object approach and see if that dog hunts. The message sending part of this application is not as performance critical as processing the responses.


That's not what I mean. The question is: would it be possible to describe the protocol as a struct with statically-typed fields? The fact that you are trying to express it as an array doesn't matter much – if it's a heterogeneous collection where, however, every position corresponds to a constant type, then it should be better expressed as a struct or a tuple, not as an array.

Got it. Good question. The various collections of fields would not match a single protocol, some "Requests" may have 50 fields and some may have 3. I could make a trait that all request types share, each having their own implementation, but I think that is back to the same problem as before but at a different level of abstraction. Usually the Request will have a couple static integers to identity the type of message it is and then the rest of the fields are going to be evaluated at runtime based on what the caller wants. But for each Request variant those fields will be of a known type and quantity so you could represent each type as a struct.

I was planning on having a Request enum and writing a function for each variant. But I was still looking for a utility function that could take the collection of fields and write to the buffer, in order to make writing each variant's function easier.

I suppose I could wrap an underlying struct in each Enum variant and have that struct handle the writing, but I don't know if that's any better until I try.

Yes, then that is indeed similar to the use case of e.g. Serde, and in this case, you'll be able to do this with generics instead of trait objects.

You can do this by implementing the writing for primitive types explicitly, and then just blindly calling the respective trait method on each field. (That's incidentally also why this method is easy to de-boilerplate using a proc-macro.)

Got it, many thanks. I have not been able to get back to this code yet, but will try this when I can. Thank you both for the help on this!

Have a nice weekend!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.