Async_trait crate really have overhead?

In all blogs I read said about async_trait crate
have overhead but after i took benchmark and difference with a async fn impl for a service is at 20 nanoseconds, Its pretty low for example on 100K operation affect of this is 1ms on whole system, where is overhead ?! This is joke

The async-trait crate does a memory allocation for every async method call to type erase the returned future. Maybe that is the reason of the perf hit? To know the exact reason for sure you will have to use a profiler.

1 Like

Rust is a performance-focused language, and in principle makes a distinction between zero-cost abstractions and abstractions that do add some overhead. It's up to the user to decide if that overhead matters for them.


Tower use that technique for async trait,
When tower used widely without any issue !!
I think this not really impact on system !!

Rust also tries to cater to many use-cases. An extra allocation may be insignificant on a webserver, but a major issue on a microcontroller with embassy, for example.


Could you explain which trait in Tower do you mean here? AFAIK, it is specifically designed to avoid these hidden costs.

Just because you don't notice it doesn't mean there's not an impact.

When libraries describe they have non-zero overhead (especially allocations) this doesn't meant they're slow or bad. It's just stating a fact, and whether that is totally fine and not noticeable, or adds up to a problem, depends on where it's used.

This is like being annoyed that sugar-free coke says it's sugar-free, even though you can drink regular coke and it won't kill you.


This example from tokio blogs for service_trait
that heavily used in tower, its allocate for each call !

Do I say wrong !?

This example doesn't use async_trait, it explicitly opts in for dyn Future, to simplify the example. One can use statically determined type as Handler::Future just fine.


Yes i know that, I said like that technique not that async_trait crate !!!

And your means is not good for big project ?!

For big project, the only good answer on any performance question is "measure it under realistic load". This overhead might be important, if allocation/deallocation is in tight loop or cache misses somehow dominate the other operations, but most likely the measurement will show no difference other then noise.


I measured Async_trait vs impl async fn for service
Difference is 20-30 nanoseconds, I don't think
that have any impact on big system, for total overhead for that over half milion concurrent calls is 1 milliseconds, its zero actually