when i tried to benchmark above code , i was expecting a few 100 hits per second but due to some reason this code is running awfully slow and cpu usage is mostly 0.3% to 0% most of the time
profilers like cargo-flamegraph will help you figure out where the code is sitting still.
It sounds like you are blocking threads or async task on something which might make it do nothing for a while. There might be deadlocks if several tasks are waiting on each other.
Try to profile it and potentially use debug logging to figure out what's really happening. If you still don't know how to fix it by then, post some more details here.
i cannot use flame-graph on windows as it depends on UNIX components and same for almost every profiler went for , there is no explicit thread locking/thread sleep , there is no async functionality , and i have no clue how to do profiling however i am using channels heavily(small data is passed via channels)
update . i tried on WSL and flamegraph could not find tools it need
Channels are either async or blocking. If they are blocking, they do block the thread. TCP is either async or it's blocking. So there's a fair chance that that's where your issue is.
I suppose if there is no profiler working on windows, you're down to debug logging and timing things manually.
I just noticed that it's zipping through benchmark on Linux , but windows version is problematic
I use the profiler available in VS (not VSCode), couldn't it be used here?
On windows, the closest equivalent to perf on Linux is Event Tracing for Windows - based tools, such as WPA and xperf.
I profiled my rust program some time ago, and I recall using Very Sleepy after some googling. It works practically out of the box and supports launching a program.
You’re doing blocking I/O in the main (accept) thread when you
read() from the socket. If there’s no data on the socket yet, it’ll block and you won’t be able to service any other connection.
If you want to try rolling your own threaded server, then accept the connection and dispatch it to the pool immediately - don’t do any work on the accept thread. But, if you’re not spawning a new thread per connection, and instead using a pool, you’ll run into similar issues of tying up worker threads in their blocking I/O.
If you don’t expect lots of connections, a thread-per-request can be sufficient and fast enough. If not, you need to explore evented I/O models instead.
yes i am doing blocking io on main thread , but as soon as i have input , the request goes to a different thread-pool(after a tiny bit of request processing) , i just noticed that @vitalyd @najamelan program is awfully slow when it goes via localhost , but becomes good again when i use full IP even on windows