Thread scheduler

I am simulating a distributed system consisting of many servers, many more than I have cores.
Each simulated server runs in a std::thread. It looks like thread IDs are integers allocated in the order the threads are spawned. It also seems that the algorithm schedules the ready thread with the next largest thread ID. If there isn't one, it apparently selects the ready thread with the smallest ID.

The result is correlated behavior that I don't expect to see on the physical servers. Is there any way to randomize the selection of the next ready thread to run?

The std::thread module simply provides raw OS threads, and scheduling is done by the operating system.

That's what I thought, @alice. Are there any other thread libraries that would give me some control?

You can look into how you can tune the OS scheduler for your operating system.

Alternatively you could consider using a threadpool such as rayon (If you're CPU bound) or our async ecosystem (If you're IO bound).

1 Like

Hello,
A more complex but effective approch can be to create N-1 threads where N is number of cores, set threads affinities to 1 thread 1 core. Create a pool of fibers ( 1 fiber is 1 server).
Create a job scheduler that runs fiber in the desire order in a thread.

Edit 1 : When I say fibers I talk about Win32 Fiber or Posix Fiber (see here: https://nullprogram.com/blog/2019/03/28/)

Edit 2 : Take a look a the term Green-thread :). I don't know if rust has support for it now.

Thanks, @JLalu, but that sounds like more work than I did to build the simulator. :slightly_smiling_face: I was hoping for something like setting an OS configuration parameter somewhere.

Even without being "I/O bound" in the traditional sense, it might make sense to structure the simulated servers with async functions since you can provide your own user-space task scheduler fairly easily. You'd write a custom executor that selects the "ready" task to run at random. And rather than "thread IDs", you'd have "task IDs". Of course, if your "servers" are "CPU bound", then you'll need to provide explicit yield points, ala tokio::task::yield_now to allow your scheduler to swap out for another task.

For an idea of how to get started, take a look at this: https://rust-lang.github.io/async-book/02_execution/04_executor.html

You'd change the Executor::run bit toward the end to select a task at random from the ready_queue rather than the first in the queue.

I am personally a little bit skeptical of the idea that a random task selection is necessarily better than what the thread scheduler is doing - you would expect your servers to all be making forward progress more-or-less at the same rate, no?

I started this project long before async/await was available, but I'm looking into making that change. The pointer to the Executor is just what I'll need. Thanks.

A little more on the problem.

Screenshot 2020-01-07 09.43.24

The server at (0,1) sends a message to each of its neighbors. If that's the first time the receiving server sees this message, it creates a link to the sender and forwards the message to all its neighbors. Look at the server at (2,3). I would expect it would receive the message from (1,3) before getting it from (3,1). Trace files show a clear bias in the next scheduled thread, something that simply cannot happen on the physical system.

2 Likes

Gotcha, that makes sense. If you use something like tokio's channels for message passing, you'll get your cooperative multitasking for free. You may even try experimenting with just using tokio's default executor and see if it gets you the behavior you expect. It's going to run tasks as they're "ready". I would expect the shorter path to get through the ready states faster than the longer path because tokio isn't executing tasks in a "fair" way like a thread scheduler, but instead in a more eager way.