Does Context type in futures 0.2.0 meant as managing local state of that task?


#1

Hello, just like Erlang OTP, wherein gen_server they have process local state, Context in futures 0.2.0 meant to be same?

If yes, then can anyone please provide with an example, how to store data on locally on context.


#2

It’s not really meant for that. It’s more of an infra piece to build custom schedulers, rather than something endusers use: https://github.com/rust-lang-nursery/futures-rfcs/blob/master/task-context.md


#3

SO, can you tell me, how would I inject state component in my state machine? because, I might want to build something list gen_server or state machine at the task level and that will manage its own internal state, so, I don’t need to use global mutexes and all.

In general, to simplify the question, i want to pass state when task start and in state machine, on every state change, i want to update state and pass it to next state.


#4

I’m not familiar with gen_server so can’t really compare/contrast.

By “task” I suppose you mean a future - tasks are a separate thing in futures/tokio, which are the underlying things that drive a future (i.e. park/unpark for notifications).

Holding state in the futures world is done by having fields in your future impl, possibly wrapped with a shared ownership smart pointer (Rc or Arc, depending on threading concerns). If you’re running on a single threaded reactor/event loop, there should not be a need for a Mutex.


#5

what do you suggest? I am building really highly scalable MQTT broker in rust. It will be handling lots and lots of TCP connections.

Questions.

  1. Should I use single threaded or multi-threaded runtime?
  2. Rc, Arc will add any noticeable overhead? In multi-threaded runtime case I might have to use Arc with Mutex.
  3. Or, i should be using MPSC channels for sharing resources.

Suggestions would be welcomed. Thank you in advance.


#6

At a high-level, I personally would likely look into one of these two approaches:

  1. Single reactor + multithreaded executor.
  2. Multiple reactors with each reactor also being the executor.

#1 is good if a single thread (core) can sustain the entire I/O workload; the rest of the cores (or some subset) are dedicated to servicing the actual requests. This requires some communication/message passing between the executors and the I/O thread, so you’re looking at cross-core memory traffic and some synchronization/atomic operations.

#2 is good if you want to spread the I/O across cores. In this case, each thread will service its own connections and also execute the requests from those connections. Each reactor/eventloop is essentially single-threaded and you don’t need to move data around from/to IO and execution: it’s all on the same thread. You’ll want some mechanism to try and evenly distribute connections (and workload) across the separate reactors.

Rc and Arc are unlikely to add any significant overhead on their own unless you’re doing nothing but manipulating the refcnt - unlikely in anything other than contrived examples.

You’ll want to avoid Mutex(es) as they’ll limit scalability. Try to use channels or approach #2 above.


#7

This is a really detailed answer, thank you. Ok, I am bit beginner in this, so, how do I make sure, I run reactor one 1 core and other executors on other cores. ?

If possible, can you give me a little bit of logic?

and also, which rust API to use to get this functionality. is there any crate that will take care of this automatically?

I know, i am asking lot many question, but honestly, the answer you are providing is pure gold. i just want to keep asking :slight_smile:


#8

Take a look at the tokio crate. It has just undergone some notable changes, but the guide appears to have been updated for it (at least from a quick look): https://tokio.rs/docs/getting-started/hello-world/.

I don’t know how familiar you are with Rust itself, but one suggestion would be to make sure you’re fairly comfortable with the language before jumping into futures + tokio. You can, of course, learn both at the same time but you’ll need to be very patient :slight_smile:.


#9

I think tokio runtime uses approach #1 by default when we create multi-threaded runtime.

Creating a Runtime does the following:

Spawn a background thread running a [Reactor] instance.
Start a ThreadPool for executing futures.

This is written in the doc.

Now, all i need to find out is how to do state management, without Mutex for server and other clients.


#10

Yeah, that’s the default in the “reformed” Tokio; previously the default was a single thread that was both a reactor (ie poller/notification handler of events) and an executor (ie it ran your futures).

If your executors (ie the threadpool futures) need to share mutable state, it’ll be a bit tricky to avoid a mutex. What you want ideally, IMO, is a sharded setup: each worker thread owns its mutable data and rarely communicates with other threads. Each shard, therefore, owns a subset of the overall data. When a shard does want to share data with another shard, they communicate over a channel using messages. The channels can be lock-free spsc style ring buffers/queues. From this standpoint, approach #2 is more amenable to that because you end up with, essentially, a mini full-fledged async server running on each thread, servicing IO and doing CPU work.

Approach #2 is what the seastar C++ framework does, and I think it’s a great modern architecture for high scale and performant servers (they built scylladb on top of it). It would be interesting to see what a Rust version of such a thing would look and feel like.


#11

Thank you for the good guide on that.

Just to get info, how much overhead you think, if I use the current setup, will create that can be avoided.

Guess, for 1M TCP connections, using approach #1 will be too much overhead ?


#12

And above all, how do you think this will compare to Erlang? I mean erlang scheduler is all about IO Bound apps.

Just wanted to get some external person’s point of view in this.