I'm building my first webserver in Rust, having come from Python.
In Python, whenever I had jobs I wanted to outsource to a workers I would use rq and rq-scheduler. By jobs I mean anything that didn't immediately return to a user (eg a job to calculate something substantial and store in db) or something that ran on schedule (eg daily payment collections).
Which crates should I take a look at for similar functionality in rust? I'd favor simplicity over full-featureness.
And a related question: my webserver is fully async (actix-web and sqlx), which made me wonder if I even need a "worker" or I can just manage to achieve the same effects using async tasks? I'm new to async programming to any advice would be greatly appreciated.
Python doesn't really work well when you want to do compute-heavy tasks in multiple threads because the GIL essentially serializes everything, but Rust uses bare OS threads so you can run tasks in the background as much as you want. There are libraries which provide a threadpool abstraction for doing exactly this... If you are using an async framework it's usually enough to us their spawn() function (for async jobs) or spawn_blocking() (for sync jobs) to make sure a task is done on its threadpool.
Alternatively, if you are wanting to use workers to distribute tasks among multiple computers, you can often use clients for normal queue libraries like RabbitMQ or Redis (which has pub-sub capabilities) and follow the same patterns as you would in Python.
Your web server is almost certainly using an async runtime under the hood. To spawn a background task, you would just spawn a task on that runtime — typically that would be the tokio::spawn method. For doing expensive computations in async applications, you should read this article, which explains how you can do that.
Hey @alice - thanks for the reply, and for the article!
So your answer seems to suggest I don't need a worker at all? You mention tokio::spawn which is simply an async event loop and the article talks about multiple ways to spawn threads - but neither mentions workers.
To give a bit more context the task I need to run is to pull 150,000 tweets from twitter and save them to the database (so IO bound).
How would you rank the 3 options (async tasks, threads, workers) from best to worst and why?
My task is pulling 150,000 tweets from twitter and saving them to a db. I'm leaning towards a worker, as I'm thinking I should spawn a new worker process AND have it run in an async loop, while all the API calls to twitter are done.
Hm, what I mean by worker is a whole separate process with its own memory (unlike a thread). Think about running a separate container in docker-compose or Kubernetes - I would call that a worker.
That's what python-rq that I mentioned in the original post gives you. You start a separate process from the command line and communicate with it via a queue of events in Redis.
There's no point in spawning a new process. It only makes sense if you are using a language like Python with a GIL that makes single-process multithreading infeasible, or if you need to sandbox it in some way.
See that is so interesting! I did not realize that. Again, beginner here, so pardon any ignorance.
Question then: is it possible to run an entire async event loop on a separate thread? Let's say I need to do 1500 calls to twitter that each return 1000 tweets. Can I spawn 1 thread and run 1500 calls in an async even loop on that thread? Is that how you'd do it?
That said, assuming your web server is also using a runtime, you can also just run both on the same runtime. This would be my first approach unless some reason for separating them comes up.
How should I think about comparing these two options? Code clarity-wise the first one is slightly better, but what about efficiency? Is there a way for me to compare the two? Are there tools / crates I could take a look at?
I would prefer the first version. There's no obvious reason why you need a separate runtime, so I wouldn't do that.
Some comments:
You would generally use the current_thread scheduler for stuff like the second version. Otherwise you are spawning quite a few threads.
You don't need the clokwerk crate. Tokio's time module has things you could use to do the same thing.
The fact that clokwerk requires you to loop with a sleep like that actually means that using Tokio directly would be more efficient since you don't wake every 100 ms if there's nothing to do.