[WASM/Rust] Multithreading / stack size control

I've started to research Rust/WASM by looking at the rustwasm book, which is a really nice introduction in how to get started. The reason for this research is job-related, we have a project that needs a large stack in order to operate properly. In Rust this means spawning an extra thread since stack size control of the main thread is not allowed at this time.

So naturally I then decided try and use thread::Builder::spawn(), only to find out at runtime that merely calling it is enough to induce a panic. Which is a bummer, but not completely unexpected. If it had worked it would have been very nice, since this is the API I currently use in order to control stack size.

I then also had a look at JavaScript, and web workers seem like a good fit, until the stack size requirement is taken into consideration. This is because web workers seem to offer no API with which to control the stack size of the threads they run on.

So what I'd like to know is: if multithreading and stack size control were implemented, in which project would that be? And what would be needed for such support?
If I can get a clear picture maybe I could do the work.

You might be better off re-writing your algorithm to use an explicitly pre-allocated Arena/Stack and not using recursion. My guess (without knowing more about the algorithm or what you are actually doing) is that this would be quite optimal and would not require a massive investment in the WASM ecosystem to get configurable stack sizes into the implementations.

If that doesn't sound reasonable, then, you'd probably have to work at the level of the WASM specification and WASM implementations across ALL BROWSERS to get "Stack Size Control" into the spec and implementations before Rust could ever leverage that. I Think.

There is no multithreading in wasm. Eventually it will get added.
If your doing a node.js project neon might (never tried) be an option instead of wasm.
In browser any large computation that can't be slit up should be done in a worker.
As @gbutler69 mentions, replacing recursive function isn't particular hard, just first time can daunting.

I'm aware that some (mainly tail-recursive) recursive fns can be rewritten to use iteration instead.
However, I'm not aware of any such possibility for multiple mutually recursive fns, and that is the case with my code. It's elegant in principle, but the downside is the requirement of a large stack. If you know of any way to do that then I'm all ears :slight_smile:

That said, on Linux, MacOS and even on Windows this really isn't a problem as std::thread just works there, and so my code can do its thing in its own thread. It's only now that I have to look at porting to WASM that the stack requirement is turning into a bit of an issue.

@jonh Currently the project does run using Neon. The problem with that is that I can't really run an NPM module with native code in it (neon generates an index.node for NodeJS) in the browser. This is something we would like to be able to do because it would allow us to get rid of a couple of thousand HTTP requests, each of which would request very specific information in order to build a big picture. We're acutely aware that a couple thousand HTTP requests is no way to achieve performance, hence our wish to port to WASM.

No, I'm not saying take a tail-recursive algorithm and re-write as iteration, I'm saying take an algorithm that uses the implicit stack by making recursive function calls and turn it into and algorithm that uses an explicit stack allocated on the heap (as a vector preferably). Any algorithm that you can write as recursive function calls you can re-write using an explicit stack instead.

I'm pretty sure this is going to be your only option with WASM currently (not a Rust issue, but, more of a WASM issue and where the WASM spec and browser support is at).

1 Like

right—the stack, and its depth, are outside control of the WebAssembly code
it is not part of the raw memory area
so this can't be changed from the rust side

It would be good to read up on that, do you have any resources that you can point me to?

Off the top of my head, I can't think of anything simple, but, there are definitely books on this sort of thing. That being said, a quick Google search turned up this Blog post:

Now, the important point is this: Every time you would make a recursive call, instead push a value or values or set of values onto a stack instead and loop to do the next thing. Every time you would return from a recursive call, pop something off the stack and do something with it. If you empty the stack, you are done.

So, instead of:

  • Call function
  • function does some computation and possibly calls itself (directly or indirectly)
  • function returns something up the stack
  • if a function calls itself directly/indirectly, when that returns, it completes and intermediate result and then returns that up the stack (generally)
  • Eventually when the function returns it is back to the original caller and you have a result

With:

  • Create one or more explicit "Stacks" on the Heap of appropriate size for computation
  • Start a loop
  • peform preliminary computation and push onto stack(s)
  • loop, perform more computation, perhaps pulling intermediate values from the stack(s)
  • perform final calculations using intermediate values
  • when you pop the last thing from the stack, the computation is complete

Simple as that. Of course, the specifics will vary depending on what you are trying to accomplish.

3 Likes

I figured it was time for a little update.

I've mostly converted the code (modulo some bugs).
Some observations:

  • The designs between old and new code bases, unsurprisingly, mirror each other somewhat
  • It's a lot more work and error-prone to manually do stack manipulations (read: get them just right) compared to outsourcing it to the call stack. This reminds me of manual memory allocation vs Rust's ownership/RAII-based approach, in the sense that going from automatic to manual is not an improvement. I guess I'll call it a penance for abandoning the Altar of Recursion.
  • Great Scott, I don't think I've written code this ugly in quite a while. This will undoubtedly improve over time, as the code is technically still an advanced-stage prototype. And I suppose some of that it inherently linked to the imperative programming paradigm, and thus can't be helped. But yeesh.
  • Interestingly, holding (almost) all state in a single stack, and then deriving Debug impls from it and interior structs, improves introspection and thus debugability. Not that it wasn't possible before, mind you; Before, it was simply a bit more code to get to the state.
2 Likes

How does it seem to perform on WASM? Inquiring minds want to know?

BTW: Great work and thanks for the update. I look forward to hearing about your final solution.

I don't have perf numbers yet for 2 reasons:

  1. I don't have the WASM integration in place just yet. But that's next on my todo-list once the biggest bugs are resolved i.e. once I can verify that the test results are correct irrespective of whether they run natively or on WASM.
  2. The new code is still quite young and thus unoptimized.

I'll post an update when I have more data.

1 Like

It's time for an update!

The code still isn't in production or anything, but I have managed* to port the code base to WASM.
I've also taken care to ensure the code can still target native, which is useful for comparative performance measurement.

Some observations in no particular order:

  • Based on some anecdotal evidence, the performance compares roughly like this:

    native Firefox 63.0 Chromium 69.0.3497.100
    ±0.350 milliseconds ±9.0 milliseconds ±2.0-3.0 milliseconds

    This is just 1 example of course, but when run again and again manually I keep getting roughly these numbers, with some variance. It is striking that native version is faster than either WASM version by roughly an order of magnitude, but it remains to be seen if that holds up for other inputs.

  • The previous point is also a bit of a wakeup call to Firefox IMO: more WASM optimizations would be very welcome, and the competition isn't standing still. I know the Firefox devs are far from lazy and probably occupied with other awesome things like project Quantum, but the difference between Chromium and Firefox on regarding WASM speed is certainly measurable.

  • The tooling, especially wasm-pack is pretty awesome. With the docs in hand it's easy to set up, and it takes care of all the nitty gritty, which is great because I don't want to do it :slight_smile:

*I had to take a few detours, like writing a few patches to enable the chrono crate to target WASM

2 Likes