Can either wasmer or wasmtime handle 1M concurrent wasm modules switching every 1 microsecond?

Suppose we have wasm modules with very low memory requirement, say 16kb. We can run 1M of these with only 16GB.

Does either wasmer or wasmtime support switching 1M wasm modules with each running for 1 microsecond ?

Thanks!

Wasm engines often place unmapped multi-gig guard regions around each data region to reduce memory-checking overhead. This quickly exhausts the mappable address space on Intel. If wasmer and wasmtime take this approach, then it's not feasible.

Is this related to the "we don't have to do bounds checks for remote-execution shell code" because if you do an out of bounds access, you hit one of these guard regions, triggering some OS handler ?

Yes

I want to run some numbers. To the best of my current knowledge

  1. x86_64 only use 48 bit ptrs

  2. wasm heap + buffer comes out to 4GB = 32 bits

  3. 2^48 / 2^32 = 2^16 = 65536

  4. So, even if nothing else is running, we are limited to 65,536 wasm runtimes per x86_64 machine ?

You can customize the size of the guard page in both of them. Of course smaller guard pages mean less bound checks elided and worse performance.

47 for user space: 22.3. Memory Management — The Linux Kernel documentation

Wasmtime can use both static memory where 4GB + guard pages worth of address space is allocated for every module and dynamic memory where only the actually used part is allocated, but at the cost of bound checks everywhere with a significant (something like ~1.5-2x I believe) slowdown in many cases. You can use dunamic memory using config.static_memory_maximum_size(0).

  1. This still sounds faster than Erlang/Beam or V8 Isolates/ JS. :slight_smile:
  2. Do you know how much effect a 64kb guard page would have ? 1M of it is only 64GB, and it might eliminate many of the most common array indexing's.

The bound checks are for indexing the linear memory as a whole. Most of the time it is not statically known that a pointer is within the first 64k of linear memory. It may be possible coalece a bound check for p and for p + 65535 with a 64k guard, but Cranelift currently doesn't support this, so I don't know how much it would save. Maybe ask at https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift?

I think I misunderstood bounds check. Suppose we compiled this Rust code to wasm32:

pub struct Foo {
  a: i32,
  b: i32
}

pub fn main() {
  let t = Rc::new(Foo::new());
  t.a; // <-- this could result in a bound check since it's indexing into the linear memory ?
}

All wasm loads and stores to linear memory are bounds checked. Both heap and stack go into linear memory except when rustc optimizes them into wasm locals.

1 Like