Async ML Inference on Apple Silicon

I've just open-sourced batch_forge, a specialized inference runner for the JAX/Equinox \ ecosystem. The project is written in pure Rust and leverages metal-rs for custom compute kernels. Current features include a zero-copy Safetensors loader, an async tokio request manager, and stateful attention state management. Feedback on the memory-mapped loader and Metal buffer synchronization would be greatly appreciated.

1 Like