Take a look at the neural network framework I wrote, which is implemented in Rust + CUDA

Highlights

  • Rust-first, not Rust-only implementation
    • Rust owns the framework structure and most high-level logic.
    • CUDA C++ is used for optional GPU acceleration.
    • CPU-only builds remain available without the cuda feature.
  • Dynamic autograd built around tensor graph construction
  • Module-style abstraction for model components
  • Separated layers / ops / models for easier experimentation
  • Flexible precision system
    • parameter dtype
    • runtime dtype
    • activation dtype
    • KV-cache dtype
  • Quantization-aware loading
    • load float weights normally
    • quantize on load to i8
    • generate offline quantized safetensors
  • CPU and CUDA execution paths with explicit kernel/backend work
  • Hugging Face tokenizers integration
  • Safetensors support with memory-mapped and streamed loading modes
  • Release profile tuned with lto, panic = "abort", and strip

Hi JZ,

i want to give this a try and let you know about my experiences,

awesome idea,

Jonas / t4ce

02.05.26

Yes i can confirm, Lumen can infact CPU interfere reasonably well via BURN for example this tinyllama LLM and does indeed provide answers, i will do another commit into my fork and bring a GPGPU speedup execution towards it.

Feel free to contact me when there is any need to run this outside of any Classical known Operating System