Since I couldn't find a simple yet comprehensive statistical library, I created a small one based on the Python standard statistics library. I hope you'll try it out and give me some feedback. Thanks all
https://github.com/semmyenator/statsrust
https://deepwiki.com/semmyenator/statsrust
https://crates.io/search?q=statsrust
How does it compare with statrs?
If you're familiar with Python's statistical libraries, statsrust is a tool worth checking out. It provides Rust developers with a lightweight statistical toolset that's high-performance and better suited for focused scenarios within the Rust ecosystem. It's not intended to replace the full functionality of larger toolkits like statrs in Rust or numPy or sciPy in Python.
-
Getting an enum variant from a string looks very strange; it's better to use the enum directly.
-
Instead of
Box<dyn Fn(f64) -> f64>, you can use function pointers since your closures don't capture anything and can freely coerce to function pointers:
type KernelFn = fn(f64) -> f64;
impl Kernel {
/// Returns the kernel function
fn kernel(&self) -> KernelFn {
match self {
Kernel::Normal => |t| (-(t * t) / 2.0).exp() / (2.0 * std::f64::consts::PI).sqrt(),
...
}
}
}
- Using
ndarrayas a dependency solely for primitive operations on 1D vectors is suboptimal; it would be better to move the necessary logic to a separate module for working withVec<f64>. - Using
statrsas a dependency for just a couple of functions is also expensive.
Response to StatsRust Improvement Feedback
Thanks for the actionable suggestions—they directly drove key optimizations: tighter Rust idioms, better performance, and reduced bloat. Here’s how we addressed each point:
-
Replaced string-to-enum conversion with direct enum usage
RemovedKernel::from_name(). Functions now takeKernelenums (e.g.,Kernel::Normal) directly.
→ Why: Eliminates runtime string parsing, invalid input risks, and "magic string" ambiguity (e.g.,gaussvsnormal). -
Switched
Box<dyn Fn>to function pointers
Definedtype KernelFn = fn(f64) -> f64;and updated all kernel methods to return it.
→ Why: Zero-cost abstraction—no heap allocation, faster calls, and inherentSend + Sync. -
Dropped
ndarrayforVec<f64>logic
Replaced all 1D array ops (mean, variance) with direct slice-based implementations.
→ Why: Avoids overkill dependency for trivial vector math; faster compilation, smaller binary. -
Removed
statrsvia targeted manual implementations
Added minimal internal logic: Abramowitz-Stegunerf, customNormalDist(PDF/CDF/sampling).
→ Why: Cuts heavy dependency for 2-3 niche functions; full control over numerical stability.
Tradeoff Clarification
Your point about "relying on mature libraries for stability" vs. "hand-rolled implementations" is spot-on. We prioritized:
- Control & minimalism over broad-case robustness (e.g., our
erftargets typical input ranges, not edge casesstatrshandles). - Performance/scope fit over general-purpose safety (e.g., skipping
ndarray’s multi-D checks for 1D-only needs). - Dependency hygiene over "free" maintenance (no upstream breakage risks, but we own all logic now).
Final Decision
Given this focused scope, we’re hosting this leaner version on GitHub only (not publishing to crates.io). It’s optimized for specific use cases—not a general-purpose replacement.
leaner version of statsrust