Since I couldn't find a simple yet comprehensive statistical library, I created a small one based on the Python standard statistics library. I hope you'll try it out and give me some feedback. Thanks all
https://github.com/semmyenator/statsrust
https://deepwiki.com/semmyenator/statsrust
https://crates.io/search?q=statsrust
How does it compare with statrs
?
If you're familiar with Python's statistical libraries, statsrust is a tool worth checking out. It provides Rust developers with a lightweight statistical toolset that's high-performance and better suited for focused scenarios within the Rust ecosystem. It's not intended to replace the full functionality of larger toolkits like statrs in Rust or numPy or sciPy in Python.
-
Getting an enum variant from a string looks very strange; it's better to use the enum directly.
-
Instead of
Box<dyn Fn(f64) -> f64>
, you can use function pointers since your closures don't capture anything and can freely coerce to function pointers:
type KernelFn = fn(f64) -> f64;
impl Kernel {
/// Returns the kernel function
fn kernel(&self) -> KernelFn {
match self {
Kernel::Normal => |t| (-(t * t) / 2.0).exp() / (2.0 * std::f64::consts::PI).sqrt(),
...
}
}
}
- Using
ndarray
as a dependency solely for primitive operations on 1D vectors is suboptimal; it would be better to move the necessary logic to a separate module for working withVec<f64>
. - Using
statrs
as a dependency for just a couple of functions is also expensive.
Response to StatsRust Improvement Feedback
Thanks for the actionable suggestions—they directly drove key optimizations: tighter Rust idioms, better performance, and reduced bloat. Here’s how we addressed each point:
-
Replaced string-to-enum conversion with direct enum usage
RemovedKernel::from_name()
. Functions now takeKernel
enums (e.g.,Kernel::Normal
) directly.
→ Why: Eliminates runtime string parsing, invalid input risks, and "magic string" ambiguity (e.g.,gauss
vsnormal
). -
Switched
Box<dyn Fn>
to function pointers
Definedtype KernelFn = fn(f64) -> f64;
and updated all kernel methods to return it.
→ Why: Zero-cost abstraction—no heap allocation, faster calls, and inherentSend + Sync
. -
Dropped
ndarray
forVec<f64>
logic
Replaced all 1D array ops (mean, variance) with direct slice-based implementations.
→ Why: Avoids overkill dependency for trivial vector math; faster compilation, smaller binary. -
Removed
statrs
via targeted manual implementations
Added minimal internal logic: Abramowitz-Stegunerf
, customNormalDist
(PDF/CDF/sampling).
→ Why: Cuts heavy dependency for 2-3 niche functions; full control over numerical stability.
Tradeoff Clarification
Your point about "relying on mature libraries for stability" vs. "hand-rolled implementations" is spot-on. We prioritized:
- Control & minimalism over broad-case robustness (e.g., our
erf
targets typical input ranges, not edge casesstatrs
handles). - Performance/scope fit over general-purpose safety (e.g., skipping
ndarray
’s multi-D checks for 1D-only needs). - Dependency hygiene over "free" maintenance (no upstream breakage risks, but we own all logic now).
Final Decision
Given this focused scope, we’re hosting this leaner version on GitHub only (not publishing to crates.io
). It’s optimized for specific use cases—not a general-purpose replacement.
leaner version of statsrust