Statsrust : A comprehensive Rust library for statistical analysis, providing a wide range of descriptive statsrust, probability distributions, and non-parametric methods

Since I couldn't find a simple yet comprehensive statistical library, I created a small one based on the Python standard statistics library. I hope you'll try it out and give me some feedback. Thanks all
https://github.com/semmyenator/statsrust
https://deepwiki.com/semmyenator/statsrust
https://crates.io/search?q=statsrust

2 Likes

How does it compare with statrs?

1 Like

If you're familiar with Python's statistical libraries, statsrust is a tool worth checking out. It provides Rust developers with a lightweight statistical toolset that's high-performance and better suited for focused scenarios within the Rust ecosystem. It's not intended to replace the full functionality of larger toolkits like statrs in Rust or numPy or sciPy in Python.

1 Like
  1. Getting an enum variant from a string looks very strange; it's better to use the enum directly.

  2. Instead of Box<dyn Fn(f64) -> f64>, you can use function pointers since your closures don't capture anything and can freely coerce to function pointers:

type KernelFn = fn(f64) -> f64;

impl Kernel {
    /// Returns the kernel function
    fn kernel(&self) -> KernelFn {
        match self {
            Kernel::Normal => |t| (-(t * t) / 2.0).exp() / (2.0 * std::f64::consts::PI).sqrt(),
            ...
        }
    }
}
  1. Using ndarray as a dependency solely for primitive operations on 1D vectors is suboptimal; it would be better to move the necessary logic to a separate module for working with Vec<f64>.
  2. Using statrs as a dependency for just a couple of functions is also expensive.
1 Like

Response to StatsRust Improvement Feedback
Thanks for the actionable suggestions—they directly drove key optimizations: tighter Rust idioms, better performance, and reduced bloat. Here’s how we addressed each point:

  1. Replaced string-to-enum conversion with direct enum usage
    Removed Kernel::from_name(). Functions now take Kernel enums (e.g., Kernel::Normal) directly.
    → Why: Eliminates runtime string parsing, invalid input risks, and "magic string" ambiguity (e.g., gauss vs normal).

  2. Switched Box<dyn Fn> to function pointers
    Defined type KernelFn = fn(f64) -> f64; and updated all kernel methods to return it.
    → Why: Zero-cost abstraction—no heap allocation, faster calls, and inherent Send + Sync.

  3. Dropped ndarray for Vec<f64> logic
    Replaced all 1D array ops (mean, variance) with direct slice-based implementations.
    → Why: Avoids overkill dependency for trivial vector math; faster compilation, smaller binary.

  4. Removed statrs via targeted manual implementations
    Added minimal internal logic: Abramowitz-Stegun erf, custom NormalDist (PDF/CDF/sampling).
    → Why: Cuts heavy dependency for 2-3 niche functions; full control over numerical stability.

Tradeoff Clarification
Your point about "relying on mature libraries for stability" vs. "hand-rolled implementations" is spot-on. We prioritized:

  • Control & minimalism over broad-case robustness (e.g., our erf targets typical input ranges, not edge cases statrs handles).
  • Performance/scope fit over general-purpose safety (e.g., skipping ndarray’s multi-D checks for 1D-only needs).
  • Dependency hygiene over "free" maintenance (no upstream breakage risks, but we own all logic now).

Final Decision
Given this focused scope, we’re hosting this leaner version on GitHub only (not publishing to crates.io). It’s optimized for specific use cases—not a general-purpose replacement.
leaner version of statsrust

1 Like