Scientific programming: KDE bandwidth calculation discrepancy between Python and Rust

tmbb · March 13, 2025, 5:04pm

I'm basically asking for a code review on a relatively complex project, which uses non-trivial mathematics and numeric code. The short version is that I'm not getting exactly the same numbers as the reference version, and I don't really understanda why: coding errors? differences in the numerical solver? other?

I am trying to re-implement in rust the functionality in this package: GitHub - john-hen/KDE-diffusion: Kernel density estimation via diffusion in 1d and 2d. I have managed to port the code, which you can find here: GitHub - tmbb/kde_diffusion: Improved-Sheather-Jones (Botev et al.) algorithm for rust

The end goal is for me to be able to use this as part of an Elixir plotting library inspired by Matplotlib.

The main difference between the two versions happens in the discrete fourier transform calculations because of differences in normalization used by the Python and Rust implementations.

I am using the reference file from that project to compare the output of the KDE from my rust implementation to the orginal Python. As part of my testing suite, I have plotted the two different distribution (you can open the file as HTML in the test suite of the rust GitHub project):

I have managed to track the difference in output to the most "numerically complex" part of the code, which is part where the code tries to find the root of the "fixed point function" using the Brent method. I am using the Brent method from the rust argmin-rs crate, while the Python implementation used an implementation from SciPy. The root of the function is the t⋆ (or t_star in the code), as as I've shown in the plot, I get a value which is a bit more than double than the one one gets from the Python code.

Is this kind of difference something that should be expected when using different implementations of the Brent method? Probably not, because the function claims to have a higher tolerance than that, and the numbers are not that small (I wouldn't expecte them to be subject to floating point errors). I would appreciate any help from someone who is well-versed in numerical rust code.

parasyte · March 13, 2025, 6:47pm

Hmmm, ~~these are not the same~~ (they are, see below):

The numpy code uses the minimal and maximal values from the array. ~~You are just taking the first and last elements.~~ (edit: After sorting. Thanks @steffahn for the correction!)

You are multiplying by f in the ZetaGamma cost function. The Python divides:

There may be other transcription errors. These are the ones I noticed. The code was a bit hard to compare with the Python. You have K and C swapped in your ZetaGamma cost function, and the caching code is incomplete and unused. The field names for your histogram don't match the names used by numpy. All of these things added to my confusion while trying to find any real differences. I may have missed some.

steffahn · March 13, 2025, 7:25pm

They are sorting the array first, so the result should be the same.

parasyte · March 13, 2025, 7:32pm

Well spotted, I completely missed that.

tmbb · March 13, 2025, 7:34pm

Thanks for taking the care to review my code, but:

This shouldn't be the cause of the discrepancy

The array is sorted in ascending order in the first line of this function before I get the x.first() and x.last(), so that x.first() should be the minimum and x.last() should be the maximum. I have added tests that confirm this:

Default delta (Rust)

This shouldn't also be the cause of the discrepancy

The expressions are actually equivalent (up to floatint point errors). Two sequencial divisions is the same as dividing by a product. For example: a / b / c == a / (b * c). In the case of the relevant expression:

# Original code (... / N / F)
t = (2*C*K/N/f)**(2/(3+2*s))
# Equivalent code
t = (2*C*K/(N*f))**(2/(3+2*s))

Additionally, just to confirm the discrepancy is not due to floating point errors, I have tried to run the code with the two divisions instead of dividing by the product and the results are exactly the same.

In any case, thanks for reviewing my code.

I have corrected this, so that now the K and C values are the same.

In any case, thanks for reviewing the code.

PS: the discrepancy here is not related to the histogram counts and I'm pretty sure it's related to the part of the code that solves the equation using the Brent method because if I artificially choose the same solution used by the Python code I get the same density. So all differences seem like they come from different solutions (unless there's something else I'm not seeing)

steffahn · March 13, 2025, 7:59pm

Don’t ask me why this difference exists, I have no idea! But the transform indeed somehow seems to differ – exactly by a factor of two! It looks like countering this already change fixes everything?

diff --git a/src/lib.rs b/src/lib.rs
index 0e3fbd4..13b4021 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -214,7 +214,7 @@ pub fn kde_1d(x: Vec<f64>, suggested_grid_size: usize, limits: (Option<f64>, Opt
     let dct = planner.plan_dct2(grid_size);
     dct.process_dct2(&mut transformed);
     // TODO: explain why we have to adjust first component
-    transformed[0] /= 2.0;
+    transformed.iter_mut().skip(1).for_each(|x| *x *= 2.0);
     // Convert the transformed vector into an array to facilitate numerical operations
     let transformed: Array1<f64> = transformed.into();
 
@@ -237,7 +237,7 @@ pub fn kde_1d(x: Vec<f64>, suggested_grid_size: usize, limits: (Option<f64>, Opt
     let mut smoothed: Array1<f64> = transformed * (- 0.5 * PI.powi(2) * t_star * &k2).exp();
     
     // Reverse transformation after adjusting first component
-    smoothed[0] *= 2.0;
+    smoothed.iter_mut().skip(1).for_each(|x| *x /= 2.0);
     let mut inverse = smoothed.to_vec();
     let idct = planner.plan_dct3(grid_size);
     idct.process_dct3(&mut inverse);

steffahn · March 13, 2025, 9:51pm

Ah well… it might be as simple as the formula just has the different factor:

Compared to e.g. the variant on wikipedia

Which isn’t too surprising IMO, since this is a linear transformation, so the scaling is a bit arbitrary, anyway…

In fact it looks like within the python code, there’s actually a division by 2 again in a subsequent step… we are cancelling out preprocessing steps with preprocessing steps here…

tmbb · March 14, 2025, 11:15am

Thanks a lot! This must be it. I have applied those changes and added aknowledgements in the right places in the code.

tmbb · March 15, 2025, 8:48pm

For those that might be interested, I've built a demo website for the package, which shows visually how well the diffusion-based algorithm performs on some weird densities. The densities were taken from the paper that first published the algorithm. The outputs are of course indistinguishable from those of the Python implementation.

system · June 13, 2025, 8:49pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Looking for a code review on a KDTree project of mine before I go present it at a conference. I am unsure of the idiomatic-ness of it help	18	2290	January 12, 2023
Performance issue with C-array like computation (2 times worst than naive java) help	48	6538	January 12, 2023
Borrowing problem from porting some C++ code help	20	1475	January 12, 2023
Unexpected performance from array bound tests and more	13	2219	January 12, 2023
Performance help between c code and rust code help	4	1502	January 12, 2023

Scientific programming: KDE bandwidth calculation discrepancy between Python and Rust

Related topics