I am working with this clustering code library: clustering/src/lib.rs at main · xgillard/clustering · GitHub
The basic principle is you run a function like:
let mut samples: Vec<Vec<f32>> = vec![]; //, ie [[x,y], [x,y], [x,y]]
let clustering = kmeans(k, &samples, max_iter); //clustering = Clustering<'_, Vec<f32>>
And you get back:
/// This is the result of a kmeans clustering
pub struct Clustering<'a, T> {
/// The set of elements that have been clustered (in that order)
pub elements: &'a [T],
/// The membership assignment
/// membership[i] = y means that element[i] belongs to cluster y
pub membership: Vec<usize>,
/// The centroids of the clusters in this given clustering
pub centroids: Vec<Centroid>,
}
pub struct Centroid(pub Vec<f64>);
This is very nice except for when you start trying to save the Clustering
result object into another struct, which has a Mutex on it, and is wrapped as a Resource you are accessing.
The absolute "Rust torture" begins as you must now propagate all those lifetime <'a> mentions all over everything that now touches this things. And I don't know where it all stops or how to even look at all the visual mess and errors it is creating.
The misery of trying to manage this and all the endless errors does not seem frankly worth the hassle. I don't know how to figure it all out, and what for? All this obsession over managing a reference to an array of floats? Perhaps this is necessary in other situations but here I can just copy by value and be done with it. It will take no computational effort compared to the actual work the algorithm must perform (which is thousands to millions of times more complex than a simple array copy).
Then I can escape the ugly syntax and horrible bizarre <'a>
Rust idiosyncracy that perhaps some might love but I would rather avoid.
My inclination is then just to edit the package and remove the &
reference and <'a>
(ie. pass the Vector by value or copy) so I don't have all this insane lifetime management to deal with for no good reason. Is that crazy or rational in your opinion?
It seems like lifetimes of references just becomes code pollution very quickly and the only reason you should ever do it is if you actually truly have to.
If you look at the code it seems the elements
which are taken as a reference setting off this whole catastrophe are never mutated (because you would not want to) so besides the minimal cost of copying the data once at the start, there is no downside to just doing that. It shouldn't matter.
Am I understanding this correctly? I am asking this partly philosophically as someone who has no specific "passion" for any coding language on earth (including Rust) and just wants to get a job done without learning every niche intricacy of every language that has no bearing on my life. There is not enough time in the day.
I don't know how you guys can enjoy the concept of lifetime management like this and unless you are doing some incredibly performance intensive task or you need something shared around and mutated in many places it seems irrational.
This is me after a day of trying to get a basic K-Means clustering system running.