Why doesn't this Rust tutorial code work?

I am trying to perform a K Means clustering in Rust. I am trying this method:

https://rust-ml.github.io/book/3_kmeans.html

I thought in theory there is a nice tutorial there so I presume it should be easy. But it is breaking and giving an error I can't understand for such simple code.

1) Cargo.toml (Okay)

rand = "0.8"  # Check for the latest version
ndarray = "0.16.1" # needed for concatenate
linfa = "0.7.0"
linfa-nn = "0.7.0"
linfa-clustering = "0.7.0" # https://crates.io/crates/linfa-clustering

2) "Use" (Okay)

use rand::Rng;
use ndarray::{Array2, ArrayView1, ArrayView2, concatenate}; 
use std::str::FromStr;
use linfa::prelude::*;
use linfa_clustering::KMeans;
use linfa_nn::distance::L2Dist; //euclidean distance x + y
use ndarray::prelude::*;
use rand::prelude::*;

3) Random Noise Squares (Okay)

//modified chat gpt generated random noise function since they don't give it in the tutorial
fn create_square(center: [f32; 2], half_width: f32, num_points: usize) -> Array2<f32> {
    let mut rng = rand::thread_rng();
    let mut points = Vec::with_capacity(num_points);

    for _ in 0..num_points {
        let x = rng.gen_range((center[0] - half_width)..(center[0] + half_width)); //gen_range(start..end): rand crate generates a random value between start (inclusive) and end (exclusive).
        let y = rng.gen_range((center[1] - half_width)..(center[1] + half_width));
        points.push(vec![x, y]);
    }
    let arr : Array2<f32> = Array2::from_shape_vec((num_points, 2), points.into_iter().flatten().collect()).unwrap();
    return arr;
}

4) Concatenate Random Noise Squares (Okay)

//COPY OF THEIR CODE TO MAKE A RANDOM DISTRIBUTION OF POINTS
fn get_random_points() -> Array2<f32> {
    let square_1: Array2<f32> = create_square([7.0, 5.0], 1.0, 150); // Cluster 1
    let square_2: Array2<f32> = create_square([2.0, 2.0], 2.0, 150); // Cluster 2
    let square_3: Array2<f32> = create_square([3.0, 8.0], 1.0, 150); // Cluster 3
    let square_4: Array2<f32> = create_square([5.0, 5.0], 9.0, 300); // A bunch of noise across them all

    let data: Array2<f32> = ndarray::concatenate(
        Axis(0),
        &[
            square_1.view(),
            square_2.view(),
            square_3.view(),
            square_4.view(),
        ],
    )
    .expect("An error occurred while stacking the dataset");

    return data;
}

5) Run the Model (FAILS)

//RUN THE MODEL FUNCTION - FAILS
fn run_model(data : Array2<f32>) {
    let dataset = DatasetBase::from(data);
    let rng = thread_rng(); // Random number generator
    let n_clusters = 3;
    let model = KMeans::params_with(n_clusters, rng, L2Dist)
        .max_n_iterations(200)
        .tolerance(1e-5)
        .fit(&dataset)
        .expect("Error while fitting KMeans to the dataset");

    let dataset = model.predict(dataset);
}

ERROR

I don't know what I have done wrong but adding the final run_model function does not compile. It returns errors:

error[E0277]: the trait bound `linfa::DatasetBase<_, _>: From<ArrayBase<OwnedRepr<f32>, Dim<[usize; 2]>>>` is not satisfied
   --> src/lib.rs:130:19
    |
130 |     let dataset = DatasetBase::from(data);
    |                   ^^^^^^^^^^^ the trait `From<ArrayBase<OwnedRepr<f32>, Dim<[usize; 2]>>>` is not implemented for `linfa::DatasetBase<_, _>`
    |
    = help: the following other types implement trait `From<T>`:
              `linfa::DatasetBase<ndarray::ArrayBase<D, I>, ndarray::ArrayBase<ndarray::data_repr::OwnedRepr<()>, ndarray::dimension::dim::Dim<[usize; 1]>>>` implements `From<ndarray::ArrayBase<D, I>>`
              `linfa::DatasetBase<ndarray::ArrayBase<D, ndarray::dimension::dim::Dim<[usize; 2]>>, ndarray::ArrayBase<S, I>>` implements `From<(ndarray::ArrayBase<D, ndarray::dimension::dim::Dim<[usize; 2]>>, ndarray::ArrayBase<S, I>)>`

I have followed the tutorial exactly as far as I can tell. So I don't know what the problem is. Might the tutorial be out of date?

Or is there something obvious?

Rust is a bizarre language to me with endless obscure type, reference, mut, borrowing, and trait complaints. I can't make sense of it any time something breaks. It is very hard for me to understand what the error is trying to tell me. I am just staring at it.

Thanks for any help. It is appreciated.

Where did you get these dependency versions? They do not appear in the tutorial you linked. And the kind of error message shown is probably related to type mismatch caused by version conflicts.

The dependencies for linfa has ndarray@0.15, and you are using ndarray@0.16. I would peg this as one of the problems.

I'm not certain if this will fix your issue, but it looks like you're using different versions in your cargo.toml file than the tutorial. The GitHub repo you're following has this in it's dependencies:

[dependencies]
plotters = "0.3"
ndarray = "0.15"
ndarray-stats = "0.5"
linfa = "0.7"
linfa-clustering = "0.7"
linfa-linear = "0.7"
linfa-datasets = {version = "0.7", features = ["winequality", "diabetes"]}
linfa-nn = "0.7"
rand = "0.8"

In particular, you're using ndarray 0.16, while the tutorial is using 0.15. there are also other "linfa" crates which may provide the proper trait implementation.

Also, have you read the rust book? That's probably a much better place to start than this tutorial you're reading.

It's an excellent place to start. But having been in this kind of situation with Rust a few times I know one will never find the answers in there.

When one finds oneself using wrong versions of things, features missing, has the wrong use clauses in use or such, things become impenetrable.

You've got no argument there from me. What I mean to say is someone new to the rust language would be better served reading the rust book before jumping into machine leaning libraries, not that this specific issue is answered in the rust book.

2 Likes

Thanks guys. I suspect the design of the library simply changed over time and the tutorial is not helpful. I think there is something to be said for writing simple libraries rather than overcomplicating things. For example, I managed to get this package working in less than an hour because it only uses 390 lines of code including descriptive comments.

I will always prefer solutions like that for any given problem. I think when you start building a project on top of something else that already has a half dozen dependencies and endless bloat you don't need you are just building on a house of cards and you will pay the price later.

No one is usually being paid to maintain all these packages and libraries or solve problems when things stop working.

Same as everywhere with open source packages and libraries.

So I will just forget this "linfa" package nonsense with its massive bloat to solve a simple problem (for me) and its likely obsolete tutorial.

Agreed. To which I would add: stay away from ChatGPT generated code when trying to learn things. Or at least be prepared to studying it and deal with its failing.

Presumably the examples in the repo of the book actually compile and run: GitHub - rust-ml/book: The Rust Machine Learning Book. Likely a good place to start.

1 Like

I'd like to kindly point out this is not a language problem. Bloat is the default in every software domain. Wirth's Law in action.

I do sympathise with that feeling.

However, I'm never going to live long enough to build my compiler, and the OS my code runs on. Even if I had or could acquire the skills. Similarly for lots of existing code I make use of in libraries/modules/crates that others have created. At the end of the day a hard core rejection of so called bloat would result in progress grinding to a halt.

Anyway, did you try out the examples that come with that book you are reading in the repository I linked above?

2 Likes