Structuring an algorithm for multiple threads (Solved)

Not sure, if I'm correct with your algo, but I think, you can ditch the bitmaps:

Then the algo gets somewhat to:

strides = split pixels in equal sized blocks

handles = for each stride in strides
    handle = spawn thread with:
        sub_clusters = create N sub clusters initialized to zero, where N is the count of current clusters
        for each pixel in stride
            nearest_cluster = find nearest cluster
            if pixel.cluster != nearest_cluster
                sub values from sub_clusters[pixel.cluster]
                add values to sub_clusters[nearest_cluster]
                pixel.cluster = nearest_cluster
        return sub_clusters

while sub_clusters = join next handle in handles
    for each cluster in sub_clusters
        add sub_cluster to cluster

for each cluster:
    update averages

The idea, let the threads calculate differences, and the later apply these difference to your clusters. And when all differences are applied, calculate the averages.

Cool. I have never returned values from an async thread in a join, but I can see how that would work. I now have two additional algorithms to contemplate.

A subCluster in your terminology is probably the same as a ClusterTotals in the code I processed above.

If I follow this algorithm then I am getting close to the same number of de/allocations as I would If used channels to process deltas. Choices choices…..

Some changes……

A good idea in my opinion would be to write the straightforward algorithm (without optimizations and without even threading, at this point) and test it on a few smaller images.

Regarding Vec<Pixel>: when you're creating a Pixel(x,y,r,g,b) it's just an abstraction over a few integers on the stack (probably fits in 64 bits, and most certainly in 128 bits) - notably it does NOT require any calls, neither to allocator nor to a "constructor".
Then you'll call pixels_vec.push(pixel) which will ensure there is space for the element[1] and then move Pixel into the vector; the latter uses around 1-2 instructions because they are, once again, plain old integers.

A tip: compile in --release mode if checking performance.


  1. if there is no space then allocation size is doubled, and all previous elements are moved in bulk; use Vec::with_capacity(width*height) to avoid this ↩︎

1 Like

I know the underlying algorithm works because I have implemented it (several ways) in kotlin and c#.

And yes, it would be best to test a non-optimised version on smaller images. However getting the initial implementation correct does not in itself give an easy path to optimisation. What we are discussing here is how the algorithm could be implemented so that it could be most easily optimised for speed knowing the strengths and weaknesses of Rust.

What I have gained from this discussion is some knowledge of potential optimisations, and a possible structure that could fit into the language strictures.

And here is an important understanding of how Rust handles vectors. Do you mean that a vector is a contiguous area of memory treated as individual elements? So a vector containing 100 instances of a 20 byte struct is a contiguous area of at least 2000 bytes. If so then I am happy to forget about those costs. In java/kotlin there would be 100 allocations of the 20 byte objects and an allocation that contained references to those 100 objects. It sounds as if rust is different…. beneficially so.

So the choice of algorithm could be influenced by the cost of allocating/storing lots of small objects. This in turn depends upon the internal representation of the data (about which I can find little hard info - though this may be my lack of familiarity).

Rust doesn't do auto-boxing, and it's not a language where every value gets its own heap allocation and vtable.[1] Depending on your glossary, this might be phrased as "not everything is an object". Alignment is a factor, but if you have a size_of 20, then a single allocation of at least 2000 bytes can hold them all, yes.


  1. You can write no_std programs where there is no language-level concept of a heap (but you won't have std's Vec<_>). ↩︎

2 Likes

Thanks for that. This is probably going to be the biggest difference for me going from java/kotlin to rust. It sounds way more efficient. Is there somewhere that I can read more about it, as it will influence my choice of algorithm/implementation.

I don't have a good citation aimed at that exact topic/POV change, sorry. Perhaps someone else can suggest a good "Rust for Java programmers" guide.

1 Like