Improve this code without guessing capacity

Hello,

I find this bit of code quite ugly and would like to ask people to rewrite the function in a better and maybe more functional way.

The main problems I see with the code:

  • vector allocation and shrink_to_fit-ing - I wish we could just collect it all from some sort of iterator;
  • the imperative approach that I’d like to replace perhaps a more functional way.

https://play.rust-lang.org/?gist=f98e6ab762d40333326e3ee375ef0f2e&version=stable&mode=debug&edition=2015

Focus on points_to_shapes function; the rest is just boilerplate.

Any other suggestions are welcome.

Thank you in advance.

1 Like

Would you mind, if I rewrite the whole thing? :smiley:

Totally :rofl:
The only requirement - is the shape of input data (and the data structures)

I’m not 100% sure, but it should work :crossed_fingers: (playground).

fn points_to_shapes(points: &[(bool, Point)]) -> Vec<Shape> {
    points.iter().fold(vec![Shape::default()], |mut s, (ns, p)| {
        if *ns {
            s.push(Shape::default());
        }
        s.last_mut().unwrap().points.push(p.clone());
        s
    })
}

The idea is, that there is always a shape in your vec and you just use the last one to push something into it (last_mut...push()). If a new shape should be created, I just push a new shape (empty vec?) to the vec.

I’ve made three (small?!) changes to your code

  1. I added #[derive(Default)] to your shape (is avoidable, but makes things easier)
  2. I added #[derive(Clone)] to your Point (is avoidable, but very neat)
  3. I changed the signatur of points_to_shapes to accept a slice, instead of a Vec (can be changed back, should not change anything)

I haven’t tested it for performance, but this could be easily done, if you want, but you should do it with your (real) data, because the structs might be a little bit bigger and that would tamper the result=

Hope it helps :slight_smile:

1 Like

Thank you, nice use of fold :+1:

I also learned about the Default :smile:

But I’d like to avoid Copy/.clone() as the actual data structure is much bigger (was just trying to keep it simple for the example).
Possible?

1 Like

Sure. But then you have to accept a Vec (so you have to .clone() that if you want to use it in the future again).

fn points_to_shapes(points: Vec<(bool, Point)>) -> Vec<Shape> {
    points.into_iter().fold(vec![Shape::default()], |mut s, (ns, p)| {
        if ns {
            s.push(Shape::default());
        }
        s.last_mut().unwrap().points.push(p);
        s
    })
}
1 Like

Yeah, that would be fine. I don’t actually need to use it afterwards so might get away with a slice.

Thank you - that is so much cleaner than doing it imperatively :clap:

1 Like

@hellow thinking about this. It is still using push which means the Vec will need to be re-allocated at some point. Which means we’d probably need to call shrink_to_fit somewhere as well?

I wouldn’t bother too much with shrink_to_fit. Yes you could iterate over all Vecs and call shrink_to_fit, but why. That means the jmalloc library has to be called again, a syscall has to be done and maybe more.
Unless you deal with a very memory limited environment (I think of microcontrollers or such), I wouldn’t do that.

It isn’t even defined, that it will shrink down to the minimal possible value: Vec::shrink_to_fit

It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements.

2 Likes

yeah, seems reasonable. Cheers.

I suppose I can still preallocate if I need to (instead of using Default).

If the input slice is empty, presumably the returned Vec should be empty too, but yours has a dummy Shape in it.

It’s a matter of opinion, but I think I prefer the “imperative” version for ease of understanding, although I would also not bother with shrink_to_fit unless it was shown to be a problem for memory usage. I also don’t think with_capacity is worthwhile unless there is a reason to choose 10 (e.g., “most shapes will have at least 5 points but only 1% will have more than 10”).

One improvement that does come to mind is that instead of moving current_points into shapes and making up a new current_points, you can drain it and create a new one “in place”. This way each points vector (except the last one) will be created with the correct size from the beginning.

    if new_shape {
        shapes.push(Shape {
            points: current_points.drain(..).collect(),
        });
    };
1 Like

You’re right! A simple

if points.is_empty() { return vec![] }

at the very beginning should do it :slight_smile: Well done!

1 Like