Hello,
I have a data structure like this one:
enum SemanticSurface {
RoofSurface,
GroundSurface,
WallSurface,
}
struct Material {
name: String,
ambient_intensity: Option<f32>,
diffuse_color: Option<[f32; 3]>,
emissive_color: Option<[f32; 3]>,
specular_color: Option<[f32; 3]>,
shininess: Option<f32>,
transparency: Option<f32>,
is_smooth: Option<bool>,
}
struct Texture {
image: String,
}
type Point = [f64; 3];
type LineString = Vec<Point>; // There are many Points in a LineString.
struct Surface {
boundaries: Vec<LineString>, // Usually, there is one LineString in a Surface boundary, but there can be more.
semantics: Option<SemanticSurface>,
material: Option<Material>,
texture: Option<Texture>,
}
type Shell = Vec<Surface>; // There are many Surfaces in a Shell.
enum Geometry {
Solid {
lod: String,
boundaries: Vec<Shell> // Usually, there is one Shell in a Solid boundary, but there can be more.
}
}
I know the exact size of the vectors, but only at runtime. The data in the vectors comes from an intermediary container that was deserialized from a JSON file.
Since I know the size, I create each vector .with_capacity()
.
The data itself is in the Points. The other fields, Surface.semantics, Surface.material, Surface.texture
does not contain any data for now. I just left it there, because maybe it matters? Similarly, the Solid.lod
field is a string of a few characters only.
When I measure the allocated size of these structures I see that there is significantly more memory allocated for the vector in Geometry::Solid.boundaries
than what I can explain from multiplying the number of Points by 24 bytes.
I measure the heap allocation size with the datasize crate. I also check the peak memory use with /usr/bin/time -v
and valgrind massif. They confirm the memory usage that I measure for Geometry::Solid.boundaries
, and since I'm in the GB-range, I don't really care about a few Mb difference...
But to be specific, these are the numbers that I measured:
- total nr. points: 34543002 (calculated from this, there is about 829 Mb of Point data on the stack, right?)
- total nr. of Surface: 11514334
- total size of Surface-s: 1105 Mb (I think this is heap allocation only, since I measured it with
datasize::data_size
. - total nr. of Geometry-s : 16865
- total size of the
boundary
vector of all Geometry: 2948 Mb (measured the same way as the Surface, withdatasize::data_size
)
I checked if the vectors are over-allocated with Vec::capacity() - Vec::len()
, and this is 0 for each vector.
Could someone help me to find out why is there nearly 3x heap mem allocated for the Geometry.boundary
than for the Surfaces? Since the Geometry.boundary
is just a Vec<Vec<Surface>>
and it doesn't have unused capacity (capacity - len = 0), I would expect that the difference in the allocated mem would be negligible.
Or am I mixing concepts and measuring the wrong thing?