Is this code well optimized for importing a single glb mesh?

This uses the gltf library and bytemuck.

let object: (Vec<Vertex>, Vec<u8>) = {
                let obj = data
                    .0
                    .meshes()
                    .nth(0)
                    .unwrap_unchecked()
                    .primitives()
                    .nth(0)
                    .unwrap_unchecked();
                let r = obj.reader(|buf| Some(&data.1[buf.index()]));
                (
                    r.read_positions()
                        .unwrap_unchecked()
                        .zip(r.read_tex_coords(0).unwrap_unchecked().into_f32())
                        .zip(r.read_normals().unwrap_unchecked())
                        .map(|v| Vertex {
                            pos: v.0.0,
                            uv: v.0.1,
                            norm: v.1,
                        })
                        .collect::<Vec<Vertex>>(),
                    r.read_indices()
                        .map(|v| match v {
                            ReadIndices::U16(ind) => {
                                bytemuck::cast_vec(ind.collect::<Vec<u16>>())
                            }
                            ReadIndices::U32(ind) => {
                                bytemuck::cast_vec(ind.collect::<Vec<u32>>())
                            }
                            ReadIndices::U8(ind) => {
                                bytemuck::cast_vec(ind.map(|f| f as u16).collect::<Vec<u16>>())
                            }
                        })
                        .unwrap_unchecked(),
                )
            };

The most optimized option is to map the buffers directly into VRAM by providing the layout of the mesh to your GAPI, but that requires a new pipeline layout per vertex layout in your meshes. That's part of the reason engines often have a "baking" step that ensures there's no mapping required on load.

That said, this is a fairly typical approach for smaller projects, and the given code is fairly optimal for that (as in, I expect anything you might find to optimize would give tiny returns, eg. less than a percent)

I wouldn't use unwrap_unchecked(), the performance difference of removing the check is irrelevant next to any allocation library short of a trivial bump allocator (which it's comparable to in cost) - it's really not worth the danger of undefined behavior. That API is mostly for carefully optimizing very tight loops.

Wdym by "allocation library short of a trivial bump allocator"? And wdym by baking in regards to vertex layouts? I want to learn how to bake if you have any resources for wgpu on that as well. And in regards to using unwrap_unchecked, you could call it a lazy optimization since it's for a game and not for things like a library.

The cost of a bump allocation is very small, whereas a normal heap allocation has a larger cost. So compared to a normal heap allocation the unwrap cost is small, but not so small compared to a bump allocation.

Bump allocators (e.g., bumpalo) get their speed by allocating locally (they are not normally sharable between threads) and by not supporting deallocation of individual objects -- the entire bump allocator (all its allocations) must be freed at once.

Unrelated, but does tokio have a bump allocator? Also, there seems to be a caveat to using them.

Huh? Do you mean for its internal data structures? I don't know.

Do you mean other than the ones I mentioned?

No to the latter. Bumpalo as an example. It's a question only because of the way bumpalo's docs describes itself. Particularly the second paragraph in the bump allocation section.

Yes, the 2nd paragraph in their doc is one of the things I mentioned:

The disadvantage of bump allocation is that there is no general way to deallocate individual objects or reclaim the memory region for a no-longer-in-use object.

In addition you can't allocate boxes, collections, etc, defined in the std library, or other crates, using a bump allocator. There is no allocator interface in Rust, yet. So you can only allocate the boxes, collections, etc, defined by the Bumpalo API, or implement your own data structures on top of the raw allocator in Bumpalo.

I don't if that's what you were asking.

It was. I understand. I really hope rust allows custom allocators in the future.

You actually can replace the global allocator already, which changes the allocator used by eg Vec, but bump allocators are not at all suitable for that.

The API under design is available behind the unstable feature allocator_api, with discussion happening here, which links you to the (unstable) Allocator trait.


However, this is all besides the point, as you're talking about trying to shave things that are below microseconds, when you're loading potentially megabytes into a GPU. The point was that the savings of unwrap_unsafe() are completely worthless in this context, you don't need it and should use unsafe() (or better, expect("mesh missing normals") etc.), or some other error handling mechanism.

If you're calling an unsafe API, you should be able to provide a // SAFETY comment block describing why you know that this will not violate the safety block, and you are getting this from an external format.


An alternative, if you're really concerned about the performance of metadata and not the almost certainly dominating by many orders of magnitude buffer data, you may want to directly parse the GLTF document, by declaring only the fields you care to parse, e.g. using serde something like:

#[derive(serde::Deserialize)]
struct GltfDocument
{
  meshes: [GltfMesh; 1], // expect exactly one.
  // ...
}

#[derive(serde::Deserialize)]
struct GltfMesh
{
  primitives: [GltfPrimitive; 1], // etc.
  // ...
}

// ...

fn main() -> anyhow::Result<()> // for simplicity
{
    let data = fs::read("input.glb")?; 
    let glb = gltf::Glb::from_slice(data)?;
    let doc: GltfDocument = serde_json::from_slice(&glb.json)?;

    let obj = &doc.meshes[0].primitives[0];
    // ...

    Ok(())
}
1 Like

Use cargo asm to see how the generated code looks like.

The unsafe unwrap_unchecked() may be unnecessary.

If you make this method return Option, then you can use ? on methods like .nth(n)?. This is generally cheap — it may be optimized out of the bounds are obvious. It may still be cheap for the branch predictor, since the branch will never be taken. ?/return/break usually optimizes well, and ends up being cheaper than code with a potential panic, such as slice[index].

.unwrap_or_default()/.unwrap_or(f32::NAN) is another safe and usually very cheap alternative. It may compile down to a conditional move, which for data read from RAM can be essentially zero cost.

And again, we're taking about 3 branches on cached data right before reading megabytes into VRAM, possibly more than 6 orders of magnitude more expensive :face_savoring_food:

1 Like

Isn't your glb code similar if not exactly what import_slice does?

The difference is you define exactly what structure you expect so you don't have to parse details you don't care about or traverse and revalidate optional fields.

I generally wouldn't recommend this, as mentioned if you're caring about it at this level you should be baking your resources so they can just be directly mapped into VRAM without this sort of processing.

How would I bake them?

Do what you would do to load them into VRAM, then instead dump that to files however you like. In particular you know the vertex format you expect, so you can just create a GPU buffer, lock, then read the file directly into that buffer.

You can even have all your GPU side static data on one big flat file, and build whatever ones/table/metadata file with the offsets you need as part of the higher level loading process.

Exactly what that looks like depends on what you're doing, it's part of your engine design.

I understand. Never thought about that. Thank you.

#[derive(serde::Deserialize)]
struct GltfDocument
{
  meshes: [GltfMesh; 1], // expect exactly one.
  // ...
}

How do I derive Deserialize over something with members that don't derive it?

You generally don't, each field needs to implement (explicitly or derived) Deserialize if you're deriving it, though check out serde.rs for all the various ways you can customize that with attributes, out you can explicitly implement it (though it's a particularly painful job for most cases)

In that example I gave, GltfMesh is defined directly below as also derived, as it is representing the shape in the GLTF document that the gltf crate returns from the meshes method; not the loaded into memory mesh (the gltf document normally doesn't directly contain the attribute data or indices itself, it only references locations within binary buffer data, though it can use base 64 to get a purely JSON mesh too)

You may want to reference the gltf cheat sheet for the shape as defined in the document.

I can't find GltfMesh anywhere.