I'm working on a parser for a binary file format that is basically a stream of events. Each time the parser encounters an event, it builds a struct for that event's data and passes it to a callback function.
The problem with this setup is that the parse function in this example wastes time building EventB despite the handler never needing it. If the parser could "know" the EventHandler doesn't need EventB it could easily skip over each occurrence to save time. What's a good way to design the interface to make this possible? Is this a textbook case for lazy parsing?
You could use a registration system of some sort rather than having them all on one trait. Like let people register Box<dyn FnMut(EventA)>s with the parser and call them all when you see such an event -- and you'd know if there aren't any so you could skip getting the details.
You could make EventA be a thin wrapper around a &[u8] into the buffer so it's trivial to construct, but the methods on the struct would pull things out of the serialized stream on demand.
You could decide that it's probably so cheap to deserialize a flat event from a binary format that you might not need to worry.
From this, I'm assuming that the binary format is structured such that after determining the event type, you either know the event payload size or can easily read it from header information, in order to skip over it.
If there's structure to the event payload which you're saving time by skipping over parsing, though, that means there's potentially ill-formed structure that you're skipping over without validating. Generally, it's probably good practice to validate the structure of the data is good, even if you're not processing it any further. Once you've done the validation, actually putting references into an Event structure should be a trivial amount of work, and perhaps even work that the optimizer can cut out if the code is monomorphized (generic not using dyn).
If the binary format is structured such that you can't use a zero-copy view into the buffer to pass the validated payload to the event handler, that's an unfortunately poorly designed format. (Zero-copy doesn't actually mean zero copies, though, to be clear; it means zero large/heap copies/allocations, where variable-length data is used directly from the deserialization buffer rather than a separate heap copy. Things like fixed, constant size structure absolutely gets copied into structured data.)
An approach the wasmparser crate uses is to have a top-level parser which splits the WebAssembly file up by section and gives you an iterator of "payloads", where the wasmparser::Payload is an enum that might contain the data in-line (e.g. if it's something cheap like the version number) or will give you sub-readers for parsing that section.
That way you can defer the extra parsing work until it is needed.