I'm working on implementing Simple Binary Encoding for Rust for a
high-performance embedded use case, and have bumped up against some
interesting challenges to the hope of zero-cost abstractions.
In short, I'd like to be able to decode raw bytes (
- meaningfully typed value references (
enums, etc) without copying
- with an abstraction that makes incorrectly-ordered component decoding impossible
- with minimal per-field-decoded allocations or runtime checks
- for messages which may include arbitrarily-sized collections of substructures
I believe the first goal can be accomplished largely through
pointer magic, and being careful with any compound structs that
map on top of a given slab of bytes.
The second goal is about making the decoder API minimize opportunities
for the end users to misinterpret regions of bytes. A straightforward example of
this is using an
interpret_as_my_enum method on bytes
3..5 when the data for that
enum within the context of the encoded message is actually over at bytes
My present solution to 2 uses session types, which amount to a long chain of structs that wrap the underlying bytes and
expose a single method that spits out a decoded value and the next step of the
type-level state machine of interpretation. This pattern makes tight
APIs that leverage Rust's data ownership system to do correct
byte-offset management, but unfortunately runs afoul of constraint 3, minimizing
allocations. If each step in a session-typed decoding chain requires a new
wrapper struct to enforce the decoding state machine's currently-exposed
capabilities, that implies a costly allocation per field decoded. At high speeds,
that's a bit worrying.
I would love to learn if there are patterns by which those allocations could be avoided,
either through a different ordering-enforcement scheme, or perhaps by providing
additional hints to the compiler. Perhaps @alexcrichton or SilverWingedSeraph would know?
Constraint 4, handling messages with subsections exhibiting
variable-at-runtime-multiplicity seems like a simple matter of coding at first
glance, but has taken on a sinister aspect as I've wrestled with it. The most
pressing consequence is that it highlights the difficulty of
working with unsized byte array slices. Namely, without elevating the size
of the byte source from which data is being read into the
type system, we are forced to do runtime
offset + length <= size style checks
and furthermore pay the allocation cost of
Result wrapping each
field-interpetation attempt in case the bounds check fails.
Constraint 3, minimize checks and allocations, rides again.
My early inclination to was to generate APIs which rely on sized byte arrays,
but there are several ergonomic problems. First, since we're dealing with
arbitrary message schema containing repeating groups, the sizing of the
byte arrays would have to be exposed to the end user for input for each distinct
fixed-length block within the message.
Second, even if we could specify a sized
array that represents the whole of a message, generating sized read-only views on
that data in order to decompose processing into well-typed helper functions
seems tough in Rust right now. At very least, I don't know how to do it without
doing either data copies (into sized array buffers) or leaning on truly horrific
code generation of custom sized wrapper types that skip bounds checks.
In other words, it would be nice to start from a
[u8; 1024] and pass down
&[u8; 256] and
&[u8; 768] references to decoder functions without allocating fresh arrays.
Is this a planned area for possible language or compiler improvements?
I'm hopeful that others who have forayed into the low-copy space
( dtolnay , @dwrensha , et al?) could comment on how to
aim for compile-time-assuredness of zero-copy byte array slicing and dicing?
Any and all advice would be welcome,