Creating an enum which is generic over owned and borrowed Strings

When parsing a byte slice into an AST, it's efficient to have the AST reference strings in the underlying byte slice rather than copying the bytes to an owned String.

I've managed this in an implementation of Redis's RESP protocol here. In short:

pub enum RESP<'a> {
    SimpleString(&'a str),
    // ...
}

pub fn parse(buf: &[u8]) -> Result<(usize, RESP), ParseError> {
    // ...

This works great when you know that the byte buffer will live longer than the parsed AST. However, sometimes you want to own your AST. Maybe you want to clone the AST and use it after the byte buffer has been de-allocated.

Using a String works fine, but then you lose out on avoiding the copy.

pub enum RESP<'a> {
    SimpleString(String),
    // ...
}

How do I idiomatically create an AST enum which can both reference a underlying buffer and be cloned to produce an owned value which can be used the buffer has be de-allocated?

Things I've tried:

  • Making RESP generic over T and subbing T for String or &'a str. Besides being terribly ugly, I've been unable to create functions that accept both RESP<String> and RESP<&'a str>.
  • I've read a bit about Cow and it's ability to house both an owned and borrowed value. Though this is somewhat what I want, it sure feels like not it's intended use-case. Having an additional indirection when constructing the value seems overkill.

How should this be done idiomatically? Can it be done? Should it be done? Thanks! <3

There's a Cow<'a, T> in std for efficient copy-on-write semantics. Check it's definition. Enjoy your cow power with Cow<'a, str>!

1 Like

I've read a bit about Cow and it's ability to house both an owned and borrowed value. Though this is somewhat what I want, it sure feels like not it's intended use-case. Having an additional indirection when constructing the value seems overkill.

As a library writer, is it a good idea to sprinkle your AST with Cow so that some users get the ability to clone and own their previously borrowed strings? I can't articulate why this feels a bit off to me.

I feel like what I want is a way to say <T: StringTrait> which only accepts String and &str for T.

To clarify, AFAICT this is 100% Cow's intended use case, and it does not introduce any additional indirections / pointer dereferences. What it does introduce at runtime is a bool (plus padding) to track the "am I owned or still just borrowed?" state, and of course some branching on that bool. Just like any other enum.

What's a little unclear to me from your post is whether you would ever want one of these values to change from borrowed to owned at runtime. If that never happens, then perhaps the prominence of that operation in Cow's API is why Cow seems off somehow. But if I'm reading your OP post correctly, even if you never do the change, you still don't know until runtime whether you'll be creating the borrowed or the owned variant, which means you'll need an enum equivalent to Cow anyway.

But, if you do know at compile time whether you'll be creating a String or a &str, then the trait you're looking for is probably <T: AsRef<str>>: https://doc.rust-lang.org/std/convert/trait.AsRef.html

I implemented the Cow approach in a branch and it works fine. The only issue is that it litters constructors with Borrow.

But, if you do know at compile time whether you'll be creating a String or a &str , then the trait you're looking for is probably <T: AsRef<str>> : https://doc.rust-lang.org/std/convert/trait.AsRef.html

In this case, I'd expect the user of the library to know whether or not they'd want to clone the AST.
But I can imagine cases where they might want to it dynamically too...

I can imagine creating a to_string() method on RESP<T: AsRef<str>> which converts a RESP<&str> into a RESP<String>. Gonna try hacking that up and comparing the two approaches.

If you still feel not sure, serde_json can deserialize JSON string into Cow<'a, str> to handle escaped string properly.