How to choose an owned vs. borrowed struct member

Rust newbie and generally a pointer newbie here, so please excuse if this is obvious.

I am trying to wrap my head around when to appropriately choose a owned vs. a borrowed struct member. Let's make this concrete. Suppose I have a struct like this:

pub struct Endpoint {
    pub uri: String,
    pub methods: Vec<String>, // e.g. "GET", "POST"
}
impl Endpoint {
    pub fn new(uri: String, methods: Vec<String>) -> Endpoint {
        Endpoint {
            uri: normalize_uri(uri),
            methods,
        }
    }
...

The struct may be instantiated from the command line, thus we don't know what the uri is at compile-time. However, when we instantiate it we normalize the uri with a normalize_uri function. I don't expect that the uri will be written to after this. For simpicity's sake, the normalize_uri function looks something like this:

fn normalize_uri(uri: String) -> String {
    let re = Regex::new(r"^/|/$").unwrap();
    re.replace_all(&uri, "").to_string()
}

replace_all returns Cow<'t, str>. As I said I'm not at all experienced in any of this, but I assume replace_all returns this "Cow" to give the caller flexibility as to whether the returned data is owned or borrowed.

So, now we have a endpoint struct that has a uri member that has to come in as an allocated String, which is passed to a function that calls another function that returns a Cow, which is then allocated again? by calling to_string(). This doesn't seem right to me, nor efficient. Though perhaps there is no extra allocation when to_string is called on a Cow<str>?

Now in this case, extra heap allocations (if there are indeed any) probably don't matter, because the grand majority of time and resources this program will spend will be in network IO, but regardless I am using this as a learning experience so I am curious about best-practices and idiomatic rust. TIA.

1 Like

In general, you should avoid references for anything that will live longer than a few function calls unless you have a specific reason otherwise (benchmark results, safety, etc.)— That tends to be the inflection point where the complexity of handling lifetimes becomes more trouble than it's worth.


In this case, it's the replace_all implementation that chooses: It won't go to the trouble of allocating a new String unless it actually finds something to replace (and maybe a few other situations). If you're don't to keep the result long-term, you can just use the Cow like an &str. If you want to store it for later, you'll usually want to convert it to a String.

Also, in the case of your normalize_uri function, you have to convert it to an owned String: The Cow returned from replace_all potentially refers to the internal buffer of uri, which gets dropped at the end of the function.

1 Like

So for my context, what I have chosen to do is "fine"?

Yes, your current approach is fine. The copying would only really become an issue if the URIs were many kilobytes long, which is rarely the case.

1 Like

First, remember that "references in structs [are] the evil-hardmode of Rust that will ruin your day.".

But if you'd like some more nuanced guidelines, you might find this thread interesting:

As a specific place you might move slightly less to owned stuff, it's plausible that the methods are commonly string literals. So you could consider something like

pub struct Endpoint {
    pub uri: String,
    pub methods: Vec<Cow<'static, str>>, // e.g. "GET", "POST"
}

As that way you wouldn't need to allocate for common things.

8 Likes

It might be worth changing methods: Vec<String> to something less stringly typed. As a bonus, that should also eliminate borrow vs ownership issues for the method strings.

#[derive(Clone, Copy)]
pub enum EndpointMethod {
    Get,
    Post,
    // ...
}

// Implementing fmt::Display is probably not the right way to expose the HTTP
// method. You'll likely want something more specific to your API, but this
// example should demonstrate the idea of how you can separate the typed
// representation of your endpoint method from how it's rendered as a string.
impl fmt::Display for EndpointMethod {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            EndpointMethod::Get => write!(f, "GET"),
            EndpointMethod::Post => write!(f, "POST"),
            // ...
        }
    }
}

fn main() {
    println!("{} {}", EndpointMethod::Get, EndpointMethod::Post);
}
3 Likes

you are right. I was thinking an enum. But actually methods should be a hashmap of an enum string to payload structs.

First, I wanted to say how great the rust community is to new-comers. Not a snob in sight. Also the compiler is really awesome. I have never seen such user-friendly messages from compilers. It's almost like having a personal tutor

Now, let's say I want to add a field on Endpoint, description. This is a static string that we know at compile time:

pub struct Endpoint {
    ...
   pub description:  &'static str
}

Questions:

  1. If a string is known at compile time, does that mean it should be always be typed as a static string? Should it always live as long as the program? The other option is parameterizing with a non-static life-time, like: description: &'a str
  2. I assume the rust community makes trade-offs based on real life considerations. For example, if we are working in a non-memory sensitive scenario, do folks make. "unnecessary" string allocations to avoid the propagation of explicit life time typing? So for example we might type it as a String just for readability/simplicity's sake? I ask this because I see Rust as quite useful beyond its core, performance, memory restrictive domain. It seems very well designed and safe in areas apart from strict memory safety and avoidance of undefined behavior.
1 Like

If it is a string literal, then it is a &'static str.

It will live as long as the program, since it is baked into the executable. Any references to it can live as short as they want to, of course.

Probably yes. "Clone to satisfy borrow checker" is sometimes thought of as an antipattern, but in practice, this is not a problem if one understands what does the clone() do.
If the strings in question are immutable (but owned, not statically stored), one might want to reach for Rc<str> or Arc<str> instead - this will allow for cheap cloning, since there's no need to clone the string itself, just a pointer to it.

5 Likes

Oh we have snobs here: it's just about things like "crashing is bad" and "error messages shouldn't suck" :yum:

5 Likes

So, let's say that we have a struct, but we don't know in advance how it should be instantiated (if that's the right word). It could be used by another library, or it could be populated by a yaml file at run-time.

This is a case where we really don't know if the string-like fields are static or should be dynamically allocated. In this case is Cow the answer?

If you're not sure use String. It's simple and fast enough for most use cases - that's why it has such a generic name.

2 Likes

Cow<'a, str> is pretty much an enum of either &'a str or String, with all the costs and benefits of that, and it tends to be special case. When 'a is 'static, it's more generally useful, but still not likely to actually be all that much better than just using a String.

It's intended for situations where:

  • you're trying to be a zero cost library,
  • where not needing to allocate is the majority situation, like parsing string literals or the like where most of the time you are just going to point into the source,
  • but sometimes there's an escape sequence so you need to allocate,
  • and also the user is the library is not going to need to just get a String anyway: perhaps they are only comparing to other str's, or interning into their own compact string table, or whatever.

I actually haven't seen it used outside of parsing specifically, but there's no reason you couldn't use it for others

1 Like

Others' have great replies.. I recently went through this exercize unsuccessfully trying to avoid String copies. Short of dynamically producing slices on the fly or resorting to ref-counting I couldn't find a Java/C like approach (which as I think about it, I'm not sure how the borrow checker could prove I'm not mutating the peer structure away).

But for the Cow, I did find that it's a great use for return of static/dynamic error codes.

fn foo(input:u8) -> Result<(),(u8,Cow<'static,str>)> {
   match input {
        0 => { Err((ZERO_VALUE, Borrow("zero value"))) }
        1 => { Ok(()) }
        _ => { Err(INVALID_VALUE, Owned(format!("invalid value: {}", input)))) }
   }
}

Granted, performance in the error case isn't critical, but I'm writing a C-API layer so I needed errno + err msg, and I love that I can produce dynamic error messages in the above way (of course it require the C API to 'close' all returned message objects so Rust can drop stuff). But the above is fast enough that for COMMON static error messages, there is zero allocation - for number validators this is money! You've got a 50/50 shot a given int will parse correctly, so avoiding heap just to say Owned(format!("'{}' isn't valid",value)) is a big performance difference.

As other's said, you can easily have

  fn decode(msg: &str) -> Cow<'static,str> {
     match str {
         &"POST" => { Borrow("POST") }
         &"GET" => { Borrow("GET") }
         _ => { Owned(String::from(msg)) } // custom is more expensive
    }
  }

though enums with a custom extra might be a better usability choice

enum ReqType { GET, POST, UNKNOWN(String) }
fn decode(msg: &str) -> ReqType {
   match msg {
      &"GET" => { ReqType::GET }
      &"POST" => { ReqType::POST }
      _ => { ReqType::UNKNOWN(String::from(msg)) }
}

To add to this

For example, if we are working in a non-memory sensitive scenario, do folks make "unnecessary" string allocations to avoid the propagation of explicit life time typing?

If performance isn't the issue, then use Rc or Arc to avoid borrow checker issues. I admit, I reach for clone() prior to Rc myself for no measurably good reason - I'm just alregic to python-style RCing.. Back in my perl days there was a well documented bug-like-use-case related to circular reference counted rings producing MASSIVE memory leaks (in the order of Gigabytes - in the 1990s that was a lot). So I always try to avoid Rcs if possible. Clearly an Rc<String> can't exhibit that bug - but I'm algorithmicly racist at this point I guess. :slight_smile:

On rethinking, the Rc only "costs" 1 extra word of storage, and 2 more drop if-statements (for 1 clone of an Rc), which is more or less fixed cost, v.s. a String::clone which is unbounded in cost and requires extra Heap allocations beyond the initial Rc.

I think maybe my other hesitation is the extra indirection that an Rc forces. You have essentially a Box'd reference to a Rc+String struct with a raw pointer to the dynamicly sized string. So 2 loads to get the first byte... v.s. a String or &str which has in your local registers the struct that point to the 1st byte of the string. But, again, you said "if performance wasn't an issue"..

Agreed that String is far more readable. I know that I have a minor panic attack when I look at my old code with generic lifetimes with associated references - I think; will this change break things??? And I've got about a 50/50 chance these days (String always fixes everything - but just always feels wrong).

For the immutable case, use Rc<str> (or Arc<str>) not Rc<String> to remove some indirection.

1 Like

Rc<str> - does that work practically? You need a lifetime specifier unless it's static, and I can't think of the situation where that wouldn't inhibit the Rc clone's lifetime as well. (but I'm still new to all the power-lifetime-usecases :slight_smile: - I'm still in awe from GhostCell

You don't need a lifetime because it is owned, not borrowed. Rc and Arc are "owning pointers", like Box and (by some definition of pointer) Vec. That's why their signature is Rc<T>, not Rc<'a, T>; note also that str does not have a lifetime either (in contrast with &'_ str). The memory will be freed when the refcount falls to zero.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.