Do I want a Cow?

I have a small template cache thing that holds strings (we'll get to that in a bit) and allows re-rendering them with new data.

On the first version, I figured the template strings could be hardcoded and so it's just &'static (and then slices from there).

However, now I want to be able to optionally load the templates at runtime, so I have a String in addition to a &'static str.

Do I want to use Cow for this? From reading up on it a bit - seems that the docs urge its use for return types... but is that just the usual, or am I thinking about this all wrong?

1 Like

The docs may emphasize its use in specific places, but it's still just a type much like any other. If it makes sense to use it as a member of another type, go right ahead and do so.

Since you're using &'static str as the borrowed type, this doesn't have to infect your container type with a lifetime, either.

2 Likes

Yes, Cow<'static, str> is a good way to store a string literal that may be replaced with a dynamically-loaded string.

4 Likes

Your usecase is pretty much the definiton of Cow<'static, str>. :cow:
I would always consider writing an enum for your usecase, but Cow is exactly that.

1 Like

So I'm running into trouble...

In this playground the basic usage of just accepting either/or is fine.

But then I try and hold a slice of the input too, and that's where I get stuck. I don't quite follow this... the slice surely lives as long as the original since it's taken from that?

use std::borrow::Cow;

struct Template<'a> {
    original: Cow<'a, str>,
    slice: &'a str
}

impl <'a> Template <'a> {
    pub fn new_string(s:String) -> Self {
        Self::new_cow(Cow::Owned(s))
    }
    
    pub fn new_str(s:&'a str) -> Self {
        Self::new_cow(Cow::Borrowed(s))
    }
    
    pub fn new_cow(s:Cow<'a, str>) -> Self {
        let slice = &s[..];
        Self {original: s, slice}
    }
    
    pub fn render(&self) -> String {
        self.original.to_string()
    }
}
fn main() {
    println!("{}", Template::new_string("hello".to_string()).render());
    println!("{}", Template::new_str("world").render());
}

playground

I'd also ask yourself if there is any reason not to just leak the dynamically loaded strings? It sounds like a use case where that might be reasonable, and it would allow you to sick with &'static str. If this is a "read on startup" kind of loading then there strings won't get freed until the program exits in any case, so leaking them would be harmless.

3 Likes

@droundy - thanks, I put that on the playground here for the sake of easy reference :slight_smile:

This scares me a bit... but maybe it shouldn't...

It looks like you're trying to make a self-referential struct in this code, which is usually more trouble than it's worth in Rust. Both original and slice have the same lifetime parameter, but it doesn't mean that one is borrowed from the other; the compiler won't stop you from making another function that swaps out original, which would be an issue if slice referred to it.

1 Like

why would that be a lifetime problem since original must still be alive (otherwise it wouldn't have the same lifetime as the slice)?

An owned Cow would get dropped along with its inner String when it's swapped out. This is something I'm a bit uncertain on as well so I wrote an example to try and see what actually happens

use std::borrow::Cow;

#[derive(Clone)]
struct DropString(String);

impl Drop for DropString {
    fn drop(&mut self) {
        println!("dropped {}", self.0)
    }
}

struct Template<'a> {
    original: Cow<'a, DropString>,
    slice: &'a str,
}

fn main() {
    let original = DropString("original".to_string());
    let mut template = Template {
        original: Cow::Owned(original),
        slice: "sliced",
    };
    template.original = Cow::Owned(DropString("new string".to_string()));
    println!("doing stuff with {}", template.slice);
}

prints out

dropped original
doing stuff with sliced
dropped new string

I believe the bigger issue with self-referential structs is that moving the struct would invalidate the references it contains, but that's something the specifics of which I understand even less so I thought of something that was more intuitive to me :sweat_smile:

1 Like

Ohh, I think I see... my mental model was &'a Cow<'a, str> instead of Cow<'a, str>, if I understand correctly...

Instead of leaking, think of it like transferring ownership to the OS: it’ll still get cleaned up at program exit, just not through the same mechanism.

3 Likes

But according to that definition, user-space memleaks don't exist. And since they clearly do...

I haven’t seen unbounded memory usage in a long time that wasn’t automatically fixed when the program terminated. This is a service that the OS provides, and there’s no shame in using it judiciously. As long as your program’s memory footprint stays a manageable size throughout its execution, what harm does a “memory leak” cause?

Leaking resources is generally problematic with long-running services, particularly when the leaks are unbounded (because all resources are finite). There are always exceptions. I think leaking data that is intended to be globally accessible or effectively static is a good example of such an exception.

1 Like

I see a couple of issues with that:

  1. Empirically, small successful tools tend to not stay small, especially if that wasn't an explicit goal like it was with coreutils. If an author of such a tool decides at some point to leak memory, and doesn't clearly communicate that fact out-of-band*, another author can easily overlook that, and explode the memleaks to problematic proportions while implementing a new feature.
  2. As a philosophical point: is it OK to perform some problematic action just because the scale at which it is done is small? I would argue that the nature of the issue doesn't change, so no. To use an admittedly extreme example: would it be OK to scream at / steal from / murder a person just because you don't do it do many people? I would argue not.
  3. This line of reasoning, when applied to UB, leads to such claims as "this code (which is technically UB) is fine because the problematic situation in which UB is triggered doesn't actually occur". I'd still call that buggy code.

So if the code we're talking about is some personal, never-will-be-shared, instantly-deprecated thing, ok fine go right ahead.
If it's something that is meant to go into production or even more importantly, if others will work on it too, then I would advise against leaking the memory.

*Out of band communication of intentional memleaks is necessary because the actual leak itself is often quite subtle, and easily missed even when read multiple times. But, as such communication requires extra effort, this is generally unlikely to happen.
On top of all that, memleaks generally are not expected in Rust, even though the language doesn't (claim to) prevent them.

Everything has both benefits and drawbacks, so there isn’t a clear distinction between problematic and beneficial actions. The world is a complicated place, and I find a situational cost/benefit analysis to be a better framework for decision making (taking externalities into account, of course).

Automobiles, for example, are a leading cause of death in many countries but we, as a society, tolerate them because of the benefits they bring. Does that make driving a problematic activity that must be eliminated?


That’s the justification used for just about every unsafe block of code written, including the ones in the standard library. The only thing that varies is the amount of effort that goes into verifying the claim that the potential UB is never actually triggered. The code is only buggy if the analysis is wrong.

Also, I don’t understand how relying on a well-documented and long-established operating system feature is analogous to maneuvering around behavior that’s explicity forbidden. One is taking a shortcut through an alleyway and the other is walking close to a cliff edge.

Memory leaks are simply not a closed, coherent concept. It's a leaky abstraction, or at least, a highly contextual one -- one person's memory leak is just another person's long-lived allocation. Rust (and most other languages) don't have any reliable way to prevent memory leaks, because if you walk too quickly down that road, you end up trying to ban Turing completeness.

You can't just say "clearly memory leaks exist." What you mean is "clearly there are memory usage patterns that people find problematic." The point is, it is the people who find it to be a problem, it is not a fact intrinsic to the system. Trying to ban memory leaks is like trying to ban poetry from being printed to stdout.

1 Like

I think it may be worth talking about the levels of memory patterns that are sometimes called leaking memory. I'll introduce them in order of badness:

  1. A leak in a loop that runs indefinitely. You allocate memory regularly, and fail to free it (even though it is not needed), so that your memory use increases monotonically until the machine runs out of swap space. This is bad news. In any modern language this is pretty rare (reference counting or garbage collection basically solves it). In C or old-style C++, it's almost universal in larger programs, due to the difficulty of manually identifying when memory is no longer needed.

  2. A finite number of allocations which aren't freed even though they are no longer needed, which change the scaling of memory use, e.g. from O(n) to O(n**2). This is serious, because it limits the size of problem that can be solved. This kind of leak is particularly easy to introduce via recursion, and is over reason why stack is still kept very small, to limit the impact of this kind of bug.

  3. Memory leaks that scale with the size of the problem. e.g. it just doubles the peak memory use of the program. This may not be serious at all, if the program is not memory limited. But on the whole you'd rather avoid this sort of leak.

  4. Leaking memory that increases peak memory use by size O(1). Unless the size happens to be huge, this is probably no problem.

  5. A memory leak in which memory fails to be freed when it is no longer used, but does not increase the peak memory use at all. This happens when the memory is needed until after the memory allocation had passed its high water mark. It's very hard to imagine this causing a problem. Only if the process runs basically indefinitely after allocating and failing to free a lot of memory.

  6. A program that fails to free some of its memory upon exit. This is actually a (very tiny) performance win, since freeing allocations on exit has zero benefit, and just uses CPU time. The kennel has to tear down the page tables of the process regardless, which frees all its memory, and calling free on the memory does nothing to help.

I would encourage leaking where you are confident you are in scenarios 5 or 6 above (and it simplifies your code). If you're absolutely sure your case is in category 4 above and you know it's a small allocation, by all means leak if it makes the code significantly simpler. Simpler code is far more likely to be both performant and bug-free.

8 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.