Is memory leak considered safe in long time running service in Rust

Hi forum,

So I do not understand the statement "memory leaks are memory safe in Rust" in this link Reference Cycles Can Leak Memory - The Rust Programming Language

In a long time running program which is intended to be up for months or years, what if there is continuous memory leaks in this program? Because manual reference counting is still needed and there is a chance to make reference cycles by mistake in Rust.

For example, in the following bad example in C language, memory will be exhausted and program will be down not for long, but very soon. I think memory leak is not safe here.

  1. Is memory leak really memory leak in Rust, does memory leak ever exist in Rust?

  2. Why memory leak is safe in Rust, but not safe in other languages?

Thanks

// bad example

for (;;){ //forever loop

    char *p = malloc(1024);

    //doing something...

    if (bad_luck()){ 
        free(p); //skipped due to bad luck
    }
}

Rust's reference counted pointer types perform the counting automatically. There is no need for manual reference counting.

Yes, absolutely.

Memory leaks are safe in other languages, too. They don't cause erroneous values to be read, they don't cause arbitrary/invalid pointers to be dereferenced, they don't cause race conditions, etc.

The fact that you can crash a program with resource exhaustion doesn't make it memory unsafe. You can just as well crash a program by making it use up too much disk/swap space, or by explicitly panic!()ing. That doesn't make these operations memory-unsafe.

23 Likes

More broadly, there’s a subtle distinction between correctness and safety. Programs that behave as @H2CO3 describe here are probably incorrect, in that they don’t behave how their author intended. No compiler can completely avoid producing buggy programs like this— Short of reading the programmer’s mind, it can’t know what the programmer actually intended.

Safety is a more technical concept which concerns consistency. Some things that a program can do, like reading from a memory location before writing it, will interact with the rest of the computer system in unpredictable ways. This unpredictability makes the operations useless, as the program behavior will change due to factors outside the users' and programmer's control. Also, problems that arise from these operations are notoriously hard to debug, as the behavior might not happen during the debugging process. The "safety" of a programming language is a subjective measure of how difficult the language makes it for programmers to accidentally include these problematic operations.

16 Likes

The inability is even more fundamental than this in many modern platforms. By default, Linux will overcommit memory for example, and there is no in-program defense against the OOM-killer. That is, even with a fallible allocation model, guarding against memory resource exhaustion crashes requires measures outside of the programming language itself. (And that is but one resource axis to consider.)

3 Likes

You can even go further than this.

Rust can't prevent you from unplugging the computer. So thus arbitrarily killing the program at any point can't really be "unsafe", since there's nothing the programmer can do to prevent it.

So while Rust helps avoid a bunch of mistakes around handle leaks and such, you still need a watchdog to restart your Rust service if something bad enough happens to it -- memory leak, stack overflow, abort from a panic inside drop, etc.

5 Likes

I would like to point out that it's important to distinguish between "safe" and "safe". :grin:

It's a matter of terminology. In the Rust context, when we say "safe" (as in contrast to unsafe) we mean particular guarantees that can't happen (in particular no erroneous access to memory; not sure what else?).

But when we colloquially speak of "safety" outside the Rust world, we usually mean something entirely different. Of course, an integer-wraparound is dangerous (trying to avoid saying "not safe" here, because in Rust it is considered safe):

fn fly_higher(height: &mut u16) {
    *height += 1000;
}

fn main() {
    let mut height: u16 = 65000;
    fly_higher(&mut height);
    if height < 50000 {
        println!(">>> SELF DESTRUCTION <<<")
    }
}

(Playground with profile "release")

Output:

>>> SELF DESTRUCTION <<<

The confusing thing is that in Rust a lot of dangerous things are "safe":

Behavior not considered unsafe

The Rust compiler does not consider the following behaviors unsafe , though a programmer may (should) find them undesirable, unexpected, or erroneous.

Deadlocks
Leaks of memory and other resources
Exiting without calling destructors
Exposing randomized base addresses through pointer leaks
Integer overflow

But of course, an integer overflow can destroy a rocket, shut down a life-support system, corrupt your file system, or cause your alarm clock to fail waking you up for your final exam.

In that sense, I find the wording "undesirable" or "unexpected" a bit of an understatement. :sweat_smile:

I particularly find the deadlock case tricky. But still, "safe" Rust protects us from a lot of bad things, particularly those things that corrupt memory and cause weird behavior at entirely different parts of our program. I spent a lot of time chasing bugs in (my own :see_no_evil:) C programs, where program behavior simply didn't make sense at all to me. It sometimes turned out to be a mistake in an entirely different section of the program, which corrupted memory that wasn't used until later. This is what "safe" Rust protects us from.

The problem, however, is that the word "safe" is usually used differently. (Edit: not really a problem, as long as we know which definition of "safe" we are using, of course.)

4 Likes

I think it might help you distinguish between maybe three different possible meanings for "memory leak." The first is any case where allocated memory is inaccessible but hasn't been (and won't be) freed. The second is when memory may be accessible but it's guaranteed never to be freed. The third is what you called a "continuous memory leak" which is when memory is repeatedly allocated but not freed when it becomes inaccessible throughout the running of a program. Each of these scenarios is a memory leak, but they have differing levels of problematicness.

In the first case, it's hard to consider it good. Why keep around something you can never use? But it might not be a problem if you never free a copy of your command line flags, for instance.

The second case is where leaking memory can be really useful, in either rust or in C or C++. It can allow you to have a single data structure shared across many others without the complexity an overhead of reference counting and without the danger of use after free bugs showing up. Years back I used to use this approach in C++ (way before reference counted pointers were in the STL) for data structures describing pseudopotentials for elements. You knew there could never be very many of them, and there was no need to free them. Yes, you could reference count them, but why bother? In rust a nicer use case would be for zero copy parsing of a buffer read from a file. If you know you're reading just the one file, you can avoid the nuisance of tracking lifetimes to ensure your buffer stays around but just calling Box::leak().

The third case, of course, is what gives memory leaks a bad name. In C it is very hard in moderately complex code to manage to figure out the right time to free everything, and a memory leak is way nicer than a use after free bug. In rust, however, is pretty hard to accidentally introduce a repeated memory leak. Yes, you could do it with reference counted cycles, but it's actually quite awkward to implement reference counted cycles, requiring internal mutability. I expect more leaks come from using Box::leak() but those aren't likely to be accidental.

I will also add that there is another kind of bug that has the same problem that a continuous memory leak has in a long running program. And that is monotonically increasing memory use without a memory leak (I.e. all the memory will be freed before the program exits). Or in general using algorithms with poor space scaling. This has the same downside as a repeated memory leak, and is probably more likely to occur in rust code, at least for beginning programmers.

13 Likes

As a general guide, things are not unsafe so long as they maintain your capability to reason about the program. The worst thing about UB is that you can't understand the program any more. For example, if there's UB and your code as if A { println!("A"); }, then seeing A in the console doesn't guarantee that A is true.

You might not want all your financial details uploaded to pastebin, but doing so isn't unsafe because you can still understand and debug the flow of a program that does that.

4 Likes

Oh, right, if I'm not mistaken, then "safe" Rust (as in Rust's terminology) can not exhibit

which is distict from

I wasn't familiar with this distinction, but @H2CO3 explained them to me in the thread cited below. Basically, UB can do anything:

Thus, it would remove our "capability to reason about the program", as @scottmcm pointed out. Thanks for bringing this up, it made me remember the difference between undefined behavior and unspecified behavior (and other errors that don't fall within one of these categories but are still bad).

That said, there's still erroneous code that is neither exhibiting undefined behavior or unspecified behavior but still does bad things (Playground).

I hope I'm right with saying that "safe" Rust (as in Rust's definition) can never exhibit undefined behavior, unless there is also used some unsafe code which is unsound.

"Safe" Rust code, however, can still have unspecified behavior or have logic errors/bugs even when all behavior is specified.

1 Like

Very good point, I tried to make a small example exhibiting such "leak" without reference cycles or explicit leaking:

#[derive(Debug)]
pub struct BlurFilter {
    previous: Vec<f64>,
}

impl BlurFilter {
    pub fn new() -> Self {
        Self {
            previous: vec![0.0],
        }
    }
    pub fn blur(&mut self, value: f64) -> f64 {
        let blurred = (self.previous[self.previous.len()-1] + value) / 2.0;
        // This "leaks" memory as the `Vec` will grow and grow:
        self.previous.push(blurred);
        blurred
    }
}

fn main() {
    let mut f = BlurFilter::new();
    println!("{}", f.blur(7.0));
    println!("{}", f.blur(100.0));
    println!("{}", f.blur(50.0));
    // Now imagine we do:
    for i in 0..100 {
        f.blur(i as f64);
    }
    // Then we have a lot of "garbage" in our memory:
    println!("Inspection: {f:?}");
    // Of course, in this example, dropping `f` would free the memory, but what
    // if we never drop it for the runtime of our program? Then this would lead
    // to continuous leaking, eventually ending in memory exhaustion (timeout
    // on Playground though):
    // loop {
    //     f.blur(12.0);
    // }
}

(Playground)

Output:

3.5
51.75
50.875
Inspection: BlurFilter { previous: [0.0, 3.5, 51.75, 50.875, 25.4375, 13.21875, 7.609375, 5.3046875, 4.65234375, 4.826171875, 5.4130859375, 6.20654296875, 7.103271484375, 8.0516357421875, 9.02581787109375, 10.012908935546875, 11.006454467773438, 12.003227233886719, 13.00161361694336, 14.00080680847168, 15.00040340423584, 16.00020170211792, 17.00010085105896, 18.00005042552948, 19.00002521276474, 20.00001260638237, 21.000006303191185, 22.000003151595592, 23.000001575797796, 24.000000787898898, 25.00000039394945, 26.000000196974725, 27.000000098487362, 28.00000004924368, 29.00000002462184, 30.00000001231092, 31.00000000615546, 32.00000000307773, 33.000000001538865, 34.00000000076943, 35.000000000384716, 36.00000000019236, 37.00000000009618, 38.00000000004809, 39.000000000024045, 40.00000000001202, 41.00000000000601, 42.000000000003006, 43.000000000001506, 44.00000000000075, 45.00000000000038, 46.000000000000185, 47.00000000000009, 48.00000000000004, 49.00000000000002, 50.000000000000014, 51.00000000000001, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0, 61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0, 89.0, 90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 97.0, 98.0] }

which states

A program can be said to contain unspecified behavior when its source code may produce an executable that exhibits different behavior when compiled on a different compiler, or on the same compiler with different settings, or indeed in different parts of the same executable.

So not just "can happen". Any rust program that relies on usize (or isize), directly or indirectly, is dependent upon unspecified behavior. That pretty much covers every rust program, does it not?

One might say this is implementation defined rather than unspecified behavior, but I don't find anything to this effect in The Rust Reference.

I'm not sure whether the term "unspecified behavior" is well defined (in all contexts), and the above link is just a Wikipedia article :sweat_smile:.

An example for unspecified behavior in Rust is integer overflow in the release profile. Thus, the >>> SELF DESTRUCTION <<< example above isn't guaranteed to self-destruct. It might also panic:

Integer overflow

[…]

Other kinds of builds may result in panics or silently wrapped values on overflow, at the implementation's discretion.

Relying on usize may happen in different ways. A program might choose to abort (gracefully or via panic) when a calculation's result would exceed usize::MAX, for example. Or it might get a calculation wrong that then causes a chain reaction leading to a security vulnerability. Both cases, while being totally different in regard to severity, seem to be covered by the cited definition for "unspecified behavior". So maybe the definition of "unspecified behavior" is either not very helpful or perhaps incomplete?

The Wikipedia article actually says:

Definition

[…]

The exact definition of unspecified behavior varies.

Maybe "unspecified behavior" and "implementation defined" behavior have a kinda fuzzy borderline that is drawn in regards to how "bad" the deviant behavior is (which would be pretty subjective, of course). But I don't really know how these terms are commonly used and/or defined.

In either case, unspecified behavior is still different from undefined behavior where everything could happen. So "undefined" is well-defined :stuck_out_tongue_winking_eye: by causing total breakdown of the capability to reason about a program (perhaps similar to "ex falso quodlibet").

1 Like

This is an extremely important and good point. Having switched to Rust from C for a lot of service-style systems, there are plenty of problems I no longer have to worry about. But that does not include

  • sending items to a channel but not draining them from the receiving end
  • opening file descriptors in low level code but leaving a code path that doesn't close them
  • sending a raw pointer from a Box over FFI, but not reboxing and dropping it when it's no longer needed

Etc. None of these things will corrupt memory, thwart the control flow of my program or destroy a peripheral. What they will do is make a service lock up or crash on a device that cannot be restarted without expensive manual intervention.

So when I get a report of a "memory leak" (or see what looks like one, myself), I know it may well have nothing to do with forgetting to call free(), and I keep an open mind about where to look for the problem.

2 Likes

One important point I haven't seen anyone state explicitly yet is that the very concept of a "memory leak" (or a "deadlock", or most of the other "safe but undesirable" behaviors) is fundamentally defined in part by the programmer's intent, which is why no programming language can categorically prevent it.

RAII / proper destructors do prevent the overwhelming majority of unintended leaks in practice, but if you create a data structure that lives for the entire duration of the program and then populate it with gigabytes of data, only you can decide if that was intended behavior or a "memory leak". Some programs (like rustc) legitimately need to do that.

Similarly, the difference between a "deadlock" and a very long-running bottleneck task that all the other tasks are stuck waiting on is mostly a question of intent. It's often easy to create "unnecessary" concurrency bottlenecks by mistake, but sometimes the bottleneck is unavoidable (or not worth the hassle of avoiding), such as reading a large file from the disk/network into memory before trying to edit it.

The distinction between a data race and a race condition is another good example. Data races are instant UB, never desirable, and impossible in safe Rust (modulo toolchain bugs). Race conditions are whenever some code's behavior depends on timing, but you didn't want it to. If multiple threads are appending to a list (with proper synchronization so it's not a data race), and some code later on depends on the order of the list, only you know if that's intended behavior or a race condition.

In contrast, the "memory safety" that unsafe and "safe Rust" refer to can be defined without any appeal to human intent. While we haven't yet committed to a single mathematically precise definition of it, it's pretty clear that "memory safety" implies the absence of undefined behavior, i.e. not violating any of the assumptions that the compiler has to make about your code in order to meaningfully and correctly compile it at all.

For what it's worth, the basic concept of "unspecified behavior" (i.e., stuff that the standard/spec doesn't pin down because it needs to vary by platform/environment) is one I've run across many times. AFAICT, the phrase has been consistently used for this concept since well before I started programming, mostly in the C and C++ communities, but also in many other languages that care about specs and portability. I agree that it's probably not the best possible phrase for the concept, but it is a well-established term of art we can safely (heh) use for expressing that concept.

12 Likes

Good distinction! This one though is less a matter of intent, I think:

A real deadlock, as opposed to a bottleneck, is when each task is waiting for a lock to be released, which is being held by the other task. This will indeed hang indefinitely, and I can't see a good use-case for it. Unfortunately, this is difficult to detect mechanically, and differentiate from the long-hanging bottleneck which might have a legitimate purpose, leaving the programmer with the task of debugging it.

Similarly, memory which is no longer accessible and will not be released (such as a leaked Box where the pointer has been lost, or a free-wheelin' Rc-cycle (edit: meaning without any references to it from outside the cycle)) is also almost definitely a bug, since it can't affect the behaviour of the program anymore.

1 Like

An Rc cycle can be completely legitimate. Rc is basically the only straightforward way of creating self-referential structures in Rust, and that is sometimes needed, especially for representing arbitrary graphs (e.g., I needed it while writing a type checker, because recursive or mutually recursive types can cause cyclic references between a set of 1 or more types). In that case, the actual bug is not the cycle itself but the lack of a Weak link. In garbage collected languages, this wouldn't even be an issue, since GCs can and do detect cycles.

1 Like

Of course! That's why I specified "memory which is no longer accessible". By "free-wheelin'" I meant that the cycle only references itself, but all external references are gone (not the best phrasing, it made sense to me it at the time, it was late :sweat_smile:). So as long as it's also accessible from somewhere else it might have a legitimite use; once that external link is lost, and the thing isn't dropped (because, as you said, no weak references) then it's surely a bug.

1 Like

That is a good point I forgot about. I think I missed this because it still technically requires that your tools know what a "lock" looks like, and/or what a "waiting" thread looks like, which at least in principle ends up back at the "defined by programmer intent" thing. In some languages, it's arguably a reasonable assumption that no one would ever roll their own non-standard lock or concurrency primitives in practice (especially if OS-level threads/locks are off-limits), but IIUC even in those languages you can still confuse the deadlock detectors if you do, so the principle still applies even if the tool is mostly correct (just like how destructors mostly solve the leak problem).

That's why I think it's good to differentiate between what is the ideally correct behaviour and what is reasonable to expect of tooling.

In case of a "real" deadlock, ideally you want to be notified of it, perhaps have it somehow terminated, because it always implies buggy behaviour. Practically, this is sometimes possible with the right tooling, assuming you don't roll your own locks and probably with some runtime cost, so there's tradeoffs and even then it's probably not foolproof. (I wonder, assuming only standard locks, can a GC-like mark-and-sweep somehow always detect a deadlock?)

OTOH, in the case of a long-running bottleneck, like you said, even in the ideal case, you don't want your language/system/runtime intervening, since this might be actually intended behaviour.

I'm pretty sure that the answer is yes:

  1. Mark all of the threads that can currently make progress, i.e. are not currently waiting for locks
  2. Mark all of the threads that are waiting only for locks held by marked threads; iterate until you reach a fixed point
  3. Any remaining unmarked threads are deadlocked

This won't detect livelocks, however, where some of the threads involved can make progress, but not in a meaningful way. For example, a backoff-and-retry system might end up in an infinite cycle of acquires and releases where no thread ever holds all of the locks it needs at the same time.

1 Like