Where to look for memory leaks in Rust?

I have a web servers written in Rust. After a month of work, used memory by it increased by 250MB. Where the memory can go? The server uses a tread pool, so there is no a tread amount increase. I use a memory dump in Java showing where all memory used. Is there a similar tool for Rust? What else should I put my attention to fight the leak?

Without having a specific solution to your problem, I would vaguely point in the direction of leak sanitizer: sanitizer - The Rust Unstable Book

I've found the sanitizers to be a little hit-or-miss. Sometimes they work perfectly, and then - a few months later - not so much. With that said, I've found a few memory leaks thanks to the address sanitizer over the years.

Read the contents of the memory. (such as gcore pid and thenxxd core.pid)

Maybe give heaptrack a try.

3 Likes

You can use Valgrind on Rust code, but it's rare to accidentally leak memory. Loops in reference counted data structures, explicitly leaking data with Box::leak, or most likely bugs in some (likely C/C++) dependency. Valgrind can find leaks even in the C/C++ libraries but can be tricky to use and interpret the output especially if the leak is small and slow.

Instead you might not have a leak and have heap fragmentation. It's less of an issue now but it used to be that old system allocators were focused on short lived single threaded processes, so they weren't designed to avoid fragmentation. The simplest fix is to use a different allocator which designed to limit memory fragmentation. I think the most popular non system allocators are jemalloc and mimalloc

Many years ago in C I used the jemalloc built in profile api to find leaks on a live production systems. But I'm unaware of any Rust tooling around those APIs.

5 Likes

Depending on the crates you used - you provided no list - some data structures could be increasing indefinitely. I mainly suspect cache but have some other options in mind.

2 Likes

It is what I currently think. So, I applied some even pre-allocations and will check if the used memory continued to grow.

I do not use any crates, besides some internal ones which work mostly at the start up time. But thanks for the point, I will thoroughly review them.

Another option is various "high water mark" buffers getting bumped up and never chopped back down.

For example, if there's a request parser that is buffering the entire request first in a shared buffer, then it will over time grow to the size of the maximum request.

If you have that happening on a bunch of layers you could get to 250MB per easily, but it should also be more obvious, increasing rapidly at the start then leveling before a spike comes in and then staying steady.

If you have a pretty linear growth it's not that.

5 Likes

It's fine, generally, I even like it, because memory usage will be stabilized after some time. My concern is only that memory didn't grow indefinitely as it happens with my Java server which I have to restart every month. So far, Rust server behaves much better, and so far, I never restarted it because it crashed by lack of memory. I had another problem as a tread leak, but it is another story.

Most of the time, yeah, it can lead to mis-diagnosing a leak occasionally, or a denial of service if someone can max all those out and exhaust your memory, but those are pretty minor issues in comparison to the increase in performance relative to reallocating every request.

I think a custom debug allocator like Zig has, that tracks allocations and shows the stack trace of allocations that are not freed when a test exits, is the best solution. But I don't know of such an allocator for Rust.

With RAII in Rust, objects are automatically freed when no longer referenced, sort of like GC-based languages. So typically a memory leak will be caused when an object has been added to a collection but you've forgotten to remove it. So the collection grows. For that case you may be able to find the problem by adding debug assertions to the Drop for these collections to check whether they're empty when the program finishes. These will usually fire in tests when you've neglected to remove something, giving you a reproducible case you can narrow and instrument.

Another possible source of leaks, with RAII systems like Rust, are circular references. So you could think about whether any of those exist.

1 Like

If it can be encapsulated in a test, Miri will detect leaks. But if the leak is something that would be freed if the server shut down, and is a problem "only" because the server runs indefinitely long, it won't.

3 Likes

You can use the same tooling developed for C and C++ over the last decades. It can be easier to find tools to use if you refer to those languages in your queries.

1 Like

Looks like meilisearch had fragmentation since a C dependency was using the system allocator. They needed LD_PRELOAD to work around it.

1 Like

Thanks, an interesting reading.

Try switching your allocator to jemalloc. It has metrics that you can track while your application is running that indicate heap fragmentation.

4 Likes

Currently the memory almost stopped growing after 300MB mark. Perhaps I need to wait another week before trying jemalloc.

If the memory grows, but the rate of growth decreases and approaches some upper limit, I think that is a sign of memory fragmentation rather than a leak.

2 Likes

jemalloc prefers to keep a higher steady state idle memory pool than mimalloc does, so if you care more about memory usage than 99%ile latency then mimalloc may be better.

1 Like

miri can detect the memory leaks like calling code not freeing structure that has been returned from unsafe or ffi library. Simply run your test suite under miri. It takes long time to run but many non trivial bugs have been found this way. GitHub has detailed instructions how to get it running.

If you are able to test on Linux, Heaptrack and Bytehound are useful to investigate if you have allocations that aren't being freed (such as a cache or hash map that grows and never gets cleared). Even if you don't discover the specific issue you will most likely find other issues (needless allocations that could be optimised to reuse a buffer for example) and gain insight into how your program behaves.

If you are on another OS, there are probably similar tools, but I don't know what they would be. Generally you can use the same tools as you would for C or C++, but if the tool doesn't support demangling Rust symbols it might be a bit harder to parse the results.

1 Like