Slow memory creep in long running Tokio process

I've seen comments on a similar memory-leak post but I believe my case is much simpler. I have a loop in which I'm tokio::task::spawn()-ing lots of tasks, waiting for them and (hopefully) cleaning everything up.

I'd expect memory peak to stabilise but it looks like there is a small increase in each iteration. My test program:

async fn main() {
    for i in 0..1000 {
        many_tasks().await;

        println!("Run {:3}: {:?}", i, procinfo::pid::statm_self());
    }
}

async fn many_tasks() {
    let handles = (0..5_000_000)
        .map(|_| tokio::task::spawn(my_task()))
        .collect::<Vec<_>>();

    // order doesn't matter, wait for everything
    for handle in handles {
        handle.await.unwrap();
    }
}

async fn my_task() {
    tokio::time::delay_for(std::time::Duration::from_secs(5)).await;
}

using

tokio = { version = "0.2", features = ["full"] }
procinfo = "0.4"

My output is:

Run   0: Ok(Statm { size: 771979, resident: 566614, share: 268, text: 131, data: 770801 })
Run   1: Ok(Statm { size: 961931, resident: 756610, share: 273, text: 131, data: 960753 })
Run   2: Ok(Statm { size: 978315, resident: 772994, share: 273, text: 131, data: 977137 })
Run   3: Ok(Statm { size: 994699, resident: 789378, share: 273, text: 131, data: 993521 })
Run   4: Ok(Statm { size: 1011083, resident: 805762, share: 273, text: 131, data: 1009905 })
...
Run  18: Ok(Statm { size: 1011083, resident: 805762, share: 273, text: 131, data: 1009905 })
Run  19: Ok(Statm { size: 1027467, resident: 822146, share: 273, text: 131, data: 1026289 })
Run  20: Ok(Statm { size: 1027467, resident: 822146, share: 273, text: 131, data: 1026289 })
Run  21: Ok(Statm { size: 1043851, resident: 838530, share: 273, text: 131, data: 1042673 })
...
Run  42: Ok(Statm { size: 1076619, resident: 871297, share: 275, text: 131, data: 1075441 })
...
Run  46: Ok(Statm { size: 1093003, resident: 887681, share: 275, text: 131, data: 1091825 })
...
Run  61: Ok(Statm { size: 1125771, resident: 920449, share: 275, text: 131, data: 1124593 })
...
Run  65: Ok(Statm { size: 1142155, resident: 936833, share: 275, text: 131, data: 1140977 })
...
Run  77: Ok(Statm { size: 1158539, resident: 953217, share: 275, text: 131, data: 1157361 })
Run  78: Ok(Statm { size: 1174923, resident: 969601, share: 275, text: 131, data: 1173745 })
...
Run  81: Ok(Statm { size: 1174923, resident: 969601, share: 275, text: 131, data: 1173745 })
^C

on Ubuntu 18.04.4 LTS (in WLS1) but I've seen similar behaviour in Gentoo.

The interesting bit is that if I change the delay_for() to 2 seconds, the creep is less pronounced.

Does anyone know what is going on? How can I deal with this -- can I perform some manual cleanup or is it necessary to restart the runtime every now and then?

3 Likes

This is most likely not a memory leak, but some combination of fragmentation of your memory, and your allocator not giving memory back to the OS. If I turn on jemalloc with

#[global_allocator]
static A: jemallocator::Jemalloc = jemallocator::Jemalloc;

And configure it to aggressively release memory

JEMALLOC_SYS_WITH_MALLOC_CONF="background_thread:true,narenas:1,tcache:false,dirty_decay_ms:0,muzzy_decay_ms:0,abort_conf:true" ./target/release/creeptest

I get this

Run   0: Ok(Statm { size: 857854, resident: 532552, share: 647, text: 186, data: 725893 })
Run   1: Ok(Statm { size: 984830, resident: 461979, share: 647, text: 186, data: 852869 })
Run   2: Ok(Statm { size: 1038078, resident: 495914, share: 647, text: 186, data: 906117 })
Run   3: Ok(Statm { size: 1066750, resident: 483588, share: 647, text: 186, data: 934789 })
Run   4: Ok(Statm { size: 1066750, resident: 471642, share: 647, text: 186, data: 934789 })
Run   5: Ok(Statm { size: 1066750, resident: 426532, share: 647, text: 186, data: 934789 })
Run   6: Ok(Statm { size: 1081086, resident: 439575, share: 647, text: 186, data: 949125 })
Run   7: Ok(Statm { size: 1081086, resident: 443245, share: 647, text: 186, data: 949125 })
Run   8: Ok(Statm { size: 1081086, resident: 463154, share: 647, text: 186, data: 949125 })
Run   9: Ok(Statm { size: 1081086, resident: 427517, share: 647, text: 186, data: 949125 })
Run  10: Ok(Statm { size: 1081086, resident: 456839, share: 647, text: 186, data: 949125 })
Run  11: Ok(Statm { size: 1081086, resident: 466862, share: 647, text: 186, data: 949125 })
Run  12: Ok(Statm { size: 1081086, resident: 458861, share: 647, text: 186, data: 949125 })
Run  13: Ok(Statm { size: 1095422, resident: 434981, share: 647, text: 186, data: 963461 })
Run  14: Ok(Statm { size: 1095422, resident: 406095, share: 647, text: 186, data: 963461 })
Run  15: Ok(Statm { size: 1095422, resident: 460366, share: 647, text: 186, data: 963461 })
Run  16: Ok(Statm { size: 1095422, resident: 427295, share: 647, text: 186, data: 963461 })
Run  17: Ok(Statm { size: 1095422, resident: 465376, share: 647, text: 186, data: 963461 })
Run  18: Ok(Statm { size: 1095422, resident: 472402, share: 647, text: 186, data: 963461 })
Run  19: Ok(Statm { size: 1095422, resident: 393485, share: 647, text: 186, data: 963461 })
Run  20: Ok(Statm { size: 1095422, resident: 429607, share: 647, text: 186, data: 963461 })
Run  21: Ok(Statm { size: 1095422, resident: 466897, share: 647, text: 186, data: 963461 })
Run  22: Ok(Statm { size: 1095422, resident: 459701, share: 647, text: 186, data: 963461 })
Run  23: Ok(Statm { size: 1095422, resident: 456268, share: 647, text: 186, data: 963461 })
Run  24: Ok(Statm { size: 1095422, resident: 495478, share: 647, text: 186, data: 963461 })
Run  25: Ok(Statm { size: 1095422, resident: 470396, share: 647, text: 186, data: 963461 })
Run  26: Ok(Statm { size: 1095422, resident: 462738, share: 647, text: 186, data: 963461 })
Run  27: Ok(Statm { size: 1095422, resident: 416839, share: 647, text: 186, data: 963461 })
Run  28: Ok(Statm { size: 1095422, resident: 465722, share: 647, text: 186, data: 963461 })
Run  29: Ok(Statm { size: 1095422, resident: 421351, share: 647, text: 186, data: 963461 })
Run  30: Ok(Statm { size: 1095422, resident: 445116, share: 647, text: 186, data: 963461 })
Run  31: Ok(Statm { size: 1095422, resident: 418212, share: 647, text: 186, data: 963461 })
Run  32: Ok(Statm { size: 1095422, resident: 446528, share: 647, text: 186, data: 963461 })
Run  33: Ok(Statm { size: 1095422, resident: 440657, share: 647, text: 186, data: 963461 })
Run  34: Ok(Statm { size: 1095422, resident: 448054, share: 647, text: 186, data: 963461 })
Run  35: Ok(Statm { size: 1095422, resident: 437773, share: 647, text: 186, data: 963461 })
Run  36: Ok(Statm { size: 1095422, resident: 474733, share: 647, text: 186, data: 963461 })
Run  37: Ok(Statm { size: 1095422, resident: 425795, share: 647, text: 186, data: 963461 })
Run  38: Ok(Statm { size: 1095422, resident: 433262, share: 647, text: 186, data: 963461 })
Run  39: Ok(Statm { size: 1095422, resident: 473925, share: 647, text: 186, data: 963461 })
Run  40: Ok(Statm { size: 1095422, resident: 447174, share: 647, text: 186, data: 963461 })
Run  41: Ok(Statm { size: 1095422, resident: 430013, share: 647, text: 186, data: 963461 })
Run  42: Ok(Statm { size: 1095422, resident: 465132, share: 647, text: 186, data: 963461 })
Run  43: Ok(Statm { size: 1095422, resident: 491901, share: 647, text: 186, data: 963461 })
Run  44: Ok(Statm { size: 1095422, resident: 474051, share: 647, text: 186, data: 963461 })
Run  45: Ok(Statm { size: 1095422, resident: 474875, share: 647, text: 186, data: 963461 })
Run  46: Ok(Statm { size: 1095422, resident: 487053, share: 647, text: 186, data: 963461 })
Run  47: Ok(Statm { size: 1095422, resident: 469821, share: 647, text: 186, data: 963461 })
Run  48: Ok(Statm { size: 1095422, resident: 485762, share: 647, text: 186, data: 963461 })
Run  49: Ok(Statm { size: 1095422, resident: 465155, share: 647, text: 186, data: 963461 })
Run  50: Ok(Statm { size: 1095422, resident: 442107, share: 647, text: 186, data: 963461 })
2 Likes

Thank you, jemalloc helped somewhat -- it reduced the rate a little but it never stabilised like in your case and the the memory consumption is still increasing.

I don't have the expertise to guess whether tweaking jemalloc config would result in even better behaviour or the memory creep is there to stay.

Would you have any other suggestions what else I could try or how I could mitigate the problem? Maybe someone running tokio in production knows some tricks?

I don't know much about what to do about this exact issue. You can try asking on our discussions page to find someone who uses it in production. You can also try our discord.

1 Like

If you think it's actually leaking memory, you can try Leak Sanitizer to track it down. The feature is experimental, though.

2 Likes

Thanks @notriddle, I'll give it a go but it doesn't seem to be a leak. I've played with valgrind and massif a bit and it reports that all allocs are freed. It looks like it's some form of memory fragmentation as @alice suggested.

In my use case I'd be happy to regularly call some form of "flush"/"shrink to fit" on some tokio internal buffers.

I mean, you can always try dropping the runtime every so often, but I don't think shrink_to_fit is the thing here since that should show up on the memory usage?

What do you mean by showing up?

Yes, shrink to fit on a vector or similar wouldn't help on its own, but if there's some collection of boxes to buffers and there was a way of re-allocating/moving those to a new collection, there'd a be a good chance they'd get packed better in memory.

I've run 4 tests:

  • default jemalloc (twice)
  • aggressive jemalloc
  • default malloc

The difference in creep is not that huge. What surprised me was how different jemalloc1 and jemalloc2 are -- there is no difference in the config/code, it's 2 consecutive runs.

Also the big jump in jemalloc1 just before 200 doesn't fill me with confidence, what if a similar jump happens on run >700.

For now I'll probably have to add monitoring and will have to restart runtimes to make sure that after a week/month we won't run out of memory. Pity, since I tried to use my tokio project as a showcase of Rust at work -- everything is lovely but "potentially running out of memory" is not the best selling point when talking to bosses. :smiley:

For me using 500_000 takes gives:

Run   0: Ok(Statm { size: 84371, resident: 17075, share: 592, text: 93, data: 83294 })
Run   1: Ok(Statm { size: 150897, resident: 86635, share: 592, text: 93, data: 87751 })
Run   2: Ok(Statm { size: 150897, resident: 85678, share: 592, text: 93, data: 89069 })
Run   3: Ok(Statm { size: 150897, resident: 84714, share: 592, text: 93, data: 91278 })
Run   4: Ok(Statm { size: 150897, resident: 84684, share: 592, text: 93, data: 92266 })
Run   5: Ok(Statm { size: 150897, resident: 85569, share: 592, text: 93, data: 92270 })
Run   6: Ok(Statm { size: 150897, resident: 85427, share: 592, text: 93, data: 93435 })
Run   7: Ok(Statm { size: 150897, resident: 87934, share: 592, text: 93, data: 93437 })
Run   8: Ok(Statm { size: 150897, resident: 88659, share: 592, text: 93, data: 94201 })
Run   9: Ok(Statm { size: 150897, resident: 87432, share: 592, text: 93, data: 94201 })
Run  10: Ok(Statm { size: 150897, resident: 85718, share: 592, text: 93, data: 94202 })

while 1_000_000 tasks gives:

Run   0: Ok(Statm { size: 231085, resident: 163100, share: 589, text: 93, data: 171344 })
Run   1: Ok(Statm { size: 233032, resident: 167215, share: 589, text: 93, data: 179025 })
Run   2: Ok(Statm { size: 233032, resident: 167250, share: 589, text: 93, data: 179070 })
Run   3: Ok(Statm { size: 233032, resident: 174242, share: 589, text: 93, data: 181832 })
Run   4: Ok(Statm { size: 233032, resident: 174374, share: 589, text: 93, data: 181833 })
Run   5: Ok(Statm { size: 233032, resident: 174374, share: 589, text: 93, data: 181836 })
Run   6: Ok(Statm { size: 233032, resident: 174374, share: 589, text: 93, data: 181838 })

and 2_000_000 tasks:

Run   0: Ok(Statm { size: 393408, resident: 325418, share: 585, text: 93, data: 340532 })
Run   1: Ok(Statm { size: 397302, resident: 347787, share: 585, text: 93, data: 349461 })
Run   2: Ok(Statm { size: 397302, resident: 348183, share: 585, text: 93, data: 349820 })
Run   3: Ok(Statm { size: 397302, resident: 348183, share: 585, text: 93, data: 349829 })
Run   4: Ok(Statm { size: 397302, resident: 343036, share: 585, text: 93, data: 353431 })
Run   5: Ok(Statm { size: 397302, resident: 348117, share: 585, text: 93, data: 353432 })
Run   6: Ok(Statm { size: 397302, resident: 348249, share: 585, text: 93, data: 353529 })
Run   7: Ok(Statm { size: 397302, resident: 348249, share: 585, text: 93, data: 353531 })
Run   8: Ok(Statm { size: 397302, resident: 348249, share: 585, text: 93, data: 353534 })
Run   9: Ok(Statm { size: 397302, resident: 343277, share: 585, text: 93, data: 358197 })
Run  10: Ok(Statm { size: 397302, resident: 338772, share: 585, text: 93, data: 365756 })

For 500_000 memory consumption is stable after the second run, for 1_000_000 and 2_000_000 memory consumption is almost stable after the first run and completely stable after the second run. Rss likely fluctuates due to swapping. I can't use more than 2_000_000 tasks without excessive swapping. This is all using glibc 2.28-10 on Debian and no jemalloc.

I tried on Arch Linux with default glibc allocator.
It slowly increases:

Run 0: Ok(Statm { size: 1009463, resident: 487162, share: 570, text: 134, data: 744731 })
...
Run 203: Ok(Statm { size: 2231647, resident: 1691396, share: 570, text: 134, data: 1963170 })

Sometimes make several steps back for small memory amount, but then again it eats memory.

Anybody create tokio issues for this? I suppose memory pool inside tokio, or playing with memory size allocation should solve this problem.

You could try valgrind --tool=massif /path/to/program and then massif-visualizer to find where memory is being used if this is a problem in tokio. If it is a problem in the memory allocator itself, then massif won't show any memory usage growth.

I've tried massif before but the output is rather erratic. I'm not that familiar with massif so I'm not sure what to look for.

Here's an example of one of the peaks towards the end of the run:

And this is how 55 runs look like in massif:

About 1.5GB of memory seems to be used by the tokio::time::delay_for implementation.

https://github.com/tokio-rs/tokio/blob/d2f81b506a469481ab0b62aaeaf48fc8c60e4c66/tokio/src/time/driver/entry.rs#L107

A quick calculation shows that the fields of Entry are combined 65 bytes big. Entry is also aligned to 128 bytes, so it will be 128 bytes big as a whole. This means that there are about 12_582_912 entries allocated (1.510241024*1024/128). This is a little over twice as much as the 5_000_000 spawned tasks.

Nb: all calculations are for x86_64

I am not familiar with tokio, so I don't know why this would be the case.

I created tokio issue about this: memory fragmentation due to size of tokio/src/time/driver/entry::Entry · Issue #2619 · tokio-rs/tokio · GitHub

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.