Why vec::push taking extremely long time?


#1
fn main() {
    let s: usize = 64 * 1024 * 1024 * 1024;
    let mut r: Vec<u8> = Vec::with_capacity(s);
    unsafe { r.set_len(s); }
    for i in 0..s {
        r[i] = 0;
    }
    r.clear();
    println!("pre alloc done");
    for _ in 0..s {
        r.push(1);
    }
    println!("Hello, world! {}", r[s-1]);
}

the first loop finishes pretty fast, several seconds only
then “pre alloc done” appears on the std out

then my process ran away with 100% usage

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
47306 dapeng    20   0 64.0g  64g  804 R 100.0 26.6  79:42.62 hello_world

perf top shows

Samples: 145K of event 'cpu-clock', Event count (approx.): 12863396913
 86.72%  [kernel]             [k] change_protection
 11.90%  [kernel]             [k] vm_normal_page

and the best part is … my process refused to anwser to my “kill -9” call

what is going on now?


#2

What is the reason for doing this “pre alloc” stuff? When you create a Vec::with_capacity, it has already preallocated space for all the elements you are pushing later.


#3

Interesting, when I run this code, I get “./main” terminated by signal SIGILL (Illegal instruction)


#4

Are you on a 32-bit system by any chance?


#5

No, if the question is addressed to me :smile:


#6

Hm, I was able to get a SIGKILL without unsafe:

↪  cat main.rs                                                                                                               Sun Oct  4 12:45:19 MSK 2015
fn main() {
    let s: usize = 1024 * 1024 * 1024 * 8;
    let r: Vec<u8> = Vec::with_capacity(s);
}
matklad at arch in ~/trash 783.503 Kb                                                                                                                    
↪  rustc main.rs                                                                                                             Sun Oct  4 12:45:22 MSK 2015
main.rs:3:9: 3:10 warning: unused variable: `r`, #[warn(unused_variables)] on by default
main.rs:3     let r: Vec<u8> = Vec::with_capacity(s);
                  ^
matklad at arch in ~/trash 783.503 Kb                                                                                                                    
↪  ./main                                                                                                                    Sun Oct  4 12:45:26 MSK 2015
fish: “./main” terminated by signal SIGILL (Illegal instruction)

I guess I it is a bug in rustc


#7

So on the playpen this code triggers a SIGILL:

fn main() {
    let s: usize = 64 * 1024 * 1024 * 1024;
    let mut r: Vec<u8> = Vec::with_capacity(s);
    for _ in 0..s {
        r.push(1);
    }
    println!("Hello, world! {}", r[s-1]);
}

However, if you replace the 64 with 1, “only” allocating 1 GiB instead of 64, it is instead killed via SIGKILL and not SIGILL. (in Release mode. In Debug mode it times out.)


#8

SIGILL is normal, you’re pretty likely to have an failure to allocate a 64 GB vector, and you get an abort like than on out of memory.

The main rust bug in that situation is that there is no output message. It should say “Could not allocate memory, aborting!”.


#9

It has an effect (but you don’t really need to care about it, please it’s not important). Writing to the pages of allocated memory forces the OS to actually allocate them to the process, which it normally does not do until then. So in that sense, they aren’t truly “allocated” until written to. This happens completely transparently and I don’t see a reason why you should care for it. Some kind of madvise syscall could be used instead.


#10

If you want to do this particular operation as efficiently as possible, vec![1u8; s] should be perfect.


#11

Won’t this first attempt to allocate [1u8; s] on the stack?


#12

i think you don’t have enough ram, my server has 240G, so 64G isn’t really particularly hard for the server


#13

the program is still running after several hours lol

top - 12:23:02 up 5 days, 20:50,  2 users,  load average: 1.27, 1.16, 1.11
Tasks: 257 total,   2 running, 255 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  3.1%sy,  0.0%ni, 96.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  251913536k total, 250977296k used,   936240k free,   131700k buffers
Swap:        0k total,        0k used,        0k free, 37237212k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
47306 dapeng    20   0 64.0g  64g  804 R 99.7 26.6 452:05.73 hello_world

#14

Yep, SIGILL comes up pretty regularly (I think stack overflows also result in it - or anything that lowers to an abort intrinsic), but I’m not sure why allocating 1G gets the program SIGKILLed (unless that’s a memory-limiting feature of the playpen, which seems likely)


#15

If kill -9 doesn’t work, the program is stuck in kernel mode and never returns to user mode (since that would unblock SIGKILL immediately). Did you compile in Debug or Release mode? Does something different happen if you use the other mode?


#16

Nope! (In particular, s is a runtime value, a let binding, and stack allocation cannot be based on non-constant sizes).


#17

only tried release mode as in cargo build --release


#18

the thing that bothers the most is why change_protection kernel call?

I notice that it won’t kick in immediately, only after the total ram usage is above certain threshold


#19

Since Vec contains memory of unknown size, it has to be allocated dynamically, which means it will always be put on the heap, unless you’re allocating into a custom memory arena (custom allocator).


#20

The vec![1u8; s] syntax does look similar to the [1u8; s] syntax for creating a stack-allocated array, but they are completely different. vec![1u8; s] is actually a macro invocation, the equivalent of vec!(1u8; s) - [] are syntactic sugar to make it look nice.