Segmentation Fault when using Uninitalized Vec(s)

Here is the snippet of code.

let mut image = vec![Vec::<Luv>::with_capacity(width); height];
for strip in &mut image {
      unsafe {
         strip.set_len(width);
     }
    strip[0] = (0.0, 0.0, 0.0).into(); // Causes Segmentation Fault
}

I'd been trying to understand why there's an issue here for the past few hours. I'd like to use uninitialized Vec so that I store the correct values later but it doesn't seem to work as intended.

Assigning to a place drops the old value. If the old value is uninitialized, that's UB.

(By the way, even creating a reference to an uninitialized value is already UB.)

4 Likes

It probably causes a segmentation fault because assigning to an initialised location drops what was already in that location. More generally, using set_len() to set the length beyond the portion of the vector that has been initialised is undefined behaviour, as documented in the second point under "Safety" in the documentation for set_len().

If you want to work with the uninitialised portion of the Vec, you can gain access to it in a safer way with spare_capacity_mut(). Working with it will involve some unsafe operations, but MaybeUninit makes clear where the unsafety is.

However, there is an entirely safe way to do the initialisation you want here, and that is to just use push(), which shouldn't reallocate because you have already set the capacity.

10 Likes

I'm bit of a noob but I can understand what's the issue here, especially if we're going types that implement complex drop(s). But, I'm curious as to why it caused a segmentation fault when it was merely a tuple containing floats.

It depends on what is Luv, but in general you shouldn't try to reason with UB, anything could happen.

3 Likes

In general I agree that one usually should not bother reasoning about why a program with UB did something undesired in one particular way. Once UB has been encountered, anything can happen; the UB is defined at the language level, and not in terms of anything lower like asm or machine code. And in fact, you were lucky to get a segfault that unambiguously demonstrated a bug, in contrast with a garbage program that seems to work most of the time.

That being said, this time it is actually highlighting a logic flaw. The way vec![expr; N] works is to clone the expr as many times necessary to create N entries in the Vec. So if the height is 10 here, you create one Vec::with_capacity(width) and then clone it 9 times.

The problem is that when you clone a Vec, it doesn't create a new Vec with the same capacity, it creates a new Vec with the same elements. You're ending up with 9 Vecs with 0 capacity and one with width capacity.

Set the capacity inside the loop for each inner Vec instead.

(And then, just use push and avoid unsafe.)

14 Likes

That seems to be the reason for the segfault. I expected the first inner vector to have a capacity of width, but it seems to be the last one:

fn main() {
    let mut x = vec![Vec::<u8>::with_capacity(4); 3];
    
    // runs fine 
    
    let y = &mut x[2];
    
    unsafe {
        y.set_len(4);
    }
    
    y[0] = 0u8;
    
    // segfaults
    
    let y = &mut x[0];
    
    unsafe {
        y.set_len(4);
    }
    
    y[0] = 0u8;
}

Playground.

Miri
error: Undefined Behavior: dereferencing pointer failed: 0x1[noalloc] is a dangling pointer (it has no provenance)
   --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:145:9
    |
145 |         &mut *ptr::slice_from_raw_parts_mut(data, len)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: 0x1[noalloc] is a dangling pointer (it has no provenance)
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE:
    = note: inside `std::slice::from_raw_parts_mut::<'_, u8>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:145:9: 145:55
    = note: inside `<std::vec::Vec<u8> as std::ops::DerefMut>::deref_mut` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2617:18: 2617:72
note: inside `main`error: Undefined Behavior: dereferencing pointer failed: 0x1[noalloc] is a dangling pointer (it has no provenance)
   --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:145:9
    |
145 |         &mut *ptr::slice_from_raw_parts_mut(data, len)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: 0x1[noalloc] is a dangling pointer (it has no provenance)
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE:
    = note: inside `std::slice::from_raw_parts_mut::<'_, u8>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/raw.rs:145:9: 145:55
    = note: inside `<std::vec::Vec<u8> as std::ops::DerefMut>::deref_mut` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2617:18: 2617:72
note: inside `main`
   --> src/main.rs:22:5
    |
22  |     y[0] = 0u8;
    |     ^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error
   --> src/main.rs:22:5
    |
22  |     y[0] = 0u8;
    |     ^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

That's because leaving the original item as the last one makes it possible to avoid a clone: you clone the first n-1 elements, and the last one will be moved by-value.

1 Like

Ah yes of course. We can't put it in the first element or it would've been moved before we created our clones.

It would be possible to move first and then clone from after the move…

fn repeat<T: Clone>(value: T, n: usize) -> Vec<T> {
    let mut vec = Vec::with_capacity(n);
    if n > 0 {
        vec.push(value);
        for _ in 1..n {
            vec.push(vec[0].clone());
        }
    }
    vec
}

…but there's no specific advantage to doing it that way unless you care about the move vs. clone, which is rare and not guaranteed, and I would guess it likely has a slower memory access pattern.

2 Likes

In general, the issue with UB is that the compiler, in the name of performant or reasonable code on the output side, is allowed to assume that it never happens. If you write a program where UB does happen, the compiler output is no longer guaranteed to be reasonable - optimizations may still be applied based on the behaviour staying in the defined regime, even when that isn't the case.

(Proving that the program only contains defined behaviour, without the help of the constraints of safe rust, is probably Turing-complete.)

It's fairly common for programmers to be surprised that this has real and material consequences for programs with "trivial" UB, where a rote translation of the code to an assembly language might produce a "reasonable"-seeming result. Compilers don't work that way - and haven't since the 70s, really, because the resulting programs from that kind of translation are massively too slow and too large. Real compilers, including rustc, take advantage of guarantees about valid programs to make complex and far-reaching changes to your code, safe in the knowledge that those changes will produce an equivalent output for any given valid input program. If you feed them an invalid program, well, caveat emptor.

1 Like

It's undecidable.

3 Likes

I stand thoroughly corrected. Thank you.

The problem here is that in the absence of formal definition it's very hard to even say what is reasonable or not.

Most often such developers try to push the idea that anything that works reliably with gcc -O0 should be supported. The next question to ask is to clarify how something like this would be supported:

int set_x(int x) {
    int stack_slot = x;
}

int add_y(int y) {
    int stack_slot;
    return stack_slot + y;
}

int main() {
    set_x(3);
    int result = add_y(7);
    printf("%d\n", result);
}

It works (and returns 10) with clang, gcc and many, many, MANY other compilers (with optimizations disabled), it have to be supported, right?

Usually at this point discussion degenerates into centithread where “we code for the hardware” folks explain how:

  1. Every “real” program which works with -O0 must be supported.
  2. The program I wrote is not not “real” and thus shouldn't be supported.

Attempts to find out how “real” programs are different from “unreal” either lead to nowhere, or, worse yet, end up with an offer to add bazillion options to the compiler to make sure everyone may pick their own definition.

Rust solves the problem in two steps:

  1. People like “we code for the hardware” C folks are kicked out.
  2. Compiler developers continously try to change both language definition and standard library to make it easier to write UB-free code, when possible and feasible and also provide tools to help detect UB.

#1 and #2 are both equally important. Without #1 we would have repeat of the C/C++ story (where you may never know if something which was tested and working for years would stop doing so), without #2 it's not really feasible to have #1.

3 Likes

To address the initial code snippet, I hope you have considered an iterator for generating the items.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.