Why do I get a segmentation fault?

Can anyone explain why I am getting a segmentation fault with the following code?

This is a learning exercise. I am poking around to understand the language at a more fundamental level.

I am trying to build a struct that holds both a value, and a pointer to it. Why would I want to do that? So that I have a mutable pointer to something whose ownership is also linked to the ownership of the pointer. But again, really just looking to understand what is going on, not alternatives. :slight_smile:

Imports
use std::sync::Arc;
use std::sync::atomic::{AtomicPtr, Ordering};
use std::thread;
use std::thread::JoinHandle;
use std::time::Duration;
use rand::Rng;
The Struct
    struct Foo<T> {
        pointer: AtomicPtr<T>,  
        value: T,
    }

    impl<T> Foo<T> {
        fn new(mut t: T) -> Foo<T> {
            Foo {
                pointer: AtomicPtr::new(&mut t), // Attention here 
                value: t,                        // Attention here
            }
        }

        pub fn give(&self) -> &T {
            let res = self.pointer.load(Ordering::SeqCst);

            unsafe { &*res }
        }
    }
Main method
fn kaboom() -> JoinHandle<()> {
    let abc_vector = vec!["a".to_string(), "b".to_string(), "c".to_string()];
    let foo = Foo::new(abc_vector);
    let foo_arc = Arc::new(foo);

    let arc_to_thread = foo_arc.clone();
    let thread_handle = thread::Builder::new().name(format!("SomeThread"))
        .spawn(move || {
            let mut iter = 0;
            loop {
                iter += 1;
                let num = rand::thread_rng().gen_range(1..5);
                thread::sleep(Duration::from_millis(num * 1));
                let res = arc_to_thread.give();  // I am only reading!
                println!("The content is: {:?} ", res);      // I am only reading!
                if iter == 5 { break; }
            }
        }).expect("Can't create thread ...");
    thread::sleep(std::time::Duration::from_millis(10));
    thread_handle
}

let thread_handler = kaboom();
println!("I help debug....");
std::thread::sleep(std::time::Duration::from_millis(200));
thread_handler.join();
}

Playgound link

  • I have inserted a few Thread.sleep's and println's in order to force the segmentation fault to appear.
  • As the code is non-deterministic. Segmentation doesn't always happen.

If feels like the abc vector I pass on is somehow being destroyed before the thread is done using it.
When I implemented the Drop trait (not shown) with a println, the code was falling (seg fault) before running the drop function.

This is my understanding:

  1. abc_vector is created [thread main]
  2. The vector is moved into our struct Foo. [thread main]
  3. foo value is moved into the Arc [thread `main]
  4. The Arc with our foo and our vector is cloned and moved inside the other thread [thread main].
  5. This other thread starts looping around and (only) reading the shared reference [thread SomeThread]
  6. Concurrently to 5, the method kaboom exits. [thread main]
    6.1 - Because the original vector was moved, the value should no be destroy when the original variable goes out of scope.
    6.2 - foo_arc is destroyed when kaboom exits, however, its clone arc_to_thread is moved inside thread SomeThread and therefore should not run the drop function when kaboom exits.

Where is my understanding failing me ?

Please make a playground version. Then you can easily run it in Miri, which will immediately tell you the UB.

You saved a pointer to the parameter of the method. That's invalidated as soon as the function returns, so it's fundamentally not useful.

Self-referential structs in Rust basically don't work, because structs can be moved, and if they're moved the pointers get invalidated. You should generally try a different design.

(Or, if you must, owning_ref — Rust concurrency library // Lib.rs)

3 Likes

Thanks @scottmcm .

Where can I read more about that?
I doubt this information is on the "The Rust Programming Language"?
Maybe the Rustnomicon ? What other sources, related to Rust or maybe C++, would you recommend?

(I updated the post with a playground link)

In a more critical comment,

Why is the pointer invalidated as soon as the function returns?
The parameter t refers to a value which is being moved. If the address of such value hasn't changed, shouldn't the pointer still point to valid memory?

The address did change. That's what it means to move it.

Perhaps you're used to languages like Java where (almost all) types are actually just references to the heap?

1 Like

t refers to a value which is being moved

If the address of such value hasn't changed

It has, though. Moving the value changes the address (or at least reserves the right to). Every move is effectively a memcpy that may or may not be elided by LLVM.

In fact, I believe your pointer is invalidated twice before the function returns. Once when t is moved into Foo and again when Foo is returned.

2 Likes

This information is a basic truth of how memory works. You aren't allowed to reference variables after they are out of scope (and potentially filled with garbage) in C or C++ either.

The same issue has been brought up many times here. The last one was a week ago. Please use the search.

I don't think this tone is necessary or helpful. If you don't want to answer redundant questions, then simply don't answer them. This isn't Stack Overflow.

Moreover, I frequently see you take this tone with people, and I personally don't appreciate it. Please try to be less abrasive.

1 Like

Thanks all for the feedback. I know understand the issue much more.

Perhaps you're used to languages like Java where (almost all) types are actually just references to the heap?

Indeed, I come from Java.

It has, though. Moving the value changes the address (or at least reserves the right to). Every move is effectively a memcpy that may or may not be elided by LLVM.

This broke the mental model I had.
I thought one of the core ideas was that the compiler would be able to smartly 1) use the same memory address, and just 2) prevent the old variable that had ownership of the value from being used.
In other words, nothing would change at runtime. It was just compile time enforced constrains.

In that case, isn't a move not completely different from a clone? The difference being that on the clone, the original variable is still valid.
Doesn't this mean that potentially a lot of data has to be copied on every move?
Well, I guess if the data being moved are all smart pointers (like vec and String), than the amount of data copied is very small?

What "jargon/terms" would I need to search on the web to find resources that delve into this in more detail?

The compiler can choose to do this, but this is an optimization and is thus not guaranteed.

This would be the case if you didn't use unsafe. unsafe tells the compiler to perform less checks, thus making it unable to catch this issue.

1 Like

Asking people to use the search is not abrasive. Searching before asking is basic netiquette.

1 Like

This is where the optimizer comes in, as bjorn3 mentions. In the abstract machine, yes, it's moved every time. When it comes to the machine code that's actually generated, correct code can't tell whether it was moved, so the optimizer is allowed to not physically move it if possible.

And between inlining and SRoA and such, what the final machine code looks like often has very little resemblance to the original source code.

If you're curious to learn more about all the things LLVM can do -- it's the tech that does the optimizations for rustc -- you might like https://youtu.be/FnGCDLhaxKU

Yes. size_of::<String>() == 3 * size_of::<usize>(), so moving a string is just moving those three words, not moving the contents behind the pointer on the heap.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.