Strange UB in slice::from_raw_parts()

I met a strange UB on my project and I can't fix it on my own.
In short, it seems to relate to slice::from_raw_parts() because the issue only occurs when the slice size is very large (about to 10,000,000 bytes). The strange point is that the deref results of two same raw pointers are different. However, in my codes, there is only a return between them ( I'm sure that the underlying data's lifetime is long enough).

The issue is detailed described on this PR, including some introduction and the steps to reproduce.

Edit: this has nothing to with slice::from_raw_parts. It's caused by dangling pointers.

Have you tried running the test in miri? That is a tool which interprets your code and detects a lot of UB while doing so.

@bjorn3 Thank you for the reply. I tried miri for several times but it seems not working well on my project ( maybe just because of too many unsafe usage on my code :frowning: )

What does "not work well" mean?

Can you point us to the offending code which creates the slice?

Also, can you describe exactly what you feel is unsound/UB? So, what are you seeing and how does it break safety invariants? (e.g. you've got a dangling pointer, you're treating an array of 50 elements as an array of 51 elements and can read past the end of the allocated memory, etc.)

From what I can tell that PR just shows a log message saying two slices aren't equal, but without knowing why we expect them to be equal it's hard to know what is going wrong.

Can you add something like assert_eq!(pointer_a, pointer_b) to make sure they're pointing at the same thing? Comparing raw pointers for equality will tell you whether their address is the same.

If the addresses and lengths passed to std::slice::from_raw_parts() are the same but the two slices aren't equal, there's a good chance your pointers are derived from uninitialized memory and LLVM has chosen to optimise the code however it chooses.

"What The Hardware Does" is not What Your Program Does: Uninitialized Memory has an example which shows how reading uninitialized memory (which is UB) can cause all sorts of funny things to happen:

use std::mem;

fn always_returns_true(x: u8) -> bool {
    x < 150 || x > 120
}

fn main() {
    let x: u8 = unsafe { mem::uninitialized() };
    assert!(always_returns_true(x));
}
2 Likes

@Michael-F-Bryan Thank you for the reply.

From what I can tell that PR just shows a log message saying two slices aren't equal, but without knowing why we expect them to be equal it's hard to know what is going wrong.

The "why" is that the codes are performing like below:

fn return_a() -> Slice {
    // do something
    dbg!(&a.as_slice());
    return a
}

fn main() {
    let a  = return_a();
    dbg!(a.as_slice())
}

Then I think the outputs of two dbg! should be the same.
But actually they are different when a is a very "large" Slice. The Slice is something to perform like an owned &[u8] without lifetime bound (I know, this is very dangerous) and it can be converted into a &[u8] by slice::from_raw_parts():

#[derive(Clone, Eq)]
pub struct Slice {
    data: *const u8,
    size: usize,
}

impl Slice {
  #[inline]
    pub fn as_slice(&self) -> &[u8] {
        if !self.data.is_null() {
            unsafe { slice::from_raw_parts(self.data, self.size) }
        } else {
            panic!("try to convert a empty(invalid) Slice as a &[u8] ")
        }
    
}

In the PR I mentioned in the description, the logs are printing the results of a.as_slice().

If the addresses and lengths passed to std::slice::from_raw_parts() are the same but the two slices aren't equal, there's a good chance your pointers are derived from uninitialized memory and LLVM has chosen to optimise the code however it chooses

I really agree with this and carefully checked whether the underlying data's lifetime of which the Slice is point to is not long enough to avoid referring uninitialized memory or encounter such a situation using dangling pointer. Maybe I just missed something.

Where does the data pointed to by a come from?

If the Slice that you return references a local variable then you'll be returning a dangling pointer to the caller (the data pointed to by a gets destroyed before return_a returns).

The underlying data is hold by a struct (using a Rc). So it's just like

struct B {
     data: Rc<Vec<u8>>
}

impl B {
    fn return_a(&self) -> Slice {
        // do something
        dbg!(&a.as_slice());
        return a // a pointe to B.data
    }
}

fn main() {
    let b  = B {data: Rc::new(vec![1u8, 2u8])}
    let a  = b.return_a();
    dbg!(a.as_slice())
}

So there should be no dangling pointer I think ?

Do you add items to data between calling B::return_a() and inspecting a? If so, it may be that the vector needed to expand and reallocate, invalidating any pointers to data's previous buffer. That could also explain why it only happens when the size gets large.

You've defined data as Rc<Vec<u8>> though, so it shouldn't be possible to mutate data...

@Michael-F-Bryan Hey! I think I found the "way" to get through this!
After reviewing my code one more time, I just found that the underlying Rc<Vec<u8>> is hold by a LRU cache with cap 8<<20. And when the data size is very large( 10_000_000 in that case), it's just evicted by that LRU, which means the underlying data is dropped. And after I increase my cache cap to the size bigger than 10_000_000, all things are green. As I said, I just missed "that thing" :joy:

Anyway thank you again for reminding me to analyze this problem step by step.

/// Slice is a simple structure containing a pointer into some external
/// storage and a size.  The user of a Slice must ensure that the slice
/// is not used after the corresponding external storage has been
/// deallocated.

That contract should require unsafe for anything that accesses the potentially-dangling storage.

9 Likes

I don't think that's the correct solution... Your Slice allows users to read from a pointer without providing any guarantees that the data being pointed to is valid. Your problem is that this abstraction is unsound, increasing the LRU cache's size just hides the symptoms.

If you want to do it properly, you might want Slice to contain a reference counted pointer instead. To take into account the fact that items may be removed from the cache (and therefore dropped), you should use something like std::sync::Weak.

Another option is to just use normal references and let the borrow checker make sure you don't try to access free'd memory.

7 Likes

@Michael-F-Bryan

Your problem is that this abstraction is unsound, increasing the LRU cache's size just hides the symptoms.

This abstraction is a pain in my project and Slice has already caused several issues before this one. And there is already an issue to track the progress removing this. I tried several times to remove this but just keep getting frustrated by the borrow checker (really guilty about this). And another parts were not tested well so I decided to remain Slice until tests become stronger.

Using references sounds like a good idea (maybe just try something like Bytes?). I'll reconsider the design from now.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.