Undefined Behavior: in-bounds pointer arithmetic failed: attempting to offset pointer by 20 bytes, but got alloc238 which is only 1 byte from the end of the allocation

Hello, I am confused by this error from Miri :

Compiling playground v0.0.1 (/playground)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.45s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: in-bounds pointer arithmetic failed: attempting to offset pointer by 20 bytes, but got alloc238 which is only 1 byte from the end of the allocation
   --> src/main.rs:124:24
    |
124 |             let next = ptr.add(size);
    |                        ^^^^^^^^^^^^^ Undefined Behavior occurred here
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
help: alloc238 was allocated here:
   --> src/main.rs:22:25
    |
 22 |             let start = std::alloc::alloc(layout);
    |                         ^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: stack backtrace:
            0: <&HeapArena as std::alloc::Allocator>::allocate
                at src/main.rs:124:24: 124:37
            1: alloc::raw_vec::RawVecInner::<&HeapArena>::try_allocate_in
                at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec/mod.rs:465:41: 465:63
            2: alloc::raw_vec::RawVecInner::<&HeapArena>::with_capacity_in
                at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec/mod.rs:434:15: 434:92
            3: alloc::raw_vec::RawVec::<i32, &HeapArena>::with_capacity_in
                at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec/mod.rs:177:20: 177:77
            4: std::vec::Vec::<i32, &HeapArena>::with_capacity_in
                at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:977:20: 977:61
            5: main
                at src/main.rs:143:23: 143:55

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error

Standard Output

My code is like this :

#![feature(allocator_api)]

pub struct Block {
    ptr: *mut u8,
    cursor: *mut u8,
    end: *mut u8,
    layout: std::alloc::Layout,
    next: Option<Box<Block>>
}

pub struct HeapArena {
    head: std::cell::UnsafeCell<Block>
}

impl HeapArena {
    #[inline(always)]
    pub fn new(size: usize) -> Self {
        
        let layout = unsafe { std::alloc::Layout::from_size_align_unchecked(size, std::mem::align_of::<usize>()) };

        let start = unsafe { std::alloc::alloc(layout) };
            if start.is_null() {
                std::alloc::handle_alloc_error(layout);
            }

            Self {
                head: std::cell::UnsafeCell::new(Block {
                    ptr: start,
                    cursor: start,
                    end: unsafe { start.add(size) },
                    layout: layout,
                    next: None
                })
            }
        
    }
    
    unsafe fn grows(&self, min_size: usize) {
        println!("grow");
        let head = unsafe { &mut *self.head.get() };
        let new_size = min_size.max(head.layout.size() * 2);
        let layout = std::alloc::Layout::from_size_align(new_size, 8).unwrap();
        let ptr = unsafe { std::alloc::alloc(layout) };
        if ptr.is_null() { std::alloc::handle_alloc_error(layout); }

        let old_block = unsafe { std::ptr::replace(head, Block {
            ptr,
            layout,
            cursor: ptr,
            end: ptr.add(new_size),
            next: None,
        }) };
        
        head.next = Some(Box::new(old_block));
    }

    #[inline(always)]
    pub fn reset(&mut self) {
        let mut current = unsafe { &mut *self.head.get() };
        loop {
            current.cursor = current.ptr;
            if let Some(ref mut next_block) = current.next {
                current = next_block.as_mut();
            } else {
                break;
            }
        }
    }
}

impl Drop for HeapArena {
    #[inline(always)]
    fn drop(&mut self) {
        let mut current = unsafe { std::ptr::replace(
            self.head.get(),
            Block {
                ptr: std::ptr::null_mut(),
                cursor: std::ptr::null_mut(),
                end: std::ptr::null_mut(),
                layout: std::alloc::Layout::from_size_align(1, 1).unwrap(),
                next: None,
            },
        ) };
        loop {
            if !current.ptr.is_null() {
                unsafe { std::alloc::dealloc(current.ptr, current.layout) };
            }

            if let Some(next_block) = current.next {
                current = *next_block;
            } else {
                break;
            }
        }
        
    }
}

unsafe impl std::alloc::Allocator for &HeapArena {
    #[inline(always)]
    fn allocate(
        &self,
        layout: std::alloc::Layout,
    ) -> Result<std::ptr::NonNull<[u8]>, std::alloc::AllocError> {
        let head = unsafe { &mut *self.head.get() };
        let size = layout.size();
        let align = layout.align();

        let handle_addr = head.cursor as usize;
        let aligned_addr = (handle_addr + (align - 1)) & !(align - 1);
        /* let next = aligned_addr + size;
            if next > head.end as usize {
                self.grows(size);
                return self.allocate(layout);
            }
        */
        // let ptr = aligned_addr as *mut u8;
        let ptr = head.cursor.map_addr(|_| aligned_addr);
        let next = unsafe { ptr.add(size) };
        if next > head.end {
            unsafe { self.grows(size) };
            return self.allocate(layout);
        }
        head.cursor = next as *mut u8;
            
        return Ok(unsafe {
            std::ptr::NonNull::new_unchecked(
                std::ptr::slice_from_raw_parts_mut(ptr, size),
            )
        });
        
    }

    #[inline(always)]
    unsafe fn deallocate(&self, _ptr: std::ptr::NonNull<u8>, _layout: std::alloc::Layout) {}
}

fn main() {
    let mut arena = HeapArena::new(1);
    {let mut vector = Vec::with_capacity_in(5, &arena);
    for _ in 0..100 {
        vector.push(10);
        vector.push(20);
    }
    println!("{}", vector[0]);
    }
    arena.reset();
    //println!("{}", vector[0]);
}

What is and which line is the cause of why the ptr.add() failed? Why it is failed?

And what is the solution?

Thank you for the helps

as a writer of unsafe code it is your job to read the documentation of every unsafe function you call with the utmost caution, and fully respect their safety contract.
as the error message clearly state, you have violated the safety contact of add

and you know you might violate it because you litterally have

if next > head.end

on the line below, which is never true unless you do undefined behavior.
if you haven't read it already you probbly should do it now.

if you want to succeeed at writing unsafe code, you really should stop putting unsafe blocks that cover whole function bodies, and check each unsafe function you call.

grows is pretty good, although it could use some more safety comments

reset does not need any unsafe at all

I think you mistyped top (stop)?

Thank you for the answer and the link to the documentation. I didn't read it earlier :v that is new idea for me to read every unsafe function documentation. Now I already read it

I edited my code to put unsafe block separately for each unsafe function. After more reading and analysis I guess it is because, the result of ptr.add(size) must point within the same allocated block or be exactly 1 byte past the end , I allocated with size 1, but then I am requesting space for 5 i32 (20 bytes), the code performs ptr.add(20) on a 1 byte allocation, it attempts to point outside that tiny allocation, which is immediate Undefined Behavior. Correct my understanding of it is wrong

I changed the code to calculate the remaining capacity using integer math, and only if there is space then performing the pointer increment :

fn allocate(
        &self,
        layout: std::alloc::Layout,
    ) -> Result<std::ptr::NonNull<[u8]>, std::alloc::AllocError> {
        let head = unsafe { &mut *self.head.get() };
        let align = layout.align();

        let handle_addr = head.cursor as usize;
        let aligned_addr = (handle_addr + (align - 1)) & !(align - 1);
        let size = layout.size();
        let end_addr = head.end as usize;
        if aligned_addr.checked_add(size).unwrap_or(usize::MAX) > end_addr {
            unsafe { self.grows(size) };
            return self.allocate(layout);
        }
        
        let ptr = head.cursor.map_addr(|_| aligned_addr);
        let next = unsafe { ptr.add(size) };
        head.cursor = next as *mut u8;
            
        return Ok(unsafe {
            std::ptr::NonNull::new_unchecked(
                std::ptr::slice_from_raw_parts_mut(ptr, size),
            )
        });
        
    }

Now Miri does not report any Undefined Behaviour?

How is your review?

this is much better, this will avoid the issue. this is a pretty strange condition i will admit, but unsafe functions have strange preconditions.

checking miri is definitely a great thing to do which too many ignore.

Thank you for the help

I also learned other thing from this

  • using pointer arithmetic with (+) symbol makes Miri can not detect the Undefined Behaviour. Changing it to ptr.add() helps Miri to able to see the Undefined Behaviour and report it, because the Undefined Behaviour only occures after I changed from (+) to ptr.add()

  • casting int to pointer using as, eg as *mut u8 also makes Miri not able to get more info and check it. Changing it to .map_addr() makes Miri able to check it again

btw for alignement i would use .checked_next_multiple_of(align)

Thank you, that is another new knowledge for me. I changed the bitwise to it. It works

As an additional advice I would recommend to get in a habit of providing safety comments for each unsafe block that you use. I.e.:

// SAFETY: Thorough argument explaining safety invariants required
// by `some_unsafe_fn` and why they are always upheld.
unsafe {
    some_unsafe_fn();
}

Try to have as precise and pedantic description that you can. Don't cut corners with comments "I checked and this function is safe to call". Each such comment should be a case (almost a legal case) proving without a shadow of a doubt why calling unsafe function (or performing other unsafe operation) is sound.

To get a feel for how to write safety comments just look at the standard library. I just went into the implementation of Vec<T> and linked a couple of example safety comments. See this, and this, or this or maybe that.

There is even a clippy lint, which will warn you if you forget to add safety comment to an unsafe block.

Though, personally, I prefer to be even more pedantic than std, and it has plenty of uncommented unsafe blocks. (I forgive it on the basis that std is presumably the most audited Rust crate.)

Not only it's the most audited crate, but, more importantly, it includes pieces that were written way before current standards were established by necessity.

So that's std is both an example how to do things and also small amount of of “legacy” that couldn't be fixed “because compatibility” (it may even be problem with usafe comments: it's nice to have clear and concise comments for each use of unsafe function — but it's better to clearly have no comments than incorrect comments… at least when comments are absent it's obvious issue, when they are incorrect — it's much harder to notice and someone may rely on incorrect comments which would make situation even harder to fix).

Thank you, I will add safety comment for sure after all thing is done. I didn't do it earlier because it's still very experiment by myself that changes very fast plus I didn't know enough about the unsafe function

Hello, I get new question in this topic

If I have

let arena = HeapArena::new(1024 * 8);
let mut vec = Vec::with_capacity_in(2, &arena);
vec.push(String::from("a"));
vec.push(String::from("a"));

When the arena is dropped, will it also run the drop function of string despite string comes from global allocator?

If not, how to trigger the drop function of containers that are allocated from global allocator that are saved inside containers allocated in the arena when the arena is dropped?

I guess it could depend on the specific area you're using, but I think it's common that this isn't supported. Instead you use Strings, collections, boxes, etc, that are made for allocation in the arena. For example, for the popular Bumpalo arena see: bumpalo - Rust

With Bumpalo there is also a way to box a value that will be dropped: bumpalo::boxed - Rust

Thank you, that is nice info

Initially I want to also drop containers from global allocator if it is saved inside container from arena together when arena is dropped to have really a single drop run despite some of them come from global allocator. Do you think it can be achieved by saving the pointer of String::new() etc, then run drop on said pointer? Does single drop run together has higher performance than multiple drop run?

For now at least it does not leak memory. The system already prevent calling drop if there is container still point to arena. Eg :

let arena = HeapArena::new(1024);

let mut vec = Vec::with_capacity_in(2, &arena);
v.push(10);

drop(arena);

println!("{}", vec[0]);

I checked if the drop function from each std collection is still run after out of scope. It is indeed still run

#[derive(Debug)]
struct Tes(&'static str);

impl Drop for Tes {
    fn drop(&mut self) {
        println!("Drop function of container from global allocator is run");
    }
}

fn main() {
    let arena = HeapArena::new(1024 * 8);
    let mut vec = Vec::with_capacity_in(2, &arena);
    vec.push(Tes("a"));
    vec.push(Tes("a"));
    
    println!("{:?}", vec[0]);
}

And I added println to the arena's drop function

The output is :

Tes("a")
Drop function of container from global allocator is run
Drop function of container from global allocator is run
arena's drop function is run

The drop function of each std collection will also run the drop function of other std collection that is saved inside it

The drop function of each std collection is guaranted to run first before the drop function of the arena

So there is no memory leak

Is that correct?