Miri identified UB with MaybeInit::assume_init(), but I'm not sure why

I have ported the better part of the BBolt database into Rust. I challenged myself to seek as little help as I can and after almost 7000 loc I think I've done pretty well.

I need to track down some undefined behavior. I'm sure I violated rules somewhere. Miri says I broke a rule, but I don't see how.

test tx::test::test_tx_create_bucket ... error: Undefined Behavior: trying to retag from <326266> for SharedReadWrite permission at alloc99841[0x0], but that tag does not exist in the borrow stack for this location
    --> src/tx.rs:204:5
     |
204  |     self.split_r().db.page(id)
     |     ^^^^^^^^^^^^^^^^^
     |     |
     |     trying to retag from <326266> for SharedReadWrite permission at alloc99841[0x0], but that tag does not exist in the borrow stack for this location
     |     this error occurs as part of retag at alloc99841[0x0..0x18]
     |
     = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
     = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <326266> was created by a SharedReadWrite retag at offsets [0x0..0x18]
    --> src/common/memory.rs:63:11
     |
63   |     BCell(a.alloc((RefCell::new(t), b)))

My memory and code model works like this:

  • Transactions are strictly single threaded
  • All transaction memory is stored in a Bumpalo bump that's owned by the tx object
  • All nodes owned by the bump are available via a RefCell in wrapped in the BCell struct.
  • Each nodes with logic have a 'Cell' struct (example) wrapping it that provides the logic so I can manage the RefCell borrows. The BBolt database calls functions on both parent and child nodes and this was my workaround. I can easily copy the Go code without falling afoul of the borrow tracker.
  • The transaction is created using MaybeUninit since we need to pin the Bumpalo, database RwLock, and the transaction cell.
  • I have separate structs for the Read and RW transactions (an experiment that seemed like a good idea at the time!). To keep the code duplication to a minimum and follow the Go code as closely as possible, I use the SplitRef trait to give the internal API trait code access to the read data and optionally the write data in the same function.

I'm currently shaking down the code and fixing bugs. In the last day my tests have started to fail in nonsense ways. Today I rewrote the database backend so it's able to just use a big array so I can determine the cause with Miri.

Back to my error:

The transaction is created by pinning the bump first, the database lock second, and then it creates the transaction cell that will be used by the rest of the transaction. There is a cycle between the transaction cell and the root bucket cell so I'm using Rc.

What I've done runs afoul of Miri's rules when assume_init() is called to finish creating the transaction, but I don't see how.

I do have an unsafe version I can use to create the cycle, but I haven't tested it outside of a simple unit test and I'm not sure if that would cause the same problem or not.

No talk of code smells or re-architecting the code, please. I'm about out of PTO and rewriting several thousand lines of code isn't something I have time for.

Please help me save my holidays project!

1 Like

Looking at this

pub struct TxRwImpl<'tx> {
    bump: Pin<Box<LinearOwnedReusable<Bump>>>,
    db: Pin<Box<DbGuard<'tx>>>,
    pub(crate) tx: Pin<Rc<TxRwCell<'tx>>>,
}

impl<'tx> TxRwImpl<'tx> {
    pub(crate) fn get_ref(&self) -> TxRwRef<'tx> {
        TxRwRef {
            tx: TxRwCell { cell: self.tx.cell },
        }
    }

    pub(crate) fn new(
        bump: LinearOwnedReusable<Bump>, lock: RwLockWriteGuard<'tx, DBShared>,
    ) -> TxRwImpl<'tx> {
        let mut meta = lock.backend.meta();
        meta.set_txid(meta.txid() + 1);
        println!("trace~tx.new id: {:?}", meta.txid());
        let page_size = meta.page_size() as usize;
        let inline_bucket = meta.root();
        let mut uninit: MaybeUninit<TxRwImpl<'tx>> = MaybeUninit::uninit();
        let ptr = uninit.as_mut_ptr();
        unsafe {
            addr_of_mut!((*ptr).bump).write(Box::pin(bump));
            let bump = &(**addr_of!((*ptr).bump));
            addr_of_mut!((*ptr).db).write(Box::pin(DbGuard::Write(RefCell::new(lock))));
            let db = &(**addr_of!((*ptr).db));
            let tx = Rc::new_cyclic(|weak| {
                let r = TxR {
                    b: bump,
                    page_size,
                    db,
                    meta,
                    stats: Default::default(),
                    is_rollback: false,
                    p: Default::default(),
                };
                let w = TxW {
                    pages: HashMap::new_in(bump),
                    commit_handlers: BVec::new_in(bump),
                    p: Default::default(),
                };
                let bucket = BucketRwCell::new_in(bump, inline_bucket, weak.clone(), None);
                TxRwCell {
                    cell: BCell::new_in(TxRW { r, w }, bucket, bump),
                }
            });
            addr_of_mut!((*ptr).tx).write(Pin::new(tx));
            let uninit = uninit;
            let r = uninit.assume_init();
            r
        }
    }
    …
}

It looks like you are creating a self-referencing datatype. In this one, the tx field contains (indirectly, in the TxRwCell, then the BCell, in the TxRW in the TxR… the details don’t really matter…) a reference created with the line

let db = &(**addr_of!((*ptr).db));

which is starting with a &Pin<Box<DbGuard<'_>>> and dereferencing it to a &DbGuard<'_>.

Anyways, the problem now is that Rust’s memory model is such that Box<T> needs unique access to its target. The precise question of when this unique access is asserted is not something I know off the top of my head, but AFAIK, moving the Box (yet, just the Box, not its contents) surely does it, at least in the conservative and still a bit experimental “Stack Borrows” semantic / memory model that miri enforces, and your assume_init call does move (the struct containing) the Box!

Crates for self-referential struct will run into this issue, too, and will need to resort to workarounds, such as working with a raw pointer instead of a box. For something like this, e.g there’s a Box alternative using a raw pointer internally, that’s offered e.g. in the aliasable crate.

use core::ptr::addr_of;

fn main() {
    let x = Box::new(42);
    let y: &i32 = unsafe {
        &*addr_of!(*x)
    };
    let _read = *y; // OKAY
    let x = x; // invalidates the reference
    let _read = *y; // MIRI complains
}
use aliasable::boxed::AliasableBox;

use core::ptr::addr_of;

fn main() {
    let x = AliasableBox::from_unique(Box::new(42));
    let y: &i32 = unsafe {
        &*addr_of!(*x)
    };
    let _read = *y; // OKAY
    let x = x; // doesn't invalidate the reference
    let _read = *y; // OKAY
}

Sure enough, adding such a change…

--- a/src/tx.rs
+++ b/src/tx.rs
@@ -14,6 +14,7 @@ use crate::db::{DBShared, DbGuard, DbIAPI, DbRwIAPI};
 use crate::freelist::Freelist;
 use crate::node::NodeRwCell;
 use crate::{BucketApi, CursorApi, CursorRwApi};
+use aliasable::boxed::AliasableBox;
 use aligners::{alignment, AlignedBytes};
 use bumpalo::Bump;
 use lockfree_object_pool::LinearOwnedReusable;
@@ -776,7 +777,7 @@ impl<'tx> TxApi<'tx> for TxRef<'tx> {
 
 pub struct TxRwImpl<'tx> {
     bump: Pin<Box<LinearOwnedReusable<Bump>>>,
-    db: Pin<Box<DbGuard<'tx>>>,
+    db: Pin<AliasableBox<DbGuard<'tx>>>,
     pub(crate) tx: Pin<Rc<TxRwCell<'tx>>>,
 }
 
@@ -800,7 +801,7 @@ impl<'tx> TxRwImpl<'tx> {
         unsafe {
             addr_of_mut!((*ptr).bump).write(Box::pin(bump));
             let bump = &(**addr_of!((*ptr).bump));
-            addr_of_mut!((*ptr).db).write(Box::pin(DbGuard::Write(RefCell::new(lock))));
+            addr_of_mut!((*ptr).db).write(Pin::new(AliasableBox::from_unique(Box::new(DbGuard::Write(RefCell::new(lock))))));
             let db = &(**addr_of!((*ptr).db));
             let tx = Rc::new_cyclic(|weak| {
                 let r = TxR {

lets the miri test run into the next (at first glance seemingly unrelated) instance of Undefined Behavior in your code.


By the way, judging by the amount of Pin being used on Unpin types such as LinearOwnedReusable / DbGuard / TxRwCell, it seems like you are likely fundamentally misunderstanding what Pin does.

3 Likes

Hm. When you say unique, I assume you don't mean the borrow checker?

Most likely. A lot of this project has been violating rules I didn't know about and learning from my mistake.

Maybe “unique” wasn’t my best wording. The access is “exclusive”. Or “owned”. Not aliased. So many words… In any case, you should make sure that when the Box itself isn’t borrowed, its contents aren’t borrowed. If I recall correctly, this particular rule is also one that might realistically change in the future to something less strict.

1 Like

Wouldn't the RefCell have taken care of that then? Or by creating a permanent reference it is no longer a uniquely owned bit of memory?

Right now, Box<T>'s ownership is represented to LLVM as a noalias pointer. So roughly speaking, anytime you move a Box around, it's asserting that there are no other pointers to that T.

2 Likes

That makes a lot of sense. Thank you!

Does this work better? bbolt-rs/src/bin/db_playground.rs at main · ambaxter/bbolt-rs · GitHub

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.