My generic Rube Goldberg machine 'does not fulfill the required lifetime'

Hello all!

I'm having some lifetime issues that I'm not sure how to fix! I'd love some help, but at the very least I hope you enjoy my adventures in designing complex Rust architecture!

    Checking bbolt-nub v0.1.0 (/Users/abaxter/projects/better-bbolt/bbolt-nub)
error[E0477]: the type `RefTxCursor<'p, 'tx, T>` does not fulfill the required lifetime
   --> bbolt-nub/src/components/cursor.rs:667:1
    |
667 | / impl<'p, 'tx: 'p, T: TheTx<'tx, TxPageType = DirectPage<'tx, RefTxBytes...
668 | |   CursorSeekRefApi<'tx> for RefTxCursor<'p, 'tx, T>
669 | | where
670 | |   for<'b> Self::RefKv<'b>: PartialOrd<[u8]>,
...   |
677 | | }
    | |_^

error[E0477]: the type `RefTxCursor<'p, 'tx, T>` does not fulfill the required lifetime
   --> bbolt-nub/src/components/cursor.rs:668:3
    |
668 |   CursorSeekRefApi<'tx> for RefTxCursor<'p, 'tx, T>
    |   ^^^^^^^^^^^^^^^^^^^^^

error[E0477]: the type `RefTxCursor<'_, '_, T>` does not fulfill the required lifetime
   --> bbolt-nub/src/components/cursor.rs:668:29
    |
668 |   CursorSeekRefApi<'tx> for RefTxCursor<'p, 'tx, T>
    |                             ^^^^^^^^^^^^^^^^^^^^^^^

error[E0477]: the type `RefTxCursor<'p, 'tx, T>` does not fulfill the required lifetime
   --> bbolt-nub/src/components/cursor.rs:672:6
    |
672 |   fn seek_ref<'a>(
    |      ^^^^^^^^

I'm spending my spare time writing a database. One of the main goals is the ability to change low level aspects about how the data is access and organized so I can run experiments. Because of that this code is much much much more generic that any project should reasonably be. It's also fun (for some definition of fun) and I've learned a lot by doing this.

My generic Rube Goldberg machine is below.

Generic page access:

// I need to generically slice/subslice bytes that are either
// * completely loaded in memory (types beginning with Direct)
// * lazily loaded in memory (types beginning with Lazy)
pub trait GetKvRefSlice {
  type RefKv<'a>: GetKvRefSlice + 'a
  where
    Self: 'a;
  fn get_ref_slice<'a, R: RangeBounds<usize>>(&'a self, range: R) -> Self::RefKv<'a>;
}

// A single loaded array of memory, usually page_size in length
pub trait TxBytes<'tx>: Deref<Target = [u8]> + AsRef<[u8]> + Clone + Sized + Sync + Send {}

// A generic trait for both directly loaded pages and lazily loaded pages
pub trait TxPageType<'tx>: Page + GetKvTxSlice<'tx> + GetKvRefSlice + Clone + Sync + Send {
  type TxPageBytes: TxBytes<'tx>;
}

Generic byte comparisons:

// Since errors may occur during File read calls, 
//I created some Try variants for PartialEq, Eq, 
//and PartialOrd to be used by the Lazy loading key/value types
pub trait TryPartialEq<Rhs: ?Sized = Self> {
  type Error: Error + Send + Sync + 'static;
  fn try_eq(&self, other: &Rhs) -> crate::Result<bool, Self::Error>;
  // SNIP
}

pub trait TryEq: TryPartialEq<Self> {}

pub trait TryPartialOrd<Rhs: ?Sized = Self>: TryPartialEq<Rhs> {
  fn try_partial_cmp(&self, other: &Rhs) -> crate::Result<Option<Ordering>, Self::Error>;
  // SNIP
}

pub trait KvTryEq: TryEq + TryPartialEq<[u8]> {}

pub trait KvTryOrd: TryPartialOrd + TryPartialOrd<[u8]> + KvTryEq {}

pub trait KvTryDataType:
  KvTryOrd + TryHash + TryGet<u8> + RefIntoTryCopiedIter + RefIntoTryBuf + Sized
{
}

// Direct key/value types has similar requirements
pub trait KvEq: Eq + PartialEq<[u8]> {}

pub trait KvOrd: Ord + PartialOrd<[u8]> + KvEq {}

pub trait KvDataType: KvOrd + Hash + Get<u8> + RefIntoCopiedIter + RefIntoBuf + Sized {}

Most importantly both key/value types must be comparable to [u8]. The library interface will always use [u8] for user inputted data.

This design decision splits any search functionality into 2

// The generic Node interface for both leaf and branch nodes
pub trait HasElements<'tx>: Page + GetKvRefSlice + Sync + Send {
  type Element: HasKeyPosLen + Sync;

  // SNIP

  fn search<'a>(&'a self, v: &[u8]) -> Result<usize, usize>
  where
    <Self as GetKvRefSlice>::RefKv<'a>: PartialOrd<[u8]>,
  { 
    // SNIP
  }

  fn try_search<'a>(
    &'a self, v: &[u8],
  ) -> crate::Result<
    Result<usize, usize>,
    <<Self as GetKvRefSlice>::RefKv<'a> as TryPartialEq<[u8]>>::Error,
  >
  where
    <Self as GetKvRefSlice>::RefKv<'a>: TryPartialOrd<[u8]>,
  {
    // SNIP
  }
  // SNIP
}

This way search and try_search functions only exist for the key/value datatypes that support them. Neat!

You've probably noticed the 'tx lifetime everywhere. I want all resources to be limited to the transaction boundary which is identified by 'tx.

// When the backend reads a page, the 'tx lifetime
// is explicitly added by the transaction handler. To 
// simplify things, backend has no concept of transactions.
pub trait TxReadPageIO<'tx> {
  type TxPageType: TxPageType<'tx>;
 // SNIP 
  fn read_node_page(
    &'tx self, node_page_id: NodePageId,
  ) -> crate::Result<NodePage<'tx, Self::TxPageType>, PageError>;
}

pub trait TheTx<'tx>: TxReadPageIO<'tx> {
  fn stats(&self) -> &TxStats;
}

pub struct CoreTxHandle<'tx, IO> {
  io: RwLockReadGuard<'tx, IO>,
  stats: Arc<TxStats>,
  tx_id: TxId,
}

Each transaction handle defines the type of memory it will be passing around and whether or not it's direct or lazy loaded.

// The MemMap handler. It passes around a single
// &'tx [u8] that holds the entire page
pub struct RefTxHandle<'tx, IO> {
  handle: CoreTxHandle<'tx, IO>,
}

impl<'tx, IO> TxReadPageIO<'tx> for RefTxHandle<'tx, IO>
where
  IO: IOPageReader,
  IO::Bytes: IntoTxBytes<'tx, RefTxBytes<'tx>>,
{
  type TxPageType = DirectPage<'tx, RefTxBytes<'tx>>;
  // SNIP
}

impl<'tx, IO> TheTx<'tx> for RefTxHandle<'tx, IO>
where
  IO: IOPageReader,
  IO::Bytes: IntoTxBytes<'tx, RefTxBytes<'tx>>,
{
  // SNIP
}

// The lazy loading handler. It passes around Arc<[u8]>
pub struct LazyTxHandle<'tx, IO> {
  handle: CoreTxHandle<'tx, IO>,
}

impl<'tx, IO> TxReadPageIO<'tx> for LazyTxHandle<'tx, IO>
where
  IO: IOOverflowPageReader,
  IO::Bytes: IntoTxBytes<'tx, SharedTxBytes<'tx>>,
{
  type TxPageType = LazyPage<'tx, Self>;
  // SNIP
}

impl<'tx, IO> TheTx<'tx> for LazyTxHandle<'tx, IO>
where
  IO: IOOverflowPageReader,
  IO::Bytes: IntoTxBytes<'tx, SharedTxBytes<'tx>>,
{
  // SNIP
}

With that design we can generically access and compare slices in pages, searching within pages using that functionality, and none of that code needs to know how any of the bytes are read in or organized in memory. Very cool!

Where I'm currently stuck is on the Cursor where which is central to walking the database nodes.

// A CoreBucket is the database index we're walking.
// It holds the root node and the transaction.
// A StackEntry is a node page and the element we're
// currently referencing on that node.
pub struct CoreCursor<'p, 'tx: 'p, T: TheTx<'tx>> {
  bucket: &'p CoreBucket<'tx, T>,
  stack: Vec<StackEntry<'tx, T::TxPageType>>,
  location: CursorLocation,
}

impl<'p, 'tx, T: TheTx<'tx>> CoreCursor<'p, 'tx, T> {
  // We return keys with this function
  pub fn key_value_ref<'a>(
    &'a self,
  ) -> Option<(
    <T::TxPageType as GetKvRefSlice>::RefKv<'a>,
    <T::TxPageType as GetKvRefSlice>::RefKv<'a>,
  )> {
    // SNIP
  }
  
  // We move around the index with move functions
  fn move_to_first_element(&mut self) -> crate::Result<Option<LeafFlag>, CursorError> {
    // SNIP
  }
}

Searching across the index is where things get interesting. I found I had to explicitly set a different lifetime for<'b> for RefKv<'b>: PartialOrd<[u8]> or RefKv<'b>: TryPartialOrd<[u8]> otherwise the compiler would complain that I could only use a single mutable borrow at a time. Regardless of how I tried to resolved the issue, the borrows would never end. My original function definitions used 'a instead of 'b. I'm note exactly sure what I was telling the compiler there, but separating the lifetimes worked.

  fn seek<'a>(&'a mut self, v: &[u8]) -> crate::Result<Option<LeafFlag>, CursorError>
  where
    for<'b> <T::TxPageType as GetKvRefSlice>::RefKv<'b>: PartialOrd<[u8]>,
  {
    // SNIP
  }

  fn try_seek<'a>(&'a mut self, v: &[u8]) -> crate::Result<Option<LeafFlag>, CursorError>
  where
    for<'b> <T::TxPageType as GetKvRefSlice>::RefKv<'b>: TryPartialOrd<[u8]>,
  {
    // SNIP
  }

That brings us to my latest issue. I need to, in a single function both search for a [u8] and return the key/values.

The APIs look like this

pub trait CursorRefApi<'tx> {
  type RefKv<'a>: GetKvRefSlice + 'a
  where
    Self: 'a;
  // SNIP
}

pub trait CursorSeekRefApi<'tx>: CursorRefApi<'tx>
where
  for<'b> Self::RefKv<'b>: PartialOrd<[u8]>,
{
  fn seek_ref<'a>(
    &'a mut self, v: &[u8],
  ) -> crate::Result<Option<(Self::RefKv<'a>, Self::RefKv<'a>)>, CursorError>;
}

Finally we get to the issue at hand. The Cursor compiles right up until I add the search function

// This Cursor mirrors the RefTxHandle above.
// A MemMapCursor will wraps this and will explicitly define the backend for any library user
// No one needs to see the horror show I've created :D
// It's definitely better than the last architecture I've designed
pub struct RefTxCursor<'p, 'tx: 'p, T: TheTx<'tx, TxPageType = DirectPage<'tx, RefTxBytes<'tx>>>> {
  filter: LeafFilterCursor<'p, 'tx, T>,
}

// This works as expected
impl<'p, 'tx: 'p, T: TheTx<'tx, TxPageType = DirectPage<'tx, RefTxBytes<'tx>>>> CursorRefApi<'tx>
  for RefTxCursor<'p, 'tx, T>
{
  type RefKv<'a>
    = <T::TxPageType as GetKvRefSlice>::RefKv<'a>
  where
    Self: 'a;
  // SNIP
}

// This errors out
impl<'p, 'tx: 'p, T: TheTx<'tx, TxPageType = DirectPage<'tx, RefTxBytes<'tx>>>>
  CursorSeekRefApi<'tx> for RefTxCursor<'p, 'tx, T>
where
  for<'b> Self::RefKv<'b>: PartialOrd<[u8]>,
{
  fn seek_ref<'a>(
    &'a mut self, v: &[u8],
  ) -> error_stack::Result<Option<(Self::RefKv<'a>, Self::RefKv<'a>)>, CursorError> {
    todo!()
  }
}

I do not understand which lifetime the compiler is talking about. The error message is completely unhelpful in this regard. My code is available here.

At the very least I hope you enjoyed the tour! I'd love some help if you have any suggestions!

Bonus points if someone can explain to me why I needed to use for<'b> <T::TxPageType as GetKvRefSlice>::RefKv<'b> instead of <'a>

The problem can be minimized to the following:

use std::marker::PhantomData;

pub trait GetKvRefSlice {
    type RefKv<'a>: GetKvRefSlice
    where
        Self: 'a;
}

pub trait CursorRefApi<'tx> {
    type RefKv<'a>
    where
        Self: 'a;
}

pub struct RefTxCursor<'tx> {
    filter: PhantomData<&'tx ()>,
}

impl<'tx> CursorRefApi<'tx> for RefTxCursor<'tx> {
    type RefKv<'a>
        = ()
    where
        Self: 'a;
}

impl<'tx> RefTxCursor<'tx>
where
    for<'b> <Self as CursorRefApi<'tx>>::RefKv<'b>: PartialOrd<[u8]>,
{
    fn seek_ref<'a>(&'a mut self, _v: &[u8]) -> <Self as CursorRefApi<'tx>>::RefKv<'a> {
        todo!()
    }
}

A similar issue (and the solution) is described in The Better Alternative to Lifetime GATs - Sabrina Jewson (I really recommend reading this). In summary, using higher ranked trait bounds and lifetime GATs together makes the compiler think some types need to be 'static, in this case RefTxCursor. The solution is to change the GetKvRefSlice trait to

trait Gat<'a, Implied=&'a Self> {
    type RefKv;
}

trait CursorRefApi<'tx>: for<'a> Gat<'a> {}

And then implement it for RefTxCursor like

impl<'a, 'tx> Gat<'a> for RefTxCursor<'tx> {
    type RefKv = ();
}

impl<'tx> CursorRefApi<'tx> for RefTxCursor<'tx> {}

(By the way, why do you have 'tx on CursorRefApi?)

This makes the code compile:

impl<'tx> RefTxCursor<'tx>
where
    for<'b> <Self as Gat<'b>>::RefKv: PartialOrd<[u8]>,
{
    fn seek_ref<'a>(&'a mut self, _v: &[u8]) -> <Self as Gat<'a>>::RefKv {
        todo!()
    }
}

From what I can see, the same solution should work in the real code.

1 Like

Thank you! I'll mess with this at my next break!!

For compatibility reasons with the original Go library, the database needs to be able to return keys/values that are valid for the entire transaction and aren't limited by the Cursor's lifetime.

Edit: I see what you mean. The 'tx isn't used for the RefApi. I've removed it :slight_smile:

@branamn

Would the get_ref_slice function defined in GetKvRefSlice trait go in the new Gat trait or would that be a new trait?

Something like this?

trait GatRefKv<'a, Implied=&'a Self> {
  type RefKv;
}

pub trait GetGatKvRefSlice : for<'a> GatRefKv<'a> {
  fn get_ref_slice<'a, R: RangeBounds<usize>>(&'a self, range: R) -> <Self as GatRefKv<'a>>::RefKv;
}

I think you could put it on either trait. But it seems most ergonomic to do as you have done.

1 Like

I was able to get everything compiling and moving forward again. Thank you for helping me!