Code review for unsafe code in diesel

Reviewing:

#[allow(missing_debug_implementations)]
pub struct StatementUse<'a, 'b> {
    statement: MaybeCached<'a, Statement>,
    column_names: OnceCell<Vec<*const str>>,
    // we need to store the query here to ensure noone does
    // drop it till the end ot the statement
    // We use a boxed queryfragment here just to erase the
    // generic type
    _query: Box<dyn QueryFragment<Sqlite> + 'b>,
    binds_to_free: Vec<(i32, Option<SqliteBindValue<'static>>)>,
}

impl<'a, 'b> StatementUse<'a, 'b> {
    pub(super) fn bind<T>(mut statement: MaybeCached<'a, Statement>, query: T) -> QueryResult<Self>
    where
        T: QueryFragment<Sqlite> + QueryId + 'b,
  • Subjective nit: when using complex lifetimes, especially with unsafe code, despite popular belief, the Self alias ought to be replaced with what it mentions, so that the signature is fully readable on its own. This is especially meaningful if the associated function is far from the impl line:

    fn bind<T>(
        mut statement: MaybeCached<'a, Statement>,
        query: T,
    ) -> QueryResult<StatementUse<'a, 'b>>
    where
        T: QueryFragment<Sqlite> + QueryId + 'b,
    

    Also, since 'b represents the region of owned-usability of the given query, then maybe you should name it accordingly, as well as 'a -> 'stmt for the maybe-borrow of the Statement?

    impl<'stmt, 'query> StatementUse<'stmt, 'query> {
        fn bind<Query>(
            mut statement: MaybeCached<'stmt, Statement>,
            query: Query,
        ) -> QueryResult< StatementUse<'stmt, 'query> >
        where
            Query: 'query + QueryFragment<Sqlite> + QueryId,
    

    and ideally propagate the rename elsewhere.

impl<'a, 'b> StatementUse<'a, 'b> {
    pub(super) fn bind<T>(mut statement: MaybeCached<'a, Statement>, query: T) -> QueryResult<Self>
    where
        T: QueryFragment<Sqlite> + QueryId + 'b,
    {
        let mut bind_collector = SqliteBindCollector::new();
        query.collect_binds(&mut bind_collector, &mut ())?;

        let SqliteBindCollector { metadata, binds } = bind_collector;
        let mut binds_to_free = Vec::new();

        for (idx, (bind, tpe)) in binds.into_iter().zip(metadata).enumerate() {
            // Sqlite starts to count by 1
            let bind_idx = idx as i32 + 1;
  • Note that iterator.enumerate() is the same as (0 ..).zip(iterable), so if you want to start by 1, and use i32s, I highly recommend that you do:

    for (bind_idx, (bind, tpe))
        in (1_i32 ..).zip(binds.into_iter().zip(metadata))
    {
    
            // It's safe to call bind here as:
            // * The type and value matches
            // * We ensure that corresponding buffers lives long enough below
            // * The statement is not used yet by `step` or anything else
            unsafe { statement.bind(tpe, &bind, bind_idx)? };
  • I cannot judge of the safety of this call, since Statement::bind does not spell out its /// Safety requirements.

    After looking at this, it does look like the value parameter involves two lifetimes (you should try to enable the deny_elided_lifetimes_in_paths lint, especially if you are using unsafe code), so I'm assuming the lifetimes to live for at least as long as either the statement is dropped, or the bindings are "cleared" by being overwritten with bindings to "null placeholders" (which the drop impl does; for further clarity you could write the Drop inside this very function).

    EDIT: After reading this whole snippet as a whole, I realize it's a wrapper offering a non-unsafe API for two "orthogonal issues": "bound arguments should not dangle" (which is tackled by the Drop impl and the vec) and the "column names shan't dangle". From there, I'd recommend that two wrappers be used, one tackling only the first issue, and then a second wrapper, containing it, which adds the "column names" invariant.

            // We want to unbind the buffers later to ensure
            // that sqlite does not access uninitilized memory
            match bind {
                SqliteBindValue::BorrowedString(_) | SqliteBindValue::BorrowedBinary(_) => {
                    binds_to_free.push((bind_idx, None));
                }
                SqliteBindValue::Binary(b) => {
                    binds_to_free.push((bind_idx, Some(SqliteBindValue::Binary(b))));
                }
                SqliteBindValue::String(b) => {
                    binds_to_free.push((bind_idx, Some(SqliteBindValue::String(b))));
                }
                _ => (),
            }
        }

        Ok(Self {
            statement,
            column_names: OnceCell::new(),
            binds_to_free,
            _query: Box::new(query) as Box<_>,
        })
    }
    pub(in crate::sqlite::connection) fn run(self) -> QueryResult<()> {
        self.step().map(|_| ())
    }

    pub(in crate::sqlite::connection) fn step(self) -> QueryResult<Option<Self>> {
        let res = unsafe {
            match ffi::sqlite3_step(self.statement.inner_statement.as_ptr()) {
                ffi::SQLITE_DONE => Ok(None),
                ffi::SQLITE_ROW => Ok(Some(())),
                _ => Err(last_error(self.statement.raw_connection())),
            }
        }?;
        Ok(res.map(move |()| self))
    }
  • According to the docs of column_name, IIUC, this one invalidates the potentially-cached column_names since it calls sqlite3_step, and yet offers Self nonetheless, thus allowing for:

    let stmt_use = …;
    let _ = stmt_use.field_name(1); // calls `column_name` and caches a pointer
    if let Some(stmt_use) = stmt_use.step()? { // calls `sqlite3_step`
        if let Some(coll_name) = stmt_use.field_name(1) { // dereferences the cached pointer
            // According to the warnings written on the raw `column_name`
            // function, this could be a dangling reference!
        }
    }
    
    // The returned string pointer is valid until either the prepared statement is
    // destroyed by sqlite3_finalize() or until the statement is automatically
    // reprepared by the first call to sqlite3_step() for a particular run or
    // until the next call to sqlite3_column_name() or sqlite3_column_name16()
    // on the same column.
    //
    // https://sqlite.org/c3ref/column_name.html
    pub(super) unsafe fn column_name(&self, idx: i32) -> *const str {
        let name = {
            let column_name =
                ffi::sqlite3_column_name(self.statement.inner_statement.as_ptr(), idx);
            assert!(
                !column_name.is_null(),
                "The Sqlite documentation states that it only returns a \
                 null pointer here if we are in a OOM condition."
            );
            CStr::from_ptr(column_name)
        };
        name.to_str().expect(
            "The Sqlite documentation states that this is UTF8. \
             If you see this error message something has gone \
             horribliy wrong. Please open an issue at the \
             diesel repository.",
        ) as *const str
    }
  • Note: I was initially surprised by having a function yielding a raw pointer and also being an unsafe fn on its own, but since .field_name() is a non-unsafe function yielding a Rust reference which would be invalidated by this function, the unsafe fn is warranted (although I'd personally make this function private inside field_name to make it clearer).

    As it stands now, a /// Safety annotation regarding its misinteraction with .field_name() would be in order.

    pub(super) fn column_count(&self) -> i32 {
        unsafe { ffi::sqlite3_column_count(self.statement.inner_statement.as_ptr()) }
    }
    pub(super) fn index_for_column_name(&mut self, field_name: &str) -> Option<usize> {
        (0..self.column_count())
            .find(|idx| self.field_name(*idx) == Some(field_name))
            .map(|v| v as usize)
    }
    pub(super) fn field_name(&mut self, idx: i32) -> Option<&str> {
        if let Some(column_names) = self.column_names.get() {
            return column_names
                .get(idx as usize)
                .and_then(|c| unsafe { c.as_ref() });
        }
        let values = (0..self.column_count())
            .map(|idx| unsafe { self.column_name(idx) })
            .collect::<Vec<_>>();
        let ret = values.get(idx as usize).copied();
        let _ = self.column_names.set(values);
        ret.and_then(|p| unsafe { p.as_ref() })
    }
  • Nit: this uses a &mut self receiver as well as a &-friendly OnceCell; which is kind of an oxymoron.

    • Either embrace &mut here:

      column_names: Option<Vec<*const str>>,
      
      let column_names = self.column_names.get_or_insert_with(|| {
          /* define the raw `column_name` here (c.f. previous comment) */
          (0 .. self.column_count())
              .map(|idx| self.column_name(idx))
              .collect()
      });
      column_names
          .get(idx as usize)
          .and_then(|c| unsafe { c.as_ref() })
      
    • or feature & access (AFAIK, this won't cause soundness problems with the lifetime, provided column_name remained private (or else it being an unsafe fn would indeed be warranted)):

      -     column_names: Option<Vec<*const str>>,
      +     column_names: OnceCell<Vec<*const str>>,
      …
      - pub(super) fn field_name(&mut self, idx: i32) -> Option<&str> {
      + pub(super) fn field_name(&self, idx: i32) -> Option<&str> {
      -     let column_names = self.column_names.get_or_insert_with(|| {
      +     let column_names = self.column_names.get_or_init(|| {
      
    pub(super) fn copy_value(&self, idx: i32) -> Option<OwnedSqliteValue> {
        OwnedSqliteValue::copy_from_ptr(self.column_value(idx)?)
    }
    pub(super) fn column_value(&self, idx: i32) -> Option<NonNull<ffi::sqlite3_value>> {
        let ptr =
            unsafe { ffi::sqlite3_column_value(self.statement.inner_statement.as_ptr(), idx) };
        NonNull::new(ptr)
    }
}
impl<'a, 'b> Drop for StatementUse<'a, 'b> {
    fn drop(&mut self) {
        // First reset the statement, otherwise the bind calls
        // below will fails
        self.statement.reset();

        // Reset the binds that may point to memory that will be/needs to be freed
        for (idx, _buffer) in std::mem::take(&mut self.binds_to_free) {
            unsafe {
                // It's always safe to bind null values
                self.statement
                    .bind(SqliteType::Text, &SqliteBindValue::Null, idx)
                    .expect("Binding nulls shouldn't ever fail");
            }
        }
    }
}
  • I would:

            unsafe {
                // It's always safe to bind null values
                self.statement
                    .bind(SqliteType::Text, &SqliteBindValue::Null, idx)
                    .expect("Binding nulls shouldn't ever fail");
    +           drop(buffer);
            }
    

    to make the semantics involved explicit, so as to be more robust to refactorings.

    And, if the .expect were to be remotely reachable, which would cause the binds_to_free to drop the remaining elements without clearing them from the statement, I'd then suggest to use ManuallyDrop in this snippet:

    let mut binds_to_free = ManuallyDrop::new(mem::take(&mut self.binds_to_free));
    binds_to_free.iter().for_each(|&(idx, _)| unsafe {
        self.statement.bind…
    });
    // Now that all the stuff has been un-bound, we can finally free the buffers:
    drop(ManuallyDrop::into_inner(binds_to_free));
    

    And at that point, even more generally, you could be wrapping in a ManuallyDrop the .binds_to_free field altogether: it would avoid the .take(), as well as show the "subtle drop semantics" intent involved:

    self.binds_to_free.iter()…
    // Now that all the stuff has been un-bound, we can finally free the buffers:
    ManuallyDrop::drop(&mut self.binds_to_free);
    

    Way more unwind-safe this way, I'd say :slight_smile:

Other than these things (the most important of those being the potential unsound interaction between .field_name() and .step()), everything looks quite good to me —I really like the Box<dyn …> improvements over the manual free (the one from the previous commit quoted in this thread), for instance :ok_hand:

2 Likes