[FFI] Problem with mutable pointer in C-function call

Hi all,
I'm currently working on a refactoring of a C++ ODBC API library. Now I noticed I have problems with a certain function argument (which is a mutable pointer) that should be changed by the ODBC driver to match the columns result set but which is not.

Please have look the the source codes comments, as I describe my problem in them, too.

// Inside the extern C definitions:
pub(crate) fn SQLBindCol(
    statementHandle: SQLHSTMT,
    columnNumber: SQLUSMALLINT,
    targetType: SQLSMALLINT,
    // the ODBC interface stores the value of this 
    // DB column result set in this buffer (*mut c_void)
    targetValuePtr: SQLPOINTER,
    // tell the ODBC interface how big my buffer is
    bufferLength: SQLLEN,
    // this one will be set by the ODBC interface, indicating if the
    // column data is either NULL, or how much bytes are stored within 
    // the targetValuePtr buffer.
    strLen_or_indPtr: *mut SQLLEN,
) -> SQLRETURN;

The problem is, when I call SQLBindCol with a desired column, and then fetch the result set with SQLFetch, I noticed that strLen_or_indPtr keeps its value from the initialization:

/// Stores the information retrieved by calling `SQLDescribeCol`
#[derive(Debug)]
pub struct ColumnDesc {
    name: String,
    data_type: ColDataType,
    size: u64,
    decimal_digits: i16,
    nullable: bool,
    col_idx: i16,
}

/// My type to store the fetched data from the ODBC interface 
#[derive(Debug)]
pub struct ColumnBuffer {
    // see above
    desc: ColumnDesc,
    // the actual data from the DB column
    buffer: Vec<u8>,
    // how many bytes are set in the buffer (-1 if none)
    data_len: SQLLEN,
}

 pub fn bind_columns(&self, columns: Vec<ColumnDesc>) -> Result<Vec<ColumnBuffer>> {
     let mut column_buffers = Vec::new();
     
     for column in columns {
         let mut buffer = ColumnBuffer {
             buffer: vec![0; column.size as usize],
             desc: column,
             data_len: 0,
         };

         let res: OdbcResult = unsafe {
             ffi::SQLBindCol(
                 *self.0,
                 buffer.desc.col_idx as SQLUSMALLINT,
                 buffer.desc.data_type.into(),
                 buffer.buffer.as_mut_ptr() as *mut c_void,
                 buffer.buffer.len() as SQLLEN,
                 // here I pass a mutable pointer to data_len
                 &mut buffer.data_len,
             )
        }.into();
           
        // error handling ... unimportant for now
        columns_buffers.push(buffer);
    }
    Ok(column_buffers)
}

// ...and calling it
// ... SQLExecDirect(...);
// ... SQLNumResultCols(...);
// ... SQLDescribeCols(...);
let mut buffers = stmt.bind_columns(columns)?;
    while !stmt.fetch().failed() {
        for buffer in &mut buffers {
            // here I can see the buffer.buffer changes, but buffer.data_len not
            println!("{}", buffer);
            buffer.flush();
        }
    }

Example output:

ColumnBuffer { 
    desc: ColumnDesc { name: "PARTGROUP", data_type: BigInt, size: 20, decimal_digits: 0, nullable: true, col_idx: 1 },
    buffer: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    data_len: 0 
}

// In fact, the column value for the fetched record is in fact null, but I can't determine that here. Because 
// the buffer contains zeros either way if `PARTGROUP = 0` or `PARTGROUP = NULL`. I need `data_len` to be
// set to `-1` in this case.

Now I don't understand why ColumnBuffer.buffer gets mutated by the ODBC interface, but ColumnBuffer.data_len not. Has this maybe something to do with the heap allocated value vs. the stack allocated value?

Let me know if you need more source code.
Thanks in advance.

This example works btw:

// hardcoded test for a BIGINT column
let mut buf = vec![0; 20];
let mut ptr_ind: i64 = 0;

let res: OdbcResult = unsafe {
    ffi::SQLBindCol(
        *self.0,
        1,
        -25,
        ptr.as_mut_ptr() as *mut c_void,
        ptr.len() as i64,
        ptr_ind,
    )
}
.into();

while stmt.fetch().failed() {
    // prints: buf: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], -1
    println!("buf: {:?}, {}", buf, ptr_ind);
}

So it seems like my custom type ColumnBuffer / ColumnDesc and/or the loop has something to do with it.

Assuming this is the same SQLBindCol function what you're doing is actually undefined behavior. The pointer you pass to the function gets stored to write to later when you call fetch.

Since your buffer is on the stack when you take a reference to the field, the pointer you pass won't update the actual value, it will just clobber some random stack data when you call fetch.

You need to make sure the value is at a stable address before you try and take a reference to pass SQLBindCol. Note that just pushing into the Vec first and accessing the last value to get the reference isn't necessarily sufficient, as the Vec could reallocate on a subsequent push. That would change the memory address again. I would suggest filling up the Vec first and then looping through the Vec afterwards to call SQLBindCol.

I think you may also technically have undefined behavior due to the fact that there's no way for the compiler to know that those mutable pointers to rust data "escape" and will be used to modify the data at a later point in the program. Switching to just accessing your buffer data through raw pointers[1] may be sufficient, I'm not entirely sure about that though.
Edit: This was wrong, see later comment.

Your second example works because the value you take a reference to doesn't get moved into another location later.

Edit: removed a mention of UnsafeCell that didn't make any sense, you're passing raw pointers and not shared references which is what UnsafeCell is concerned with.


  1. e.g. by converting the Vec into a raw pointer and never accessing data through the Vec itself ↩ī¸Ž

1 Like

I assumed something like this. I can see in the ODBC trace file, that the memory address for data_len is actually the same across multiple column bindings - not what anyone would expect.
I will make sure I create a "stable type" where no reallocations are done and addresses stay the same, until fetching is finished.

Yes, in fact I was also a little bit surprised the program/lib compiled how I wrote it. I was a little bit naive to think Rust can guide me everywhere and I don't have to make any thoughts about mutable borrows. Stupid mistake, since I use unsafe here.

Thanks buddy, much appreciated!

Thinking about it some more, I think I was wrong about this. What you need to be careful of is creating references to the data you handed out raw pointers to. In your case you don't appear to be touching the buffer or len at all between when you call bind_columns and when you've called the fetch method. If that's the case I think you're fine.

Unfortunately the rules for how raw pointers interact with references are still a little bit up in the air, but not touching those fields at all between binding the columns and fetching the data puts you in a pretty safe position.

Playing around with an example a bit in Miri seemed to imply that reading data from those fields is fine but mutating it yourself via a &mut is not, but I wouldn't swear to that being true in all cases.

The problem is the move of the various ColumnBuffers into the vector. On initialization they are created somewhere on the stack, but once I move them after calling SQLBindCol, their address change. Here is a little test, to demonstrate the address change because of the move:

let mut buffers = Vec::new();
for column in columns {
    let buffer = ColumnBuffer {
        buffer: vec![0; column.size as usize],
        desc: column,
        data_len: 0,
    };
    println!("prev: &buffer.data_len={:p}, &buffer.buffer= {:p}, &buffer = {:p}", &buffer.data_len, buffer.buffer.as_ptr(), &buffer);

    buffers.push(buffer);
}

for buffer in &mut buffers {
    println!("after: &buffer.data_len={:p}, &buffer.buffer= {:p}, &buffer = {:p}", &buffer.data_len, buffer.buffer.as_ptr(), &buffer);
    let res: OdbcResult = unsafe {
        ffi::SQLBindCol(
            *self.0,
            buffer.desc.col_idx as SQLUSMALLINT,
            buffer.desc.data_type.into(),
            buffer.buffer.as_mut_ptr() as *mut c_void,
            buffer.buffer.len() as SQLLEN,
            &mut buffer.data_len,
        )
    }
    .into();
}

// prints
// prev: &buffer.data_len=0x7633dacce0, &buffer.buffer= 0x185578ad940, &buffer = 0x7633dacca0
// next: &buffer.data_len=0x185578845e0, &buffer.buffer = 0x185578ad940, &buffer = 0x7633dace00

The address of the buffer itself changes, but of course also all of its fields. Since the buffer of the Vec<u8> is heap allocated, it stays the same. This is way the populating of the buffer worked, but mutating of the data_len field did not.

Yeah, and initially I thought I don't tamper with the buffer (except for printing und flushing it after each SQLFetch) and this is fine, but as seen above, the move inside the Vec is tampering with the buffer, so this assumption isn't true.

Oh yes, I didn't mean my whole comment was wrong, just the section I quoted from, sorry if that was confusing!

I just meant I don't think there was other UB aside from the obvious move problem

1 Like

It's fine! :slight_smile: Just wanna make it clear for possible readers of the topic who stumble over similar problems, why this wasn't working.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.