Segmentation fault while building an interface for zlib

Ciao Rust community,

I am currently fiddling around with the FFI of Rust. In fact, I need to call a few C functions of from the zlib C library. Right now I am focused on the compress method:

ZEXTERN int ZEXPORT compress OF((Bytef *dest,   uLongf *destLen,
                                 const Bytef *source, uLong sourceLen));

My C knowledge is very limited, which may be one of the reasons I am struggling with this.
The Rust code I wrote up until now, is as follows:

use crate::prelude::*;
use std::convert::From;
use std::os::raw::{c_int, c_ulong};

type Bytef = u8;
type uLongf = c_ulong;
type uLong = c_ulong;

#[link(name = "z", kind = "static")]
extern "C" {
    fn compressBound(sourceLen: uLong) -> uLong;

    fn compress(
        dest: *mut Bytef,
        destLen: *mut uLongf,
        source: *const Bytef,
        sourceLen: uLong,
    ) -> c_int;

    fn uncompress(
        dest: *mut Bytef,
        destLen: *mut uLongf,
        source: *const Bytef,
        sourceLen: uLong,
    ) -> c_int;
}

pub struct Zipper;

impl Zipper {
    pub fn compress(buffer: &Buffer) -> AlsResult<Buffer> {
        let unzipped_data_len: uLong = buffer.data_len() as uLong;

        if unzipped_data_len == 0 {
            return Err(throw(
                3,
                AlsErrorKind::Unkown,
                format!("Passed a Buffer with a data length of `0` to Zipper::compress"),
            ));
        }

       let zipped_data_buf_len: uLong;
        unsafe {
            // this call to zlib is successful
            zipped_data_buf_len = compressBound(unzipped_data_len);
        }

        let mut buf = Buffer::new(zipped_data_buf_len as usize);
       
        // Creating raw pointers for `compress()` here
        let buf_ptr = buf.as_mut_ptr();
        let raw_data = buffer.as_ptr();

        unsafe {
            // this crashes
            let result = compress(
                buf_ptr,
                zipped_data_buf_len as *mut uLong,
                raw_data,
                unzipped_data_len,
            );

            match LibZCode::from(result) {
                LibZCode::Ok => (),
                any => {
                    return Err(throw(
                        0,
                        AlsErrorKind::Unkown,
                        format!("Error calling zlib::compress() = {:?}", any),
                    ))
                }
            }
        }
        Ok(buf)
    }
}

Buffer looks like this:

pub struct Buffer {
    data: Vec<u8>,
    capacity: usize,
    data_len: usize,
}

cargo test throws the following error for a unit test of Zipper::compress(buf):

 process didn't exit successfully: `/path/libbase/target/debug/deps/libbase-4b4bb560900cabe5` (signal: 11, SIGSEGV: invalid memory reference)

Do I have to change the in-memory representation of the data field of the Buffer struct - if so, in which way? Buffer implements AsRef<[u8]> and AsMut<[u8]>, referring to Buffer.data. I tried to call the zlib compress function with two Vec<u8>, but this didn't work either.
Is the mistake here on the Rust side, or do I have to check if I violate any contracts for the C pointers the compress() function uses?

Best regards and stay healthy!

is likely the problem. You are literally converting that "random" integer to a memory address – that's almost certainly not going to be a valid address, and it's certainly not how the function is meant to be used.

This instead looks like an out parameter. Since C functions can only return a single value, if multiple values are to be returned (such as the compressed data and its length), only one of them can be the "real" return value, and the other ones will need to be written to using pointers to pre-allocated space.

So what you want to do instead is declare zipped_data_buf_len as mutable, and pass a pointer to it:

let result = compress(
    buf_ptr,
    &mut zipped_data_buf_len,
    raw_data,
    unzipped_data_len,
);

As an aside, this is unnecessary and the uninitialized declaration is non-idiomatic:

let zipped_data_buf_len: uLong;
unsafe {
    zipped_data_buf_len = compressBound(unzipped_data_len);
}

Blocks are expressions, so you could (and should) write this instead:

let mut zipped_data_buf_len: uLong = unsafe {
    compressBound(unzipped_data_len)
};
2 Likes

Just in case you're not aware of it, the libz-sys crate provides all the extern "C" declarations you should need to interface with zlib via FFI. (See also: https://kornel.ski/rust-sys-crate.)

You're right, the use of zipped_data_buf_len as *mut uLong was indeed the root of the problem. The underlying u64 type had no .as_mut_ptr() method, so my thinking was to parse the variable into a raw u64 pointer - but this pointer should not point to the address of the value behind zipped_data_buf_len at all, right? I think I have some knowledge gaps around &mut - but I assume the mutable reference is behind the scenes the pointer to the memory address I was looking for?

Thanks for the great tip regarding the idiomatic use of the return value of a block, I often forget how simple it is to write idiomatic Rust code. :slight_smile:

Yup I am aware of that, but since I am in the process of learning the FFI in Rust, I have to do this by myself to get used to the syntax and things you have to take into account. :slight_smile:

1 Like

That's why you should just pass a mutable reference to it. The vector has an as_mut_ptr() method because it returns the pointer to its heap buffer (which you actually want to overwrite) and not to the vec object itself (which you absolutely do not want to overwrite). But in the case of a primitive, there's no extra hidden backing storage to use – you really just want to have your u64 overwritten as-is.

Yes, Rust references are pointers. That is their fundamental property. They are just smarter than C pointers because they carry metadata at compile-time, so you can't use them incorrectly. But a &mut T can be coerced into a *mut T implicitly, so you can use it in FFI.

Sorry but I have no idea what you are asking here. You don't need to parse anything. If the C function expects a pointer, just pass a pointer. Don't try to perform tricks, those never work out well.

I see. C doesn't matter about the Vec itself and it's fields (pointer to heap buffer, len, capacity), so with the as_mut_ptr() method I get the pointer to the actual data, which zlib can use to modify its contents. With the separate argument sourceLen the C code will only access the memory which is within the allocated memory the Vec created (of course, same for destLen).
In fact, at first I wrote let result = compress(..., &mut buf, ...) and wondered why this doesn't work, but now I know; this is a safe pointer to the vector (Buffer implements AsMut), then Rust coerced it to an unsafe pointer *mut Vec. But this is not the kind of type, the C function expects.

In case of the u64, I just didn't know how to get a raw pointer to it and gave x as *mut uLong a try (at least the type signatures for the arguments of the compress() function were satisfied that way). What I was trying to ask, is the following: will the value which resulted from this parsing process be garbage, and the subsequent dereferencing in zlib throw the said segmentation fault?

The problem is not even the type in itself, the problem is that the vector object itself (including its pointer and length data) have been overwritten.

It will not exactly be "garbage"; it will predictably have the same value as the usize itself (if you initialized it to 0, then it will be the address zero), just converted to a pointer. But it definitely won't be the address of the integer itself.

And its my responsibility to restrict the usage of the function for a safe interface to zlib.

Yeah, it will most probably result in undefined behavior then.

Thank you for the great and detailed answers, they helped me a lot on the topic to see things clearer. :slight_smile:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.