Incapsulate global FFI access + inner mutability

I have a C library linked statically that reads a file and fills global variables with information from the file. I have managed to interface with that library with FFI, so I can get contents of the file through one-shot call to the C library unsafe { read_file(file) }. Then I can read external static global variables. There is also a C-call unsafe { free_file() } that frees memory under the global pointers.

I am not competent enough, though, to determine how exactly I should build the Rust application that reads the file through the library and structures the data afterwards (because the output of the C code is just a bunch of globals), thereby I am asking the forum to help figuring out proper program design.

I have found the recommendation to extract external reading routines into a separate module that blocks vision of the FFI globals. I accepted that and also decided that I should have some struct that absorbs all the globals (copies data from them into proper Rust data structures and then C-calls free_file()). The overall data structure I have is like this (pseudocode):

struct FileInformation {
    pub block1: Block1,
    pub block2: struct Block2 {
        pub sub1: Block2_Subblock1,
        pub sub2: Block2_Subblock2
    }
}

where Blocks contain properly (from my pov) structured data.

I assume that the overall constructor should look like this:

impl FileInformation {
    fn new(file: Path) -> Self {
        unsafe { read_file(file) };
        let block1 = Block1::read();
        let block2 = Block2::read();
        unsafe { free_file() };
        Self { block1, block2 }
    }
}

where block reading functions know what globals they should access.

The problem is that there is nothing to stop Block1::read() from accessing the globals, some of which are pointers, when these globals are not valid (either the file hasn't been read yet or the file was already freed). The function Block1::read() may be called from any context, not only from FileInformation::new(), because block variables are public.

This 'reader' function cannot check itself whether the pointers what will be read are valid or invalid, they just will read and cause segfault if the calling context is wrong. How do I enforce both incapsulation and proper context checking (=> error handling) in this case?


The second question. Suppose that Block2 has a field sub2 that perfectly belongs there by its very meaning, but its data supplier is another source, not the file we are reading. The program can live fine with empty sub2, but with restricted capabilities if sub2's source is not available. Rust teaches us that this is the place for Option:

struct FileInformation {
    pub block1: Block1,
    pub block2: struct Block2 {
        pub sub1: Block2_Subblock1,
        pub sub2: Option<Block2_Subblock2>
    },
    pub block3: Block3
}

although if the sub2 source becomes available when the main struct FileInformation is already constructed, I do not want to re-read everything but rather want to take existing struct and add the missing information. If I do not know, however, when exactly that source will be available, I should either declare an instance of FileInformation as mut and carry that mut around, which I do not want to, or I want to have partial inner mutability on the field FileInformation::block2::sub2 even if the whole FileInformation is not itself mutable, or something else I am unaware of. What would you consider being "the right version" for this problem?

For your first problem:

pub struct Block1 {
...
}

impl Block1 {
    fn read() -> Self { ... }
    pub fn is_awesome_block(&self) -> bool { ... }
    ...
}

Block1::read() is then only accessible in the same module it's defined it, which should also contain struct FileInformation. That way, users of FileInformation or Block1 outside the defining module can't trigger it to read from the globals that the FFI library accesses, and if Block1::read is the only way to create a Block1, then it can only be created by code in your module. Because this module only does the FFI accesses, you can keep it nice and small, and have it be part of a bigger crate that knows how to do useful stuff. Visibility and privacy - The Rust Reference for more on privacy options

For the second case, I'd use update syntax, and probably a helper function:

impl Block2 {
    fn set_sub2(self, sub2: Block2_Subblock2) -> Self {
        Self {
            sub2: Some(sub2),
            ..self
        }
    }
}

This allows you to create a Block2 that's filled with the content of self, but has some information overwritten (in this case, sub2). You rely on the compiler to optimize this - which is possible because you're passing in an owned Block2 and returning an owned Block2. Defining and Instantiating Structs - The Rust Programming Language has details on struct update syntax.

You might also want a helper on FileInformation (I've written this in a different style that you could steal for impl Block2 above if you prefer it:

impl FileInformation {
   fn set_block2_sub2(self, sub2: Block2_Subblock2) -> Self {
        let Self { block1, block2 } = self;
        let block2 = block2.set_sub2(sub2);
        Self { block1, block2 }
   }
}
1 Like

Thank you for the reply

Do I understand correctly that, while I am inside the FFI module bounds, I can be not so defensive about pointer accesses? By the way, could you please say something about is_awesome_block function here, does it belong to the discussion or is it just some dummy function?

Rust's privacy controls are granular, with the module being the smallest unit of granularity.

To use pointer accesses, you have to use unsafe - but unsafe can result in large areas of code that need to avoid breaking invariants that your unsafe block depends on. If you break an invariant that unsafe code depends on, you can end up in Undefined Behaviour, which in turn means that an apparently innocent piece of code can break your program in weird and wonderful ways.

By keeping all the things the unsafe code depends on in a single module, you ensure that only that module (the FFI module in your case) could potentially trigger UB. By careful design, you can ensure that the invariants that your unsafe code depends on (like "you must not access the globals before read_file or after free_file) are contained in the FFI module, and thus your code outside the FFI module cannot cause UB.

And is_awesome_block is just a dummy function to show that accessors that don't depend on what the unsafe code is doing (nor affect what unsafe code is run) can be pub, but that code like read which depends on invariants from unsafe code should not be publicly visible.

2 Likes