How to simulate I/O errors when writing integration tests for a binary crate?

I'm writing integration tests for my binary crate (CLI application) and I'm trying to achieve full code coverage. Currently, I'm at 95% code coverage, which I think is quite good already, but the only remaining code paths that are not currently reached by any tests all are code paths that are related to error handling — specifically, they are related to handling various kinds of I/O errors.

How can I "simulate" I/O errors in the integration tests? :thinking:


Here is what I have got, so far:

  • Simulating a failure of File::open() is trivial. Just pass a path to a non-existing file.

  • Simulating a failure of File.read() is not so trivial. I have found that, on Linux, I can pass special path /proc/self/mem as input file, because this file opens fine; meanwhile any read() attempt is going to fail reliably. But this is a Linux-only solution. And I'm not even sure whether Linux gurantees this behavior for all distrubutions and for all future kernel versions. It seems a bit brittle. Is there a more general, platform-independant solution?

    I thought about writing a custom Linux device driver (kernel module) that creates a virtual file /proc/fail, which always opens successfully — the open() implementation would be just a NOP — and that fails on any read or write attempt — the read() and write() implementations would simply return EIO right away. But this doesn't really solve the platform-specific issue. And it makes the test setup even more complex.

  • Simulating a failure of ReadDir.next() seems to be very hard. I have not found a way to simulate the case that a directory can be opened successfully but then fails to iterate the dir entries! Maybe this could be simulated with a custom device driver, but it seems to go beyond what procfs can do, and therefore I don't really have an idea to implement it.

  • Simulating a failure of write!() to the stdout stream is easy again. I can simply create a pipe, close the "read" handle of the pipe immediately, and set up the remaining "write" handle of the half-closed pipe as the Command::stdout() of my child process. This causes any write!() to stdout inside the child process to fail reliably, with a "broken pipe" error.

  • Simulating a failure of read!() from the stdin stream is difficult. Setting the Command::stdin() of the child process to the "read" handle of a pipe whose "write" handle has already been closed, does not seem to trigger a failure of read!() from the stdin inside the child process. I have not found a way to make read!() from stdin fail!


So, is there a simpler way to simulate these errors?

Or, is there at least some way to simulate the error cases that I currently could not simulate?

Regards.

Are you asking how to do this without stubbing out your IO layer? Because stubbing out the IO layer is the first thing I'd reach for.

If you don't want to touch your program's source code at all, then you can swap out the syscall library (libc.so on Linux, Kernel32.dll on Windows) with a custom one that returns errors from the appropriate system calls. Doing this will also let you simulate things like memory allocation failures.

Another option is to use the ptrace API to halt the program whenever there's a system call and change the return value to an error code before resuming the program. I don't know how to do this on Windows, but I guarantee you it has something equivalent.

On Linux there is also FUSE which I believe lets you emulate arbitrary filesystem behavior.

Thanks for your response! :slight_smile:

Yes, I think so.

My application uses I/O functions and struct's directly from std. Of course, I could write my own I/O layer. This way I could "fake" the error inside the application itself. But I only need to "fake" the error for one particular test case. In all other cases (and certainly in the "release" binary!) this must not happen. Default behavior just forwards the call to the std I/O routines.

So how would I trigger the "fake" error, but just in one particular test case?

Environment variable ?!

What I don't like about this idea is that it adds a lot of complexity all over the application — because every I/O call now would have to go through the additional I/O layer instead of simply calling the "normal" std I/O routines directly. Also, this approach probably adds a runtime overhead to my application. All of this, just to make one specific integration test possible :worried:

I don't mind adding complexity and/or runtime overhead to my test suite (integration tests), but every overhead that is added to the actual application, just to make it "testable", is a concern.

This sounds very complicated. Yes, I could probably build, e.g., musl libc from the sources, which would then allow me to modify it as needed. But, assuming I wanted to "manipulate" a system function like read(), how would I know which particular read() invocation is supposed to fail?

Just making every read() fail unconditionally would probably break too much :sweat_smile:

Also, it would mean that my integration tests could only run successfully on a system with my "custom" libc library installed. That's quite a constraint. Not sure I like it.

Now, this sounds interesting! But I have no idea how to do it :thinking:

Also, what you describe sounds like a manual process. But what I would need is something that can be set up automatically from the test suite. And only for a particular test case. That plus: It would need to automatically catch and modify just the right syscall. Is that possible?

Regards.

I'd say it would be better to move the logic out of your binary and put it into lib, then parametrize lib with something it might read data from.

Not only it makes your code testable, but also gives you ability to substitude stdin with sockets, add filters and so on.

fn read_from_stdio(dest: &mut [u8]) -> std::io::Result<usize> {
    stdin().read(dest)
}

fn do_all(get_data:fn (&mut [u8])->io::Result<usize>) {
    let mut data = Vec::<u8>::new();
    get_data(&mut data).unwrap();
}

fn main() {
    do_all(|a|read_from_stdio(a));
}

You can now parametrize do_all with anything your like.
Function pointer is free (or almost free) abstraction here.

the easist way to cause libc::readdir() to fail is to remove the directory. this usually results an ENOENT. other ways to cause the filesystem failure include: unmount the underlying filesystem, kill the FUSE daemon process, interrupt the network connection for remote filesystems (e.g. sshfs), etc.

note, lib::readdir() buffers the directory entries in userspace, NOT every readdir() would issue the getdents64 syscall. this means, for small directories, only a single syscall is needed the first time you called <DirEntry as Iterator>::next(), subsequent next() calls would not be affected even if the filesystem is unmounted.

so, if you don't want to use huge directories (or, huge entries in a small directory, e.g. very long file names) in your tests, you should inject the fault right after the std::fs::read_dir(), but before the returned iterator is being advanced.

reading a closed pipe returns EOF, that's well documented behavior. (on the other hand, writing a closed pipe would return EPIPE).

to inject failure to stdin is no different from regular files. just redirect stdin into a non-readable file, instead of using pipes. alternatively, you can close the file descriptor of stdin, e.g. libc::close(0), but I would not recommend it, since I'm afraid this might cause the stdlib to fail in unexpected way, other than simply failing the stdin.read() call.

Actually, that is pretty much the design I already have: A library crate that implements all the "core" functions and a binary crate that implements the CLI application.

The "core" library is intentionally designed to not do any I/O by itself. It also does not write any error messages to the terminal. Put simply, all that the "core" library does is: it takes the input data from the source buffer (&[u8]), performs the desired computations, puts the output data into the target buffer (&mut [u8]) and returns a Result<(), ErrorCode>.

But this means that all actual I/O operations, like reading data from a file, iterating directory entries, printing result to terminal, etc. pp., happens in the CLI application (binary crate). This also means that there are a number of code paths for handling potential I/O errors in the CLI code.

It is these "error handling" branches in the CLI front-end where code coverage of the integration tests is lacking. Achieving full code coverage for the "core" library was easy :smiling_face:

The big question with this approach is: How would I orchestrate all of this — with the right timing — from "outside" the process under test, i.e., from the test code? :thinking:

Keep in mind that integration tests for a binary crate look something like this:

#[test]
fn test_without_extra_crates() {
    let output = Command::new(env!("CARGO_BIN_EXE_program"))
        .args(/* test-speciifc parameters go here */)
        .output()
        .expect("failed to execute process");

    assert!(output.status.success());
    let stdout = String::from_utf8_lossy(&output.stdout);
    assert!(stdout.contains("Expected Output"));
}

What would be a good "non-readable" file?

Since the GitHub runner runs everything (including tests) as "root" user, access permissions are ignored. So, using access permissions to make a file "read-only" won't work. However, in the meantime, I found that we can set stdin to a directory to make all reads from stdin fail :smiling_face_with_sunglasses:

if you want to test the binary as-is, without any instrumentation, then the only possible way is to use os-specific mechanism, and it would be VERY tedious and error prone, but the alternatives would be equally as complicated, or even worse.

for example, you can use FUSE to hook/mock filesystem operations, like opendir(), readdir(). essentially, you need a mock filesystem implementation in your test harness. or you can create a pty pair to mock user interactive input events, etc.

some alternatives may include:

  • to intercept at the syscall boundary, e.g. using ptrace, as suggested already. this is like implementing a mini debugger in your test harness

  • to intercept at the libc boundary, e.g. using LD_PRELOAD or similar hacks, this almost feels like creating malware tbh.

none of this methods is simple by any means. then the question becomes: is it worth all the effort? is the 100% coverage goal realistic to begin with? or, is there alternatives to achieve (almost) the same level of quality control without the need of uninstrumented tests? or at least, make the program in a way so that end-to-end tests don't need complicated os level mocking?

yes, that's also a good way to force the read() to fail, another method is to close the stdin file descriptor. they give different errno:

$ cat /
cat: /: Is a directory
$ cat 0<&-
cat: -: Bad file descriptor

Would something like this scratch your itch? Triggering fake errors on a test-by-test basis might get trickier, though you could still force your underlying override to branch into different kinds of failures depending on the input. As in: on "must-not-open" in the filename, fail at open(); on "must-not-read", fail at read(), etc. You would be somewhat defeating the whole purpose of integration tests, as there's no testing of any "integration" of your code with the underlying OS; but - given the target of "100% coverage" alone - it might be a cute little exercise, if nothing else.

So compile out the stub implementation in release builds.

Doing I/O "all over" the application is usually a sign of bad design.

Assuming you don't compile out the stub implementation in release mode, the I/O itself is still likely going to be much more expensive than the overhead from your wrapper.

Your custom libc.so doesn't need to actually implement any of the syscalls, just forward them to the real libc.so

You don't need to install it, you just need to force the custom one to load using something like LD_PRELOAD.

You can read the arguments to the system call before deciding whether to change its behavior. If I were doing this I'd run the test multiple times and cause the Nth system call to fail where N increments with each run. That way you're testing all of your system calls. You could also make them fail at random (Tigerbeetle's test suite uses a more advanced version of this strategy).

The ptrace method can be implemented in a handful of lines of code.

I think what @throwable-one meant was move the "io" logic out of your binary and put it into lib.

I will probably look into this, but haven't used the ptrace API before.

Sure, if the all methods that could achieve full code coverage in the "error handling" code paths are excessively complex, then I will probably just have to ignore these.

I was hoping that there might be some "simple" solution that I hadn't been aware of, but this doesn't seem to be the case :neutral_face:

BTW: It's a bit unfortunate that #[coverage(false) still is unstable.

This would mean that the application code being tested would actually be different from the "real" (release) application code. Wouldn’t that be kind of “cheating” the test :thinking:

Actually, as pointed out above, I already have a design where the "core" library and the CLI application are cleanly isolated as separate crates. All the I/O operations are done in the CLI front-end, whereas the "core" library only implements the pure computations.

But this means that the CLI application unavoidably needs to perform I/O operations — such as scanning a directory for input files, reading a specific input file, writing out the results, etc. — at various places, in order to implement all the CLI commands that need to be supported.

Rerouting all of these I/O calls through my own I/O layer would add a lot of code complexity to the CLI application, just to be able to "fake" I/O errors for a handful of integration tests...

:backhand_index_pointing_right: IMO, tests should not force unnecessary complexity into the production code.

...but "my" libc would have to implement those "forwarding" stubs for all syscalls.

I think it would be more straight forward to just build an existing complete libc, e.g. musl libc, from the sources and modify it at a few places as needed.

Anyway, I think that writing my integration tests in such a way that they will only work properly in conjunction with a "custom" libc... doesn't seem like a viable approach :pensive_face:

There are two aspects to this, which the whole endeavour (going off by your description) seems to implicitly mix and mash. For one, there's your application code, which in its (relative) isolation is meant to account for all the things that could go wrong. When testing for this, it's not your job to attempt to replicate the exact conditions under which those things do go wrong: your only job is to account for them. In Rust, that boils down to (among other things) making sure you don't accidentally unwrap() or expect in a place where you should be match-ing of if let Err(..)-ing instead.

Mock interfaces are meant to help you do just that. Regardless of whether you implement them behind a bunch of #[cfg(test] compilation guards or via your own I/O wrappers, the task itself doesn't change all that much. That's one part of the equation all done and over with, hopefully.

The other side is about how well your program interacts (integrates) with the world outside of itself. If you're querying a DB over the TCP and your connection freezes, or the server hangs, or something else goes haywire: does your program go haywire with it? "Integration testing" and "simulating I/O errors" are therefore, by definition, fundamentally at odds with each other. Either you're testing things out in a real-world use-case scenario, or you're bending over backwards in an attempt to consciously over-engineer increasingly obscure edge cases neither you nor your library nor its users will ever stumble upon; and you're doing it all in the name of what - a perfectly round figure of "100%"?

If you want to extend the extent to which your program is able to account for, and recover from, some implicitly assumed properties you're relying on when interfacing with std::io or std::fs or whatever else it might be, manually mocking/overriding the specific modules to (spuriously) produce the kinds of errors you aren't (and would never be) able to reproduce on your own, is enough.

Or you can keep on digging through every nook and cranny of every platform and OS under the sun along with all the ways in which each and every one of them could happen to violate any of the aforementioned properties, with no regards to how often it factually occurs 99.9% of the time.

Unless you're planning to pitch your CLI tool to NASA or any other contractor, requiring perfect 100% code coverage compliance alongside a whole array of additional requirements?

The goal with tests is to minimize false positives, false negatives, and test run time. Minimizing changes to application code helps reduce false positives. Refusing to instrument application code at all can increase all three.

Only if the difference between the instrumented and non-instrumented binaries would change the result of the test.

Complexity of the test code should also be a consideration.

this is a textbook case for a mock. you can use mockall
once you have a mock of all the stuff you need you can do

#[cfg(not(test))]
std::io::{//std io stuff}

#[cfg(test)]
mocks::{//mock io stuff renamed to std io names}

your code will compile with the mocks instead of the real structs during testiing and you will be able to emulate whatever scenario you want

No, you only have to export the symbols you want to override. The rest automatically resolve to the original libc.

You are right! :sweat_smile:

The trick is to not replace the "libc" library completely, but use LD_PRELOAD for loading a custom library that contains just the functions we need to override. These will automatically take precedence over the "default" implementation (from libc), and we can even pass-through the call to the "default" implementation via dlsym(RTLD_NEXT, "function_name") as needed.

Another important thing to know is that we can use readlink() with /proc/self/fd/%d to translate a given FD back to the original file path, so that we can figure out (inside the "override" library) which calls we need to intercept and which ones we have to pass through as-is.

FWIW, here’s the “override” library I’ve come up with:

Works with opendir() and readdir() just as well :smiling_face_with_sunglasses: