How do you test binaries? (not libraries)


As part of my learning project of porting Apple’s shell_cmds to rust, I would like to properly test things.

Here’s the thing. Some of the utilities are mostly just custom parsing of command-line arguments, which means I’m trying to test the binary itself.

In my project, I have library code like this: src/ I can put unit tests of library code right in this file. That’s fine.

Then I’ve got binaries that look something like this: /src/bin/ – this binary is where the command-line parsing happens.

I can make integration tests in tests/ to test inter-library stuff, but I don’t see any way to get at the binary’s code.

So, I’m thinking the answer is to spawn the binary from the integration test file and interact with it that way. But how do I know which binary has just been produced (ie should I call target/debug/command or…?)? I’m hoping I don’t have to spawn a shell and then call cargo run --bin command…because that seems awful kludgy.

Ideas for writing portable tests executing external programs?

You might want to take a look at how cargo’s integration tests handle this - I’m not super familiar with it, but the logic should be in somewhere.


Good suggestion. So after some research, it appears that cargo has created an elaborate builder pattern for their own command handling, and cargo’s own tests that test cargo’s functionality use the builder to build up an internal representation of a cargo command and run that directly on projects that they generate on the filesystem just-in-time.

So, in essence, not even cargo itself tests its own actual binaries. It’s just libraries testing libraries. Turtles all the way down. :frowning:

I suppose the lesson is that all the code should go in the library, and the binary should just be a line to call a function to processes the arguments, and a line that passes the processed arguments to the library code. Then there’s really no point in testing the binary, eh? Perhaps the same assumption is behind binaries not being included in document generation. Whatever the case, I am getting the distinct impression that I ought to minimize the size of my binary source files. Then it’s a moot point. I’ll have nothing in the binaries worth testing. Well, except for their actual interaction with the outside world.

I had an interesting ride through cargo’s code for about an hour! Clever stuff. Very hairy code.

I didn’t particularly like the pattern of looped-back module namespaces, that essentially made many things globally scoped. They have this ops module that pulls in pub stuff from all over, and they all reference each-other through ops again. Is this a common pattern? It meant that every time anything used something from another module, I had to literally go look at the giant ops/ file to find its actual file location. I can see providing such a thing for external users of the library to use to have a more stable experience, but using it internally seems like it would encourage (or at least tolerate) poor code organization.


Hmm. I think I am mistaken. It looks like deep in the bowels of cargo the binaries are somehow being compiled with the tests and run externally. More research to do.


@alexcrichton may be able to give some more detail on how cargo’s test stuff works.

I am also not the biggest fan of exposing things in multiple places at the same time for the same reasons you’re finding :slight_smile:


Yep, cargo tests suite is deferentially worth looking at! It’s one of the best test suites I’ve worked with.

The absolute majority of tests literally call into the cargo binary and inspect its stdout/stderr together with the filesystem state. It’s interesting to note that --verbose flag is used to check the internal states. For example, to check that only one binary is compiled with --bin name flag, you check all rustc invocations which are visible with -v (test source).

To see how cargo project is actually executed, look at the cargo and cargo_dir functions.


I think this is mainly a tooling problem: if you have good “go to symbol by name” and moderate “go to definition for symbol at caret”, then cargo code base is really easy to navigate.

I personally quite like this ops facade: in 25 lines it summarizes all the stuff you can do. Then, in implementation modules, you can write crazily complicated implementation without the fear of affecting the API.


I can think of another way you’d test your binaries. Just hack together a script which will pass a set of command line arguments to the binary and check that the output is what you expect. That’s only feasible if your program finishes in a reasonable amount of time, of course.

I personally prefer the approach you mention. Have everything in libraries then your binary is just a couple lines which process the inputs and call functions to generate outputs. Then you can run the binary manually to ensure the entire system works as you’d expect. If it’s only a handful of lines then it’s hard for bugs to slip in.


If Cargo is too overwhelming, here are a few more tools that run integration tests on a binary. All of them run tests that invoke the binary and inspect stdout.


This is the approach that I prefer. You see this same pattern at other layers of abstraction, for example, separating IO from logic, or patterns like CQRS.

In this case, the tough part is dealing with the binary aspect, so you make that layer as thin as possible and then just don’t test it. Or maybe you do, but I rarely find tests like that to have tons of value.


For your original question, testing binaries, I’d definitely recommend just executing the binary itself and asserting the stdout/stderr/status (Command::output is super useful for this). Cargo will automatically ensure that your binary is compiled when running integration tests (tests/*.rs) as well.

Also FWIW I think Cargo is a good example of what “almost all the bells and whistles” would look like, but it’s pretty complicated as others have mentioned. Cargo does exec itself (via Command, behind many layers), though.


Executing binaries seems fine and I quite like the concept of testing it as a whole, not just some layer, but how would one test a binary that never finishes (or at least in a reasonable time), like (HTTP) servers? Just spawn a thread with a binary, do a request, then kill the spawned thread? Are there some crates that do something like that?