Soft question: testing process?

Although I have tried unit testing a few times in Rust, I have largely taken "if it compiles, it probably works" to the logical conclusion of "I probably don't need unit tests." Unfortunately, this technical debt is now starting to slow down development time.

For those proficient with unit testing in Rust: what is your process / "art" of unit testing ?

EDIT: I just realized "unit testing" might mean something specific to some people; so I want restate my intentions: advice on testing in general is fine too.

2 Likes

Generally when I write tests, I try to write one to three tests of ordinary behavior, and then additional tests for situations I think are likely to be incorrect in some way.

As for whether my tests count as unit tests or not, well I don't really know. I try to put tests in the tests/ folder when I can, but sometimes you need to test something private and so I put it inside the library itself.

This sounds like a lot of work. What "test loc / code loc" do you often achieve ?

In Tokio we have 40k lines of code and 18k lines of tests.

But it's important to point out that the "density" varies a lot depending on which part of the codebase you're looking at. For example, the core of the runtime has a lot of unsafe code, with a lot of tests (including loom tests) to accompany it. The tokio::io module has much much fewer tests.

1 Like

Do you maintain the same level of testing discipline in solo projects, where there is not an external social pressure of: "you can't commit this change without providing a test for it" ?

I've never quite founding a testing setup where testing felt like productive rather than a burden. The closest would be TDD, where it almost induces a video-game like addiction of iteration cycles; but those tend to break outside of small examples.

No.

1 Like

The professional developers on my team tell me that it's not unusual to have 2-3x as many lines of test code for code going into production. They tell me that the effort really pays off by reducing errors from creeping in under maintenance. I am much more lax about testing for early prototypes and code that only I will touch. I also find test code useful when figuring out how to use someone else's program.

I see tests as very helpful: they're the tool which lets me know that the code dies what i intend.

I won't be executing or using every feature of my library when using it from a program, so it's hard to ensure that the features keep working. This is even more true for corner cases which might never be exercised by hand. Test cases let me automate this: I code the scenario once and I know it will be checked for me on every cargo test.

It is a lot of work to write tests . It's not at all unusual to have as much test code as normal code.

As you mention, there can be a bit if an art to it. For example, if you write unit tests for internal functions, you'll likely have to change the tests when you change these functions. This can be a huge maintenance burden which slows everything down. If you instead write tests at a higher level, then internal refactoring should not affect the tests. There is a judgement call there: perhaps your internal helpers are tricky and it's hard to tell if they're correct from the high level? Then they probably need their own battery of tests.

I once worked on the Mercurial version control system and there we almost exclusively relied on very high-level black box testing where we would execute the whole binary and look at things like hg log. Because of the cryptographic hashes used throughout, we could be quite sure that things worked correctly when we saw the expected hash in the commit log.

1 Like

This is precisely what I am suffering from. These tests -- are they the 'byproduct of TDD', or are they intentional tests ?

I agree with you that tests are net benefit. The problem is that their cost is obvious to present/current me, but their benefits (though tremendous) are not easy to measure. In particular, imagine a test catches a trivial refactoring bug that would otherwise have taken days/hours to debug down the line if discovered; the 'savings' the test provides are not so easy to measure.

This is definitely one area where I start losing tests. Another area is when I have to mock objects -- that just feels silly, to have the actual code, then to have a 'fake code'.

Is this documented somewhere? I would be very interested in reading more about this.

Yes, the overall process is described in the Mercurial wiki. You can browse the tests in the repository if you like, look for the .t files.

The approach is really nice for command line tools so people extracted the test runner. There is a Python version here: Cram: It's test time and I also wrote a version in Go: GitHub - mgeisler/cram: Go port of Cram.

For me, it's mostly a matter of convenience: sometimes it's too complicated or expensive or time consuming to setup the real thing. A mock can then be a super alternative. However, if I can run the real thing, then I normally prefer doing that.

1 Like

They tell me that they use TDD (when they use it) to define the happy path and specific tests for each unhappy case they can think of. It is a lot of work, but it pays off by keeping bugs out of production.

1 Like

I wouldn't pretend to be proficient, but I have written a couple of unit tests which aim to compare one implementation with another ( simpler implementation ) where the code is especially tricky ( lots of cases ). I would say it takes more thought and effort to think up useful tests than to write the code that is being tested!

Here is an example, I check that two implementations of a storage interface give the same results:

Another idea is to compare the same implementation with itself, but using different parameters, e.g.

Unit test? The meaning of that weather depends on what you mean by "unit".

In my experience of avionic and military systems a unit was typically the smallest part of a project that could be compiled by itself. Say a C source file. Although we would not have dreamed of using C back in the day. More likely Ada. Anyway such a unit/module/whatever could be built against a test harness and tested in isolation of the rest of the project.

Such unit tests were thorough. They required 100% code coverage. That is to say every statement/expression had to be tested at least once.

Then the testing of the logic went into minute detail. For example if one had something like:

if x >= y {
   bla();
}

One would exercise that condition for x equal to y, for less than y by 1 and more than 1, for x greater than y by 1 and more than 1, for both and and y at the minimum and maximum of their ranges in all combinations....

The idea being to check that the programmer had not accidentally used just ">" instead of ">=" or whatever the specification demanded. Oh, did I mention, all such units had a design specification and a test specification, specs, code and test were all written and reviewed by different people. When writing tests one did not get to see the source code only the specs.

In that kind of environment unit tests can easily be far bigger than the source being tested.

Then there were the integration tests...

Typically we don't want to do all that. Often though when I'm working on a "fiddly" part of some program that I'm developing I will develop and build it as a stand alone program that I can write simple tests for. When looks like it's working correctly and I can't break it with tests it gets used in the main project, as a library or just cut and pasted in :slight_smile: In the Rust world it can be it's own crate I guess.

1 Like

Here is a recent example of insufficient testing: ( article date is 5 June 2020 ).

Yeah, I think there's an important discussion here about which level you're testing at. In Tokio, the majority of our tests operate on our public API directly, and internal functions and data-structures only have tests when there's something non-obvious going on, or when the test is something we added together with a bugfix in that internal code (we try to always add a test for the bug when we fix a bug).

I think there's also an important discussion about mocks. I generally try to avoid mocking as much as I possibly can. I would much rather just use the real thing and create files in temporary directories than mock out the file system. Similarly for TCP streams and such. The file system is plenty fast for our tests and it is not the bottleneck.

2 Likes

Warning, unusually strong opinions ahead :slight_smile:

I think testing is a bit of an art, in a sense that my experience tells me that the overall state of knowledge about testing in the industry is confused. That is, there's a lot of very confident advise, which turns out to be wrong or self-contradictory. As (one) evidence of confusion, "software engineering at google" book describes google-wide shift of opinion (overly simplifying) from "mocks are good" to "mocks are bad". Ie, google-size/quality company can get testing methodology wrong.

There's a lot of ways specific testing approaches can go wrong and become a drag on productivity:

  • tests can ossify code, making even simple refactors 10x costlier, which over time leads to compound ossification.
  • tests can be brittle, requiring multiple tries to merge any, even trivial, bit of work, encouraging people to submit larger changes as a unit
  • tests can be slow -- running a test suite on 12 core machine can take half an hour, with an average CPU utilization of half-a-core, because of pervasive timeouts and sleep-based synchronization.
  • tests can create direct burden -- it can be the case that to test simple sounding cases like "when X happens in condition Y the result is Y" would require hundreds of lines of boilerplate setup, which completely obscure X, Y, and Z.
  • testing can be very ineffective at uncovering defects -- you might have thoroughly tested and production database, and then aphyr (https://jepsen.io/) will come up with an utterly trivial setups which break any consistency guarantees you purportedly provide.

If you dodge all the pitfalls though, than yeah, testing becomes transformative, to the point where you don't care about the actual code at all, as it becomes trivial to change, refactor or throw away.

I've written some of the positive testing advice in How to Test.

1 Like

This one is certainly quite important. Tokio does have a feature for mocking out time for this reason.

1 Like

My wife was a programmer at IBM. Each project she worked on had a person dedicated to testing. Some people were good at it, some not so much. Those who were good at testing were highly sought after. Perhaps surprisingly, they were also well regarded by the company.

1 Like