Unit testing approach. Forced to pick the lesser of two evils?

I am writing a simple queue interface backed by a vector. For reference, the implementation looks like this:

Code
pub struct ArrayQueue<T> {
    storage: Vec<Option<T>>,
    head: usize,
    size: usize,
}

impl<T> ArrayQueue<T> {
    pub fn initialize() -> Self {
        Self { storage: vec![None], head: 0, size: 0 }
    }

    pub fn size(&self) -> usize {
        self.size
    }

    pub fn add(&mut self, x: T) {
        if self.is_full() {
            self.grow(self.storage.len());
        }
        let index = self.storage_index(self.size);
        self.storage[index] = Some(x);
        self.size += 1;
    }

    pub fn remove(&mut self) -> Option<T> {
        if self.is_empty() {
            return None;
        }
        let element = std::mem::take(&mut self.storage[self.head]);
        self.head = self.storage_index(1);
        self.size -= 1;
        if self.is_too_large() {
            self.shrink(self.storage.len() / 2);
        }
        element
    }
}

I need advice about how to approach writing unit tests in the "best practices" way. I am aware of two common pieces of advice for writing tests:

  1. Don't test implementation details.
  2. Tests should fail for one and only one reason.

I often find those two pieces of advice in conflict. Let's say that I want to test the remove method, if I want to comply with 1, I need to build first a queue with add to start testing remove, and possibly even use iter to check that the queue is correct. Now my test violates 2 because I'm using add and iter to test remove and they may fail for their own bugs.

A similar argument can be said about complying with 2. If I want my tests to fail for only one reason I need to build a suitable initial state for my ArrayQueue in order to start testing remove, so to avoid using add I need to write implementation details in my test, like writing an initial value for storage: vec![Some(2), None, Some(0), Some(1)]. This will force me to rewrite the tests if I change the implementation in the future.

In short, I need advice about how to approach unit testing, and in particular how to comply with 1 and 2 at the same time, if that is even possible. Needles to say, I'm by no means a seasoned software engineer. Any advice will be much appreciated.

I just wouldn't worry about it too much. Make tests as small as possible, but not smaller. It's not possible to only call one function per test case. You can call add and remove in a single unit test.

If you really want to separate the tests, you can have test_add that only calls add, and test_remove that calls both, and then if test_add passes and test_remove fails, you can guess that the bug is probably in remove rather than in add.

4 Likes

in particular how to comply with 1 and 2 at the same time

You don't "comply" with advice. Advice is a suggestion. If a given piece of advice is difficult to apply, set it aside for the time being. If it's applicable, it might be worth following the advice as given, but it's still a good idea to understand why, rather than following it blindly.

For types like this, whose side effects are constrained to self, I tend to start from a property-centric perspective:

  • An empty queue has a size of zero.
  • Removing from an empty queue returns None.
  • Items added to the queue should be removed in order until None is returned.
  • Removals should be in order even if calls to add and remove are interleaved with each other.
  • The size of the queue at all times should be no less than zero.
  • The size of the queue at all times should be exactly equal to the number of add calls minus the number of remove calls (ignoring the behaviour if add is called usize::MAX times without a remove).

Angsting over which specific set of methods you call in your tests to check these properties is probably not a good use of your time. The purpose of tests is to exercise your program, demonstrate the presence of bugs, and ideally to make it relatively easy to localize and fix those bugs, not to be intellectually or philosophically pure.

To put this into practice, here's one test I might write:

#[test]
fn new_queues_are_empty() {
  let queue = ArrayQueue::<()>::initialize();

  assert_eq!(0, queue.size());
  assert(queue.remove().is_none());
}

Can this fail for more than one reason? Absolutely. Are you going to struggle to figure out where the bug is, if it does fail? Probably not - there are only three places it could be (initialize(), size(), or remove()), and the specific assertion that is violated will probably narrow it down enough to get you pointed towards the bug.

Here's a second, more complex example:

#[test]
fn queue_in_order() {
  let queue = ArrayQueue::initialize();

  queue.add(3);
  queue.add(5);
  queue.add(8);

   assert_eq!(Some(3), queue.remove());
   assert_eq!(Some(5), queue.remove());
   assert_eq!(Some(8), queue.remove());
   assert!(queue.remove().is_none());
}

Can this test fail for multiple reasons? Also yes. Will it be difficult for you to debug it?

5 Likes

There’s a lot of overly rigid testing advice out there, or perhaps this one may have lost some nuance along the way. How about this instead: "There should be one and only one reason for which a test is intended to fail." Nearly every test can be broken by some bug somewhere in the transitive dependencies, after all, and that should not cause us to conclude that nothing can be properly tested.

8 Likes

Like others above, I'm of the opinion that those are more guidelines than strict rules. It'd be sad to ignore implementation details if you know that some key values are important; e.g. boundaries where the code changes its way of operating, or specific values that proved to give an erroneous behaviour in the field (those you do want to keep in your tests after fixing the bug, to detect any regression).

If you're curious, I encourage you to check out Test-Driven Development. In a nutshell, that methodology puts the tests first: you write a test to validate one goal of your software, then you write the code to make the test pass, and iterate through manageable development steps, writing down additional goals or concerns as new test items and refactoring the code when necessary. The question of what to put in the unit tests becomes natural and drives your development quite efficiently, even if the thought is shocking at first for people who develop from specs before they write the tests. I was in the sceptical camp until I tried it; since then, I stick to TDD and haven't looked back.

3 Likes

I would go even beyond that. Good advice comes with a rationale so you can tell when it becomes bad advice.

If you just only have an advice and have no idea why is it a good advice… treat it with a huuuge grain of salt. Because you have no idea when said advice is applicable and when it's not applicable.

Tests don't exist to make some abstract god happy or even to make your manager happy… they exist to reduce number of bugs in production code. Who cares if your tests fail for one or ten reasons? You do. And why do you care? Because it's easier to find and fix bug if there are 1-to-1 correspondence between test and bug. But does it mean we should contort the production code to achieve that? Heck, no. We don't look on tests all that often but we deal with API that we are creating every day.

That means that if we couldn't do 1-to-1 mapping between test and potentially failing code then it's better to build a ladder: tests for foo work in isolation, tests for bar rely on foo being correct and so on.

That means that bug in one line may trigger failure is hundred thousands of tests (I'm not exaggerating, saw that with my own eyes), but if you can look on a sorted list of failures and quickly find which line is faulty… mission accomplished.

Advice is just and advice and good advice comes with a rationale… don't forget that.

5 Likes

I've always liked this article for testing:

Your overall testing strategy should feel like “I get that computer to do so much work for me!”, not ”My RSI and bleeding fingers have hopefully appeased the testing gods and atoned for my previous omissions”.

~ Test smarter, not harder - lukeplant.me.uk

3 Likes

Thank you guys, great community as always. I see a very clear consensus and I will take your advice (and its rationale).

With testing, there's a lot of debate between testing private vs public interfaces, etc. But the way I see it, tests are for finding bugs and preventing regressions. When deciding what to test, my primary focus is on exercising whatever code is most likely to be buggy and easiest to break. If some implementation detail is fragile, I test the implementation detail anyway.

Tests should fail for one and only one reason.

I wouldn't worry about it in such a strict sense. IMHO tests that may fail for multiple reasons are great — you're getting more stuff tested. You only need to avoid extremes where you try to test everything at once and end up testing less things overall (e.g. because it's not possible to test both successful and failing code paths in the same test). Tests testing too much may be mildly annoying when they find a regression, and you need to debug to find the root cause, but then you can add another test just for the cause you found.

2 Likes