What Algorithms should you test and how?

April 5, 2005 Roy Osherove

[Update: I've updated the description here to make it clearer. Some of the comments indicate that I did not explain myself as well as I should have. Perhaps this is better.]

Math Algorithms are problematic. The biggest problem with testing them is that you either end up reproducing the calculation in your test, or you test inputs and outputs blindly. If at all viable, the latter is the more reasonable choice, for several reasons outlined below.

Assuming this method:

public int Add(a,b)

{

return a+b;

}

You might be inclined to write such a test:

[Test]

public void Add()

{

int result = Add(1,2);

Assert.AreEqual(1+2, result);

}

As you can see there is some duplication here: The same calculation occurs inside the production and inside the test code. Seeing this kind of duplication brings up a number of red flags:

- how do you know if your calculation does not have a bug?

- If the only thing in the method is the calculation, and you're repeating it in your tests, are you actually testing anything?

The only reason to test such a method was if you had an IF or w SWITCH somewhere. Otherwise, there's no logic to test. Unit tests are more about logic than anything. If there's no logic. perhaps they shouldn't be there. If there's no logic but you're still testing something, perhaps what you're really doing is "integration testing". Such a test might be to call the math function under heavy load, or multiple threads and so on. I'll talk more about what I think makes for an integration test in a later post.

of course, this might seem better -

[Test]

public void Add()

{

int result = Add(1,2);

Assert.AreEqual(3, result);

}

our result is known and so we can at least "smoke test" the algorithm.

As some of you have mentioned in the comments - there's still the pesky matter of Edge cases and unvalidated input.

There's always room for edge case testing, I agree. In fact, you'd be a fool not to ask yourself what happens when you get two int.MaxValue s. Since this entry does not deal with edge cases, I'd just declare that yes - you would most definitely test edge cases. That's because - edge cases have a particular requirement - they need business logic, or else they woudn't be edge cases. If I wanted the method to just blow up on int.MaxValue, I'd just let it do that, and let the exception bubble up. Since there seems to be a staggering requirement for handling such an edge case - there's obviously some sort of non-math logic in the works here that needs to be written.

Here's a short example:

[Test]

[ExpecteException(typeof(CalcException)),"Sum resulted in an overflow and is illegal")]

public void Add()

{

Add(int.MaxValue,1);

}

Now you're talking. There HAS to be some logic inside the method to make this work - that means it's no longer just a math algorithm - it's a math algorithm hidden underneath a business rule, for which you would exactly do these kinds of tests.

The solution might look like:

public int Add(int a, int b)

{

if(a>= in.MaxValue)

{

throw new CalcException("Sum resulted in an overflow and is illegal");

}

return a+b;

}

Now you'd have to write a new test to send b as a max value, but you get the idea. It's no longer a math function in purity. You no longer test inputs and outputs - you test logic.

In fact - you could refactor it to show this distiction more clearly:

public int Add(int a, int b)

{

if(a>= in.MaxValue)

{

throw new CalcException("Sum resulted in an overflow and is illegal");

}

return sumNumbers(a,b);

}

private int sumNumbers(int a,int b)

{

return a+b;

}

For pure math functions:

The only other option is to call such algorithms with a known set of inputs and outputs.

That's usually more of a data-driven test kind of thing where you might have thousands of rows of input and expected output with many edge cases for the algorithm to go through.

That means you're assuming the expected outputs are pre-known and can be calculated beforehand (using some external tool perhaps such as Mat Lab). If they are pre-known, what exactly are you testing? You are testing for regression if at any point in time you would want to either refactor or optimize your algorithm code, and you still want to know that all outputs stay the same.

To my mind, such math problems are usually not very fit for TDD and incremental creation ("make this pass as simply as possible" mentality) because the algorithm and execution are pre-known.

However, the second you think about performing an 'IF' clause, you'll need a unit test.

This brings up an interesting question such as "how would you test sorting algorithms?". Surely there's lots of IFs and Switches there. But would you test for an outcome or for logic trees inside the method? I'd go with testing expected outcome (a sorted something) by invoking many many tests with various inputs and expected outcomes. It's not math, and can be done with TDD. You might even find your algorithm is simpler than you thought.