Mutation Tests – 100% code coverage is not enough

Unit tests are written by programmers for the sole purpose of validating and ensuring that the production code does exactly what the programmer intended. Without unit tests, we may as well be guessing. This blog is specifically about unit tests, and how to get a little extra out of them.

Code coverage

Code coverage is an important metric that allows us to understand how much of the production code is covered by tests. It’s a useful metric that can give you a quick understanding of where the untested areas in a project are and where some potential bugs could be hiding.

100% code coverage is the utopia we strive towards, the idea that every line of code in your application has been covered by a test.

And theoretically, if you’re starting a new project and you practice TDD, then you should obtain the elusive 100% coverage badge — or at least near to.

TDD (Test-driven development)

Test-driven development is a pretty simple idea, you write your tests before you write any code and you write just enough code to make that failing test pass

1. You must write a failing test before you write any production code
2. You must not write more of a test than is sufficient to fail, or fail to compile
3. You must not write more production code than is sufficient to make the currently failing test pass
https://codeheir.com/2021/02/13/coding-chess-with-tdd/

I’ve written in more detail about TDD in Coding Chess with TDD.

Bad tests

Okay, you’ve written all your code using TDD, you have 100% code coverage, so your application is bullet-proof? Not quite. Let’s assume we live in this Happy Valley world where your application doesn’t have any bugs. Now let’s imagine another programmer comes in, changes some existing code, the tests pass, but they’ve introduced a bug.

The programmer felt safe. 100% code coverage, safe. They know that whatever line of code they change, there’s a test that covers that line. So if they change a line of code and the tests still pass then chances are everything’s still working as expected.

Unless the tests are poorly written. Enter mutation testing.

What is mutation testing?

Mutation testing is a tool that makes modifications to your production code then runs the tests. If the tests are still passing after this mutation has taken place then chances are your tests aren’t sufficient.

In mutation tests, bugs are called mutants. Mutants are the changes to your production code. If the test fails the mutant is killed, if the test still passes, then the mutant survived. Tools such as Stryker show you a handy percentage of mutants killed, the more mutants killed, the better your tests.

Here’s a quick example of what a mutation test framework might do, let’s say you have the following function:

public bool IsUserOldEnough(user) {
  return user.age >= 18;
}

The production code would get altered in the following ways:

/* 1 */ return user.age > 18;
/* 2 */ return user.age < 18;
/* 3 */ return false;
/* 4 */ return true;

For each of the above mutants, the tests are run. If the tests still run with the mutants then you have a problem.

Introducing Stryker

Stryker is a mutation testing framework. It is all open-source, with support for JavaScript, .Net and Scala. For the following example, I’m using C#, but the same premise applies regardless.

First of all, let me set the scene. I’m writing an application, a small part of that application is requiring varying degrees of string manipulation. I’ve decided to make my own StringExtensions class.

 public static class StringExtensions
    {
        public static string Reverse(this string s)
        {
            var charArray = s.ToCharArray();
            Array.Reverse(charArray);
            return new string(charArray);
        }

        public static bool Any(this string s, Func<char, bool> predicate)
        {
            if (string.IsNullOrEmpty(s))
            {
                return false;
            }

            foreach (var c in s)
            {
                if (predicate(c))
                {
                    return true;
                }
            }
            return false;
        }

        public static string ToCamelCase(this string s)
        {
            if (string.IsNullOrEmpty(s))
            {
                return s;
            }

            if (!char.IsUpper(s[0]))
            {
                return s;
            }

            var camelCase = char.ToLower(s[0], CultureInfo.InvariantCulture).ToString(CultureInfo.InvariantCulture);
            if (s.Length > 1)
            {
                camelCase += s.Substring(1);
            }

            return camelCase;
        }
    }

And, of course, I’ve covered this with tests:

        [TestCase("hh", "hh")]
        public void Should_Reverse_String(string input, string expected)
        {
            input.Reverse().Should().Be(expected);
        }
        
        [TestCase("camel", "camel")]
        [TestCase("", "")]
        [TestCase("D", "d")]
        [TestCase("Dog", "dog")]
        public void Should_Convert_To_Camel_Case(string input, string expected)
        {
            input.ToCamelCase().Should().Be(expected);
        }
        
        [Test]
        public void Any_Should_Return_True_When_Checking_For_A_Character()
        {
            var input = "one";
            input.Any((c) => c == 'o').Should().BeTrue();
        }
        
        [Test]
        public void Any_Should_Return_False_When_No_Value()
        {
            var input = "";
            input.Any((c) => c == 'o').Should().BeFalse();
        }
        
        [Test]
        public void Any_Should_Return_False_When_No_Value_Matches()
        {
            var input = "fsda";
            input.Any((c) => c == 'o').Should().BeFalse();
        }

And you better believe I have 100% code coverage:

Now, for the purposes of this blog, I’ve written pretty shoddy tests, but they’re sufficient to give me 100% code coverage.

Let’s add Stryker.

Create a file called dotnet-tools.json in your project folder:

dotnet new tool-manifest

Then install Stryker flag by executing the following command in the project folder:

dotnet tool install dotnet-stryker

Then all you need to do is run the following in your test project:

dotnet stryker

Which creates this report:

80% mutation score: Out of 22 total mutants — changes to the code — only 4 survived. Let’s dive into this, and see what’s wrong with my tests.

For the Reverse method:

It is saying that if we completely removed Array.Reverse(charArray); then the tests will still pass. Let’s add another test to fix this.

[TestCase("hh", "hh")]
TestCase("hello", "olleh")] // added this
public void Should_Reverse_String(string input, string expected)
{
    input.Reverse().Should().Be(expected);
}

Now we have an 85% mutation score. Let’s take a look at another example for our Any() method:

This is similar to the last one, in that removing the return false the tests still run. Let’s add the following test:

[Test]
public void Any_Should_Return_False_When_Input_Is_Null()
{
    string input = null;
    input.Any((c) => c == 'o').Should().BeFalse();
}

Now running Stryker gives us a 90% mutation score. Let’s take a look at the final method which has two mutants:

Mutant 1: is simular to the mutants in the other two tests
Mutant 2: is an equality mutation, so changing to >= gives the same result

For mutant 1, adding more tests isn’t going to fix it. However — although in this case, it was intentional — it has highlighted that the block of code in question isn’t required, removing it and rerunning the tests confirms that.

For mutant 2, we have a very interesting scenario. It’s saying that even if camelCase += s.Substring(1) is executed when there’s only one character in the string, it’ll work. Given the example where I pass in "D" which gets converted to "d" on the penultimate line. Then when it executes camelCase += s.Substring(1) the result from s.Substring(1) is "". So "d"+"" == "d". Which I would never have considered myself. So the resulting function looks like this:

public static string ToCamelCase(this string s)
{
    if (string.IsNullOrEmpty(s))
    {
        return s;
    }
    var camelCase = char.ToLower(s[0], CultureInfo.InvariantCulture).ToString(CultureInfo.InvariantCulture);
    return camelCase += s.Substring(1);
}

Which is complete in terms of what I wanted out of the function. Now we have a 100% mutation score:

Conclusion

Having 100% code coverage is awesome, but we need to ensure that the tests are actually sufficient. Code coverage doesn’t tell you much about the effectiveness of your tests. We want absolute certainty when changing production code that what we’re writing isn’t breaking existing functionality. Using a mutation testing framework such as Stryker gives us an extra layer of protection and confidence, which is paramount for a development team’s velocity.

If you haven’t tried mutation testing I encourage you to do so, your company and its customers will thank you.

If you liked this blog then please sign up for my newsletter and join an awesome community!