Line Coverage: Lessons from JUnit

In unit testing, achieving 100% statement coverage is not realistic. But what percentage would good testers get? Which cases are typically not included? Is it important to actually measure coverage?

To answer questions like these, I took a look at the test suite of JUnit itself. This is an interesting case, since it is created by some of the best developers around the world, who care a about testing. If they decide not to test a given case, what can we learn from that?

Coverage of JUnit as measured by Cobertura

Coverage of JUnit as measured by Cobertura

Overall Coverage at First Sight

Applying Cobertura to the 600+ test cases of JUnit leads to the results shown above (I used the maven-enabled version of JUnit 4.11). Overall, instruction coverage is almost 85%. In total, JUnit comprises around 13,000 lines of code and 15,000 lines of test code (both counted with wc). Thus, the test suite that is larger than the framework, leaves around 15% of the code untouched.

Covering Deprecated Code?

A first finding is that in JUnit coverage of deprecated code tends be to be lower. Junit 4.11 contains 13 deprecated classes (more than 10% of the code base), which achieve only 65% line coverage.

JUnit includes another dozen or so deprecated methods spread over different classes. These tend to be small methods (just forwarding a call), which often are not tested.

Furthermore, JUnit 4.11 includes both the modern org.junit.* packages as well as the older junit.* packages from 3.8.x. These older packages constitute ~20% of the code base. Their coverage is 70%, whereas the newer packages have a coverage of almost 90%.

This lower coverage for deprecated code is somewhat surprising, since in a test-driven development process you would expect good coverage of code before it gets deprecated. The underlying mechanism may be that after deprecation there is no incentive to maintain the test cases: If I would issue a ticket to improve the test cases for a deprecated method on JUnit I suspect this issue would not get a high priority. (This calls for some repository mining research on deprecation and testing, in the spirit of our work on co-evolution of tests and code).

Another implication is that when configuring coverage tools, it may be worth excluding deprecated code from analysis. A coverage tool that can recognize @Deprecated tags would be ideal, but I am not aware of such a tool. If excluding deprecated code is impossible, an option is to adjust coverage warning thresholds in your continuous integration tools: For projects rich in deprecated code it will be harder to maintain high coverage percentages.

Ignoring deprecated code, the JUnit coverage is 93%.

An Untested Class!

In the non-deprecated code, there was one class not covered by any test:
runners.model.NoGenericTypeParametersValidator. This class validates that @Theories are not applied to generic types (which are problematic due to type erasure).

I easily found the pull request introducing the validator about a year ago. Interestingly, the pull request included a test class clearly aimed at testing the new validator. What happened?

  • Tests in JUnit are executed via @Suites. The new test class, however, was not added to any suite, and hence not executed.
  • Once added to the proper suite, it turned out the new tests failed: the new validation code was never actually invoked.

I posted a comment on the (already closed) pull request. The original developer responded quickly, and provided a fix for the code and the tests within a day.

Note that finding this issue through coverage thresholds in a continuous integration server may not be so easy. The pull request in question causes a 1% increase in code size, and a 1% decrease in coverage. Alerts based on thresholds need to be sensitive to small changes like these. (And, the current ant-based Cloudbees JUnit continuous integration server does not monitor coverage at all).

What I’d really want is continuous integration alerts based on coverage deltas for the files modified in the pull request only. I am, however, not aware of tools supporting this at the moment.

The Usual Suspects: 6%.

To understand the final 5-6% of uncovered code, I had a look at the remaining classes. For those, there was not a single method with more than 2 or 3 uncovered lines. For this uncovered code, various typical categories can be distinguished.

First, there is the category too simple to test. Here is an example from org.junit.Assume, in which an assumeTrue is turned into an assumeFalse by just adding a negation operator:

public static void assumeFalse(boolean b) {
  assumeTrue(!b);
}

Other instances of too simple to test include basic getters, or overrides for methods such as toString.

A special case of too simple to test is the empty method. These are typically used to provide (or override) default behavior in inheritance hierarchies:

/**
 * Override to set up your specific external resource.
 *
 * @throws if setup fails (which will disable {@code after}
 */
protected void before() throws Throwable {
    // do nothing
}

Another category is code that is dead by design. An example is a static only class, which need not be instantiated. It is good Java practice (adopted selectively in JUnit too) to make this explicit by declaring the constructor private:

/**
 * Protect constructor since it is a static only class
 */
protected Assert() {
}

In other cases dead by design involves an assertion that certain situations will never occur. An example is Request.java:

catch (InitializationError e) {
  throw new RuntimeException(
    "Bug in saff's brain: " +
    "Suite constructor, called as above, should always complete");
}

This is similar to a default case in a switch statement that can never be reached.

A final category consists of bad weather behavior that is unlikely to happen. This typically manifests itself in not explicitly testing that certain exceptions are caught:

try {
  ...
} catch (InitializationError e) {
  return new ErrorReportingRunner(null, e);
}

Here the catch clause is not covered by any test. Similar cases occur for example when raising an illegal argument exception if inputs do not meet simple validation criteria.

EclEmma and JaCoCo

While all of the above is based on Cobertura, I started out using EclEmma/Jacoco 0.6.0 in Eclipse for doing the coverage analysis. There were two (small) surprises.

First, merely enabling EclEmma code coverage caused the JUnit test suite to fail. The issue at hand is that in JUnit test methods can be sorted according to different criteria. This involves reflection, and the test outcomes were influenced by additional (synthetic) methods generated by Jacoco. The solution is to configure Jacoco so that instrumentation of certain classes is disabled — or to make the JUnit test suite more robust against instrumentation.

Second, JaCoCo does not report coverage of code raising exceptions. In contrast to Cobertura, JaCoCo does on-the-fly instrumentation using an agent attached to the Java class loader. Instructions in blocks that are not completed due to an exception are not reported as being covered.

As a consequence, JaCoCo is not suitable for exception-intensive code. JUnit, however, is rich in exceptions, for example in the various Assert methods. Consequently, the code coverage for JUnit reported by JaCoCo is around 3% lower than by Cobertura.

Lessons Learned

Applying line coverage to one of the best tested projects in the world, here is what we learned:

  1. Carefully analyzing coverage of code affected by your pull request is more useful than monitoring overall coverage trends against thresholds.
  2. It may be OK to lower your testing standards for deprecated code, but do not let this affect the rest of the code. If you use coverage thresholds on a continuous integration server,  consider setting them differently for deprecated code.
  3. There is no reason to have methods with more than 2-3 untested lines of code.
  4. The usual suspects (simple code, dead code, bad weather behavior, …) correspond to around 5% of uncovered code.

In summary, should you monitor line coverage? Not all development teams do, and even in the JUnit project it does not seem to be a standard practice. However, if you want to be as good as the JUnit developers, there is no reason why your line coverage would be below 95%. And monitoring coverage is a simple first step to verify just that.

Teaching Testing in Year One

Starting 2013/2014, TU Delft will run a revised curriculum for the bachelor computer science. The software testing course that I have been teaching to 2nd and 3rd year CS students will move from to the (end of) the first year.

This is an exciting prospect. It confirms that testing is not an afterthought, but something that should be built into software development right from the start.

But can it be done? What should freshmen coming straight from high school be taught before they can start with testing? And what should a first year testing course contain?

To build up the required knowledge, the TU Delft curriculum anticipates three pre-testing courses.

  1. In the first, students learn about object-oriented programming, covering topics ranging from simple loops to inheritance, polymorphism, and interfaces. They will even learn a bit about the mechanics of testing their code automatically.
  2. Subsequently, they use the acquired programming skills in a simple project. They learn to work in teams, to write software according to requirements provided by others, and to share their (UML) design diagrams with other team members.
  3. As the third step, they learn about data structures such as linked lists or binary search trees, and learn to use recursion. These courses (object-oriented programming, a project, and data structures), are scheduled for the first three quarters of the first year.

Then in the fourth quarter, a dedicated course on software testing comes in. The course should hook students to innovative forms of testing for the rest of their lives. Here’s what I have in mind for that.

The practical basis will include exploratory testing, behavior-driven development, and the use of testable scenarios to specify requirements. With respect to unit testing, the students will learn JUnit, the use of build tools (maven), coverage analysis, and the use of continuous integration tools (Jenkins). I even hope to get them to understand a mocking framework like Mockito. Students will apply these techniques to a small existing applications (JPacman) which they will have to adapt and test.

The more theoretical basis will be provided by the systematic derivation of test cases from models, such as state machines or decision tables. Furthermore, I’ll elaborate on different adequacy models (beyond statement coverage!) as well as combinatorial testing techniques (e.g., pairwise testing).

This being an academic course, it will also include a critical reflection on the tools and techniques covered. We’ll identify strengths and weaknesses, and see how today’s hottest research aims at addressing these weaknesses.

Well, perhaps this is all too ambitious. I will try, and we will see. Luckily, TU Delft is not the first university to move testing to the first year: Eindhoven is a notable other example, and I am sure there are more (although perhaps not many). “Test early, test often” — learn it early, apply it often.

Automated GUI Testing with Google’s WindowTester

One of the topics that popped up a couple of times in our “test confession” interviews with Eclipse developers, is the tension between unit testing and GUI testing. Does an application with a thorough unit test suite require user interface testing? And what about the other way around? What does automated GUI testing add to standard unit testing? Is automated GUI testing a way to involve the end-users in the testing process?

wintester-recorder

In order to fully understand all arguments, I decided to play around a little with WindowTester, a capture-and-playback automated GUI testing tool made available by Google, with support for Swing and SWT.

My case study is JPacman, a Java implementation of a game similar to Pacman I use for teaching software testing. The plan of attack is simple: take the existing use cases, turn each into a click trail, record the trail, and generate a (JUnit-based) test suite.

My first use case is simple: enter and exit the game. To that end, I open the recorder, launch pacman, and press the red "Record" button in Eclipse. Then I press the start button in the game, followed by the exit button, which quits the application. WindowTester then prompts for the place to save this interaction:

save-testcase

After that, WindowTester generates the following JUnit test case:

public class StartAndExitUseCase extends UITestCaseSwing {

    import ...

    public StartAndExitUseCase() {
        super(jpacman.controller.Pacman.class);
    }

    public void testStartAndExitUseCase() throws Exception {
        IUIContext ui = getUI();
        ui.click(new JButtonLocator("Start"));
        ui.click(new JButtonLocator("Exit"));
        ui.wait(new WindowDisposedCondition("JPacman"));
    }
}

The next use case requires me to make moves in several directions.
Unfortunately, WindowTester can’t record arrow keys. To resolve this, I decide to modify Jpacman to use good old vi-navigation (‘hjkl’).

I then open JPacman, to make moves in all directions. Unfortunately, I’m a bit slow, and I bump into one of the randomly moving monsters, after which I die.

This is a deeper issue: Parts of the application, in particular the random monsters, cannot be controlled via the GUI. Without such control, it is impossible to have test cases with reproducible results.

My solution is to create a slightly different version of Pacman, in which the monsters don’t move at all. In fact, I happened to have the code for this ready already in my test harness, as I used such a version for doing unit testing.

This works, and the result is a test case passing just fine:

public void testSimpleMove() throws Exception {
    IUIContext ui = getUI();
    ui.click(new JButtonLocator("Start"));
    ui.enterText("jlkhhh");
}

The test doesn’t assert much, though. Luckily, WindowTester has a mechanism to insert “hooks” while recording, prompting me for a name of the method to be called.

hook

This results in the following code:

public void testSimpleMoveWithAsserts() throws Exception {
    IUIContext ui = getUI();
    ui.click(new JButtonLocator("Start"));
    ui.enterText("l");
    assertCorrectMoveToTheLeft();
    ui.enterText("j");
    assertCorrectMoveDown();
}

protected void assertCorrectMoveDown() throws Exception {
    // TODO Auto-generated method stub
}
...

WindowTester generates empty bodies for the two assert methods, leaving it to the developer to insert appropriate code. This raises two issues.

The first is that the natural way (at least for me as a tester) to verify that a move down was conducted correctly is to ask the position of the player to the appropriate objects. But from the GUI, I don’t have access to these. My work around is to adjust Pacman’s “main” method, to make the underlying model available through a static reference. This results in the following code:

protected void assertCorrectMoveDown() {
    Pacman pm = SimplePacman.instance();
    assertEquals(1, pm.getEngine().getPlayer().getLastDy());
}

Writing such an assertion requires good knowledge of the underlying API, and a bit of luck that the API exposes a method to witness the desired effect.

My next use case involves a hungry Pacman consuming a dot, which earns the player 10 points. Doing this in a way similar to the previous use case iss simple enough. Would it also be possible to assert that the proper number of points are displayed correctly in the GUI? This requires getting hold of the appropriate JTextField, and checking its content before and after eating a dot.

To support this, WindowTester offers a number of widget locators. An example is the locator used above to find a JButton labeled with “Start”. Other types of locators make use of patterns, the hierarchical position in the GUI, or a unique name that a developer can give to widgets. I use this latter option, allowing me to retrieve the points displayed as follows:

private int getPointsDisplayed() throws WidgetSearchException {
    WidgetReference<JTextField> wrf = 
        (WidgetReference<JTextField>) 
        getUI().find(new NamedWidgetLocator("jpacman.points"));
    JTextField pointsField = (JTextField) wrf.getWidget();
    return Integer.parseInt(pointsField.getText());
}

Thus, I can test if the points actually displayed are correct.

The remaining use cases can be handled in a similar way. Some use cases require moving monsters, for which having access to the GUI alone is not enough. Another use case, winning the game, would require a lot of clever moves on the regular board: instead I create a custom small and simple board, in which winning is easy.

I less and less make use of the recording capabilities of WindowTester: instead I directly program against its API. This also helps me to make the test cases easier maintainable: I have a “setUp” for pushing the “Start” button, a “tearDown” for pushing “Exit”, and I can make use of other JUnit best practices. Moreover, it allows me to create a small layer of methods permitting more abstract test cases, such as the method above to obtain the actual points displayed.

Are the resulting test cases useful? I recently changed some of the GUI logic, turning a complex if-then-else to handle keyboard events into a much cleaner switch statement (inspired by a warning PMD was giving me). All unit test cases passed fine. I almost committed, but then decided to run the GUI test suite as well. It failed. I first blamed WindowTester, but then I realized it was my own fault: I had forgotten some breaks in my switch and the events were not handled correctly. The GUI test suite found a fault my unit test suite had not found.

In summary, automated GUI testing is neither a replacement for unit nor for acceptance testing. It is a useful tool for covering the GUI logic of your application. The recording capabilities can be helpful to try out certain scenarios. In the end, however an explicitly programmed test suite, making use of the GUI framework’s API, seems easier to maintain. In this way, JUnit best practices related to test suite organization as well as the application’s observability and controllability can be directly applied to your GUI test suite as well.


(This is post originally appeared February 2011 as “Swinging Test Suites with WindowTester” on the Eclipse Study blog)