Working effectively with legacy tests

A couple of years ago I read a book from Michael Feathers that kept on being mentioned a lot in the literature I was reading then. The title? Working effectively with legacy code. Feathers makes some points on how to get code without tests working, by breaking down dependencies, introducing seams, and working towards more testable code. I loved that book.

When we take the idea of test automation as software development seriously, then there also should be a concept called legacy tests. That concept to me is related to testing debt, a term I think I coined back in 2009. But what are legacy tests? How do they slow down your productivity? And what can we do about it?

Here are some limited experiences I made with a couple of legacy tests, and how to overcome them. I hope this blog entry will trigger some more experience reports from folks, so that fewer teams need to suffer from it.

Legacy Tests

Legacy Tests are automated tests that no one pays attention to. This happens from time to time, and is surely a sign of test automation debt, and probably of test automation failure, that is going to end up as test automation abundance in the long-run if not treated.

The team will lose all faith in the automated tests, and eventually completely ignore them. The side-effects to this are usually that people no longer fix broken build. They then will lose the insights into their progress, and soon enough folks will start to be more conservative about changes to the now untested code. Over time changes will take longer and longer, until completely slowing down the development.

Legacy Tests come in several flavors. At first there will be tests that are constantly red, because the responsible guy has left the company, helping out in a different team, or simply became ill.

Then there are long-running tests. These usually take hours to complete, so they only run over night, and turn red quite often. Since no one is using them effectively for development, there will be less attention to them – and more so less information when they fail.

Finally, I have experiences lots of legacy tests when they start to blink. Blinking tests are tests that run red once, then turn green again, only to fail during the next build. These usually are hard to deal with, and can waste a lot of your time.

Let’s look at each of these patterns, and see how to overcome them.

Overcoming red tests

Red tests are usually the easiest to treat. There is a root cause for having red tests, and a meta-root cause for the root cause.

In most cases, a red tests signals that someone has been unaware about the test so far. Maybe he didn’t execute all the necessary tests before checking their changes in. Maybe they have not been told about that. Maybe there is a team rule missing for that. Maybe during on-boarding a new team member this rule was not mentioned.

No matter what, we need to deal with two aspects of the issue. We need to treat the immediate problem, the red test, and the underlying root cause.

The red tests can either be fixed quickly. In that case, analyze the problem, and fix it. On the other hand, it might be that there is a hard-to-understand test being written. Then it might take longer to fix the underlying problem. I strive to delete the test then, and re-write it from scratch so that it will be easier to understand. Of course, throwing away tests can scare you, but it’s usually the best option that you have when faced with a hard-to-understand test, that only one guy knows. Before investing too much time into the problem, I rather decide to delete the test, and re-writing it from scratch.

But what do you do when you don’t know what it’s supposed to test? Throw it away anyways. It’s of no use to you right now. It’s better to get rid of it. You will face the problem sooner or later. Then you can still write the test from scratch. But also remember that a save assumption is that code without tests is not working.

But how do you decide which is which without investing too much of your time? Well, pick a timebox, say 10 minutes. If you can’t figure the problem by then, either get help, or delete the test if that person is not reachable at the moment. Oh, 10 minutes is too short? Yeah, it’s short, but if it wastes more than 10 minutes of your time to find a problem in your code base, that’s crucial information about the design and architecture of your code base. Nothing a broken test will fix.

Overcoming long-running tests

Long-running tests may eventually lead to red tests. So better get them faster now. What does long-running mean? That depends. For unit tests, I consider a test suite of more than 12 minutes long-running. For acceptance tests my threshold usually is 3 hours. At either of these points I will take a step back and see what I can do about those tests that would make them faster.

xUnit Test Patterns has a lot to say about dealing with long-running tests. It’s usually a combination of tests that exercise too much of the production code, and dealing with slower subsystems. This is usually ok-ish for acceptance tests that seem to have a larger focus, but not so for micro-level unit tests.

You can treat either of the underlying symptoms by mocking out the subsystem in question that is giving you a headache and leading to long-running tests. That usually is a first step. After that you can driver down the test automation towards the level where the actual behavior is tested. Also make sure to reflect over your design here. Maybe you need to fix that as well – introducing new concepts that will help you describe the intended behavior on a more abstract level.

Note that all these tactics introduce new risks. These risks come in the flavor of losing some of your previous confidence. Just because you tested all pieces does not mean that the application is usable. You still need to tackle the situation where individual parts of the application try to speak to each other. So, make sure to deal with the integration of the individually tested components with its own test. Usually though you will need fewer tests for this integration effort if you tested each individual part to a reasonable level. (But don’t rely this.)

Overcoming blinking tests

Blinking tests are the worst of the problems mentioned in this blog entry. Blinking tests may be green at times, and red on totally different times. These usually have a variety of reasons.

Mostly it’s that the initial test writer forgot about a particular piece in the environment that has now changed. For example when the daylight savings time changed, and a test does not run any longer, you might face a blinking test. If a test runs over midnight, and you end up with a problem, then it took into consideration that the duration of the test should never exceed the day boundary.

In all these cases you need to decide whether you want to dive deeper into the root cause of the problem – or simply get rid of the test, and re-write it from scratch. There is a pitfall, though.

The test might blink because the system is unstable. In that case you should fix your system, because if your tests face such instabilities, then likely your users will do so, too.

Delete tests that no longer serve their purpose

Legacy tests are harmful. They can slow you down, make you feel unsafe in the face of producing code that is reliable in the long run. Its impacts will make you feel like slowly boiled frogs that notice the threat only until it’s too late to do something about it.

In either way, automated tests are useless if they can’t provide you with feedback. These days I am quicker to throw away tests that no longer work. I find this has the highest benefit of trust in the development team, as well as having some education side-effects. If your tests is unstable so that I will delete it, then you might turn mad at me. Maybe that madness will drive you to talk to me, so that we can overcome some of the problems that we had.

Legacy Tests should not slow you down. If you find yourself slowed down by constantly needing to pay attention to non-working tests, realize that these no longer fulfill the reason they should be there: to provide you with feedback on your current development status. Delete these kinds of tests, and forget about the sunken costs that went into them. It’s going to be better without these anyways.

  • Print
  • Twitter
  • LinkedIn
  • Google Bookmarks

One thought on “Working effectively with legacy tests”

  1. Interesting post with lots of good ideas here.

    I have also pulled long running tests into a different system when they were providing value. The fast running tests worked as a safety net for checkins, etc. (more true unit tests), and then we would schedule the longer running tests at different intervals. I touched on that very briefly here where we decided to run long running tests every 10th time and eventually modified our scripts to do that:
    http://www.informit.com/articles/printerfriendly/462520

    I actually think it might have been Johanna Rothman who coined the term “testing debt”. I tried to find the earliest mention of it when I wrote this http://www.kohl.ca/2005/testing-debt/

    It’s nice to see people write about what happens to test code as it ages. Just like regular code, it has problems that need to be dealt with, but often automation is written with a “write once, forget about it” mentality, which means those of us who come later have to try to clean it up.

    -Jonathan

Leave a Reply

Your email address will not be published. Required fields are marked *