Write automated tests¶

Make a habit of writing automated tests and running them frequently as you code!

For a while it might seem like you can get away without doing it. For a while, you can, sort of. But learning how to use automated testing might be the most important thing you can do to improve the quality and reliability of your code. If you don’t use automated testing, you just aren’t working in a professional way.

Note

Testing is an area where there are lots of disagreements on what things should be named, and what’s best to do. For example, passionate and endless arguments rage about the exact meaning and correct ways for “unit testing.” Don’t let this discourage you from getting started with testing. Please take what’s said here with a grain of salt, and do some research on your own so you can have informed opinions about testing.

What are tests?¶

In their simplest form, tests are simply bits of code you can run, which cause an exception to be raised when something is broken or incomplete in the code you want to test. When a test is run and an exception escapes or an assertion fails, the test fails. And the test should fail any time the right things did not happen.

If you think about it, this isn’t much different from the little snippets of code you type into a Python interpreter shell, or into throwaway files, to try out new ideas and see if they work. You just save the code and structure it a little so you can easily run it over and over, to tell you if you are finished with something and to detect any new problems you might have caused. Instead of having to do it manually and inspect the results manually, you make it so it can run automatically and the failure can be detected automatically.

Why write tests?¶

Whenever you check whether your programs work, you’re testing them. At first, it seems like the easiest thing to do is just to try it out yourself, then fix problems you see on the fly. This might work pretty well for new projects which are small (and probably not very important yet). But as your programs do more, the time required to manually check their behavior gets longer, and it becomes easier to get lazy or just forget to check some things. It also starts to waste your life on tedious waiting and repetition. And as you are fixing and testing one thing, it is often the case that you are creating bugs somewhere else, in a potentially endless stream of bugs. This creates insane levels of frustration and is a terrible way to live. It’s also bad for the software. You get broken software, abandoned projects or both. The problems only get worse on multiple-person projects with real requirements and deadlines.

Since you are going to test anyway, you might as well do it in a way that is both easier and more thorough - by writing automated tests.

A silly example¶

So far this is all pretty abstract, and it may be unclear how to actually get started. So let’s look at an example to get a feel for what testing looks like.

The silly example task is to write a function which returns a sequence of strings containing the numbers 1 to 10, with “fizz” appended to multiples of three and “buzz” appended to multiples of five. (This problem is totally ridiculous, it exists solely to provide an example that is simple to understand.)

First you might write an example test (test_fizzbuzz.py):

from fizzbuzz import fizzbuzz


def test_fizzbuzz():
    strings = fizzbuzz()
    assert strings == [
        "1",
        "2",
        "3 fizz",
        "4",
        "5 buzz",
        "6 fizz",
        "7",
        "8",
        "9 fizz",
        "10 buzz"
    ]

This test is simple, but a bit silly. Often you cannot give all the desired outputs, or you would want to organize them differently, but this will do to give you an idea, and it does test the code. Anyway: better a silly test than no test, right?

Now here’s some silly example code meant to pass the test (fizzbuzz.py):

def fizzbuzz():
    strings = []
    for i in range(1, 10 + 1):
        if i % 3 == 0 and i % 5 == 0:
            this_string = "{0} fizzbuzz".format(i)
        elif i % 3 == 0:
            this_string = "{0} fizz".format(i)
        elif i % 5 == 0:
            this_string = "{0} buzz".format(i)
        else:
            this_string = "{0}".format(i)
        strings.append(this_string)
    return strings

There are many possible solutions to this problem, and this one is a bit lame. The point is this: once you have adequate tests, it becomes much easier to rewrite without breaking anything. It’s better to get something simple that you have tests to verify, than to try to get something perfect the first time without any tests to check it.

Running tests with py.test or nose¶

The example test file can be easily found and run by utilities like py.test (pip install pytest) or nosetests (pip install nose). e.g., as run from the top directory of your project:

py.test

nosetests

It can be pretty much that easy, if you have structured your project to make the tests easily discoverable. These tools typically search directories named things like test/ or tests/, finding files that start with test_ and looking for test functions which have names starting with test_ or test classes which have names starting with Test. In some complex cases, automated test discovery of this kind can be confusing and you might have to use verbose flags or docs for the project in order to get a handle on it. Usually, though, it should save some trouble and make testing a bit more fun.

Several test runners also have useful functionality like running multiple tests at once or distributing tests across multiple machines.

unittest¶

With minor modification, one could also write and run the test using Python’s builtin unittest module. I find it to be verbose and lacking in some useful features, but that’s just my opinion; lots of people use it, and it’s a reasonable thing to use.

Here’s an example of a unittest test file containing one test case:

import unittest
from fizzbuzz import fizzbuzz


class TestFizzbuzz(unittest.TestCase):
    def test_fizzbuzz(self):
        strings = fizzbuzz()
        self.assertEqual(strings, [
            "1",
            "2",
            "3 fizz",
            "4",
            "5 buzz",
            "6 fizz",
            "7",
            "8",
            "9 fizz",
            "10 buzz"
        ])

Assuming you named this file test_fizzbuzz2.py, one way to run its tests from the same directory would be:

python -m unittest -v test_fizzbuzz2

Actually you can use py.test or nosetests to run this style of test too, if you prefer. Even if you want to use unittest to write your tests, I recommend picking up py.test or nosetests just to run them.

doctest¶

The idea behind Python’s old doctest module is to basically paste transcripts of shell interaction into text files or docstrings, and then run doctest on these files to check that the results given by the code are the same as in the transcript.

Although it might initially seem like less work than writing real tests, there are several reasons you might not want to use doctest to test your Python modules.

The tests are brittle, with small changes affecting the text output. Making the tests succeed despite non-deterministic runtime variations and trivial changes across versions tends to require ridiculous contortions that (in the author’s opinion) remove the apparent ease-of-use advantage for doctest.
Many tests one should normally do are very awkward to render as text transcripts, greatly reducing the appeal in practice; for example, comprehensive doctests tend to result in many complex lines yielding True or False, interrupted periodically by lines which do nothing but set up or tear down pieces of the test environment. The format also encourages writing long lumps of doctest containing multiple tests.
Putting doctests in docstrings clutters up the docstrings, obscuring the kinds of documentation which really belong there. I don’t like to read long shell transcripts in docstrings, and I suspect many other Python programmers feel the same.
Apart from its merits, by now it looks pretty old-fashioned and sloppy to do most of your testing this way.

However, doctest is still useful for ensuring that examples you give in documentation work properly. Normally you would configure Sphinx to do this for you.

How to Test¶

This section contains some general and philosophical thoughts on how to test effectively with less unnecessary pain. It’s aimed primarily at people with less exposure to testing, who do not necessarily understand or agree with the need. If that is not you, you probably don’t need to read it. In particular, if you are a diehard of TDD or a unit testing purist, you may find this section objectionably vague or weakly stated. (Oh well.)

Test Early¶

Writing tests can be boring. Probably the hardest thing about it is getting started. It gets harder the longer you wait after the beginning of a project. Once you get started, it’s usually not very hard to continue. Whatever arguments you encounter regarding the right way to write tests, remember that the core point is to have automated tests early on, and do not get discouraged. You can refine your approach, speed up your tests or add other types of tests as you go. Just get started, so you have something to fix and a reason to fix it.

Test Often¶

As you make changes, run your tests frequently so you know when something starts being broken. This can often let you catch smaller bugs before they compound and accumulate into giant, mysterious tangles which make the project hell to work on. At a minimum, this lets you know about impending issues earlier, when there is more chance of fixing them and avoiding big problems. Let too much time go between test runs and the tests are likely to become irrelevant, then ignored, and then you lose all the benefits of having tests at all.

It is reasonably common to prevent code from being committed unless it passes tests using commit hooks (e.g., git commit hooks or mercurial commit hooks). This is intended to ensure that there are no outright broken versions of the code that someone might check out, which can save a lot of time (for example) when one needs to roll back from a deployed version containing a previously undetected bug to a previous version.

Continuous testing is also a big reason people use CI (Continuous Integration) servers like Jenkins or Travis, to monitor the health of committed code in version control repositories in addition to whatever testing developers are doing on their own machines.

Test Strategically¶

Though everyone will tell you to do it, don’t misunderstand writing tests as a matter of holiness: it’s really a practical matter of reducing pain, for you as much as others. The time and effort of writing tests is justified by the work that it saves. However, that time and effort shouldn’t be ignored, as bad practices can increase it and somewhat reduce the advantage of testing.

In extreme cases, highly overdone or stupidly-written tests CAN cause more work than they save - but that’s rare in practice. It’s much more common that we say that to ourselves in order to rationalize our own laziness, when it’s not really true. The tests you write, however imperfect, are almost certain to catch more bugs in your code than anybody will do by hand, and to require much less effort than repeating the same manual testing over and over.

There is a trap for people who underestimate the trouble saved by tests. It’s so tempting to think: “these tests are taking too much time to write. I need that time to fix bugs and add features” - especially under pressure to deliver quickly. The tighter the deadline and the more urgently your fixes and features are needed, the worse it will be when you unexpectedly have to drop everything to fix a pile of emergency bugs. In that extreme urgency, you will have even less room to write tests to help dig out of this vicious cycle. And the fixes are very likely to take more work than writing the tests would have. Urgent and important goals are reasons to take a little time up front to write tests, not reasons to skip them. Code that matters needs tests. (Even for less critical code like quick prototypes, some basic tests can help a lot in getting something fully functional as directly as possible, or ensuring that something remains functional through large overhauls in the code.)

However, given that we are going to write tests, it’s only sensible to do it in ways that save unnecessary work. We want tests which are focused on important things. We don’t want tests which are irrelevant or redundant. We don’t want tests which require a lot of unnecessary maintenance. Here’s a simple triage strategy for reducing test-writing effort at the early stages of a project:

Start with tests relating directly to subgoals which are already clear. Add tests to specify more as the design becomes clearer.
Check a few simple cases before working on many complex or subtle cases. A few simple cases should be easy to test and provide a good basis for refinement - don’t get initially stuck generating endless cases or testing complex and subtle things unless you know they are likely to be vital.
Try to check that things do what they should without depending on the specific way they are done. A good test doesn’t need to be updated every time an implementation detail in the code is changed.
While writing tests, don’t obsess over perfection, and aim for clear over clever. Being too clever takes extra time and makes things harder later. Some repetition and lack of elegance can easily be refined if needed, and are much better than writing things you won’t understand in 6 months.

In order to reach greater maturity, a project does need to commit to specific goals and design decisions and stop changing everything, so it can get important features working and keep them working. This work is reinforced with more comprehensive testing: more tested subgoals, and more carefully chosen cases (see Test Thoroughly.)

Typically this does mean more time allocated to tests, and less flexibility for new development. That directly reflects the fact that more people are depending more heavily on the functionality of the project. There’s no way out of this without removing functionality: a new branch or project can set different expectations, but it can’t replace the older project perfectly until it passes its important tests. If we exhaust techniques to test more efficiently, we have to recognize that the costs of writing and maintaining tests are simply inherent to the process of developing functional software. More functionality means more test code.

Test Thoroughly¶

Consider two aspects of thorough testing: test quality, and test coverage.

The most important thing about your tests is that they really test whether your program is doing what it should. That’s test quality. High-quality tests are sensitive to the functioning of your program, and nothing else: they must be able to fail, they must fail when something happens that can hurt the program’s function, and they shouldn’t fail for completely incidental reasons.

The other aspect of thorough testing is test coverage. Sometimes bugs can escape notice for some time because they are in parts of the code that are rarely run. Ideally, the code is not only run, but it’s actually tested to see that it does what it should. When you need to make sure more of your code is getting run during tests, use Ned Batchelder’s tool coverage (very well documented on that page, and also used by plugins to py.test and nosetests). This tool records which code ran during your tests, and generates a report of what you missed so you can find what code isn’t being tested. This doesn’t ensure that your tests are meaningful, that’s still up to you. The right way to use a coverage tool is to use it to find code you aren’t testing, so you can write new high-quality tests of that code.

Testing thoroughly isn’t incompatible with testing strategically. Tests which don’t really detect anything of interest are just slowing you down every time you have to run the tests.

Test Portability¶

Ideally, you would only ever have to develop programs against your favorite language and your favorite version of that language - whether it is an awesome new version or a favorite old one. Alas, programmers often need to support old users and also stay up to date, whether we want to or not.

If you need to support multiple versions of Python (or similarly varied environments), then use Holger Krekel’s testing utility, tox. It takes care of setting up virtualenvs and running tests in those virtualenvs for you. So you just configure and run tox, and it runs your tests in each version of Python that you mean to support. If you ever work on porting a library or keeping it compatible with multiple versions of Python, this is incredibly useful. Otherwise, it can become unmanageably frustrating to make changes aimed at one version, only to find that they break the other version.