Testing: How Not to Fire Ze Missiles
(The following is an adaptation of a talk I did concerning how to focus your testing and how to better conceptualize what kinds of testing to focus on.)
A lot of companies and a lot of developers like to talk about what kinds of testing they’re doing. But here’s a dirty little secret that the software industry won’t tell you: It’s much more common that you’ll be working at a company where very little of the code is regularly tested outside of production. I’m talking single digits code coverage bad. Many developers neglect tests, but a big part of that is because they’re unsure how to properly approach testing.
End Goals
When approaching the concept of testing your code, you should ultimately be thinking about the end goals that you want your code to accomplish. I like to focus this into three areas:
- Your code should do what you think it does.
- Your code shouldn’t do what you don’t think it does.
- You shouldn’t allow your code to make the same mistakes twice.
While most software doesn’t act in a life-critical role, it’s easiest to break these down by considering:
“If I make a mistake, will my code ‘fire ze missiles’?”
That’s the software equivalent of nuclear war. We don’t like nuclear war! Let’s avoid that, shall we?
The first two points sound the same, but they’re really dual elements of one another: You should be able to verify that your code does the right things, while also ensuring that it does no more than what you think it’s doing.
The last point may be more controversial, or more aggressive, than you’ll hear elsewhere. But, I find very high value in automating much of my work. When I build automation to solve a problem, I like to focus on making sure I get the automation aspects correct. The major advantage of doing so is that you can typically trust that it will continue working correctly in the future, or is easier to correct to do so. An additional advantage is that, by forcing your code to fit this mold from the start, it is typically more composable and maintainable going forward.
But, the focus of this post won’t be on automation. It will, however, cover how to leverage testing to ensure that your code doesn’t repeat the same mistakes.
To figure out how to write the right kinds of tests, it’s best to think of what kind of thing it is that you’re trying to test. The answer to that question will tell you what type of test you need to write.
Testing Definitions
A lot of people throw around a lot of loose definitions for different kinds of testing, so I’m going to be blunt from the start and spell out precisely what I mean. I mentally segregate testing into the following groups:
- Unit Testing: “Is my logical unit correct?”
- Property Testing: “Does my logic hold to certain rules?”
- Functional Testing: “Do my dependencies function correctly?”
- Stress Testing: “Does my functionality work under heavy load?”
- Integration Testing: “Does my code play nicely with others?”
- Acceptance Testing: “Does my code meet certain standards?“
Let’s go into each of these in detail.
Unit Testing
“Is my logical unit correct?”
I define unit testing as ensuring that logical units behave correctly. To break that down, ask yourself: What is a unit?
I define a unit as: “Any self-contained piece of code that has no external system or environment dependencies.”
If that definition sounds a little confusing, it may be easiest to explain by considering what isn’t a unit under this definition:
- Any I/O operations. Interacting with files, networking, databases, etc. These are environment dependencies. Depending on where you test your code, this same environment may not be available.
- Any threading or clock operations. These are system dependencies, as they depend very much upon the state of the system that is running the test. It might be hard to believe, but there are still systems out there that don’t support multithreading. Meanwhile, clocks are often highly inaccurate, or the timings are not easily reproducible. If you don’t believe me, just consider how much effort Google puts into timing accuracy at their data centers through the use of local atomic clocks.
- Any other code state that is not self-contained, passed as an argument, or otherwise fully mockable.
If you’re still confused what might make up a unit: Aim for purity.
- Does calling your code always generate the exact same outputs from the exact same inputs?
- Does calling your code cause any side effects on the rest of the code, the system, or the environment that may affect other running code?
If you answer either of these “no”, then you’re not talking about unit testing.
To break it down more mathematically, here’s a suggestion: If you can define the set of all possible kinds of input and the set of all possible kinds of output, then you can effectively write complete test coverage by using the Cartesian product of those sets. In these circumstances, 100% test code coverage is not a unicorn.
“Can I fire ze missiles?”
“No.”
From a unit, the answer is always affirmatively no. You can only ever “initiate a firing procedure on a missile-like structure”.
Property Testing
“Does my logic hold to certain rules?”
Again, let’s start off by asking ourselves, what is a property? My answer to that is: An implicit expectation about your code that must always hold true.
Properties are often not stated explicitly by your logic, but are often the “laws” around which your code has been structured.
Some simple examples:
- If I add one week to any given date, I will always expect the resulting date to land on the same day of the week (assuming a Gregorian calendar).
- If I reverse a list two times in a row, I should expect to get the same list that I started with (assuming the list doesn’t have any sort of undefined comparison rules – often short-circuited as “first come, first served”).
“When I fire ze missiles…”
“42 missiles will always launch.”
This is an example of a property that cannot be violated through structure of the code.
Before we move on, some additional “cool stuff” about the underutilized idea of properties:
- You can often write one test method that covers a wide range of outcomes in one pass. Again, think about date/time objects for examples.
- Some libraries exist that can auto-generate tests for you based on properties! Broad coverage, minimal effort! I think Haskell’s QuickCheck is the best example of this, which has now been ported into many other languages (to varying degrees of effectiveness).
Functional Testing
“Do my dependencies function correctly?”
What is a dependency? Some external thing that your code relies upon.
In practice, functional dependencies can often be one-line implementations – including functional tests. They often simply answer the question of “Can I connect to the thing my code depends upon?” Sometimes, those connections have constraints of their own that must additionally be tested, but if both your code and your dependency have been thoroughly unit tested, the connection point between them often narrows down to a single point of contact to be functionally tested, or one-line failure conditions.
Some examples:
- A download manager: Should always download the file asked of it, or gracefully capture network failure.
- A file writer: Should always write an output file.
- Multi-threading: Should always gracefully handle concurrent access.
What is not a functional dependency?
Verifying these conditions are functional considerations. Many of the edge cases or artifacts caused by failures are often logical considerations that can effectively be vetted by unit testing of your surrounding logic. One example of surrounding unit logic: How does your code handle a failure to download a file after the fact of failure?
“When I fire ze missiles…”
“World War 3 begins.”
Functionally verified!
Stress Testing
“Does my functionality work under heavy load?”
What is a stress (or load) test? Taking an existing piece of functionality that acts on a collection of data or resources and spinning up the size of that collection to thousands/millions/bazillions of concurrent data points; then, observe failure conditions.
Some examples:
- Rendering: How well does the code handle being asked to render way too many elements?
- Networking: How does the code handle many requests? Requests that are too large? Requests that take too long to respond or that “trickle in” data?
These are conditions that are often rarely considered until they are encountered in production. It’s best to include something along these lines in your test harness.
In the worst cases, you can’t always prevent these kinds of problems (often due to architectural or resource constraints). But, if you’ve encountered these situations through testing, you’ll at least have some idea of what it looks like in production, so that it can be more quickly acted upon.
“When I fire lots of ze missiles…”
“Missile tubes don’t clog up or cause explosive backfires in the missile launch silo.”
The last thing you want to do is hurt yourself when it comes time to perform the “real thing”.
Integration Testing
“Does my code play nicely with others?”
What is integration testing? Putting the various pieces of your code together into one consistent whole. “Integration” and “functional” testing are often confused in the minds of developers, so I like to draw a clear distinction between the two:
Integration is where you connect your logical units together with your functional dependencies.
Some examples:
- Connecting multiple units together as a workflow or pipeline.
- Connecting functional dependencies into your code to verify that it still all works.
- These can be as small as a few modules…
- …Or as big as your entire application.
“When I fire ze missiles…”
“A countdown commences.”
“Rockets ignite.”
“42 missiles have liftoff.”
“World War 3 begins.”
Acceptance Testing
“Does my code meet certain standards?”
What is acceptance? The point where a “Go/No go” choice is made on any particular aspect of code.
The rules here can be somewhat more arbitrary or context-sensitive. But! That is not to take away from their importance. These rules are often put in place to ensure some high-profile failure isn’t repeated… or simply to ensure that a “bare minimum” standard has been met for the code to proceed into production.
Some examples:
- Build automation
- If your code doesn’t compile…
- …it won’t release.
- If your code causes test failures…
- …it won’t release.
- If your code doesn’t compile…
- QA acceptance
- If your code violates some basic expectations of QA…
- …your code will be rejected from the “ready to release” state.
- If your code violates some basic expectations of QA…
“When I fire ze missiles…”
“A countdown commences.”
“Rockets ignite.”
“42 missiles have liftoff.”
“World War 3 begins.”
“Safety procedures kick in – missiles abort.”
Parting Thoughts
The different types of testing are inherently related… I think a picture is always best to describe this difference:
Hopefully, you’ve found these distinctions helpful. Of course, healthy disagreements about my various distinctions are always welcome! But, I think it’s important to at least draw those distinctions, rather than to play fast and loose with nomenclature, as this can adversely affect your testing habits and infrastructure.
You must be logged in to post a comment.