Building APIs, Part 1: Halting-Driven Development

Far too often, we developers go to solve a problem and find ourselves in the mindless drudgery of writing boilerplate. Often, this boilerplate can distract us from the bigger picture of what we’re trying to accomplish.

This is a blog-ized version of a talk I gave recently. The talk was an interactive experience which doesn’t translate directly to text, but I’ll attempt to convey the same concepts.

If I fail, just remember: It’s December! Go play in the snow!

Inspiration

My original talk was inspired by this:

gumball-warning-400x256

What is the Halting Problem?

Most developers don’t like the halting problem. It can be a terrible thing. It takes your nice, pretty code and throws a non-deterministic wrench into everything, because you can’t tell when your program can end.

It’s a problem, like Swiss cheese. Seriously, what are all those holes for? It’s disturbing!

On the other end of the spectrum, we have exceptions. Every programmer hates to see their code explode in a big, pile of exceptional mush.

Most of the time, I find exceptions to be a terrible idea. They’re very often abused to drive flow control, rather than exceptional conditions. Personally, I cringe most of the time I see them in production code, because they can be difficult to control and difficult to test. But there’s a funny feature about exceptions: You can use them to tell your code, “Yes, you will end, regardless of what contract you’re expecting!” and the compiler will happily go on its bit-twiddling way. And you can leverage this to rapidly (and safely!) construct an API.

But first, let’s have a little heart-to-heart about those language critters closest to us: Data types.

Credit Where Credit is Due

Data types are one of those things that we programmers talk about all the time, but far too often gloss over when we’re actually developing. When treated with diligence, they can do a lot more than we give them credit for. They give us a lot more information than we often think. And sometimes, they deceive us.

Let’s Talk About main()

You may have seen this guy before:

int main(int argc, char* argv[]) {
   // ...
}

Anyone who’s spent 15 minutes with C/C++ will recognize this. Let’s focus on the signature here. You’ve got an array of input arguments, a count of those arguments, and a return type. Simple, right?

Well, this is what I would call a poor specification.

There are a number of problems, some of which are better known than others. Let’s ignore the complications introduced by pointers, for those who are less familiar with using them, and let’s just talk about the actual data types being used.

  • There is no constraint on the index into the input values. You can easily request a value from beyond the argv structure.
  • The int type on argc itself doesn’t make much sense. What does it mean to allow a definition of a negative index here?
  • The argv structure is overloaded. The first input will always be the name of the process, while the remaining values are the additional string arguments supplied.
  • The return value is open for programmer definitions, but is almost always abused as another overloaded value. Very, very rarely will you see an executable that treats positive return values as the same category of result as negative return values (or 0). For example, negative values can indicate errors, positive values can indicate warnings, and 0 can indicate success. And this assumes the developer created those definitions with any sort of consistency in mind.

We haven’t even gotten past the first line of code and we’re already full of questions about our data types!

(I actually quite enjoy programming in C. And, in fairness to the language, the main() function was written with the constraints of CPU architecture in mind, not developer friendliness.)

(Every time I hear the word “overloaded“, or find someone describing something in a way that indicates overloaded use, warning bells start going off in my head. It’s almost always the sign of a bad decision being made somewhere. I’ll probably write a post about this some time in the future, because it’s a big problem.)

Be Clear in Your Specification

The most important feature of every API is to be clear in your specification. Remember, developers will spend a lot more time reading function signatures than using them. If you have multiple nuances in mind, make them explicit to the user. Otherwise, you’re just asking for pain in its use.

If you were to rewrite a main-like function in your API, I’d recommend taking each of the points raised above and separating them out into distinct inputs or outputs:

  • Rather than having separate inputs for the size of the argument collection and the collection itself, use a “smarter” data type that provides safe access to both, without providing unbounded access.
  • For special values (e.g., “the process name”), supply them as distinct input values.
  • For multiple kinds of return types, define some mechanism that can distinguish them. There are lots of ways you can do this and is the fodder for more than one post – I won’t go into this here.

Halting-Driven Development

Great, so you’ve read me complaining about types and exceptions, but where am I going with this?

Let’s perform a little exercise. Let’s say you’re a kid in a dorm room and you want to write a little social app to help your peers connect.

First, you might think “I need to display all of the updates from my friends”:

void DisplayPosts() {}

So far, so good! But, here we hit a snag: How do we get our friends’ posts?

Many developers will immediately start thinking about how they’re going to pull the posts out of a remote database, sort them by time posted or some other algorithm, etc.

If you’re thinking of this, stop. You’re getting bogged down in implementation details now, rather than building an API. Let’s figure out what you want to accomplish first before getting lost in those details.

And this is where our data types and exceptions come in:

typedef Post = std::string;

void DisplayPosts() {
  std::vector<Post> posts = GetSortedPosts();
}

std::vector<Post> GetSortedPosts() {
  throw;
}

Let’s carefully walk through what I’ve done here:

  • I don’t know what the end result of a Post looks like just yet, so I made a simple placeholder definition (in C++, using a typedef). When we come up with a better definition in the future, we can easily swap that out.
  • We know that we want our posts to be sorted, so we’re explicit in our API: We name our function to describe its behavior.
  • We’ve also been provided with two extra pieces of information:
    • The GetSortedPosts() function doesn’t require any special inputs. This implies that it has access to all of the information that it needs to perform the operation.
    • It will return a data type with what we assume are sorted posts.
  • Lastly, because we don’t care about implementation details right now, we simply throw and move on. This allows us to continue performing compile checks without worrying about those details just yet.

Just a few characters of extra detail, but when well-crafted, they contain a lot of information, both to us as API designers and to other developers who will consume our API.

Now, let’s say you can imagine your little app becoming really popular and that there will be the occasional spam robot that hijacks a friends’ account due to poor security practices. Luck is with us, as we have a fabulous spam filter to get rid of these:

typedef Post = std::string;

void DisplayPosts() {
  std::vector<Post> posts = GetSortedPosts();
  posts = FilterOutSpam(posts);
}

std::vector<Post> GetSortedPosts() {
  throw;
}

std::vector<Post> FilterOutSpam(std::vector<Post> posts) {
  throw;
}

This new definition also tells some useful things:

  • It expects a std::vector of Posts and it returns a new std::vector of Posts.
  • Based on the name and type signature, a developer would naturally assume that it filters out spam from a given std::vector and returns the remainder.

Lastly, let’s render the first five posts in our collection:

typedef Post = std::string;

void DisplayPosts() {
  std::vector<Post> posts = GetSortedPosts();
  posts = FilterOutSpam(posts);
  RenderNumPosts(posts, 5);
}

std::vector<Post> GetSortedPosts() {
  throw;
}

std::vector<Post> FilterOutSpam(std::vector<Post> posts) {
  throw;
}

void RenderNumPosts(std::vector<Post> posts, unsigned int numPosts) {
  throw;
}

Again, this should be pretty clear. We have a well-named function that describes what it wants to do and give it appropriate inputs. (We use an unsigned int because, as main() taught us, we shouldn’t do silly things like specify a negative number of posts to render when we’re talking about natural numbers.)

This is pretty elementary stuff. But notice what I’ve done with the throw statements. Throw the clever use of non-determinism, we can focus on achieving the end goal before worrying about the internal implementation details.

Once we’re satisfied with the final form of the API, we can go back through and fill these details in. Sometimes, we’ll find that the implementation shows us that we made a bad assumption about the way our API needs to work, but you’ll find that it’s a lot easier to update the API for these kinds of details after the fact while still preserving the clarity of the functionality. This clarity is very easy to lose when writing in an implementation-first manner.

I’m sure there are other names for this process of introducing exceptions as placeholders, but I like to call this Halting-Driven Development. Everyone has their own style of development that they’re comfortable with, but I’ve found the quickest and most correct forms of code I’ve written are often derived from this style.

Examples of Halting in Various Languages

C/C++

void DoAThing() {
  throw;
}

C#

void DoSomethingElse() {
  throw new NotImplementedException();
}

Haskell

functionFoo = undefined

Haskell is especially delicious to work in, because you can use HDD inline as you go. This allows you to ignore the specific details until they’re needed. You can start with a simple undefined as a definition, then push it deeper and deeper into the implementation until everything is defined. The following example will quite happily compile where most other languages will complain:

-- A function that takes a list of Ints, performs some operation on it, then returns a single Int result
letsDoAThing :: [Int] -> Int
letsDoAThing =
  sum          -- Sum up the results
  . undefined  -- We haven't decided how to do this part of the operation yet, so we leave it undefined
  . map (+1)   -- Here we add +1 to each element

You can do something close to the same in other languages, but they can be a bit tricky without really powerful type inference. For example, var in C# can be used for general types, but not for functions, whereas you can usually get away with throwing undefined almost anywhere you’d like in Haskell. That means you can use undefined both as a placeholder for later implementation details of a function and also as a way to stub out a data type implementation when you’re not entirely sure what it will look like yet (like our Post above). The compiler will usually be able to use type inference to figure out how to make things work.

Conclusion

As you can see, HDD can be described a few different ways: exception-driven development, undefined-driven development, etc.

You don’t have to use it, but hopefully you’ve learned a thing or two or thought more deeply about your preferred development style (as well as the data types you use!).

One comment

  1. Pingback: Building APIs, Part 3: Controlling Code Complexity | voyageintech