Snippet Sunday: Mapping Over Large Numbers of I/O Resources

(This is the first in what will hopefully be a series of useful code snippets to solve problems. For each snippet, I’ll try to provide a basic solution to the problem, then ask my readers to post alternatives in the comments.)

Haskell provides some greatly composable functions in its standard Prelude library (accessible by default) that make performing operations on large lists of data very easy.

However, when dealing with certain resources (especially I/O resources, like file handles and network connections), these simple abstractions can get you in to trouble by easily exhausting available system resources.

What follows is a (very) simple method for mapping over large numbers of resources. It’s really just intended as an example for newcomers to Haskell, that may have encountered this problem.

This is a simple loop over a standard mapping abstraction. In this case, we’re creating background processes for a large list of shell commands. Maybe each process is running some analysis on a different file, for example. But, if you run this with too many commands, you’ll exhaust the available file I/O handles for the system.

We’ll simply tell this function what the desired max batch size should be, provide an IO function that returns some handle, and lastly provide the large list of arguments to consume. Our calling method might look like this:

callBatchMap num xs = do
   let f x = do
        (_, _, _, h) <- createProcess $ shell x
        return h
   batchMap f num xs

And our actual batching function could look like this:

   :: (Num a, Eq a)
   => (b -> IO ProcessHandle)
   -> a
   -> [b]
   -> IO ()
batchMap ioFunc maxBatch xs =
   runBatches ioFunc [] (maxBatch, maxBatch) xs
      -- Finish processing the final accumulated handles.
      runBatches _ acc _ [] = do
         mapM_ waitForProcess acc
      -- We've hit the limit of the current batch.
      -- Wait for the current set of accumulated handles,
      -- then move on to the next batch.
      runBatches f acc (0,m) xs = do
         mapM_ waitForProcess acc
         runBatches f [] (m,m) xs
      -- Open a new handle and add it to the accumulated set.
      runBatches f acc (n,m) (x:xs) = do
         h <- f x
         runBatches f (h:acc) (n - 1, m) xs

Again, this is very simple. But, this post is just meant to serve as a small code snippet for solving the problem of mapping over a large list of limited resources.

Extra credit: Define a more elegant solution and post it in the comments! This snippet is meant to serve as a “base definition” of sorts. Let’s see what alternatives you might have in mind. 🙂