A Follow Up to C# Threading Mechanisms

A few days ago, I wrote about various kinds of C# threading mechanisms, along with some Do’s and Dont’s. That spun off an interesting discussion on Facebook, which I’d like to recap a bit here.

Reset Events

When I mentioned ManualResetEvent and AutoResetEvent last time, I mentioned that users probably want to use ManualResetEvents. I thought I’d clarify some of their uses here:

AutoResetEvent is great when you want to constrain availability of a resource amongst consumers. For instance, you may be trying to control access to a connection pool (perhaps for a system device or service that requires solitary access). Or, you might have multiple consumers waiting for the next output of a queue, but you only want the first consumer to act upon it.

ManualResetEvent is great for initialization routines. Maybe you want to signal that an SSL handshake has completed. Or, most perniciously, a scenario I ran into at work involved managed foreground threads accessing native C++ backends at the time that the process closed. Because the threads weren’t running in the background, the memory access errors they were hitting were occasionally causing user-visible crash prompts as well as potential data corruption on the backends. To block access to the native resources (because the threads themselves are spawned from many different sources), I used a combination of ReaderWriterLockSlim objects to reset a ManualResetEvent. When the remaining foreground threads would come along, the event would no longer be set and the threads would be safely halted.

Which brings me to…

ReaderWriterLockSlim

I’d like to clarify that you effectively use ReaderWriterLockSlim objects the same way you’d use a lock, but it is, unfortunately, a slightly more manual process. The easiest way is something like this:

try
{
   myReaderWriterLockSlimInstance.EnterReadLock();

   // ... do some foo/bar magic here ...

}
finally
{
   myReaderWriterLockSlimInstance.ExitReadLock();
}

It’s best to put it in a try/finally block to ensure that the lock gets released. Use of the EnterWriteLock/ExitWriteLock methods should follow the same pattern.

One caution: This locking mechanism is not re-entrant, as lock is. So, be careful not to call it more than once, or you’ll get an exception thrown.

(Also, ReaderWriterLockSlim is a terrible name… It really equates to “read = concurrent operations” and “write = synchronized operations”…)

Software Transactional Memory

This is something quite useful, when made available to the language in an elegant way. Wikipedia has a good write up about it, along with links to various implementations.

One of my friends pointed me to Java’s ConcurrentHashMap (referring to it as a “godsend”). While I can’t claim to be an expert on its underlying implementation, it sounds to me as if it is using STM under the hood, or something approximating it.

STM is heavily used in Haskell, although there are different ways (and constraints) of implementing it in different languages. The general concept is that you have a container with a bias (either read or write, but usually read) and you allow any operations of the biased kind to act quickly, without locks, on the “raw” container.

For the non-bias operation, there are different ways you can do it. It sounds like Java uses a lock against the elements being updated in the ConcurrentHashMap.

If you were to implement STM manually on a container, you might do so like this: The write operation will first copy the elements being modified (a lockless read), modify the elements, then re-insert them. Only the re-insertion in this case has a contention restraint. In the case that one of the modified elements is detected to have been updated by the time the re-insertion happens, the whole copy/update/insert operation starts over – this is what makes it “transactional”. This will repeat as often as necessary to perform the update, which is why my previous post mentions poor performance if there’s not a clear bias to the STM container.

C# tries to replicate Java’s ConcurrentHashMap with ConcurrentDictionary. If you’re still below .Net 4 due to some business constraint (I am such a poor soul), using ReaderWriterLockSlim with decent performance may be able to approximate it.

STM also has an amazing capacity for composition, if the language/library structures it correctly. Simon Peyton Jones has a brilliant tutorial doing so called Beautiful Concurrency.

Parting Thoughts on Thread Performance

I’m leaving this as a “parting thoughts” segment, because it really deserves a post all of its own.

Parallel threads work best when they don’t share data. You can really get a CPU burning hot if the CPU threads aren’t forced to frequently switch contexts, or lock access to common containers too frequently. Some might call that “cheating”, but it’s really a better algorithm design for parallel computation. If you can structure your computation in an independent way and you can aggressively partition your data set, you can really get the CPU cooking.

Programming for a GPU works in a similar manner. You do have to be more careful there, as the GPU is much less forgiving to mistakes in the algorithm. And in both cases, if you get your CPU/GPU running at top speed, you’re likely to become most constrained by the I/O speed of your storage solution, or the speed of your memory bus.

Haskell even has two functions baked in to the core language to help with this; often, all you have to do is write par and seq and the compiler will do the rest (caveat: par actually uses lightweight threads, not CPU threads). The real trick is your algorithm; if you can write the expression tree for your computation in what’s called weak head normal form (which basically means evaluated only so far as to determine the structure of the data, but not to perform the calculation), partitioning the data is usually really easy. You can then just copy the WHNF calculation and a partition of the data to each thread.

The problem is that many languages don’t easily allow you to structure the computation this way (functional languages are really good at it, as is GPU programming). Simon Marlow has a wickedly awesome book all about this stuff (free online!), for you Haskell programmers out there.