There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.
Phil Karlton
I have seen a number of variations on this quote, and it looks as though the original quote only included two of the three hard things. Martin Fowler quotes Phil Karlton here and there is more discussion of the quote here.
I don’t know whether the original quote concerned hardware caching or software caching. Since I’m not a hardware engineer, I’ll talk about software caching. When implementing software caches I have found it useful to apply the Single Responsibility Principle and divide things up along these lines:
- The thing that calculates (fetches, obtains, looks up etc.) the value we want.
- The thing that actually stores the cached value.
- The thing that decides whether the value should be cached or not.
- The thing that decides whether the cache (or a given cache entry) should be invalidated.
I wish I were better at naming things. Naming is important, good naming can show that similar things are connected and dissimilar things are separate. Good names can show patterns in the code that are otherwise difficult to see. When we’re trying to get acquainted with a new codebase, names can make make our lives heaven or hell.
It isn’t always easy to work out what “similar” means. In this comment on my tbb::concurrent_vector post, Adrian points out that std::vector guarantees contiguous storage whereas tbb::concurrent_vector doesn’t. What are the essential properties of a vector that must be maintained by anything calling itself a vector? The lack of contiguous storage in tbb::concurrent_vector hadn’t occurred to me as being noteworthy – hence I wrote a blog post that was confusing. Having a name that makes sense to you is one thing, having someone else look at it helps identify any assumptions you’ve made that you didn’t know you were making.
There is a particular joy in coming up with a good name. On the odd occasions I experience this joy it is usually because I have forgotten (in the sense of rote learning) the name of something but I’ve been able to reconstruct what the name is. At the very least it shows some consistency – my brain is working the same way now as it was when I originally came up with the name. Sometimes the quality of a name doesn’t come to light until we try and name something that’s similar – if the original name is good, we can easily derive a name for the new, similar, thing.
I am sure that off by one errors will always be with us. The half open ranges used by many languages (including C and C++) make the situation better, but not perfect. My first job was programming in FORTRAN 77 where arrays start at index 1. I had to write more +1 and -1 statements in FORTRAN than I have ever had to in C and C++. Even better was when we were calling between C and FORTRAN. Yes, we knew the array indices were off by one, but it wasn’t always easy to remember whether it was +1 or -1.
I have mentioned this piece by Dijkstra before. As well as his theoretical analysis, he also mentions that the programming language Mesa allowed for all four different range conventions, and that the “half-open, start at zero” convention led to fewer errors.