Calendars and the commandline

At work we share our iCal calendars using webdav. There are a number of graphical applications running on Linux which can handle this, such as Evolution or Thunderbird.

But who wants to use a big, slow, graphical mail-calendar-kitchen-sink thing for accessing their calendar? Not me, that’s for sure. As with most of the rest of my desktop environment, I want a nice command-line application which does this one job and does it well.

Here’s how I did it.

A command-line calendar application

First things first: a command line calendar application. The good news is that there are loads of options here! From the venerable old-school-unix tools such as GNU cal (which seem most suited to generating calendar output rather than scheduling appointments), to CLI interfaces to more modern cloud-based calendars (e.g. googlecl), you are really spoilt for choice.

Unless, it seems, you have my specific use case, which is (apart from a command line interface), the ability to read and write iCal files. This narrows the field somewhat.

My current choice is calcurse, which is an ncurses command line application with similar key bindings to vim. By default it stores calendar data in its own format, but it can import and export to iCal.

Although calcurse is working for me for now, it’s worth exploring the options. For example, this blog post about mutt and khal looks quite promising as it seems to solve a bunch of issues I’ve had to manually work around with my calcurse setup.


The ability to read and write local iCal files is a good start, but our calendar data is shared across the company using webdav. This isn’t something that calcurse supports itself, so I had to find another solution.

Happily, the linux davfs file system provides seamless access to remote webdav resources. Setup is simple: I just dropped the following into my /etc/fstab, which allowed me to mount the webdav filesystem locally as an unprivileged user:

# davfs /home/tom/docs/dav davfs user,noauto 0 1

Once mounted, I can access my calendar (and those of my colleagues) as .ics files.

Fit and finish

Having got these two fundamental pieces working, I needed just a little bit of integration to make something convenient for use. I wanted a convenient way to be able to view other people’s calendars as well as my own, and to automatically import and export between calcurse’s native data format and ical. I also wanted to automate the mounting of the webdav directory to save me doing that by hand every time I wanted to access a calendar.

Naturally enough, I wrote a simple bash script to do these things. The script has some assumptions baked into it. Firstly, whatever is passed as the first argument to the script is used as the name for the ics file in the webdav repository. This aligns with how our company calendars are arranged. Secondly, the script assumes that the default user is me, that is: “tom” (this becomes important toward the end of the script).

Beyond that, the script’s main job is data-marshalling. On startup it uses calcurse to convert the remote ics file into a local temporary file in calcurse format. This is the file that gets used while working on calendar stuff in calcurse. The at the end of the session calcurse is used to export the file back out to ics format and the script copies it back to the remote webdav repository — but only if it’s my calendar we’ve been looking at. I don’t want to try to overwrite my boss’s calendar by accident! There’s also a bit of handling for unclean shutdown, in the case that calcurse is shut down without flushing data out to webdav for whatever reason. In this case we don’t overwrite the local working data file, on the basis that it could conceivably contain unsynchronised calendar modifications that need flushing to webdav.

Here’s the script in its entirety:



log() { echo "$@"; }
err() { log "ERROR: $@" 1>&2; false; }
die() { err "$@"; exit 1; }

if test $(pgrep calcurse | wc -l) -eq 0
log "umount $DAVPATH"
umount $DAVPATH

trap "on_exit" EXIT

mount | grep -q $DAVPATH || {
err "webdav path $DAVPATH doesn't appear to be mounted"
mount $DAVPATH
test -n "$1" && USER=$1

test -f $icsfile || die "cannot locate calendar file for user $USER"

# Detect unclean shutdown; don't overwrite local data
if test -f $pidfile
if pgrep calcurse &> /dev/null
die "calcurse is already running"
err "detected unclean calcurse shutdown, don't flush local data"
rm -f $pidfile
rm -f $calfile $todofile
echo -e "y\nn\n" | calcurse -c $calfile -i $icsfile &> /dev/null

calcurse -c $calfile

if test "$USER" = "tom"
echo "update dav"
calcurse -c $calfile -xical > $icsfile


This scripted solution is working well enough for my day-to-day use, but really having to copy the calendar data between local and remote filesystems is a bit of a hack, and that does introduce some rough edges. The main problem is that calendar updates are only written when the script exits (so you need to quit calcurse to send updates to webdav to share with colleagues). It’s not really a problem in practice since I don’t use the calendar heavily, but I can see it becoming tedious if I used it more.

In addition to the synchronisation issue, there is a slightly more theoretical problem of a startup race condition around the script’s checks for an existing calcurse process. It would be easy enough for multiple parallel runs of the script spawned in quick succession to fail to detect one another and lead to data loss or corruption. In my usage so far it’s not an issue as tend to run only one instance of the script at once. In the worst case I might start up another calendar in another terminal if I forget I have one already open, but the script catches that. The main issue really is that I can’t view multiple calendars at once: I can see that if I needed to schedule a meeting between multiple people that could become quite painful quite quickly.

All this said, this script plus calcurse have been meeting my (admittedly very limited) calendar requirements for a good year or so now. And while it may have a few rough edges, it bought my calendar to the command line and out of the clutches of Evolution or Thunderbird, which can only be a good thing!

Happy New Year! An integer underflow bug in the debug code…

A colleague pinged me on IRC earlier to ask whether the constant 0x30303030 meant anything special to me.  He was trying to track down a crash which was popping up on one of the embedded platforms we support ProL2TP on.  The code was falling over in the rbtree code I’d added to our utility library a while ago.  gdb told us nothing useful: the stack looked completely bonkers:

#0  0x0808e916 in usl_rbtree_remove (tree=0x30303030, node=0x30303030) at usl_rbtree.c:435
#1  0x30303030 in ?? ()

I suggested turning on debug to see whether anything popped out in the logs.  He told me he already had it enabled, and sent me a syslog snippet from around the time of the crash.  It looked something like this:

Jan  6 11:42:12 OpenWrt prol2tpd: SYSTEM: usl_rbtree_remove : pre-removal:
Jan  6 11:42:12 OpenWrt prol2tpd: DEBUG: usl_rbtree_remove(414) : match : node 0x9010ef8
Jan  6 11:42:12 OpenWrt prol2tpd: DEBUG: usl_rbtree_remove(414)    color : BLACK
Jan  6 11:42:12 OpenWrt prol2tpd: DEBUG: usl_rbtree_remove(414)    key   : 0x02000000ffffffc0ffffffa801ffffffd3000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Jan  6 11:42:12 OpenWrt prol2tpd: DEBUG: usl_rbtree_remove(414)    left  : (nil)
Jan  6 11:42:12 OpenWrt prol2tpd: DEBUG: usl_rbtree_remove(414)    right : (nil)

This rather verbose logging is detailing the rbtree code to remove a node from the tree. The interesting thing was the key — it was huge, and consisted mainly of zeros. Did that make sense?

In the context of the rbtree code, the key is a value used to determine whether a given node is logically greater or lesser than another. The rbtree code doesn’t know or care about what the key actually is, since the user supplies a callback it can use to compare two keys.

So far so good. In this case the key was actually a 128-byte structure: large, but smaller than the 270-odd characters we were printing out, even allowing for the fact that each byte would be represented as two hex characters. But even more suspicious was the fact that the printed key consisted mainly of zeros. And the magic number 0x30 which was scrawled across the stack corresponds to the ASCII code for “0”. This looked like a smoking gun! The next question was: whose finger was on the trigger?

It was high time to look at the code. The function responsible for formatting the key as a string to print to the debug log was deceptively simple, and easy to extract into a test program. Here it is, as originally written. The function name bbfmt means “binary buffer format” in C-programmer speak, and the main function shows how it is called:

#include <stdio.h>
#include <string.h>

static int bbfmt(void *ain, size_t nelem, char *aout, size_t nchar)
        char * _ain = ain;
        memset(aout, 0, nchar);
        while ((nchar>=2) && nelem) {
                int n = sprintf(aout, "%02x", *_ain);
                if (n < 0) return 1;
        return ((nchar<2) && nelem);

int main (int argc, char **argv)
        char obuf[32] = {0};
        char key[] = {
                0xf1, 0x04, 0x11, 0x42, 0x59, 0x66, 0xe8, 0x3d,
                0xf1, 0x04, 0x11, 0x42, 0x59, 0x66, 0xe8, 0x3d,
                0xf1, 0x04, 0x11, 0x42, 0x59, 0x66, 0xe8, 0x3d,
        bbfmt(key, sizeof(key), obuf, sizeof(obuf));
        printf("%s\n", obuf);
        return 0;

The idea of this code is pretty simple. We have an arbitrary array of bytes we want to print as hex. The bbfmt function takes a pointer to that array (ain), and prints each byte as two hex characters in aout. It is careful to ensure the output buffer is initialized with 0 before it starts work (the memset call), and it is likewise careful to retain a character at the end of the output buffer as a NULL-terminator for the string (the initial nchar–). Each time around the while loop it ensures that there are at least two bytes left in the output buffer for sprintf to print into. It even makes sure to check sprintf‘s return value, just in case!

So what could possibly go wrong?

Well, a gold star if you see it straight off. The secret is in sprintf‘s return code, which is used to decrement nchar, which keeps track of the number of characters left in the output buffer. bbfmt is written assuming that sprintf always returns 2 (which it should do on success, because that’s how many characters we asked for in the format string), or less than zero in the case of an error. But what if it returns more than two?

Well, what indeed? Since nchar is of type size_t, it is unsigned. So if (for example) nchar is 3 and we subtract 6 from it, we get an integer underflow and nchar wraps to some large number. That would mean we’d overflow our output buffer and keep writing until we’d processed all the bytes in the input buffer. And furthermore, since the output buffer is a static array on the stack, overflowing it might well end up smashing the stack!

Sounds convincing, right? But what could possibly persuade sprintf to return something other than 2? The key (boom, boom) is in the signedness of the value passed to it. The %x format takes an unsigned integer, but we were passing it a signed char value promoted to an int. Depending on what was in the input buffer, the sprintf code might end up interpreting the number as a very large unsigned value, hence printing more than the two characters we’d asked for.

Around this point in the investigation, foreheads were being smacked from the chilly moorland of Bradford all the way to the blasted heath of Leicestershire. We modified the code to declare _ain as a pointer to an unsigned char, and lo, the bug was fixed.

Lessons learned

The lessons here, for me, are two-fold.

  1. Donald Knuth said: premature optimization is the root of all evil. I’ve no reason to argue, but I say that anything involving strings in C is probably the trunk of all evil, or at the very least one of the major limbs. Be very careful.
  2. Test code needs testing. There’s no such thing as “just” debug code, since even debug code has the capacity to soak up time if it doesn’t work correctly. bbfmt looked well written, but it hid a critical bug at its core, and had I but tested it sufficiently we might have saved a lot of head-scratching today.

Build farms and BASH queues revisited

Quite some time ago, in my post on build farms and BASH queues I described a simple continuous-integration style build server written in BASH. That post concentrated on the IPC queue mechanism I used in my scripting.

Well, since I wrote that post, the build server code has been made available and now resides on github:

So now you can go and check it out in detail!

How it all works 1: theory

Now that babs can be viewed in its entirety, I thought I’d write a short overview of how the system works. For a BASH script it’s reasonably sophisticated, while as a program it is fairly simple, making it an interesting study.

The design of babs revolves around a few key concepts:

    • Configuration is read from a file written in the INI file format. All configuration lives in there, from specification of the repositories to monitor, to a definition of the machines to use for building code.
    • After the master server has been started (e.g. from an init script), it can be prompted to do work by adding events to its input queue. The babs script takes command line arguments to make that easy. For example, you ask the server to scan the repositories it is monitoring like this:
      babs scan
      Within the babs codebase, event queues are just files on disk. Write access to the files is serialised using lockfiles. The server can block on events using inotify to sleep until the event queue file changes.
    • The master server has a “build pool” which consists of a number of machines that will run build jobs. Jobs are shared out between machines on the basis of who is least busy (babs’ idea of “busy” is based on how many jobs each machine has outstanding).  The job runner machines each run a slave babs process, and work around the same input queue mechanism that the master babs process uses.  Communications between the master and slave machines is carried out over ssh using password-less keys.
    • babs maintains build state information in a number of lists.  As with the event queues, lists are just files on disk, protected using lockfiles.  Each list entry consists of a line of text, starting with a unique ID, and followed by space-delimited arbitrary text.  Different babs lists contain different sets of information.
    • The master server provides various sets of information to the user via. a command line interface.  The babs scan command above is an example of one such interface — although in real life you would probably run this from a repository checkin hook or a cron job.  More useful for human consumption are commands such as:
      babs history : to show the build history to date
      babs poolinfo : to show information about the babs slave build pool
      babs requestbuild : to request a build of a particular code revision

How it all works 2: reality

So much for the theory.  How does the process of building a piece of software work in reality?  Let’s follow the process through from start to finish.

In the beginning, the administrator creates the babs .ini file, which details the information about the repositories to check for updates, the scripts to run to actually process a build, and the pool of machines babs can command to carry out the building.  While they’re at it, the administrator also sets up a cron job which is responsible for periodically running babs scan on the babs master machine.

When the babs master machine performs its first scan, it looks up the last revision of the code that it built in its list of builds.  Of course, it hasn’t carried out a build at all as yet, so it needs to build the latest code the repository has to offer.

Now that babs has a build to process, it needs to pick a machine to run the build on.  To do so, it iterates through its list of job runner machines, and asks each in turn how long its job queue is.  The machine with the shortest queue is picked for the task.  In the case of the first ever build, none of the job runner machines will be busy, so the first machine on the list will be picked.

Once a build machine is selected from the pool, babs creates a set of scripts to tell the build machine what to do.  These scripts consist of a build script, and a report script.  The former consists of the build script the administrator originally created for the project, plus a bit of babs boilerplate code to set up a build directory and a some useful environmental variables (e.g. for the code revision).  The latter is a simple “callback” script that the build machine can use to report his results back to the babs master server.  The master server then passes the build and report scripts to the build machine, makes a note of the fact that the build is in progress, and goes back to sleep.

At this point, the story moves over to the build machine.  Its job is very simple really: it runs the build script, capturing the output to a build log, and then reports the results back to the babs master using the report script.  The build itself may be arbitrarily complex, since it’s driven entirely by a project-specific build script.  Neither babs master nor the build machine care about this: they just run the script.

When the build machine reports back to babs master, the updates its list of builds in progress to note that the build has been completed, and pulls the report from the build machine to save it for posterity.  Finally, it performs some internal housekeeping (for example, updating its build history file), and then (optionally) emails a build summary to one or more recipients.  Then it goes back to sleep again waiting for the next scan event.


There you have it!  A multi-process, event-driven, CI build server/client script, implemented in around 1600 lines of shell (1300 lines if you ignore comments), which provides somewhat-reusable libraries for queues, lists, events, and .ini file parsing.

I’m sad I never really got a chance to use babs in anger: the testing I performed during development on small pools of build machines suggested it would work fairly well.  However, it was a fun project to implement, and now that the code is available to everyone I hope someone will find it useful, either as a fun project to play with, or as a build server for when they absolutely need something quick, to the point, and implemented in BASH :-)

Do it, or don’t do it

Who can possibly remember all the tender words, bon mots, and loving phrases that make up even a single day of married life?  Not me, that’s for sure.  But sometimes certain sentences lodge in the brain, and I’ve had one echoing in my mind for a while now.  My wife and I were discussing some minutiae of the daily grind, possibly something about who was slated to change the next nappy, when Flora gifted me with this pithy insight: “Do it, or don’t do it”.

Although I’m being a bit frivolous in my presentation, this little sentence has stuck with me, and as I’ve thought about it more over time I do believe it to have a certain profound truth.  In life you may do something, or you may not do something: and all else is just an echo of your action or inaction.

Open Source, Raspberry Pi, Codecademy, and a determined 13-year-old

This video of 13-year-old Amy Mather’s presentation at the 2013 Raspberry Jamboree in Manchester is inspirational:

It’s all very well having iPads and iPhones… I just wanted to know how it works, and how I could get it to do what I wanted it to do rather than what the people who are working for Apple or for Android wanted it to do

That’s the Hacker Spirit, right there :-)

What makes a good tester?

On a recent project, a test engineer commented to me: “Sure, I’ll open a bug, but I won’t be able to include information on how to reproduce it because I don’t know what’s causing it”.

As a developer, that’s a frustrating thing to hear. Of course we don’t know what’s causing the problem at this stage, otherwise we’d probably have fixed it already. Surely it’s the tester’s job to figure out the conditions under which the bug occurs?

Well, maybe, and maybe not. A lot of testers I’ve worked with haven’t seemed to see things that way. They’ve simply run through the test procedure, noted which things worked and which didn’t, and called it a day. While that approach is useful to an extent (indeed, it is necessary to get a full picture of the state of the software at the point of testing), it stops the testing process at exactly the point I think a really good tester starts to come into their own.

When I’m working with a tester, I want to feel like they’re providing the project with some Yin to the Yang of development work. We should ideally be involved in a kind of a dance: I am trying to build something with no faults; the tester is trying to find the faults in the thing I build. There should be a kind of balance there; an ongoing collaboration to ensure that the output product is as good as we can make it.

Beyond that, the testers I want to work with are proactive. They’re the kind of people who will dig into a problem, carrying out experiments to discover more about the nature of the fault. They’re the kind of people who will look beyond the obvious to discover the subtleties of a bug. They might even dig into the source code to see if they can see anything wrong.

The testers I want to work with will actually be developers in their own right. They may not be working on the project code itself, but they should be working on their own test automation tools. Each piece of our project’s behaviour that is captured in an automated test case removes a little more manual test drudgery and frees little more time for more creative work.

Finally, the testers I want to work with see each new bug in the database as a success, not a failure. If you see a bug as a bad thing, you tend to shy away from them and try to pretend they’re not there. But in reality, a bug captured under test is a bug that isn’t making it out to our customer. Far from being a bad thing, a bug captured and logged before the product is released is a success of the team and the process.

Neither a rockstar nor a ninja

I watched the largely-fantastic promotional video for the newly-launched earlier this week. If you’ve not watched the video or visited the website, is another project which aims to get more kids coding. It’s a great idea, and it heartens me to think that my daughter’s generation may grow up in a world where the ability to code is as common as the ability to write. That generation may truly be the master of the computer, that than its slave.

But while I embrace the concept, I have some reservations about the tone of this promotion. Just as I’ve felt slightly uncomfortable over the last few years as I’ve skimmed job ads for various trendy startups calling for “code ninjas” or “rockstar developers”, I’m uncertain as to exactly what kind of message is giving out to kids. The promo video, alongside many fine and positive messages, also typifies what I think are some common, but misleading ideas.

Now, before I criticise, I should reiterate that seems to me a fine idea, and the video is largely great. I think it is inspirational, and it does give some idea of the excitement and the power of coding. It makes programming look like something that everyone and anyone can and should have a go at, and that’s fantastic.

Looking beyond those positive messages, though, I start to struggle. As the camera tracks through glossy shots of attractive young folk wheeling around on scooters in offices jam-packed full of video games, free food, band rooms, and tastefully graffiti’d walls, I’m forced to wonder when I last even saw an office like that, let alone worked in one. As Gabe Newell enthuses about the developers of the future having magical abilities, or Drew Houston describes coding skills as being akin to having “super powers”, I’m thinking maybe they’ve had too much coffee. And when Will I Am ends the piece with the observation that “Great coders are today’s rock stars”, he’s just making the same lazy comparison as the job ads I mentioned before.

Where these comparisons come from is difficult to determine. Rock stars deal in cocaine, groupies, and smashed hotel rooms. We all know this. Developers are more Apple Mac, obscure discussion about Python exceptions, and pints of real ale. There is really very little overlap. The same goes for the code ninjas that startups everywhere fantisise about hiring. While there are coders who wear a lot of black and obsess over martial arts, it doesn’t actually make them ninjas. None of these people have been called upon to assassinate prominent figures in feudal Japan.

The real frustration of these foolish and nonesensical labels is that we don’t need them. Coding can and should be promoted on its own merits, not by pretending to kids that they’ll be the next Led Zepplin.

Here’s the truth, kids: life as a coder is grand. I get to build fascinating things that will be used by thousands, perhaps millions of people. My job has taken me all over the world, from Calfornia to New Zealand. I’ve never been out of work since graduating from University nearly ten years ago, and based on my standard-issue crystal ball there’s no reason to think of the future as anything but rosy. Right now, I’m working from my home office on one of the coolest Open Source projects there is, the Linux Kernel, keeping my own hours and playing with my daughter in my coffee breaks. I’m not a rock star or a ninja, but otherwise things are just fine.

So if you enjoy working with computers, and the kinds of creativity that affords you, then by all means think about a career as a programmer. At the very least, you should give some coding a go, whether in a class at school or using one of the websites linked above. Programming can be really good fun, and even if you don’t end up changing the world (or even just getting a job!) off the back of it, it will still be a useful skill to learn, and you’ll gain some insight into the world around you.

But don’t be fooled. Coding is cool and fun and amazing. But it’s not magic. It’s not a super power. It takes work, and effort, and dedication, just like anything else that is worth doing. The people that work in those glossy offices with the video games? They have to actually knuckle down and bang out saleable stuff, just like everyone else. At the end of the day, work is work, and some of it is less fun than other bits, and sometimes you just have to grit your teeth and get on with it. And no number of skateboarding ninja rockstars scoffing free food and jamming in the band room are going to change that.

How to debug in seven easy steps

OK, so that title is a lie.

This article won’t teach you how to debug, because the simple answer to the question “how can I debug this code?” is this: use your intuition. Intuition comes from years of practice and directed learning, not from a blog post.  But where intuition fails, science prevails, and that is what this post is really about: a scientific approach to debugging software. This approach may be slow, and it may be tedious, but it does work.

Step 1 : look, but don’t touch

When you first hit a bug, you’ve got a golden window of opportunity to make observations. Don’t waste it! Forget debuggers and text editors, your primary tools at this stage are a pen and a notebook. Put one of each in your hands and leave the keyboard alone while you look at what’s going on, and write everything down. I’m not even kidding! Write it down, on paper. This will force you to slow down and actually look at the system, as well as providing a record for later that you can use to sanity check your hypothesizes about what’s causing the problem. The most important thing to do at this stage, besides careful observation, is to keep an open mind. Don’t even think about what the cause of the bug might be. If you do you’ll subconsciously ignore information that doesn’t correspond to your initial analysis, and you can’t afford to miss details at this stage.

The first step in the process of observation is to note down everything you can observe without interacting with the system. Did the code core dump? Did it output anything on stdout? Does it appear to have locked up? What happened just before the event? What state is the rest of the system in? Are there unusual peripherals attached? Are you running a special build? The details will depend on your system, of course, but you need to get it all down on paper.

Once you’ve captured all you can without touching the system, you can move on to making unobtrusive probes of system state. By this I mean you can run tools to show you what the CPU is doing or which processes are using memory. You can look at logfiles of other applications to attempt to capture any IPC interactions leading up to the bug. You can look at working files in the filesystem, examine timestamps, check open file descriptors, and so on. Use your imagination and your domain knowledge to capture more information without directly changing the state of the system. Just as before, make sure you write everything down!

Finally, when you’ve captured absolutely everything you can extract without changing anything, you can start probing more invasively. At this point, you can try attaching to the process with a debugger, or analysing the coredump. You can try sending signals to applications that appear to have locked up. You can start removing lockfiles, unplugging hardware, and poking buttons. In short, the gloves are now off and you can do what you like. By the end of this the system state will be completely destroyed, and your notebook will be full of observations.

Step 2 : come up with a hypothesis

I hope you enjoyed that time at the keyboard, because now you’re going to walk away from the computer for a bit. Take your notebook and pen, find a quiet place, and sit down with a coffee for a quarter of an hour to review your notes. Maybe get a colleague in to bounce ideas off. Your goal now is to come up with an idea about what’s going on with your system. Once again, try to keep an open mind! Don’t shortcut the process by running at the first idea you have: instead, take the time to fully think out the whole of the domain and figure out what could possibly account for the observations you’ve made. You should be able to come up with loads of ideas. I want everything from the obvious (“I think we overran our string buffer”) to the outlandish (“I think the kernel’s corrupted the VM page table”). Get them all written down in your notebook.

When you’ve finished brainstorming all these ideas, it’s time to wander around the lab for a bit to decide which seems most likely. At this point you’re engaging your intuition, hopefully for the first time in this process. Use your experience to filter your previous suggestions into three categories, ranging from highly likely to highly unlikely. Then pick your favorite from the pool of highly likely causes. This idea is now your hypothesis, so write it in your book under the heading “Hypothesis 1″.

Step 3 : design an experiment to validate your hypothesis

Now you’ve figured out your hypothesis, please refrain from running back to your computer to hack the code. You’re not ready to code yet! Instead, go back to the quiet place you used to brainstorm possible causes, and turn to a fresh page in your notebook. What you need to do now is design an experiment. If this sounds a lot like a high-school physics lesson, you’re probably getting the hang of this ;-)

What you’re looking for from your experiment is a set of output data which will tell you something about your hypothesis. Ideally they should either prove or disprove it, so think about what you’d need to do to conclusively demonstrate your hypothesis to be true or to be false.  Write these things down. Think also about what sort of quantity of data you need to give you confidence in your results. At this point you don’t know whether the bug happens regularly or is a freak occurrence, so you’ll need to gather a “control” dataset to compare your experimental results against. Write down exactly what data you need to gather, and in what quantity. Write down how you’re going to capture the data and how you’ll store it. Make notes on any special equipment you’ll need, and what versions of software you propose to test.

Step 4 : run the experiment, and capture the results

Happy days: it’s your chance to go back to your computer! But don’t get giddy at this point. All you need to do is run the experiment you just designed. Make sure you capture the data you need, and make sure you note any extra observations you make during the course of the experiment. But don’t do anything more. In particular, don’t start trying to test something else while you’re at it. Stick to the script: run the experiment.

Step 5 : analyze the results

Here’s the fun part. Take a look at the data you’ve gathered, and figure out whether they prove or disprove your hypothesis. If the results are conclusive, then you’re done! You can code up a fix, check it in (along with your experimental results!), and move on. But even if the results are not conclusive, that’s still a valid and useful outcome. You should now know more about the problem than you did previously, and you should be able to design a further experiment to glean yet more. Whatever your analysis, it will probably not surprise you to learn that I want you to write it down in your notebook. Read it back to yourself and check it makes sense. Remember you’re doing science here, and a key part of good science is peer review. You should be able to present your notebook to your most esteemed colleagues with pride and confidence, so make those notes good!

Step 6 : rinse and repeat

By this stage in the process, you may have solved your problem. But if you have not, you will have at least uncovered something more about the nature of the problem. In this case, your next step is to retreat from the keyboard once more to review your previous list of hypothesizes in light of your new knowledge. You may be able to strike some from the list, or add new ones. You’ll almost certainly be able to move some between the three categories of likelihood that I proposed earlier.

While you’re reviewing your progress, try your best to get out of the mindset of investigating your original hypothesis.  Instead, return to the open and objective outlook you had before when you were doing your brainstorming.  With this open mind, decide what hypothesis to explore next, and then head on back to Step 3 for the next stage of the investigation.

Step 7 : there is no step 7

There is no step 7 because steps 1 to 6, applied enough times, will solve your problem.  It may take you a long time — but eventually you’ll get there.  That’s the power of science!

Despite the surety of this debugging method, I bet you a mixed bag of sweets that you don’t know anyone who actually debugs like this. This is for one of two reasons. Either your colleagues are just running in circles and debugging by suspicion (“back that last change out, and see if that fixes it!”), or they’ve done enough debugging work to be balancing intuition and science in their heads all the time as they work. This latter possibility is some trick to pull off, but it’s basically what all software engineers do. The trouble with this is that it’s really easy to lose time through failed intuition or biased thinking. In my experience, it’s rare to avoid these pitfalls entirely. Some wizards can do it, but mere mortals may struggle.

With that in mind, I recommend giving this method a try the next time you’re faced with a difficult bug. At the worst, it’ll take longer than a purely intuitive approach. But at best it’ll yield incremental progress as opposed to intuition’s random stabs in the dark. I know which one I prefer.

Home recording

When I was at secondary school I managed to get my hands on a four-track tape machine for the duration of the Easter holidays. I used it to write and record all of my GCSE music composition work over the course of the two-week holiday. It was a magical time: I had a recorder, a guitar, a computer for producing noises using simple tracker software (the venerable and most excellent FastTracker II), and lots of uninterrupted time. I loved the flexibility and immediacy of having a studio right there in my room, ready to go at a moment’s notice. It wasn’t exactly a high-tech setup, but the accessibility more than made up for that.

Fast forward a few years and I was at University, in the process of realising that computers had become powerful enough to easily take on the duties of the old four-track tape machine. In fact, they did even more. Without a great outlay on specialised hardware it was possible to assemble, in software, the kind of studio that would have made the early multitrack pioneers feel faint. I thought about that quite a lot, and came to the conclusion that this progress was bringing a kind of musical meritocracy. Multitrack audio and sophisticated mixing used to require an expensive studio. Now it required a modest PC. Worldwide distribution used to be the province of the large record companies. Now worldwide distribution was there for anyone with an Internet connection.

The revolution, however, was not entirely without its limitations. Sure, I could record very easily with a simple PC and a cheap microphone, and indeed, my first forays into completely acoustic recording employed my guitar tuner’s built-in microphone. But while the results had their charms, they weren’t going to blow any minds on the fidelity front. So I came to realise that one needed at least some hardware. At a minimum, a decent PC, a decent multiple channel soundcard, a few microphones, a mixer, and some monitors. With such a setup I was able to record at a quality far surpassing the demands made by my musical talents, and distribute the results on my website for a low outlay.

Over the course of not very many years, then, the world of recorded music had gone from being something only rich corporations can afford, to being something that hobbyists could afford.

The interesting thing is that the trend doesn’t seem to be stopping. Whereas five years ago a decent audio interface would cost multiple hundreds of pounds, these days the redoubtable Mackie sell high-quality USB interfaces (which could pretty much cover both sound card and mixer roles in a bedroom studio setup), for less than a hundred pounds. And you can upload your music to Youtube for instant exposure, with no monthly hosting costs necessary.

What this means for music, I’m not entirely sure. It’s not a new observation that the traditional mechanisms driving the music industry are losing their relevance. But lowering barriers to entry, and reducing the transaction cost of participation can only encourage musicians to make more music, whether they’re world famous mega stars, or unknown bedroom strummers. In five or ten years time we might be listening predominantly to music recorded right there in the homes of our favourite musicians, captured by chance simply because recording is so darn cheap now it doesn’t make any sense not to have tape rolling every time you play. I’m looking forward to hearing what that sounds like.