I recently wanted to implement a simple build farm for our build servers at work. Currently we have a pool of three machines whose job it is to watch the repository for checkins on various projects, and build each update to ensure that the build is still in working order.
Our present implementation has a per-project script which runs on each machine as a cronjob. This script is responsible for checking the SCM for updates, checking out updated code, running the build, and emailing the results out to the project team. So far, it works well, but it has a number of limitations. Firstly, a lot of the code for checking the SCM, checking out code, and emailing people is common across the per-project scripts. While this code doesn’t change much, it’s a shame to duplicate it unnecessarily. Worse than that, however, is that each machine in the pool runs only one project build. If one project has a busy day, then that machine may be swamped while all the others are idling. We could get a greater build coverage by sharing the load between all the build machines.
How best to do this? I decided early on that a master-slave configuration would be easiest to implement and administer. All the configuration for the builds could live on the build master, while the slaves could be relatively “dumb” machines easily added to (or removed from) the pool. I also decided that the nature of the managing and running the build processes was sequential in nature. You check the SCM, you check out some code, you run a build, you email people.
Beyond this basic characterisation of the task at hand, I also had some restrictions which limited my implementation choices. I needed to use a language which would let me implement the build farm, meaning that a scripting language was going to be the best choice. I also wanted something that could be easily maintained by the whole team. These two restrictions combined to suggest shell script as the most sensible choice of language. This wasn’t such a bad idea: more or less every Linux box has BASH, and it’s easy to use BASH to leverage other common Linux command line tools.
With these restrictions in mind, I came up with the basic design of the system. The master machine would be configured via. an ini file which would specify the projects to be built, and the machines in the build pool. The master would be an event driven system centred around an input queue. Firstly, a “scan” request on the input queue would trigger the master to examine the project SCMs for code changes. If any were changes were detected, the master would generate a build script for that project (based on shared library code combined with a small piece of project-specific code) and send the script to a job runner machine for processing. Once the build had completed on the runner machine would generate a “job complete” event on the master’s input queue, which would trigger report emails from the master to the project team. The inter-machine communications would be managed using ssh and scp.
The tricky bit turned out to be implementing the queues, which are quite an essential element of the system. The canonical shell IPC mechanism is the named pipe (fifo). The trouble with fifos, insofar as my design goes, is that it is necessary to have a process listening on a pipe in order for a blocking write into the pipe to return. Try it for yourself!
# Create a fifo, then write some data into it # This will block until you read from the fifo tom@gibbon:~$ mkfifo /tmp/myfifo tom@gibbon:~$ echo "Hello world" > /tmp/myfifo # Now open a new terminal and read from the fifo tom@gibbon:~$ cat /tmp/myfifo Hello world # Your original echo command will now return in your first terminal |
This wouldn’t work too well in my scheme because I was looking to push events around the system using ssh. If the job runner wanted to report the results of a build to the master and had to wait around until the master was ready to read the result, that wouldn’t be ideal.
If fifos were out, then, perhaps a regular file would do. If the queue was a plain old file, it would be easy enough to append data to the end of the file, and also easy enough to read data from the front of the file. Sounds good so far. The trouble with a regular file, however, is two-fold. Firstly, you can’t tell when something has been written to the file without periodically polling it, something like this:
# Create a regular file, and wait for it to grow tom@gibbon:~$ touch /tmp/myqueue.txt tom@gibbon:~$ while true; do test $(stat -c %s /tmp/myqueue.txt) -gt 0 && break; done # Now open a new terminal and echo into the file tom@gibbon:~$ echo "Hello world" > /tmp/myqueue.txt # Your while loop will now return in your first terminal # Note your CPU use has rocketed because of the spinning while loop! |
This works reasonably enough, especially if you add a short sleep in the while loop to prevent hammering the CPU. Polling is a bit ugly, though. It would be much nicer if you could somehow block on the file changing. Happily, there is a solution, in the form of the excellent inotify-tools, specifically inotify-wait:
# Create a regular file, and wait for it to be modified tom@gibbon:~$ touch /tmp/myqueue.txt tom@gibbon:~$ inotifywait -e modify /tmp/myqueue.txt # Now open a new terminal and echo into the file tom@gibbon:~$ echo "Hello world" > /tmp/myqueue.txt # inotifywait will now return. Note your CPU hasn't been # working overtime :-) |
The second part of the two-fold trouble with a regular file is that two processes can modify the same file at the same time with unpredictable results. We need some kind of locking infrastructure. The following approach works well, making use of the shell’s “noclobber” mode to reduce races between checking the file exists and writing to it:
# $1 -- filename
lock() {
while ! ( set -o noclobber; echo "$$" > "${1}.lock" ) 2>/dev/null
do
sleep 1
done
}
# $1 -- filename
unlock() {
rm -f "${1}.lock"
}
|
Note this is a “busy” lock — that is, we’re periodically doing something while waiting for the lock to become free. In this case, checking to see whether the lockfile exists. This can be improved upon by leveraging inotifywait once more:
# $1 -- filename
lock() {
while ! ( set -o noclobber; echo "$$" > "${1}.lock" ) 2>/dev/null
do
inotifywait -e delete_self "${1}.lock" &> /dev/null
done
}
# $1 -- filename
unlock() {
rm -f "${1}.lock"
}
|
Now the locking process simply sleeps until the lockfile has been deleted by the process holding it. Insofar as interprocess locking goes, this is pretty good, but there is one remaining gotchya in that it is possible to create deadlocks should the process holding the lock unexpectedly exit before releasing the lock. We can solve that with a trap:
# $1 -- filename
lock() {
while ! ( set -o noclobber; echo "$$" > "${1}.lock" ) 2>/dev/null
do
inotifywait -e delete_self "${1}.lock" &> /dev/null
done
# We now hold the lock
trap "rm -f ${1}.lock" EXIT
}
# $1 -- filename
unlock() {
rm -f "${1}.lock"
}
|
The downside of this, of course, is that it rather rudely overrides any existing trap which is set for SIGEXIT. Is it possible to save the previous trap and restore it later? I leave this as an exercise for the reader.
Putting all of this together, then, we can implement a fairly nice queue interface for Bash:
# Lock the queue to prevent access from another process
# $1 -- queue file
queue_lock() {
while ! ( set -o noclobber; echo "$$" > "${1}.lock" ) 2>/dev/null
do
inotifywait -e delete_self "${1}.lock" &> /dev/null
done
}
# Unlock the queue to enable writing again
# $1 -- queue file
queue_unlock() {
rm -f "${1}.lock"
}
# Wait on the queue being modified
# $1 -- queue file
queue_wait() {
test -f "$1" && inotifywait -e modify "${1}" &>/dev/null
}
# Add an entry to the queue
# $1 -- queue file
# $2 -- entry
queue_push() {
echo "$2" >> "$1"
}
# Remove an entry from the queue
# $1 -- queue file
queue_pop() {
local e=$(head -n1 "$1")
awk 'NR != 1 { print }' $1 > ${1}.new && mv ${1}.new ${1}
echo "$e"
}
# Get current length of the queue
# $1 -- queue file
queue_length() {
test -f "$1" || echo 0
wc -l $1 | cut -d " " -f1
}
|
This test script demonstrates the queue in action:
#!/bin/bash
QUEUE=/tmp/queue
# include the queue interface
. $(dirname $0)/libqueue.sh
#
# Some wrapper functions for queue addition/removal
#
# $1 -- n items
add_to_queue() {
local i=0
for ((i=0;i<$1;i++))
do
queue_lock $QUEUE
queue_push $QUEUE "Item $i"
queue_unlock $QUEUE
echo "> pushed : Item $i"
done
}
# $1 -- n items
remove_from_queue() {
local i=0
local err=
local ent=
for ((i=0;i<$1;i++))
do
test 0 -eq $(queue_length $QUEUE) && queue_wait $QUEUE
queue_lock $QUEUE
ent="$(queue_pop $QUEUE)"
queue_unlock $QUEUE
test "$ent" = "Item $i" && err="" || err="ERROR"
echo "< popped : $ent $err"
done
}
# Spawn subshell processes to add to/remove from the queue
( add_to_queue 1000 ) &
( remove_from_queue 1000 ) &
wait $!
|
Any there you go! Multiprocess, block-free queues in Bash. Huzzah!