There can be only one! Referring to the one process doing a particular job at a time, of course.
When writing cron jobs that perform particular tasks in regular intervals, assumptions like "the run time will always be much shorter than the run interval" can be very dangerous in an environment that is scaling up in size rapidly. What runs ten minutes out of an hour now can easily take longer to run than an hour a few months from now if your business scales up quickly. Such assumptions can lead to anything from deadlocks to servers dying from load and data corruption.
Thankfully, there's an almost universal answer to this problem:
Locking! Alas, locking in itself can be a very hard problem. For the
above use case, a simple flock()
-based solution commonly suffices. At
Booking.com, we wrote and use the Perl module
IPC::ConcurrencyLimit
. By default, it uses a simple, machine-local
flock()
locking back-end:
use5.14.2;usewarnings;useIPC::ConcurrencyLimit;run();exit(0);sub run{my$limit=IPC::ConcurrencyLimit->new(max_procs=>1,path=>'/var/run/myapp',);my$id=$limit->get_lock;if(not$id){warn"Another process appears to be still running. Exiting.";exit(0);}else{do_work();}# lock released with $limit going out of scope here}
The simple example assumes that the /var/run/myapp
directory
is writable by the current user. It shows the basic usage for a
case such as the above, assuming that warnings are appropriately logged.
Other situations require a different setup. For example, the max_procs
parameter could be set much higher for allowing parallel execution, but
setting a limit on the concurrency to avoid entirely overwhelming the
system. On the other hand, a distributed system might require that a
particular task is only performed once at a time -- globally across
many machines. In this case, the machine-local flock()
locking
back-end is not sufficient.
Luckily, the locking back-ends are pluggable. CPAN sports lock
implementations that support locking via NFS shares, locking via
MySQL GET_LOCK()
or locking via an HA pair of Redis servers.
An experimental implementation for using
Apache ZooKeeper for locking can be found on Github.
When choosing a locking strategy, keep in mind that in locking problems,
there is no silver bullet and there will always be trade-offs involved.
The MySQL locking back-end can be used for cross-machine locking as follows:
use5.14.2;useDBI;useDBD::mysql;useIPC::ConcurrencyLimit;useIPC::ConcurrencyLimit::Lock::MySQL;my$limit=IPC::ConcurrencyLimit->new(type=>'MySQL',max_procs=>1,timeout=>2,make_new_dbh=>sub {DBI->connect("DBI:mysql:database=$database;...")},);# as before:my$id=$limit->get_lock;if(not$id){# fail}else{# success, do work}
We used the type
parameter to switch the locking strategy, specified
that we want only one process of the given type to run anywhere using
max_procs
, and set the lock-acquisition time-out to two seconds (the
time-out is not specific to the locking back-end). Finally, we provide
the MySQL client with a way to create connections for the locks. Only a
single GET_LOCK()
lock can be held by any MySQL connection, so it may
be important to separate this connection from other uses of the
database[1]. MySQL's GET_LOCK()
semantics are not very friendly.
The API for lock implementations is really rather simple. If your favourite strategy is not yet available as a back-end, then consider implementing it for others to use!
Next week, we'll consider advanced use cases of this tool which focus on implementing daemon-like functionality without all the drawbacks.
[1] Yes, this may also qualify as abuse.