SQL Zone is brought to you in partnership with:

I’m a software engineer. I got my first computer when I was 7 and have loved programming ever since. I’ve been developing corporate web applications since 1999, mostly using Java based technologies, but I like to try and explore a little bit of everything interesting going around the web/mobile technology scene. I like hard rock, RPG video games (all flavors), science fiction and fantasy books, and almost all movies (unless they star Sandra Bullock or Hugh Grant). Ricardo is a DZone MVB and is not an employee of DZone and has posted 16 posts at DZone. You can read more from them at their website. View Full User Profile

Using a Relational DBMS as a Multi Server Concurrency Control

03.28.2012
| 2859 views |
  • submit to reddit

Sometimes you just don’t want or need the complexity of a fancy distributed transaction manager or complicated RPCs to coordinate a concurrent job amongst a farm of servers.

Here is a very simple and efficient way to achieve a lock/release type of concurrency control in a farm-like architecture using nothing but a few SQL sentences and your relational database.

Imagine the following scenario: A web app that runs on multiple servers/instances where users activity is balanced. All servers are clones, and you want to keep it that way so adding/removing servers from the farm remains easy and safe; and you need to run some background scheduled job, not triggered by user activities.

To keep things simple (operations wise), you want to keep those batch jobs inside your app and you don’t want/can’t afford to integrate a fancy multi-server transaction manager.

For example, you want to implement a background job that periodically checks the price of stocks and sends an email to all subscribed users. Since all the servers in your farm are clones, this job will be running on all of them, and that’s a good thing, because if one fails any of the other can pick it up. But on the other hand you don’t want to send 25 emails to each user every time, if you have 25 servers in your farm.

So our goal is to ensure only one server performs the job, but if it doesn’t, any other can realize it and do it himself.

The solution is really simple, we define a table in our database with the following structure:

CREATE TABLE job_status (
    job_id  varchar(254) NOT NULL,
    running_at  varchar(254) NULL,
    last_run_started datetime NULL,
    next_run datetime NULL,
    PRIMARY KEY(job_id)
)

And pre-insert one row for each job (only one in our example, but it’s nice to have support for several from scratch, right?) with only the job_id field set (all other fields can start as NULL).

Assuming we code our batch job as an endless loop with a pause between runs, when each cycle starts we should execute the following statement:

UPDATE job_status
SET running_at=<NODE_IP_OR_ADDRESS>,
    last_run=<CURRENT_DATETIME>,
    next_run=<CALCULATED_DATETIME_NEXT_RUN>
WHERE job_id=<JOB_IDENTIFICATION> AND
      (
        (running_at IS NULL AND next_run <= <CURRENT_DATETTIME>)
        OR
        (running_at IS NOT NULL AND last_run_started <= <CALCULATED_TIMEOUT>)
      )

Where:

  • <NODE_IP_OR_ADDRESS>: Identification of the node in the farm where the job is running
  • <CURRENT_DATETIME>: Current system date and time
  • <CALCULATED_DATETIME_NEXT_RUN>: The date and time when the job should run again assuming this run finishes correctly.
  • <JOB_IDENTIFICATION>: A string that uniquely indentifies the job
  • <CALCULATED_TIMEOUT>: A calculated date and time (usually the current system time minus a pre-defined value) upon which the job is considered “dead” and should be picked up by another node.

In most relational DBMS you get the number of updated rows for each UPDATE command as a result of the UPDATE itself (or you can embed a second query in the same database transaction that returns it). If the number of updated rows was 1, the executing node should go ahead and execute the job now, if the number was 0 it should go back to sleep until the next iteration.

The UPDATE “locks” the job (setting the running_at field to the executing node address) only if the job was due to be executed and no other node had started it, or if the job was in fact being executed by another node, but so long ago it was considered timed out. Combine this with the ACID guarantees of a RDBMS and we got ourselves a multi-node semaphore.

To wrap things up, if the current node actually got to execute the job, it should finish the iteration by releasing the lock with:

UPDATE job_status SET running_at=NULL
WHERE job_id=<JOB_IDENTIFICATION>

Nice, isn’t it? No messy RPC, no fancy transaction manager and our app has scheduled jobs with farm support.

 

 

 

 

Published at DZone with permission of Ricardo Zuasti, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)