SQL Zone is brought to you in partnership with:

Dr. Michael Stonebraker, Adjunct Professor of Computer Science at MIT, is co-founder and CTO of VoltDB, Inc. Previously, Dr. Stonebraker was the main architect behind the Ingres and PostgreSQL databases, and was co-founder of Vertica Systems and Streambase Systems. He is a renowned researcher and prolific publisher on database systems topics. Mike is a DZone MVB and is not an employee of DZone and has posted 18 posts at DZone. You can read more from them at their website. View Full User Profile

VoltDB Beyond Multi-Partition Transactions: Run Everywhere

03.29.2014
| 4030 views |
  • submit to reddit

VoltDB supports single-partition transactions, which as the name implies, run on only one partition. VoltDB can run as many single partition transactions in parallel as there are unique partitions. This enables high transaction throughput as well as high transaction concurrency execution.

VoltDB also supports multi-partition transactions, which are transactions that involve all unique partitions within the database. When a multi-partition transaction is executing, it is the only transaction running in the database at that point in time. Thus, you can run fewer multi-part transactions as compared to single-part transactions

But what if you wanted to run an operation on all data (each partition), but you didn’t want to execute the operation as a multi-part transaction? For example, you may want to run a periodic operation across all the data to locate one or more particular records, as a scatter/gather type of operation. Or perhaps you want to age out or delete rows, or perhaps timeout session data. These types of operations likely don’t have to be transactionally consistent across the data set and thus do not need to be run as a multi-partition transaction.

Here’s how to do that.

In this example, we’ll modify the Voter sample application to periodically execute a stored procedure on each partition. This transaction will return the count of the number of voters in each partition.

Here’s the DDL to define the “Everywhere” stored procedure:

CREATE PROCEDURE FROM CLASS voter.procedures.Everywhere;
PARTITION PROCEDURE Everywhere ON TABLE votes COLUMN phone_number;

The Everywhere Java stored procedure is defined as follows:

package voter.procedures;

import org.voltdb.SQLStmt;
import org.voltdb.VoltProcedure;
import org.voltdb.VoltTable;

public class Everywhere extends VoltProcedure {

// Count the phone numbers who voted in this partition
public final SQLStmt count_votes = new SQLStmt(
        "SELECT count(*) FROM v_votes_by_phone_number;");

public VoltTable[] run(long id) {
    // Count the phone numbers who voted.
    voltQueueSQL(count_votes);
    return voltExecuteSQL(true);
  }
}

The client code uses a VoltDB System Procedure, @GetPartitionKeys, to retrieve a list of partition values, one for every partition in the database. Client programs can use this list of partition key values to invoke a single-part stored procedure on each partition individual, as shown in this client-side code:

/**
 * Run the a query (or stored procedure) on every partition, as a s
 * ingle-partition transaction.
 *
 * @throws Exception if anything unexpected happens.
 */
public void run_everywhere() throws Exception {
    // Get the partition key for each partition from the database.
    // Once we have partition ids for all partitions we'll loop through
    // and invoke our query or transaction as a single-part transaction.

    VoltTable results[] = client.callProcedure("@GetPartitionKeys", "INTEGER")
                              .getResults();
    VoltTable keys = results[0];
    for (int k = 0;k < keys.getRowCount(); k++) {
        long key = keys.fetchRow(k).getLong(1);
        VoltTable voter_count_table = client.callProcedure("Everywhere", key)
                                        .getResults()[0];
        System.out.println("Partition " + key + " row count = " + 
                              voter_count_table.fetchRow(0).getLong(0));
    }
}

When this code is added to the Voter application and run periodically via the statistics thread on a single-node cluster with 6 sites per host, the output looks similar to the following:

Running benchmark...
00:00:05 Throughput 137143/s, Aborts/Failures 0/0
Partition 2 row count = 185484
Partition 1 row count = 186041
Partition 0 row count = 186821
Partition 3 row count = 187131
Partition 5 row count = 186745
Partition 4 row count = 187408
00:00:10 Throughput 125519/s, Aborts/Failures 0/0
Partition 2 row count = 288592
Partition 1 row count = 289928
Partition 0 row count = 290540
Partition 3 row count = 290900
Partition 5 row count = 290025
Partition 4 row count = 290783


Published at DZone with permission of Mike Stonebraker, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)