Ilya Sterin is a software engineer with Nextrials, a clinical trials data management software company and also consults for a variety of startups, specifically dealing with scalable and distributed system architectures. Ilya’s also a book and blog author and avid user and contributor to open source software. When not hacking on yet another software project, Ilya enjoys spending his time coaching his 10 year-old son’s ever growing amount of sports teams. Ilya is a DZone MVB and is not an employee of DZone and has posted 5 posts at DZone. View Full User Profile

Scala Journey: Idioms, Concurrency and Other Rants

02.09.2010
| 4926 views |
  • submit to reddit

I’ve been following Scala off and on for about two years now. Mostly in spurts. I liked the language, but due to the workload and other priorities I never had the time to take it for a full ride. Well, over the last 2 weeks, I decided to take the full plunge. I’m taking a highly concurrent production application, where power is a very critical component of our application, and rewriting it in Scala. I’m doing this for more than just fun.

This application has grown from a very cleanly architected one, to one that is still rather nicely designed, but has accumulated a lot of technical debt. With everything I’ve learned about Scala, I think I can redesign it to be cleaner, more concise, and probably more scalable. The other big driving reason I’m looking to give Scala a shot, is due to its Actor based concurrency. I’ve worked with Java’s threading primitives for many years and have accumulated a love/hate relationship. The JSE 5 concurrency package brought some nice gems to my world, but it didn’t eliminate the fact that you’re still programming to the imperative model of shared state synchronization. Scala actors hide some of the ugliness of thread synchronization, though don’t eliminate the issue completely. Due to the nature of Scala, being a mix between imperative and functional language and the fact that actors are implemented as a library, nothing stops one from running into same issues as in more primitive thread state-sharing operations (i.e. race conditions, lock contentions, deadlocks/livelocks).

Basically, if you’re using actors as just an abstraction layer over old practices, you’ll be in the same boat as you started with Java. With all of that said, unlike Java, Scala provides you the facilities for designing cleaner and more thread safe systems due to its functional programming facilities. Mutable shared state is the leading cause of non-determinism in Java concurrent applications, so immutability and message passing is a way into a more deterministic world.

I’ve also looked at other concurrent programming models, like STM/MVCC. STM is the basis of concurrent programming in Clojure and it’s a different paradigm than Actors. STM is a simpler model if you’re used to programming the old imperative threading, as they abstract you from concurrency primitives by forcing state modifications to occur in a transactional context. When this occurs, the STM system takes care of ensuring the state modifications occur atomically and in isolation. In my opinion this system suites the multi-core paradigm very well and allows smoother transition, the problem with it, at least in the context of MVCC, is that for each transaction and data structure being modified, a copy is made for the purposes of isolation (implementation of copying is system dependent, some might be more efficient than others), but you can already see an issue. For a system that has to handle numerous concurrent transactions involving many objects, this can become a bottleneck and the creation of copies can overburden system’s memory and performance. There are some debates about that in the STM world, mostly involving finding the sweet spot for such systems, where the cost of MVCC is less relevant the the cost of constant synchronization through locking.

The actors model is different, it works in terms of isolated objects (actors), all working in isolation by message passing. None can modify or query the state of another, short of requesting such an operation by sending a message to that particular object (actor). In Scala, you can break that model, as you can send around mutable objects, but if you are to really benefit from the Actor model, one should probably avoid doing that. Actors lend themselves better to concurrent applications, that not only span multiple-cores, but also can easily be scaled to multiple physical nodes. Because messages being passed are immutable data structures that can be easily synchronized and shared, the underlying Actor system can share these message across physically dispersed actors just as it can for the actors within the same physical memory space.

So the world of concurrency is getting more exciting with these awesome paradigms. One thing to remember is that there is no one size fits all concurrency model and I don’t see any one of the above becoming the de-facto standard any time soon. There is a sweet spot for each, so one should learn the ins and outs of each model.

Now that I got the concurrency out of the way, let’s get back to the actual syntax of Scala. Scala is very powerful (at least compared to Java). This power comes with responsibility. You can use Scala to write beautiful/concise programs, or you can use it to write obscure/illegible programs that no one, including the original author, will be able to comprehend. Personally, I prefer and can responsible handle this responsibility. I’m a long time Perl programmer (way before I started programming Java), and I’ve seen (and even written at times), programs that Larry Wall himself wouldn’t be able to comprehend.

Scala comes with operator overloading, but when not judiciously used, that power alone can be responsible for ineligibility of any system. This is one of the major reasons why languages like Java decided to not include it. Personally, I think operator overloading can be a beautiful addition to any API. It can make writing DSLs easier and using them more natural. Again, this power is great in the use of experienced and responsible programmers.

After having experience great power (Perl) and great restraint (Java), I’m leaning more towards power (who wouldn’t :-). One one hand, it’s nice to be able to read and comprehend anyone’s Java program, even when it’s not nicely written, on the other hand, it’s a pain trying to write a program and jumping through all the hoops and limitations because of the various constraints. In a perfect AI world, the compiler would infer the capabilities of the programmer and restrict its facilities based on those, in some way as to not offend anyone:-) So if a novice is inferred, ah, there goes the operator overloading and implicit conversions, etc… But for now, I’d rather have a powerful tool to use when I write software and Scala seems to push the right buttons for me at this point.

I’m going to start of a list of posts, starting with this one, about my experiences with Scala.

Here is a little something I came up with a few hours ago. Our software has some limited interoperability with a SQL database and requires a light abstraction. We chose not to use any 3rd party ORM or SQL abstraction, mostly due to the fact that the dependency on these abstractions don’t really provide any benefit for our limited use of SQL. So I developed a simple SQL variant abstraction layer, which allows us to execute SQL queries which are defined in the SQLVariant implementation. Moving from one database to another, just requires one to implement a SQLVariant interface to provide the proper abstraction. I initially wrote this in java and although it was decent, it required quite a bit more code and didn’t look as concise as I wanted it. One issue was PreparedStatement and it’s interface for placeholder bindings. How would one bind java’s primitive and wrapper types as placeholders and how would the SQLVariant know which PreparedStatement.bind* method to call? I resorted to using an enumeration which defines these operations and reflection for the purpose of invoking these operations. I’m basically sidestepping static typing in a place I’m not sure I really want or have to. Here is the java implementation.

I got rid of a few methods, specifically dealing with resultset, statement, and connection cleanup, as they don’t really emphasize my point here.

 

  import java.lang.reflect.Method;
import java.sql.*;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public abstract class SqlVariant {

public abstract SqlSelectStatement getResultsNotYetNotifiedForStatement(NotificationType... types);

public abstract SqlSelectStatement getResultsNotYetNotifiedForStatement(int limit, NotificationType... types);

public abstract SqlUpdateStatement getUpdateWithNotificationsForStatement(Result result);

private abstract static class SqlStatement {

protected String sql;
protected List bindParams = new ArrayList();
protected PreparedStatement stmt;

public SqlStatement(String sql) {
this.sql = sql;
}

public SqlStatement addBindParam(BindParameter param) {
bindParams.add(param);
return this;
}

public String getSql() {
return sql;
}

public List getBindParams() {
return Collections.unmodifiableList(bindParams);
}

protected PreparedStatement prepareStatement(Connection conn) throws SQLException {
PreparedStatement stmt = conn.prepareStatement(sql);
for (int bindIdx = 0; bindIdx < bindParams.size(); bindIdx++) {
BindParameter p = bindParams.get(bindIdx);
try {
Method m = stmt.getClass().getMethod(p.type.method, Integer.TYPE, p.type.clazz);
m.invoke(stmt, bindIdx + 1, p.value);
}
catch (Exception e) {
throw new RuntimeException("Couldn't execute method: " + p.type.method + " on " + stmt.getClass(), e);
}
}
return stmt;
}

public abstract T execute(Connection conn) throws SQLException;
}

public static final class SqlSelectStatement extends SqlStatement<ResultSet> {

public SqlSelectStatement(String sql) {
super(sql);
}

@Override
public ResultSet execute(Connection conn) throws SQLException {
return prepareStatement(conn).executeQuery();
}
}

public static final class SqlUpdateStatement extends SqlStatement<Boolean> {
public SqlUpdateStatement(String sql) {
super(sql);
}

@Override
public Boolean execute(Connection conn) throws SQLException {
stmt = prepareStatement(conn);
return stmt.execute();
}
}


public static final class BindParameter<T> {
private final BindParameterType type;
private final T value;

public BindParameter(Class<T> type, T value) {
this.type = BindParameterType.getTypeFor(type);
this.value = value;
}

public BindParameter(BindParameterType type, T value) {
this.type = type;
this.value = value;
}
}

private static enum BindParameterType {
STRING(String.class, "setString"),
INT(Integer.TYPE, "setInt"),
LONG(Long.TYPE, "setLong");

private Class clazz;
private String method;

private BindParameterType(Class clazz, String method) {
this.clazz = clazz;
this.method = method;
}

private static BindParameterType getTypeFor(Class clazz) {
for (BindParameterType t : BindParameterType.values()) {
if (t.clazz.equals(clazz)) {
return t;
}
}
throw new IllegalArgumentException("Type: " + clazz.getClass() + " is not defined as a BindParameterType enum.");
}
}
}

Now, here is how one would implement the SQLVariant interface. The below implementation is in groovy. I choose groovy when I have to do lots of string interpolation, which somehow java and scala refuse to support. The code was shortened to just demonstrate the bare minimum.

  class MySqlVariant extends SqlVariant {
@Override
public SqlVariant.SqlSelectStatement getResultsNotYetNotifiedForStatement(int limit, NotificationType[] types) {
SqlVariant.SqlSelectStatement stmt = new SqlVariant.SqlSelectStatement("SELECT ...")
for (NotificationType t : types)
stmt.addBindParam(new SqlVariant.BindParameter(String.class, t.name().toUpperCase()))
return stmt;
}

@Override
public SqlVariant.SqlUpdateStatement getUpdateWithNotificationsForStatement(Result result) {
SqlVariant.SqlUpdateStatement stmt = new SqlVariant.SqlUpdateStatement("INSERT INTO ....")
result.notifications?.each { Notification n ->
stmt.addBindParam(new SqlVariant.BindParameter(SqlVariant.BindParameterType.LONG, n.id))
stmt.addBindParam(new SqlVariant.BindParameter(SqlVariant.BindParameterType.LONG, result.intervalId))
}
return stmt
}
......
}

I started reimplementing the above in Scala and I ran across a very powerful and beautiful Scala implicit conversion feature. This allowed me to truly abstract the SQLVariant implementations from any bindings specific knowledge, through an implicit casting facility that normally only dynamically typed languages provide. Scala gives us this ability, but also ensures static type safety of implicit conversions during compilation.

Another wonderful feature, is lazy vals, which allows us to cleanly implement lazy evaluation that we (java programmers) are so used to doing by instantiating a member field as null and then checking it before initializing on the initial accessor call. If you’ve seen code similar to below a lot, you’ll rejoice to find out that you no longer have to do this in Scala.

public class SomeClass {
private SomeType type;

public SomeType getSomeType() {
if (type == null) type = new SomeType(); // Often more complex than that
return type;
}
}

The above, besides not being ideal, is also error prone if say a type is used anywhere else in SomeClass and you don’t use the accessor method to retrieve it. You must ensure the use of accessor through convention or deal with the fact that it could be non-instantiated. This is no longer the case in Scala as its runtime handles lazy instantiation for you. See below code.

Note: I still allow the client data access abstractions to work with a raw jdbc ResultSet returned from the SQLVariant. I don’t see this as an issue at this point, first since these abstractions are SQL specific and also because ResultSet is a standard interface for any JDBC SQL interaction. Here is my concise Scala implementation. I’m still learning, so this might change as I get more familiar with Scala idioms and start writing more idiomatic Scala code.

import javax.sql.DataSource
import java.sql.{ResultSet, Connection, PreparedStatement}
import com.bazusports.chipreader.sql.SqlVariant.{SqlSelectStatement, BindingValue}

abstract class SqlVariant(private val ds: DataSource) {

def retrieveConfigurationStatementFor(eventTag: String): SqlSelectStatement;

protected final def connection: Connection = ds.getConnection
}

object SqlVariant {

trait BindingValue {def >>(stmt: PreparedStatement, idx: Int): Unit}

// This is how implicit bindings happen. This is beauty, we can now
// bind standard types and have the compiler perform implicit conversions
implicit final def bindingIntWrapper(v: Int) = new BindingValue {
def >>(stmt: PreparedStatement, idx: Int) = {stmt.setInt(idx, v)}
}

implicit final def bindingLongWrapper(v: Long) = new BindingValue {
def >>(stmt: PreparedStatement, idx: Int) {stmt.setLong(idx, v)}
}

implicit final def bindingStringWrapper(v: String) = new BindingValue {
def >>(stmt: PreparedStatement, idx: Int) {stmt.setString(idx, v)}
}

abstract class SqlStatement[T](conn: Connection, sql: String, params: BindingValue*) {

// Ah, another beautiful feature, lazy vals. Basically, it's
// evaluated on initial call. This is great for the
// so common lazy memoization technique, of checking for null.
protected lazy val statement: PreparedStatement = {
val stmt:PreparedStatement = conn.prepareStatement(sql)
params.foreach((v) => v >> (stmt, 1))
stmt
}

def execute(): T
}

class SqlUpdateStatement(conn: Connection, sql: String, params: BindingValue*)
extends SqlStatement[Boolean](conn, sql, params: _*) {
def execute() = statement.execute()
}

class SqlSelectStatement(conn: Connection, sql: String, params: BindingValue*)
extends SqlStatement[ResultSet](conn, sql, params: _*) {
def execute() = statement.executeQuery()
}
}

/* Implementation of the SQLVariant */

class MySqlVariant(private val dataSource:DataSource) extends SqlVariant(dataSource) {

def retrieveConfigurationStatementFor(eventTag: String) =
new SqlSelectStatement(connection, "SELECT reader_config FROM event WHERE tag = ?", eventTag)
}

And the obligatory unit test using the oh-so-awesome Scala Specs framework.

object MySqlVariantSpec extends Specification {
val ds = getDataSource();

"Requesting a configuration statement for a specific event" should {
"return a SqlSelectStatement with properly bound parameters" in {
val sqlVariant:SqlVariant = new MySqlVariant(ds)
val stmt:SqlSelectStatement = sqlVariant.retrieveConfigurationStatementFor("abc")
stmt must notBeNull
// .... Other assertions go here
}
}
}

 

Although I barely scraped the tip of the iceberg, I hope this helps you see some of what Scala has to offer. More to come as I progress.

You an see more blog posts/information at Ilya's Blog

Published at DZone with permission of Ilya Sterin, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: