Enterprise Integration Zone is brought to you in partnership with:

I am the founder and CEO of Data Geekery GmbH, located in Zurich, Switzerland. With our company, we have been selling database products and services around Java and SQL since 2013. Ever since my Master's studies at EPFL in 2006, I have been fascinated by the interaction of Java and SQL. Most of this experience I have obtained in the Swiss E-Banking field through various variants (JDBC, Hibernate, mostly with Oracle). I am happy to share this knowledge at various conferences, JUGs, in-house presentations and on our blog. Lukas is a DZone MVB and is not an employee of DZone and has posted 255 posts at DZone. You can read more from them at their website. View Full User Profile

LINQ and Java

07.02.2013
| 14588 views |
  • submit to reddit

LINQ has been quite a successful, but also controversial addition to the.NET ecosystem. Many people are looking for a comparable solution in the Java world. To better understand what a comparable solution could be, let’s have a look at the main problem that LINQ solves:

Query languages are often declarative programming languages with many keywords. They offer few control-flow elements, yet they are highly descriptive. The most popular query language is SQL, the ISO/IEC standardised Structured Query Language, mostly used for relational databases.

Declarative programming means that programmers do not explicitly phrase out their algorithms. Instead, they describe the result they would like to obtain, leaving algorithmic calculus to their implementing systems. Some databases have become very good at interpreting large SQL statements, applying SQL language transformation rules based on language syntax and metadata. An interesting read is Tom Kyte’s metadata matters, hinting at the incredible effort that has been put into Oracle’s Cost-Based Optimiser. Similar papers can be found for SQL Server, DB2 and other leading RDBMS.

LINQ-to-SQL is not SQL

LINQ is an entirely different query language that allows to embed declarative programming aspects into .NET languages, such as C#, or ASP. The nice part of LINQ is the fact that a C# compiler can compile something that looks like SQL in the middle of C# statements. In a way, LINQ is to .NET what SQL is to PL/SQL, pgplsql or what jOOQ is to Java (see my previous article about PL/Java). But unlike PL/SQL, which embeds the actual SQL language, LINQ-to-SQL does not aim for modelling SQL itself within .NET. It is a higher-level abstraction that keeps an open door for attempting to unify querying against various heterogeneous data stores in a single language. This unification will create a similar impedance mismatch as ORM did before, maybe an even bigger one. While similar languages can be transformed into each other to a certain extent, it can become quite difficult for an advanced SQL developer to predict what actual SQL code will be generated from even very simple LINQ statements.

LINQ Examples

This gets more clear when looking at some examples given by the LINQ-to-SQL documentation. For example the Count() aggregate function:

System.Int32 notDiscontinuedCount =
    (from prod in db.Products
    where !prod.Discontinued
    select prod)
    .Count();
 
Console.WriteLine(notDiscontinuedCount);

In the above example, it is not immediately clear if the .Count() function is transformed into a SQL count(*) aggregate function within the parenthesised query (then why not put it into the projection?), or if it will be applied only after executing the query, in the application memory. The latter would be prohibitive, if a large number or records would need to be transferred from the database to memory. Depending on the transaction model, they would even need to be read-locked!

Another example is given here where grouping is explained:

var prodCountQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    where grouping.Count() >= 10
    select new
    {
        grouping.Key,
        ProductCount = grouping.Count()
    };

In this case, LINQ models its language aspects entirely different from SQL. The above LINQ where clause is obviously a SQL HAVING clause. into grouping is an alias for what will be a grouped tuple, which is quite a nice idea. This does not directly map to SQL, though, and must be used by LINQ internally, to produce typed output. What’s awesome, of course, are the statically typed projections that can be reused afterwards, directly in C#!

Let’s look at another grouping example:

var priceQuery =
    from prod in db.Products
    group prod by prod.CategoryID into grouping
    select new
    {
        grouping.Key,
        TotalPrice = grouping.Sum(p => p.UnitPrice)
    };

In this example, C#’s functional aspects are embedded into LINQ’s Sum(p => p.UnitPrice) aggregate expression. TotalPrice = ... is just simple column aliasing. The above leaves me with lots of open questions. How can I control, which parts are really going to be translated to SQL, and which parts will execute in my application, after a SQL query returns a partial result set? How can I predict whether a lambda expression is suitable for a LINQ aggregate function, and when it will cause a huge amount of data to be loaded into memory for in-memory aggregation? And also: Will the compiler warn me that it couldn’t figure out how to generate a C#/SQL algorithm mix? Or will this simply fail at runtime?

To LINQ or not to LINQ

Don’t get me wrong. Whenever I look inside the LINQ manuals for some inspiration, I have a deep urge to try it in a project. It looks awesome, and well-designed. There are also lots of interesting LINQ questions on Stack Overflow. I wouldn’t mind having LINQ in Java, but I want to remind readers that LINQ is NOT SQL. If you want to stay in control of your SQL, LINQ or LINQesque APIs may be a bad choice for two reasons:

  1. Some SQL mechanisms cannot be expressed in LINQ. Just as with JPA, you may need to resort to plain SQL.
  2. Some LINQ mechanisms cannot be expressed in SQL. Just as with JPA, you may suffer from severe performance issues, and will thus resort again to plain SQL.

Beware of the above when choosing LINQ, or a “Java implementation” thereof! You may be better off, using SQL (i.e. JDBC, jOOQ, or MyBatis) for data fetching and Java APIs (e.g. Java 8′s Stream API) for in-memory post-processing

LINQ-like libraries modelling SQL in Java, Scala

LINQ-like libraries abstracting SQL syntax and data stores in Java, Scala

Published at DZone with permission of Lukas Eder, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)