Big Data/Analytics Zone is brought to you in partnership with:

Jim King is the visionary expert in the field of BI, database, Java development and programming. He has more than 10 years of experience in BI and development of Java applications, like esProc and esCalc. He is passionate about applying simpler methods to solve the complex problems of users. He is also an active blogger and has released a serial of original ideas on BI and Java development, such as BI, database application development, report development with help of Java, SQL, etc. Jim has posted 23 posts at DZone. You can read more from them at their website. View Full User Profile

Drive Java for Structured Data Computing with Grid style and Agile Syntax

06.20.2013
| 1179 views |
  • submit to reddit

Java doesn't have any competitive advantages in data computing, in particular the massive structured data computing. For example, according to the order detail computation, we need to find out the sales persons whose sales growths are over 10% in 3 consecutive months.

Java doesn't have the related advanced function to implement this. So, it’s hard for Java to handle such computation only with its own capability. Java needs a large amount of time and effort to manually realize the details in computation. For example, firstly, define classes and represent every piece of data with objects; secondly, use List to store multi-pieces of data; thirdly, use the nested multi-level loops to compute. Except the sorting algorithm, almost all massive data processing algorithms involved in the computation require manual implementation, such as aggregating, filtering, and grouping. Such computations usually involve the set computation and relation computation among massive data, or computation on relative positions between objects and object attributes. It takes great efforts to implement the underlying logics for these computations.

That’s why we must improve the Java computational capability. We need a tool tailored for implementing the structured data computation easily!

How about SQL? Not all Java application allows for using database. In addition, there are many data in Txt/Excel, and sometimes, problems of computation across databases and code reuse may be encountered. Moreover, SQL is still not convenient for handling many computations. Taking the above-mentioned computation for example, SQL is by no means convenient to compose:

01 WITH A AS

02  (SELECT salesMan,month, amount/lag(amount) 

03  OVER(PARTITION BY salesMan ORDER BY month)-1 rising_range 

04  FROM sales), 

05  B AS

06  (SELECT salesMan, 

07  CASE WHEN rising_range>=1.1 AND

08  lag(rising_range) OVER(PARTITION BY salesMan

09  ORDER BY month)>=1.1 AND

10  lag(rising_range,2) OVER(PARTITION BY salesMan

11  ORDER BY month)>=1.1 

12  THEN 1 ELSE 0 END is_three_consecutive_month 

13  FROM A) 

14 SELECT DISTINCT salesMan FROM B WHERE is_three_consecutive_month=1

In this case, esProc is the better choice.

esProc is a development tool for database computing, specializing in simplifying the complex computation and is quite convenient to integrate with Java. For esProc, the corresponding scripts are shown below:

esProc allows for the direct retrieval and computation across multiple databases, text files, and Excel sheets. Its grid style and agile syntax are especially designed for the massive structured data computation. It supports external parameters, and the result can be exported directly via JDBC. So, with esProc, the computational capability of Java is dramatically improved. In addition, by nature, esProc supports cross-database computation and the code reuse, with very perfect debugging functions. No wonder that the development productivity of esProc is also superior to that of SQL.


Published at DZone with permission of its author, Jim King.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)