Big Data/Analytics Zone is brought to you in partnership with:

Jim King is the visionary expert in the field of BI, database, Java development and programming. He has more than 10 years of experience in BI and development of Java applications, like esProc and esCalc. He is passionate about applying simpler methods to solve the complex problems of users. He is also an active blogger and has released a serial of original ideas on BI and Java development, such as BI, database application development, report development with help of Java, SQL, etc. Jim has posted 23 posts at DZone. You can read more from them at their website. View Full User Profile

How to Improve Java’s Computing Ability for Various Data Sources?

  • submit to reddit

Many Java applications are not incorporated with database. So, what if using such Java applications for query or structured data computing? For example, according to an Excel sheet downloaded from a finance website, find the shares rising for N consecutive days in a certain period.

For the computation on structured data, programmers usually embed the SQL statements in the Java code, and access the database server via JDBC. Although SQL statements are embedded with lots of structured-data-specific algorithms, Java lacks the advanced functions to implement these operations directly and straightforwardly. Therefore, without database, it is quite hard to implement such computation with the language capability of Java only.

It takes programmers a great amount of time and effort to implement every detail in the computation manually. Except the sorting algorithm, almost all algorithms for massive data computing require manual implementations, for example, aggregating, filtering, and grouping. For another example, to define the class and represent every piece of data with object, use List to store multiple pieces of data, and then compute through the nested multi-level loops. The computations of such kinds usually also involve the operations on sets and relations among massive data, or the computations on the relative positions between objects or object properties. It is quite cumbersome to implement these underlying logics.

Embedding a database and then performing ETL is obviously an awkward method. Is there any more agile and convenient method?

In this case, esProc is the best choice. It is a professional database computing and development tool.

esProc is good at simplifying the complex computation, and allows for Java application to access the result from esProc via JDBC. The esProc solution to this case is given below:

esProc can directly retrieve data from and compute on multiple databases\txt files\Excel sheets. esProc offers a grid style and agile syntax specially tailored for massive structured data computation. With the support for external parameters, the result can be exported via JDBC, and invoked by Java language and reporting tools. So, esProc can boost the Java computational capability dramatically. In addition, it enables the cross-database computation and supports code reuse by nature. Even the debug functionality is also quite perfect. Considering all these advantages, it is clear that esProc is more efficient than SQL.

Published at DZone with permission of its author, Jim King.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)