Big Data/Analytics Zone is brought to you in partnership with:

Jim King is the visionary expert in the field of BI, database, Java development and programming. He has more than 10 years of experience in BI and development of Java applications, like esProc and esCalc. He is passionate about applying simpler methods to solve the complex problems of users. He is also an active blogger and has released a serial of original ideas on BI and Java development, such as BI, database application development, report development with help of Java, SQL, etc. Jim has posted 23 posts at DZone. You can read more from them at their website. View Full User Profile

How to Leverage Big Data like Google?

07.23.2013
| 919 views |
  • submit to reddit

Recently, I read Why Big Data Projects Fail by Stephen Brobst at: http://data-informed.com/why-big-data-projects-fail. I can’t agree more with his opinions which exposed the problem I’ve been worried about. In this article, I am going to further discuss this topic to remind the enterprises to beware of falling into such pitfall of failure.

Let’s have a look on a positive example. As a successful enterprise in leveraging big data, how does Google make use of the big data?

1.  Collect the row data, capture the contents of each website, e-mail, or Cookie, and extract the key information.

2.  Create the complex syndetic indexfor this information. Needless to say, the advertisement-related index must be also created.

3.  Store these indices and corresponding contents in the distributed servers.

4.  When users are browsing website and searching or viewing e-mails, Google will arrange their requests to go through a complex translation procedure, and several index entries will be located accordingly.

5.  Retrieve data from server according to the index, and return the search result or advertisement.

Of all those above-mentioned contents, what contents are related to Hadoop architecture? They are the No. 3 and the No. 5 items. That is, data storing and data retrieving.

Can the No.3 and the No. 5 items be implemented easily? Yes. The alike Hadoop solution is of good expandability and low purchase cost.

Can I operate like Google once implemented the No.3 and No.5 items? No, you can’t because you have not implemented the key items of No.2 and No.4 yet.

What are the items of No.2 and No.4? They are business analysis algorithm. This is the algorithm designed by business experts meticulouslyon the basis of data, business knowledge, and market trends, as a core competency and business decision making procedure for many enterprises. This is the “Value” component of the 4V Theory.

Why big data will fall into the pitfall of failure? It is because the current big data only provides the solution for data storage and query. It lacks a good solution for business analysis to enhance the competiveness, which is the most crucial. There is a great gap in-between. In facts, the current big data is the tool for IT experts. They are able to implement the MapReduce functions with C++ or Java, but unable to reach the ultimate goal – provide the valuable business algorithms.

To avoid the pitfall of failure, enterprises must use the advanced analysis tool that is business-expert-oriented, regardless of user’s technical background, and capable to convert the business logics to the business algorithm rapidly, intuitively, and conveniently. How about NoSQL or SQL? Neither of them is ideal. They are for the IT personnel only, owing to their requirements on the strong technical background, complex operations, and comparatively weak computation capability.

What are the ideal tools for business experts? From the TCO perspective, I would rather choose the lightweight R language and esProc Desktop than pin my hopes on the heavyweight Teradata Aster and SAP Visual Intelligence. Especially esProc, this business computation desktop tool is designed for business experts, as its syntax is easy to use and understand with lower technical requirements. The scripts are aligned automatically, allowing users to observe the results of each step clearly and visually. The results can be referenced directly through the names of the cells, enabling users to compute freely according to business logic.

Published at DZone with permission of its author, Jim King.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)