Dmitriy Setrakyan manages daily operations of GridGain Systems and brings over 12 years of experience to GridGain Systems which spans all areas of application software development from design and architecture to team management and quality assurance. His experience includes architecture and leadership in development of distributed middleware platforms, financial trading systems, CRM applications, and more. Dmitriy is a DZone MVB and is not an employee of DZone and has posted 57 posts at DZone. You can read more from them at their website. View Full User Profile

Google App Engine - Where Does It Fit?

04.17.2009
| 11570 views |
  • submit to reddit

This is yet another blog about Google App Engine (GAE) recent release of Java. Naturally, as an interested party, I rushed immediately to deploy my sample GridGain app on it, but immediately realized that you cannot deploy any sort of clustering or grid applications on the Google App Engine. This is simply not something they wish to support, happily leaving this portion of the market to Amazon EC2 customers.

A good way to think about GAE for Java is as of a Plain Vanilla Servlet Container Hosting. If you have a standard J2EE app that simply processes web requests and accesses DB data, then GAE is an ideal solution for you and will be much easier to use than Amazon EC2. But if your application needs to do anything beyond retrieving and showing DB data, then with GAE you will run into a brick wall. So, if GAE fits, if fits like a glove, but if it doesn't - then there is nothing you can do to help it.

Here is the list of some limitations you may wish to consider when starting with GAE:

1. You have no control over number of deployment instances.
This may be a non-issue for trivial use cases, but you immediately hit a wall if your application requires any knowledge of clustering or even of threads. For example, let's say you need to generate a web report that would normally take about 1 minute to generate in a single thread, which is normally too long for a web-request (GAE will automatically time it out after 30 secs). Ideally, to make it run faster, I would split it into multiple sub-parts, run it in parallel on several nodes, aggregate the results and produce a response to user within, say, 5 seconds.

However, there is nothing you can do to speed it up on GAE. Firstly, you have no idea how many parts to split the report into, and even if you did, there is no way to tell GAE that every part should run on a separate CPU or on a separate server instance altogether. On top of that, what if 9 out of 10 sup-parts completed successfully and 1 failed? Again, you cannot instruct GAE to retry just the part that failed - the whole request will have to be retried.

2. You have no control over load balancing
Not every application is a website and not every application can be measured in number of page hits. The underlying notion here is that if you are not a website, then GAE is not for you. However even if you are a website, not being able to control load-balancing can be quite limiting. If you take my report generation example above, GAE again would not be able to load balance it at all.

3. You cannot use any of the existing clustering infrastructure you have
If you already make use of clustering within your application (like discovery of different app server nodes, exchanging messages between app servers, using existing caching or compute grid products, etc..), you cannot do it in GAE. You are limited to the set of caching and data storage libraries provided by GAE only.

4. You cannot use full set of the JDK classes or services
Google published JRE White List for a list of JDK classes supported by the App Engine. The main limitation there is lack of AWT or Swing packages. Although not often used in web apps, this may be quite limiting for apps that use AWT for purposes other than thick UI, like dynamic image generation, etc... I should mention that GAE does offer its own image manipulation service to compensate for this.

5. You cannot access or store files
The only way to access a file is to put it in WAR and then get it off the classpath. Creation of files is prohibited - you can only store data using GAE datastore services. This alone can become a show-stopper for many applications, as even if your application does not access or store any files, it may very well depend on the libraries that do.

6. You don't get any access to the box your app is deployed on
From GAE standpoint, this limitation is justified. The GAE is running multiple WARs from different users on the same app server, so by giving you direct access to the box, they will be giving you access to the code that does not belong to you. On the other hand, not being able to access the box will scare many traditional sys admins who generally like to poke around production environments, look at OS-specific logs, do tuning, and other environment and runtime geeky stuff.

I will stop here as far as listing limitations. I think it is quite clear that GAE is not targeting enterprises as their customers. On the contrary, it looks like are they willingly leaving this portion of the market to Amazon EC2 and are concentrating on simple web application hosting which is a huge market on its own. The way they are beating Amazon here is by being elegantly simpler to use and deploy - there is no image creation, nor should user be concerned about number of images that are started. GAE will automatically manage web load and add CPU's to your app as load changes.

So, if you are a small business owner and need a quick and cheap hosting solution - go with GAE!

From http://gridgain.blogspot.com/

Published at DZone with permission of Dmitriy Setrakyan, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Neeraj Vora replied on Fri, 2009/04/17 - 8:08am

GAE of course is a different league than EC2 and the purpose and architectures are different. There are limitations by design and I think you missed the point. The whole point is to strive towards giving the GAE users (developers) scalability without all the cloud mumbo-jumbo. Google is building a scalable infrastructure where the developers can play without knowing how the playground is built.

1. You have no control over number of deployment instances.
You don't have to, as long as your app functions.

2. You have no control over load balancing.
You don't have to, as long as your app scales.

3. You cannot use any of the existing clustering infrastructure you have.
If you have figured out scalability you don't need GAE.

4. You cannot use full set of the JDK classes or services.
Yes this is something under debate but it's not a straight forward black and white thing.

5. You cannot access or store files.
Again something valid

6. You don't get any access to the box your app is deployed on.
You don't need to as long as your app functions and scales.

You are right in your concluding paragraph about elegance and simplicity and I think even enterprise customers can find it appealing as long as the application domain fits into the hammer GAE is providing.

Alessandro Santini replied on Fri, 2009/04/17 - 8:22am

Barev Dmitriy,

I understand you have - being part of GridGain - a personal interest in dismissing GAE as a toy for developers and not ready for the enterprise.

The reality is that many high-performance, clustered application do not need to know that they are operating (and thus scaling) in a clustered environment. GAE is also providing a distributed cache facility (with some quota limitations) and a persistent storage using JDO (the choice of JDO still slips from my understanding).

I instead accept your point when talking about *distributed* applications by design (e.g. queue networks).

In conclusion, I personally think that not being aware of a cloud/clustered environment is an advantage rather than a defect.

Fabrizio Giudici replied on Fri, 2009/04/17 - 10:46am in response to: Alessandro Santini

Alessandro, GAE/J supports JPA too in addition to JDO (see http://groups.google.com/group/google-appengine-java/web/will-it-play-in-app-engine).

While some of the points by Dmitriy are questionable (as discussed in the first comment), I don't think he's "dismissing" GAE/J as a toy. Clearly GAE/J serves a much smaller scope than EC2 and all the other similar things. For some it could be still interesting, if they just need a simple infrastructure. For others it could be considered as a "toy" if they can't do something because of the runtime restrictions. For me it's useless because of the lack of the imaging stuff.

BTW, somebody says that Google provides an "alternate" engine for imaging - I don't think I'll ever use it (yet another!) even though all my projects rely on a meta- imaging framework that could probably adapted to GAE/J stuff. But, for the sake of curiosity, can somebody please give me a pointer?Thanks.

Alessandro Santini replied on Fri, 2009/04/17 - 12:22pm in response to: Fabrizio Giudici

Ciao Fabrizio,

thanks for your comment and for the pointer to JPA. I want to share this one also, http://code.google.com/appengine/docs/java/datastore/usingjpa.html, very informative as to limitations when using JPA.

Yes, the verb to dismiss may be inappropriate (hopefully won't offend anyone) but I still read that AppEngine does not scale because neither your apps nor yourself are in control of the clustering/cloud infrastructure; I still believe that this point is highly debatable, especially if it is not supported by a benchmark of some kind.

My conclusion are:

  • We are comparing apples and oranges - GAE/J and EC2 are different in goal, scope and cost
  • GAE/J is not necessarily unsuitable for enterprise applications - I still did not see any benchmark comparing a cloud instance with a GAE/J application. Most importantly, we do not know how it will scale in the future, considering that they may improve the underlying infrastructure.

Hope this clarifies my intent.

P.S.: as to the imaging API, the only link I know is http://code.google.com/appengine/docs/java/images/ but I am sure you have this already.

Neeraj Vora replied on Fri, 2009/04/17 - 1:07pm in response to: Alessandro Santini

>>We are comparing apples and oranges - GAE/J and EC2 are different in goal, scope and cost


Very, very true. EC2 is renting you land on which you build a playground, play in it and then tear it down. You are charged for the time you rent the land, whether or not you play in it all the time(it has a robot which you can instruct on how to build the playground each time). GAE/J gives you a ready custom playground to play in (you need to like the playground). It then charges you by the rides. If you rest all day in this playground, then you have a free pass.

Andy Jefferson replied on Fri, 2009/04/17 - 1:31pm

>> and a persistent storage using JDO (the choice of JDO still slips from my understanding).

Perhaps because it is not an RDBMS datastore behind it and that JPA was designed solely with RDBMS in mind, whereas JDO was and still is designed to be datastore agnostic.

Perhaps because JDO offers way more capabilities to the users plate than what JPA (1 or "2") does.

At the end of the day the user has both (as well as now also having a REST API to their datastore) and can choose for themselves. Regarding the limitations they currently have in their plugin, some of those may be removed in future revisions since it is early access.

--Andy (DataNucleus)

Alessandro Santini replied on Fri, 2009/04/17 - 4:15pm in response to: Andy Jefferson

Hi Andy,

thanks a lot for your comment. You are indeed right - that is the most probable reason.

Alex(JAlexoid) ... replied on Sun, 2009/04/19 - 2:00am

I am sorry, but you are exactly 1 year off with your overview. Since day one of GAE with Python, it was obvious what Google will provide for you and what they will limit for your own sake.All the same points are valid about Python, therfore valid about GAE as a whole.

GAE is a PaaS, while Amazon EC2 is a IaaS. Please know the distinctions before comparing the two.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.