Distributed OSGi - Tilting at Windmills
I've been an enthusiastic champion regarding the advent of OSGi as used for a Java-based enterprise computing modularity standard. However, I have a very definite compartmentalization in mind as to what role OSGi should play — I draw the line at a single running JVM instance. I don't want to use OSGi as a cornerstone for distributed computing modularity.
In a recent eWeek article, though, we see there is a movement afoot to do precisely that:
Distributed OSGi Effort Progresses
This excerpt makes it quite clear what this initiative is all about:
The Distributed OSGi effort also seeks to enable "a service running in one OSGi framework to invoke a service running in another, potentially remote, OSGi framework (meaning a framework in a JVM)," Newcomer wrote. As the current OSGi standard only defines how services talk to each other only within a single JVM, "extensions are needed to allow services to talk with each other across multiple JVMs — thus the requirements for distributed OSGi on which the design is based," he said.
I see such an effort as yet another attempt to reinvent what has already been tried several times before — distributed object computing — by the likes of such technologies as DCOM, CORBA, and even JEE/EJB (ala Session beans and RMI). What do these technologies all have in common? They have fallen into disuse after befuddling and frustrating many a programer (I personally fell on the DCOM sword back in the mid-90's).
Interface Calling Behavior Semantics
I have a by-line that I sometimes sign my blog postings with:
Let me get right to it by lobbing some grenades: I recognize two arch evils in the software universe – synchronous RPC and remoted interfaces of distributed objects.
The sentiment being expressed here is that it is wrong-headed to try to make synchronous method invocation transparent over the network. Yet that is the grand illusion that all these distributed object solutions strive to accomplish.
The problem is that an object interface that may be perfectly usable in a local JVM context — where call chain invocation can be sub-millisecond — will not have the same behavior semantics when invoked over a network connection. The method invocation may take 15 to 50 milliseconds on a good day; or may fail due to low-level transport errors (which never existed when it was being invoked in a local JVM context); or just time out without completing the call; or even never return/complete at all in any acknowledged fashion.
The consuming software code that used a method in a local JVM context has to now be designed to anticipate a wide range of different situations, as the calling contract of the method is radically different depending on the context in which it is being invoked. The advocates of distributed object computing, however, want us to believe in a grand illusion: that modules that were written and proved out for use in a local JVM context can be shifted to be consumed in a distributed context — and where consuming code, presumably, doesn't have to be any the wiser (or at least not very much wiser).
Of course, some may recall JEE/EJB Session beans where methods invoked via RMI had the potential to raise exceptions related to i/o transport errors. The upshot is that one had to design software from the outset to be a consumer of a "distributed" object interface vs. just a consumer of its local interface. Also, it was not long before EJB developers discovered that an object interface that made sense for a local JVM calling context would become a performance liability in a distributed computing context. To use the interface by its design gave rise to chatty round trips over the network, where, due to the latency/unreliability, the software becomes visibly sluggish. It is most dismaying to see all the enterprise software systems that have sluggish and problematic user interface behavior due to the application being written on a foundation of synchronous use of distributed object interfaces. That distributed object koolaide that folks drank from proved to be spiked heavily with toxic radiator fluid.
In essence, EJB developers found out that an object-oriented approach to software design could not be transparently shifted to a distributed computing context. The OOP software systems that tried to make that leap devolved into a quagmire of issues that had to be battled.
Interface Version Management
On the basis of this one failing alone distributed object computing has been one of the greatest colossal architectural mistakes of the last 15 years in the IT industry. Yet the failings don't stop there — the other equally perplexing obstacle to this undertaking is object interface version management.
I tend to think that the versioning dilemma is perhaps even more insidious than the synchronous distributed method call semantics problem. One encounters the issues of call semantics fairly early on, however, the interface versioning dilemma arises gradually over time, and then mounts up and becomes one of the greatest headaches that one battles in trying to keep deployed distributed software systems coherent.
One of the popular agile OOP developer practices of recent years is frequent re-factoring of code. Indeed, all of the popular IDEs in use are adept in assisting the developer with re-factoring. Re-factoring may well be a good thing in a context where the development team gets to control all the deployment pieces that are impacted by such change. However, in a distributed computing context, which is usually heterogeneous, re-factoring would just be asking for misery.
Making changes to how distributed software systems interact, and where multiple development teams and/or companies are involved, is a process undertaking akin to wisdom tooth extraction (the difficult kind where the dentist has to work for a few hours to break the tooth apart and bring it out in pieces). The simplest of changes can be tedious to negotiate, often politics of some variety intrudes, and it is often challenging to schedule releases to well synchronize with one other so that deployment can occur.
As such, the notion of versioning of distributed object interfaces has been proffered as the means for coping with this. One team can come out with a new and improved interface to an existing object and deploy it unilaterally. Other parties that devise software that consumes the interface of said object, can catch up to the new interface as they are able. In the meantime the older interface remains in place so that existing deployed software keeps working.
On paper versioning looks like a workable approach — the significant distributed object solutions have all had provision for versioning interfaces. In practice it can even be done a few times. However, for large, complex enterprise software systems, maintaining interface versions gets to be burdensome. One of the reasons is that by their very nature object interfaces are very brittle. The information state that is exchanged tends to be very explicitly coupled to the interface signature. It can be hard to significantly (or meaningfully) evolve the software implementation of an object without having impact to the object's interface. Once that happens, the interface has to be versioned — and a sense of dread then sets in.
OOP Does Not Work Well In Distributed Context
As to distributed object computing, quagmire is the operative word here. Quite simply OOP does not really work in a distributed computing context — there are too many difficult issues that entangle for it to be worth the while.
It is fascinating to see new generations of software engineers getting lured into re-inventing distributed object computing over and over and over again. And a lot of the same computer industry corporate players get right behind these endeavors every time. These kinds of systems become complex and grandiose — thus they seem to be excellent sticky fly traps for luring in developers. Think of the legions of developers (and their companies) that have floundered on DCOM, CORBA, JEE/EJB, WS-* (death star) — and now lets add Distributed OSGi. Distributed object computing is our industry's Don Quixote tilting at windmills endeavor.
Asynchronous Messaging and Loose-Coupling Message Format Techniques
So what is the alternative?
When it comes to distributed computing interactions, try following these guidelines:
- Design the software from the outset around asynchronous interactions. (Retrofitting synchronous software designs to a distributed context is a doomed undertaking that will yield pathetic/problematic results.)
- Prefer messaging to RPC or RMI style interface invocation
- Attempt to use messaging formats that are intrinsically non-brittle. If designed with forethought, messaging formats can later be enhanced without impacting existing deployed software systems. (The entire matter of versioning interfaces can be dodged.)
- Build in robust handling (and even auto-recovery) of transport related failure situations.
- Never let the user interface become unresponsive due to transport sluggishness or failure situations. A user interface needs to remain responsive even when over-the-wire operations are coughing and puking. (So distributed interaction I/O always needs to be done asynchronously to the application's GUI thread.)
- Keep transport error handling orthogonal to the semantics of messaging interactions. (Don't handle transport error and recovery at every place in the application where a messaging interaction is being done. Centralize that transport error handling code to one place and do it very well just one time.)
A key point to appreciate is that transport error handling and recovery is very different from the matter of business domain logic error processing. The two should not be muddied together. A lot of software written around PRC or RMI style interface method invocation does exactly that, though. Messaging-centric solutions usually permit a single transport error handler to be registered such that this need be coded just once. The rest of the application code can then concentrate on business domain logic processing.
AJAX and Flex Remoting
A messaging approach is a sound basis for designing distributed application architecture — especially when one does not control all the end-points. More recently I have been designing architecture for Flex-based web RIA applications. In these apps, the client uses Flex asynchronous remote method invocation to access services on the server. Adobe's BlazeDS is embedded in the server to facilitate: remoting calls, marshaling objects between ActionScript3 and Java, message push to the client, and bridging to a JMS message broker.
You may think that I'm not exactly following my own advice. However, there are special circumstances at play:
- Flex I/O classes support asynchronous invocation, so the operation does not block the main GUI thread of the app.
- Flex I/O classes invoke closures to process return results; also, a fault closure can be supplied to handle transport related errors. Consequently a programmer can write one very robust fault handling closure and reuse it in all I/O operations. Thus Flex does an excellent job of segregating business logic processing from transport-related error handling.
- Flex client .swf files are bundled together with their Java services into the same .war deployment archive. Consequently, the client-tier and the server-tier are always delivered together and thus will not drift out of version compliance.
The last point is worth further remark: The way the two distributed end-point tiers are being bound together into a single deployment unit makes for a situation that is nearly like delivering an old-school monolithic application. Flex supports Automation classes so that tools, such as RIATest, can be used to automate testing client UI. Consequently the client can be scripted to regression test against service call interactions. Thus deployment unit is the subject of due QA attention and even though the two tiers are not subjected to the same compiler, at least they actually get tested together prior to releasing.
If the service call interfaces are refactored, then the client can be refactored at the same time. Typically this is even being done within the same IDE (such as Eclipse with the Flex Builder plugin). The Flex code and the Java code each have refactoring support. Flex unit test could then be used within the development context to verify call interface validity.
Google GWT applications have similar characteristic where asynchronous method invocation is supported for invoking services on the server tier. Client tier java code and services jave code is developed co-jointly and can be packaged into a single deployment unit.
AJAX web applications may be another case where the client tier and the server tier are often deployed together.
Conclusion
So the take aways from this discussion are:
- If you can't control both end-points stick with messaging and loose-coupled message format design. Be very mindful of the versioning dilemma from the outset and plan with forethought. The best outcome is to be able to evolve message formats without breaking end-points that have not yet been upgraded. Try very hard to dodge the burden of version management of installed end-points.
- If you can deliver both end-points from a common deployment unit, then method invocation of remote object interfaces can be okay. However, stick with the technologies that support asynchronous I/O. Separation of the concerns of business logic processing on return results from transport fault handling is the ideal.
Related Links
Building Effective Enterprise Distributed Software Systems
XML data "duck typing"
Flex Async I/O vs Java and C# Explicit Threading
- Login or register to post comments
- 2058 reads
- Email this Article
- Printer-friendly version
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)







Comments
enewc replied on Mon, 2008/06/16 - 3:42pm
Roger Voss replied on Mon, 2008/06/16 - 9:46pm
That which we call a rose by any other name would smell as sweet (or, more apt in this case, be just as prickly).
So in other words, Distributed OSGi will simply facilitate OSGi bundles to play within the distributed computing protocols, frameworks, etc., that already exist. Okay, that makes all the difference in the world.
BTW, there's no need for Distributed OSGi to do that. Today we create OSGi bundles where classes implement services invoked by web clients (i.e., the Flex web RIA case I cite).
So gee, just like EJB Session beans, exposed classes or interfaces in an OSGi bundle will be able to sport the notion of remote use and/or local use. Yeah, yeah - we've been there and done that several times over by now. In my case, I first implemented this concept for C++ classes back in 1993/94 - prior to the official advent of DCOM by my employer at the time.
Peter Kriens replied on Tue, 2008/06/17 - 4:57am
My biggest fear is that one day I tell people that it can not be done because I could not in the past. Seymour Cray always hired young people <i>because</i> they did not know it could not be done ... Over time things change and and the general experience in the industry shifts what works and what does not work. The landscape evolves over time and one should take that into account. I built my first distributed system in 1980 and I have seen and worked/played with all the standards you mention and understand many of your objections.
However, I do think the distributed OSGi work has a fighting chance. OSGi has two crucial features that is lacking in any other environment I know: a service is not guaranteed to be there. The dynamics of a service are an intrinsic part of the specifications. This basic lack of guarantee maps very well to some of the 7 fallacies of distributed computing: we never promised reliable services. This has caused a flurry of programming models like Spring-DM, Declarative Services, iPOJO, SAT, etc. that all know how to handle the coming and going of services. The other feature is of course is that OSGi is very strong on versioning, I do not know any other environment where you can actually ask the version of your code and match it to the remote site.
The other aspect is that by having strong modularization we hide many details from the programmers. This allows the distribution to mostly take place as a configuration. There is a huge timing difference between local and remote when you use a dial up line, but todays gigabit fiber in a data center are much less of an issue. I think there are many practical use cases where the deployer will think the distributed OSGi is a godsent.
Obliviously, I am not saying we will solve all those problems you mention. However, no single environment does. The OSGi provides an interesting programming model with Java, bundles, and services. The plan is to add an optional service to the mix that allows you to make a service available over any of the myriad of distributed protocols in the market and to use the function of a remote facility with little effort, at the choice of the deployer. Just as there is the optional Event Admin that lends itself very well to the asynchronous style you promote.
OSGi is about modularization. Modularization allows you to mix and match components in runtime, a distribution service seems an important part in the toolbox. It is not our place to tell people to use asynchronous calls when in many cases the object oriented remote procedure style is quite popular. I do believe that OSGi should try to provide components for many architectural styles and then companies can work with what works best for them. And let's face it, there is no single architectural style that suits all.
Personally, I would likely choose your proposed route if I was responsible for a large distributed system (hmm, that was exactly what we did in 1980). However, today webservices, RMI, and CORBA is out there and used in lots of places. Making it easier to live in that world seems a commendable task?
Kind regards,
Peter Kriens
Richard Nicholson replied on Tue, 2008/06/17 - 7:34am
Roger,
At the start of Infiniflow Presentations I frequently state, "the same old monolithic / brittle distribute systems can be built with the best / most agile of enablers" - OSGi included :(
Complete agree with your conclusions. Indeed, given "Deutsch 8 Fallacies", these conclusion should be self evident to any seasoned distributed engineer. Of course - they are not.
That said, I think Eric/Peter are correct (from my limited understanding); OSGi RFC119 does little to constrain the actual form of a distributed architecture; a solution complying to RFC119 may be asynchronous message based or synchronous RPC. Hence, to my mind OSGi neither prevents or encourages the propagation of the same old ideas/mistakes, by the same old ISV's; this propagation rather a reflection of how impoverished / uninventive / willing to change / the industry has become.
Hence the "old world" can happy continue to with OSGi-d versions of static / stove-piped / stacks of EBJ, ESB, Corba,WS-*. Whilst - at the same time - RFC119 will not preclude systems like Infiniflow.
Cheers
Richard
www.paremus.com
tdiekmann replied on Sun, 2008/07/13 - 7:48pm
in response to: rv49649
Hi Roger,
it is not surprising that the title 'Distributed OSGi' of RFC 119 may be a little confusing to people, who do not have access to the actual document. This document as well as all other RFCs is open to OSGi Alliance members only.
However, the good news is that we plan to give an updated version of this and other EEG related documents to the public in this summer. This will allow experienced developers and bloggers like yourself to gain a better understanding and provide valuable feedback.
For what it is worth, I agree with pretty much everything you have listed in your blog with the exception of the references to RFC 119 of course. Having experienced the problems of distributed computing myself for several years as a professional makes me come to similar conclusions. We also recommend asynchronous message oriented middleware over RPC (RMI) style communication in our projects.
In RFC 119, we do not talk about the 'how'. What is appealing to users of OSGi is the service programming model as Peter pointed out. RFC 119 now talks about a standard way of integrating your existing communication infrastructure with the OSGi service programming model while allowing for interoperability of different vendor solutions. Sounds big, will be great :-)
Kind regards,
Tim Diekmann.
co-spec lead of RFC 119
co-chair Enterprise Expert Group, OSGi Alliance