Enterprise Integration Zone is brought to you in partnership with:

Roger has posted 6 posts at DZone. View Full User Profile

Distributed OSGi - Tilting at Windmills

06.16.2008
| 13027 views |
  • submit to reddit

I've been an enthusiastic champion regarding the advent of OSGi as used for a Java-based enterprise computing modularity standard. However, I have a very definite compartmentalization in mind as to what role OSGi should play — I draw the line at a single running JVM instance. I don't want to use OSGi as a cornerstone for distributed computing modularity.

In a recent eWeek article, though, we see there is a movement afoot to do precisely that:

Distributed OSGi Effort Progresses

This excerpt makes it quite clear what this initiative is all about:

The Distributed OSGi effort also seeks to enable "a service running in one OSGi framework to invoke a service running in another, potentially remote, OSGi framework (meaning a framework in a JVM)," Newcomer wrote. As the current OSGi standard only defines how services talk to each other only within a single JVM, "extensions are needed to allow services to talk with each other across multiple JVMs — thus the requirements for distributed OSGi on which the design is based," he said.


I see such an effort as yet another attempt to reinvent what has already been tried several times before — distributed object computing — by the likes of such technologies as DCOM, CORBA, and even JEE/EJB (ala Session beans and RMI). What do these technologies all have in common? They have fallen into disuse after befuddling and frustrating many a programer (I personally fell on the DCOM sword back in the mid-90's).

Interface Calling Behavior Semantics

I have a by-line that I sometimes sign my blog postings with:

Let me get right to it by lobbing some grenades: I recognize two arch evils in the software universe – synchronous RPC and remoted interfaces of distributed objects.


The sentiment being expressed here is that it is wrong-headed to try to make synchronous method invocation transparent over the network. Yet that is the grand illusion that all these distributed object solutions strive to accomplish.

The problem is that an object interface that may be perfectly usable in a local JVM context — where call chain invocation can be sub-millisecond — will not have the same behavior semantics when invoked over a network connection. The method invocation may take 15 to 50 milliseconds on a good day; or may fail due to low-level transport errors (which never existed when it was being invoked in a local JVM context); or just time out without completing the call; or even never return/complete at all in any acknowledged fashion.

The consuming software code that used a method in a local JVM context has to now be designed to anticipate a wide range of different situations, as the calling contract of the method is radically different depending on the context in which it is being invoked. The advocates of distributed object computing, however, want us to believe in a grand illusion: that modules that were written and proved out for use in a local JVM context can be shifted to be consumed in a distributed context — and where consuming code, presumably, doesn't have to be any the wiser (or at least not very much wiser).

Of course, some may recall JEE/EJB Session beans where methods invoked via RMI had the potential to raise exceptions related to i/o transport errors. The upshot is that one had to design software from the outset to be a consumer of a "distributed" object interface vs. just a consumer of its local interface. Also, it was not long before EJB developers discovered that an object interface that made sense for a local JVM calling context would become a performance liability in a distributed computing context. To use the interface by its design gave rise to chatty round trips over the network, where, due to the latency/unreliability, the software becomes visibly sluggish. It is most dismaying to see all the enterprise software systems that have sluggish and problematic user interface behavior due to the application being written on a foundation of synchronous use of distributed object interfaces. That distributed object koolaide that folks drank from proved to be spiked heavily with toxic radiator fluid.

In essence, EJB developers found out that an object-oriented approach to software design could not be transparently shifted to a distributed computing context. The OOP software systems that tried to make that leap devolved into a quagmire of issues that had to be battled.

Interface Version Management

On the basis of this one failing alone distributed object computing has been one of the greatest colossal architectural mistakes of the last 15 years in the IT industry. Yet the failings don't stop there — the other equally perplexing obstacle to this undertaking is object interface version management.

I tend to think that the versioning dilemma is perhaps even more insidious than the synchronous distributed method call semantics problem. One encounters the issues of call semantics fairly early on, however, the interface versioning dilemma arises gradually over time, and then mounts up and becomes one of the greatest headaches that one battles in trying to keep deployed distributed software systems coherent.

One of the popular agile OOP developer practices of recent years is frequent re-factoring of code. Indeed, all of the popular IDEs in use are adept in assisting the developer with re-factoring. Re-factoring may well be a good thing in a context where the development team gets to control all the deployment pieces that are impacted by such change. However, in a distributed computing context, which is usually heterogeneous, re-factoring would just be asking for misery.

Making changes to how distributed software systems interact, and where multiple development teams and/or companies are involved, is a process undertaking akin to wisdom tooth extraction (the difficult kind where the dentist has to work for a few hours to break the tooth apart and bring it out in pieces). The simplest of changes can be tedious to negotiate, often politics of some variety intrudes, and it is often challenging to schedule releases to well synchronize with one other so that deployment can occur.

As such, the notion of versioning of distributed object interfaces has been proffered as the means for coping with this. One team can come out with a new and improved interface to an existing object and deploy it unilaterally. Other parties that devise software that consumes the interface of said object, can catch up to the new interface as they are able. In the meantime the older interface remains in place so that existing deployed software keeps working.

On paper versioning looks like a workable approach — the significant distributed object solutions have all had provision for versioning interfaces. In practice it can even be done a few times. However, for large, complex enterprise software systems, maintaining interface versions gets to be burdensome. One of the reasons is that by their very nature object interfaces are very brittle. The information state that is exchanged tends to be very explicitly coupled to the interface signature. It can be hard to significantly (or meaningfully) evolve the software implementation of an object without having impact to the object's interface. Once that happens, the interface has to be versioned — and a sense of dread then sets in.

OOP Does Not Work Well In Distributed Context

As to distributed object computing, quagmire is the operative word here. Quite simply OOP does not really work in a distributed computing context — there are too many difficult issues that entangle for it to be worth the while.

It is fascinating to see new generations of software engineers getting lured into re-inventing distributed object computing over and over and over again. And a lot of the same computer industry corporate players get right behind these endeavors every time. These kinds of systems become complex and grandiose — thus they seem to be excellent sticky fly traps for luring in developers. Think of the legions of developers (and their companies) that have floundered on DCOM, CORBA, JEE/EJB, WS-* (death star) — and now lets add Distributed OSGi. Distributed object computing is our industry's Don Quixote tilting at windmills endeavor.

Asynchronous Messaging and Loose-Coupling Message Format Techniques

So what is the alternative?

When it comes to distributed computing interactions, try following these guidelines:

  • Design the software from the outset around asynchronous interactions. (Retrofitting synchronous software designs to a distributed context is a doomed undertaking that will yield pathetic/problematic results.)
  • Prefer messaging to RPC or RMI style interface invocation
  • Attempt to use messaging formats that are intrinsically non-brittle. If designed with forethought, messaging formats can later be enhanced without impacting existing deployed software systems. (The entire matter of versioning interfaces can be dodged.)
  • Build in robust handling (and even auto-recovery) of transport related failure situations.
  • Never let the user interface become unresponsive due to transport sluggishness or failure situations. A user interface needs to remain responsive even when over-the-wire operations are coughing and puking. (So distributed interaction I/O always needs to be done asynchronously to the application's GUI thread.)
  • Keep transport error handling orthogonal to the semantics of messaging interactions. (Don't handle transport error and recovery at every place in the application where a messaging interaction is being done. Centralize that transport error handling code to one place and do it very well just one time.)


A key point to appreciate is that transport error handling and recovery is very different from the matter of business domain logic error processing. The two should not be muddied together. A lot of software written around PRC or RMI style interface method invocation does exactly that, though. Messaging-centric solutions usually permit a single transport error handler to be registered such that this need be coded just once. The rest of the application code can then concentrate on business domain logic processing.

AJAX and Flex Remoting

A messaging approach is a sound basis for designing distributed application architecture — especially when one does not control all the end-points. More recently I have been designing architecture for Flex-based web RIA applications. In these apps, the client uses Flex asynchronous remote method invocation to access services on the server. Adobe's BlazeDS is embedded in the server to facilitate: remoting calls, marshaling objects between ActionScript3 and Java, message push to the client, and bridging to a JMS message broker.

You may think that I'm not exactly following my own advice. However, there are special circumstances at play:

  • Flex I/O classes support asynchronous invocation, so the operation does not block the main GUI thread of the app.
  • Flex I/O classes invoke closures to process return results; also, a fault closure can be supplied to handle transport related errors. Consequently a programmer can write one very robust fault handling closure and reuse it in all I/O operations. Thus Flex does an excellent job of segregating business logic processing from transport-related error handling.
  • Flex client .swf files are bundled together with their Java services into the same .war deployment archive. Consequently, the client-tier and the server-tier are always delivered together and thus will not drift out of version compliance.

The last point is worth further remark: The way the two distributed end-point tiers are being bound together into a single deployment unit makes for a situation that is nearly like delivering an old-school monolithic application. Flex supports Automation classes so that tools, such as RIATest, can be used to automate testing client UI. Consequently the client can be scripted to regression test against service call interactions. Thus deployment unit is the subject of due QA attention and even though the two tiers are not subjected to the same compiler, at least they actually get tested together prior to releasing.

If the service call interfaces are refactored, then the client can be refactored at the same time. Typically this is even being done within the same IDE (such as Eclipse with the Flex Builder plugin). The Flex code and the Java code each have refactoring support. Flex unit test could then be used within the development context to verify call interface validity.

Google GWT applications have similar characteristic where asynchronous method invocation is supported for invoking services on the server tier. Client tier java code and services jave code is developed co-jointly and can be packaged into a single deployment unit.

AJAX web applications may be another case where the client tier and the server tier are often deployed together.

Conclusion

So the take aways from this discussion are:

  • If you can't control both end-points stick with messaging and loose-coupled message format design. Be very mindful of the versioning dilemma from the outset and plan with forethought. The best outcome is to be able to evolve message formats without breaking end-points that have not yet been upgraded. Try very hard to dodge the burden of version management of installed end-points.
  • If you can deliver both end-points from a common deployment unit, then method invocation of remote object interfaces can be okay. However, stick with the technologies that support asynchronous I/O. Separation of the concerns of business logic processing on return results from transport fault handling is the ideal.


Related Links

Building Effective Enterprise Distributed Software Systems

XML data "duck typing"

Flex Async I/O vs Java and C# Explicit Threading

Published at DZone with permission of its author, Roger Voss.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Eric Newcomer replied on Mon, 2008/06/16 - 2:42pm

"It is fascinating to see new generations of software engineers getting lured into re-inventing distributed object computing over and over and over again. " This is a complete misunderstanding and mistaken characterization of the effort. It would be helpful to get the facts right before writing something like this. All the author had to do was follow the link to my blog entry about this http://blogs.iona.com/newcomer/archives/000569.html To quote the relevant paragraph here: "We did not want to invent a new distributed computing system, since so many already exist. (In fact we had pretty strong feedback on that point!) The design introduces some new OSGi properties to identify a service as remote and a discovery service through which a local service can find a remote service and obtain the metadata necessary to interact with that remote service. The design is intended to support any communication protocol and data format (with some constraints of course, having to do with the use of Java request/response interfaces as the service contract). Another goal of the design is to allow services in an OSGi framework to interact with services external to OSGi, both as client and server. " Nothing about new object models, tight coupling etc.

Roger Voss replied on Mon, 2008/06/16 - 8:46pm

That which we call a rose by any other name would smell as sweet (or, more apt in this case, be just as prickly).

So in other words, Distributed OSGi will simply facilitate OSGi bundles to play within the distributed computing protocols, frameworks, etc., that already exist. Okay, that makes all the difference in the world.

BTW, there's no need for Distributed OSGi to do that. Today we create OSGi bundles where classes implement services invoked by web clients (i.e., the Flex web RIA case I cite).

However, "the design introduces some new OSGi properties to identify a service as remote and a discovery service through which a local service can find a remote service and obtain the metadata necessary to interact with that remote service," Newcomer said in his blog.

So gee, just like EJB Session beans, exposed classes or interfaces in an OSGi bundle will be able to sport the notion of remote use and/or local use. Yeah, yeah - we've been there and done that several times over by now. In my case, I first implemented this concept for C++ classes back in 1993/94 - prior to the official advent of DCOM by my employer at the time.

Peter Kriens replied on Tue, 2008/06/17 - 3:57am

My biggest fear is that one day I tell people that it can not be done because I could not in the past. Seymour Cray always hired young people <i>because</i> they did not know it could not be done ... Over time things change and and the general experience in the industry shifts what works and what does not work. The landscape evolves over time and one should take that into account. I built my first distributed system in 1980 and I have seen and worked/played with all the standards you mention and understand many of your objections.

However, I do think the distributed OSGi work has a fighting chance. OSGi has two crucial features that is lacking in any other environment I know: a service is not guaranteed to be there. The dynamics of a service are an intrinsic part of the specifications. This basic lack of guarantee maps very well to some of the 7 fallacies of distributed computing: we never promised reliable services. This has caused a flurry of programming models like Spring-DM, Declarative Services, iPOJO, SAT, etc. that all know how to handle the coming and going of services. The other feature is of course is that OSGi is very strong on versioning, I do not know any other environment where you can actually ask the version of your code and match it to the remote site.

The other aspect is that by having strong modularization we hide many details from the programmers. This allows the distribution to mostly take place as a configuration. There is a huge timing difference between local and remote when you use a dial up line, but todays gigabit fiber in a data center are much less of an issue. I think there are many practical use cases where the deployer will think the distributed OSGi is a godsent.

Obliviously, I am not saying we will solve all those problems you mention. However, no single environment does. The OSGi provides an interesting programming model with Java, bundles, and services. The plan is to add an optional service to the mix that allows you to make a service available over any of the myriad of distributed protocols in the market and to use the function of a remote facility with little effort, at the choice of the deployer. Just as there is the optional Event Admin that lends itself very well to the asynchronous style you promote.

OSGi is about modularization. Modularization allows you to mix and match components in runtime, a distribution service seems an important part in the toolbox. It is not our place to tell people to use asynchronous calls when in many cases the object oriented remote procedure style is quite popular. I do believe that OSGi should try to provide components for many architectural styles and then companies can work with what works best for them. And let's face it, there is no single architectural style that suits all.

Personally, I would likely choose your proposed route if I was responsible for a large distributed system (hmm, that was exactly what we did in 1980). However, today webservices, RMI, and CORBA is out there and used in lots of places. Making it easier to live in that world seems a commendable task?

Kind regards,

Peter Kriens

Richard Nicholson replied on Tue, 2008/06/17 - 6:34am

Roger,

At the start of Infiniflow Presentations I frequently state, "the same old monolithic / brittle distribute systems can be built with the best / most agile of enablers" - OSGi included :( 

Complete agree with your conclusions. Indeed, given "Deutsch 8 Fallacies", these conclusion should be self evident to any seasoned distributed engineer. Of course - they are not.

That said, I think Eric/Peter are correct (from my limited understanding); OSGi RFC119 does little to constrain the actual form of a distributed architecture; a solution complying to RFC119 may be asynchronous message based or synchronous RPC. Hence, to my mind OSGi neither prevents or encourages the propagation of the same old ideas/mistakes, by the same old ISV's; this propagation rather a reflection of how impoverished / uninventive / willing to change / the industry has become.  

Hence the "old world" can happy continue to with OSGi-d versions of static / stove-piped / stacks of EBJ, ESB, Corba,WS-*. Whilst - at the same time - RFC119 will not preclude systems like Infiniflow.

 

Cheers

Richard 

www.paremus.com

Tim Diekmann replied on Sun, 2008/07/13 - 6:48pm in response to: Roger Voss

Hi Roger,

 it is not surprising that the title 'Distributed OSGi' of RFC 119 may be a little confusing to people, who do not have access to the actual document. This document as well as all other RFCs is open to OSGi Alliance members only.

 

However, the good news is that we plan to give an updated version of this and other EEG related documents to the public in this summer. This will allow experienced developers and bloggers like yourself to gain a better understanding and provide valuable feedback.

 

 For what it is worth, I agree with pretty much everything you have listed in your blog with the exception of the references to RFC 119 of course. Having experienced the problems of distributed computing myself for several years as a professional makes me come to similar conclusions. We also recommend asynchronous message oriented middleware over RPC (RMI) style communication in our projects.

 

In RFC 119, we do not talk about the 'how'. What is appealing to users of OSGi is the service programming model as Peter pointed out. RFC 119 now talks about a standard way of integrating your existing communication infrastructure with the OSGi service programming model while allowing for interoperability of different vendor solutions. Sounds big, will be great :-)

 

Kind regards,

  Tim Diekmann.

co-spec lead of RFC 119

co-chair Enterprise Expert Group, OSGi Alliance

Darren Cruse replied on Sun, 2010/02/14 - 10:14am

Love the article both in content and in style - you're absolutely right we in this industry deserve a bit of sarcasm for "tilting at windmills" and attacking distributed programming problems the same way again and again with just a bit "marketecture" smeared on the top of what *really* isn't fundamentally different than the past failed solutions.

(I independently came to the idea of using the phrase "tilting at windmills - like Don Quixote" recently - though in regard to my companies outsourcing policies assuming that knowledge can magically and easily and instantaneously transfer from one developer's head to another - but that's a different story :).

And I know I'm late to the party (I notice the articles been up for a long time), but the thing I'd like to add is that for me, the fundamental distinction is best summarized - at least in today's lingo - as a debate between the RPC model of distribution and the RESTful model of distribution.

And for me, in a nutshell this boils down to the idea that *caching* is fundamental to a successful distributed system (where I'm suggesting at heart the "state transfer" part of REST is really just another name for caching).

I think the excessive "chattiness" that RPC can lead to (too many fine grained network transfers) is another problem we should have learned from stuff like CORBA, but I'm willing to forgive that if the convenience of the RPC programming model is enough that we just need to teach developers to be careful not to overdo it.

But right before reading your article I'd read some stuff on SCA and a little on distributed OSGi and for these specs to *only* speak to standards for how you specify the invocation of procedures, without also speaking to fundamental changes in how we deal with the performance and reliability issues...

I agree with you - it's Deja Vu All Over Again (can I get a Homer Simpson DOH! please?)

For me to believe there's real progress would mean that these specs would also speak right up front to standards for how caching (and yes that includes the problems of cache invalidation) is going to work to allow these systems to perform well, and to have at least a *little* less sensitivity to intermittent network failures (if not address the "occasionally connected" user scenario - but I guess in today's world that idea is becoming less important).

I'd been thinking of a metaphor that "the network is the computer (but they forgot to install the cache)", meaning:

In the beginning there was circuitry that wasn't programmable.

Then some smart person came up with programmability with instructions stored in memory, and that was good.

And things evolved and other smart people invented microprocessors and those led to PCs.

And somewhere in there some smart person recognized microprocessors needed to have something called a "cache", because "cache" was fast, but memory was slow.

This was the early 70s.

There was peace signs, long hair, bell bottoms, and the first microprocessors with an L1 cache (the intel 4004 - I just looked it up).

And all was good.

But PCs were standalone and didn't talk to other PCs.

But then came the network, and that was good.

But the network was slow.

But then the network became *popular*.

And people fell in love with it.

And they were no longer content just to load programs from disks on their local PCs.

Where the programs loaded over the short little wires on their PC's motherboard.

They wanted programs to load from the internet.

They wanted the *long* wires of the internet to feel as fast and reliable as the short little wires they were used to on their computer.

And they wanted stuff on all the other computers all around the world to be as accessible and convenient as the stuff on their computer.

In fact they really just wanted all those many computers to feel like one single giant computer.

And more wires were added and the connections got faster and more reliable.

But there was this annoying little problem called "the speed of light".

And the connections were never fast enough or reliable enough.

Fast enough or reliable enough to eliminate the need for caching.

The thing is - those smart people who invented those first microprocessors, those smart people back in the early 70s with long hair and bell bottoms who struggled and puzzled though the problems of needing to connect a fast CPU to a comparatively slow memory bus...

Those people would have had a deja vu feeling and realized "gosh I think I've seen this problem before... where was it... oh yeah the ALU was trying to load instructions from memory but it was slow... and then I put in that cache thing, cause the local cached stuff was fast compared to those wires out to memory... but the memory was really slow...".

Those people, upon seeing what was going on with all the computers being networked with "service orientation" and "the network is the computer"...

They would have realized the need for deep down to the bone caching of all that goodness being sucked down from the network (from the internet).

The problem was:

They were dead.

Darren Cruse replied on Thu, 2010/04/22 - 7:53am

My apologies to Roger I remembered this morning I'd written my overly long comment above some time ago - and I just noticed there were no new comments either to Roger's article or to my comment.

Where my comment was obviously meant to be provocative (and in that respect not unlike Roger's article).

The thinking behind what I wrote goes back years ago to when I first wrote a real-time multi-player tank game over dial-up modems as a student, to later doing client-server systems in the early 90s, to CORBA in the late 90s, to more recent activity using "SOA" with SOAP and REST pulling together "services" running on many machines at once.

And the feeling that, in all this time, developing distributed, networked software remains surprisingly complicated.

In my experience performance and reliability problems are still common with such systems.

And from my point of view, programming them remains surprisingly complicated.

The move to cross platform languages like java and common formats like xml and json are a big help of course.

But it feels like the support for proper caching too often remains an afterthought.

Something to be glued on after the fact instead of a deep down out-of-the-box piece of infrastructure that holds the programmer's hand and guides them to doing the right thing without a lot of thought.

i.e. Even today there's nothing popular that I would think of as a "network operating system" or "visual basic for the net".

Something that brings the complexity of network based programming back down to where things were when I started programming. Before the net became popular.

Of course this article was about Distributed OSGi - something I don't have a lot of experience about. I had looked a bit at Spring DM and I know that Spring does have cache components you can configure. And I'd also looked a little at things like Terracotta or Coherence and NetKernel.

Of these, NetKernel is the one I'd looked the closest at, and it's the one that comes closest to getting things right in my opinion. You treat everything in it as a resource which has a URI, and you write your systems using software components that feel much like collaborating CGI/servlets. But the beauty is you can co-locate them on one machine (JVM), or distribute them across the network, largely through nothing but configuration changes. i.e. There's almost a "plasticity" to your network architecture that comes if you've developed your system under NetKernel (or a combination of NetKernel along with other REST/SOAP/etc. services on other systems).

And related to my comments above, this is partly due to it having a "dependency cache" as infrastructure. i.e. Whenever you're consuming a resource with a URI you're encouraged to think about the appropriate caching strategy for that data because caching is fundamental to the system's api and the way the system works. It's called a "dependency cache" because it understands relationships and can invalidate appropriately. So e.g. if you create html by transforming xml via xsl, the system understands that the html depends on the xml and the xsl, and can automatically invalidate the cached html when the xml or xsl changes. This is possible even when the xml and xsl itself may be distributed and retrieved via services running on different machines on the network.

Strangely, at least in the circles I run in, NetKernel doesn't seem to be well known. It's like a cool local band that nobody's heard of.

Anyway, FWIW.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.