Ola Bini is a Swedish developer working for ThoughtWorks. His daily job includes working on JRuby, starting up a Swedish ThoughtWorks office and mucking around with Java and Ruby. In his spare time he spends most time on his language Ioke, working on one of several other open source projects or reading science fiction. Ola has presented at numerous conferences, such as JavaOne, Javapolis, JAOO, RailsConf, TheServerSide Java Symposium and more. He is the author of APress book Practical JRuby on Rails Ola is a DZone MVB and is not an employee of DZone and has posted 45 posts at DZone. You can read more from them at their website. View Full User Profile

Questioning the reality of Generics

07.27.2010
| 7930 views |
  • submit to reddit

I’ve been meaning to write about this for a while, since I keep saying this and people keep getting surprised. Now maybe I’m totally wrong here, and if that’s the case it would be nice to hear some good arguments for that. Here’s my current point of view on the subject anyway.

A specter is haunting the Java community - the specter of generics.

Java introcued a feature called generics in Java 5 (this feature is generally known under the name of parametric polymorphism in the literate). Before Java 5 it wasn’t possible to create a reusable collection that would ensure the type safety at compile time of what you put in to that collection. You could create a collection of for example Strings and have that working correctly, but if you wanted to have a collection of anything, as long as that anything was the same type, you were restricted to doing runtime checks, or just having good tests.

Java 5 made it possible to add type parameters to any other type, which means you could create more specific collections. There are still problems with these - they interact badly with native arrays for example, and wildcards (Java’s way of implementing co= and contravariance) have ended up being very hard for Java developers to use correctly.

Java and C# both added generic types at roughly the same time. The C# version of generics differed in a few crucial ways, though. The most important difference in implementation is that C# generics are reified, while Java generics use type erasure. And this is really the gist of this blog post. Because over and over I hear people lament the lack of reified generics in Java, citing how good C# and the CLR is to have this feature. But is that really the case? Is reified generics a good thing? Of course, that always depends on who is asking the question. Reified might well be good for one person but not another. Here you will hear my view.

Reified? Huh?

So what does reified generics mean, anyway? It is probably easiest to explain compared to the Java implementation that uses type erasure. Slightly simplified: in Java generics doesn’t exist at runtime. It is purely a fiction that the compiler uses to handle type checking and make sure you don’t do anything bad with your collection. After the generics have been type checked, they are used to generate casts and type checks in the code using generics, some metadata is inserted into the class file format, and then the generic information is thrown away.

In contrast, on the CLR, generic classes exist as specific versions of their class. The same class with different generic type arguments are really different classes. There are no casts happening at the implementation level, and the CLR will as a result generate more specific code for the generic code. Reflection and dynamic type checks is also possible on the CLR. Having reified generics means basically that they exist at runtime, that the virtual machine knows about them and handles them correctly.

Multi-language virtual machines

The last twenty years something interesting has happened. Both out hardware and software has gotten mature enough that a new generation of virtual machines have entered the market. Traditionally, virtual machines for languages were made for specific languages, such as Pascal, Lisp and Smalltalk, and possibly except for SECD and the Warren machine, there haven’t really been any virtual machines optimized to running more than one language well. The JVM didn’t start that way either, but it turned out to be more well suited for it than expected, and there are lots of efforts to make it an even better platform. The CLR, Parrot, LLVM and Rubinius are other examples of things that seem to become environments rather than just implementation strategies for languages.

This is very exciting, and I think it’s a really good thing. We are solving very complex problems where the component problems are best solved in different ways. It seems like a weird assumption that one programming language is the best way of solving all problems. But there is also a cost associated with using more than one language. So having virtual machines act as platforms, where a sharked chunk of libraries are available, and the cost of implementation is low, makes a lot of sense.

In summary, I feel that the JVM was the first step towards a real viable multi-language virtual machine, and we are currently in the middle of the evolution towards that point.

Solving the problems

So why not add reified generics to the JVM at this point? It could definitely be done, and using an approach similar to the CLR, where libraries are divided into pre and post reified makes the path quite simple from an implementation standpoint. On the user side, there would be a new proliferation of libraries to learn - but maybe that’s a good thing. There is a lot of cruft in the Java standard libraries that could be cleaned up. There are some sticky details, like how to handle the API’s that were designed for erased generics, but those problems could definitely be solved. It would also solve some other problems, such as making it possible for Scala to pattern match on type parameters and solving part of the problem with abstracting over primitive types. And it’s absolutely possible to do. It would probably make the Java language into a better language.

But is it the only solution? At this point, making this kind of change would complicate the API’s to a large degree. The reflection libraries would have to be completely redesigned (but still kept around for backwards compatibility). The most probable result would be a parallel hierarchy of classes and interfaces, just like in the CLR.

Refified generics are generally being proposed in discussions about three different things. First, performance, second, making it easier for some features in Scala and other statically typed languages on the JVM, and thirdly to handle primitives and primitive arrays a bit better. Of these, the first one is the least common, and the least interesting by far. JVM performance is already nothing short of amazing. The second point I’ll come back to in the last section. The third point is the most interesting, since there are other solutions here, including unify primitives with objects inside the JVM, by creating value types. This would solve many other problems for language implementors on the JVM, and enable lots of interesting features.

The short stick

I believe in a multi language future, and I believe that the JVM will be a core part of that future. Interoperability is just too expensive over OS boundaries - you want to be on the same platform if possible. But for the JVM to be a good environment for more than one language, it’s really important that decisions are made with that in mind. The last few years of fantastic progress from languages like Rhino, Jython, JRuby, Groovy, Scala, Fantom and Clojure have shown that it’s not only possible, but benificial for everyone involved to focus on JVM languages. JSR 223, 292 and several others also means the JVM is more and more being viewed as a platform. This is good.

Generics is a complicated language feature. It becomes even more complicated when added to an existing language that already has subtyping. These two features don’t play very well together in the general case, and great care has to be taken when adding them to a language. Adding them to a virtual machine is simple if that machine only has to serve one language - and that language uses the same generics. But generics isn’t done. It isn’t completely understood how to handle correctly and new breakthroughs are happening (Scala is a good example of this). At this point, generics can’t be considered “done right”. There isn’t only one type of generics - they vary in implementation strategies, feature and corner cases.

What this all means is that if you want to add reified generics to the JVM, you should be very certain that that implementation can encompass both all static languages that want to do innovation in their own version of generics, and all dynamic languages that want to create a good implementation and a nice interfacing facility with Java libraries. Because if you add reified generics that doesn’t fulfill these criteria, you will stifle innovation and make it that much harder to use the JVM as a multi language VM.

I’m increasingly coming to the conclusion that multi language VM’s benefit from being as dynamic as possible. Runtime properties can be extracted to get performance, while static properties can be used to prove interesting things about the static pieces of the language.

Just let generics be a compile time feature. If you don’t there are two alternatives - you are an egoist that only care about the needs of your own language, or you think you have a generic type system that can express all other generic type systems. I know which one I think is more likely.

 

From http://olabini.com/blog/2010/07/questioning-the-reality-of-generics/

Published at DZone with permission of Ola Bini, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Oliver Weiler replied on Tue, 2010/07/27 - 4:58am

Well generics are broken... I'd say I have a solid understanding of generics, still I once in a while I run in cases where they bite me in the ass. The problem with generics: They were done wrong in the first place, and as Java emphasizes on backward compatibility (which is a good thing), there is no real way to fix them. Same with primitives, Java IO, and other things. But that's okay, if you want those features, switch to another JVM language.

Arek Stryjski replied on Tue, 2010/07/27 - 5:06am

I got lost... will reified generics be good for other JVM languages like Scala or not?

Ronald Miura replied on Tue, 2010/07/27 - 7:13am

I completely agree.

What most people complain about isn't type erasure, but verbosity, which is solved by type inference, not reified generics.

People also usually say it's fine to throw backward compatibility away 'for the sake of evolution', but would probably cry like a little girl when all their programs stopped compiling. .Net could do it because it didn't have a 10-year history and a 10 million developer base.

And about just adding a parallel hierarchy of APIs, one must note that the .Net runtime is a 48Mb download. People already complain about JRE's 15Mb. 'Everybody' wants to take CORBA out of the standard API, but 'everybody' is happy about double the size of the API just to add reified generics?

Your opinion, as one with deep experience with both creating programming languages and the internals of the JVM, has most weight in a discussion like this. Thanks.

Rogerio Liesenfeld replied on Tue, 2010/07/27 - 7:47am

The article says "in Java generics doesn’t exist at runtime" *and* "some metadata is inserted into the class file format". Well, both things can't be true... Really, the reality of generics in Java is not nearly as bad as pictured here.

There is an extensive Reflection API to query generic type information that was added in Java 5, which lets you do *a lot* at runtime. Several third-party APIs use this, such as Guice 2.0 or mocking APIs such as JMockit and Unitils Mock.

I would be more interested in knowing which use cases people are interested in, which cannot be done today with Java generics, but can be done with, say, C#. Also, can anybody point me to an existing C# API which does take advantage of generic type information at runtime, beyond what can be done today with Java?

John J. Franey replied on Tue, 2010/07/27 - 8:29am

Isn't it possible to discuss, design and implement reification in the jvm (to support languages and compilers that may require it) without demanding or discussing a change to any particular existing language or compiler (like java's 1.6 and prior) or any particular language's API library? I'm not an expert in such matters, but the thought occurred: maybe some decoupling is in order.

replied on Tue, 2010/07/27 - 10:03am

imho, they crippled generics with erasure for the sake of compatibility. They could have added a jvm switch and allow the developer the choice. We all write a complete new version at some point in time. Now, we are bogged down with erasure. They aslo need to consider newbies learning/considering the language. To force the rules and syntax of generics on you is a sin! I have been coding with generics for years and it still makes me think too much:) It's one thing to use a collection with generics, to design complex [abstract] classes with generics is another. It is needlessly complicated.

Slim Ouertani replied on Tue, 2010/07/27 - 2:36pm

I agree with you, I have these problems with jvm and erasure type.
Scala is designed for CLR and JVM and this problem may make implementation very hard.

since 2.8 scala has introduced manifest as a turn around to erasure in order to get matching on generics.

 

My question is, should java 7 break compatibility with non generic class ( as scala has done with 2.8) or as you have already proposed?

 thinks.

Eric Giese replied on Tue, 2010/08/03 - 7:08am

I used to say that erasure is a bad thing, but I've come to the conclusion it isn't.
Generics are there to help the programmer to write maintainable code and to map certain specialities in the type system. The java generics fullfill this aim.

The reason why people tend to bash the erasure feature are mostly the following:
- No class information --> this can be circumenvented by passing class objects.
- Bad performance with primitives --> this is indeed the main problem of the generics, but its rarely a real problem. Scala has even solved this by using annotations, and java could as well.
- Arrays are reified --> Indeed a problem, at least in java. Arrays are simply incompatible with generics, so you need to pass class objects for this usecase. Its a pity, but again, mostly a java problem. Scala has solved this partially.
- Method overload does not work with different generic types --> I am inclined to say that this one sucks indeed.

Advantages of erasure: the fact that they are only "sugar" enable some things which are not possible on .NET, esp. casts which would break in static type system but work very well on runtime. I've used these "hacks" myself.
Also, covariance and contravariance is MUCH easier to implement if you do not care for it at runtime, yet another point which counts for erasure. I guess there are usecases which WILL fail in a .NET enviroment regarding this issue... ;-)

Rehman Khan replied on Sat, 2012/02/25 - 3:51am

Nice post! Just thinking if we could have a pluggable/on-demand reification scheme for cases where we need more performance or better interoperability with statically typed languages like Scala.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.