Mr. Lott has been involved in over 70 software development projects in a career that spans 30 years. He has worked in the capacity of internet strategist, software architect, project leader, DBA, programmer. Since 1993 he has been focused on data warehousing and the associated e-business architectures that make the right data available to the right people to support their business decision-making. Steven is a DZone MVB and is not an employee of DZone and has posted 138 posts at DZone. You can read more from them at their website. View Full User Profile

Java PHP Python -- Which is "Faster In General"?

01.06.2011
| 21405 views |
  • submit to reddit
Sigh. What a difficult question. There are numerous incarnations on StackOverflow. All nearly unanswerable. The worst part is questions where they add the "in general" qualifier. Which is "faster in general" is essentially impossible to answer. And yet, the question persists.

There are three rules for figuring out which is faster.

And there are three significant problems that make these rules inescapable.


Rule One. Languages don't have speeds. Implementations have speeds.

Info on benchmarking. The idea of a benchmark is to have a single, standard suite of source code, which can be used to compare compilers, run-time libraries or hardware.

Having a standard suite of source is essential because it provides a basis for comparison. A single benchmark source is the fixed reference. We don't compare the top of the Empire State Building with the top of the Stratosphere in Las Vegas without specifying whether we care about height above the ground or height above sea level. There has to be some fixed point of comparison, some common attribute, or the measurements devolve into mere numerosity.

Once we have a basis for comparison (one single body of source code), the other attributes are degrees of freedom; the measurements we make will include the other attributes. This will allow a rational statement of what the experimental results where. We can then compare these various free attributes against each other. For details look at something like the Java Micro Benchmark.

Rule Two. Statistics Aren't a Panacea.

The reason there's no "in general" comparison among languages is because there are too many degrees of freedom to make any kind of rational comparison. We can make irrational comparisons, but that's the trap of numerosity -- throwing numbers around. 1250 vs. 1149, 1300 vs. 3177. What do they mean? Height above ground? Height above sea level? What's being measured?

There's a huge problem with claiming that statistics will yield an answer to which language implementation is faster "in general". We need some population that we can sample and measure. Problem 1: What the population are we measuring? It can't be "programs": we can't compare grep against Apache httpd. Those two programs have almost no common features.

What makes the population of programs difficult to define is the language differences. If we're trying to compare PHP, Python and Java, we need to find a program which somehow -- magically -- is common across all three languages.

The Basis For Comparison

Finding common programs degenerates into Problem 2: what programs could be comparable? For example, we have the Tomcat application, written in Java. We wouldn't want to write Tomcat in Python (since Tomcat is a Java Servlet container). We could probably write something Tomcat-like in PHP, but why try? So we can't just grab programs randomly.

At this point, we devolve to subjectivity. We need to find some kind of problem domain in which these languages overlap. This gets icky. Clearly, big servers aren't a good problem domain. Almost as clearly, command-line applications aren't the best idea. PHP does run from the command-line, but it's always contrived-looking because it doesn't exploit PHP's strengths. So we wind up looking at web applications because that's where PHP excels.

Web applications? Subjective? Correct. PHP is a language plus a web application framework bundled together. Java and Python -- by themselves -- are just languages and require a framework. Which Java (and Python) framework is identical to PHP's framework? Spring, Struts, Django, Pylons? None of these reflects a code base that's even remotely similar. Maybe Java JSP is similar enough to PHP. For Python there are several implementations. Sigh.

Crappy Program Problem

We can't easily compare programs because we're really comparing implementations of an algorithm. This leads to Problem 3: we picked a poor algorithm or did a lousy job of implementing it in the target language.

In order to be "comparable", we don't want to exploit highly-optimized or unique features of a language. So we tried to be generic. This is fraught with risks.

For example, Java and PHP don't have list comprehensions. Do we forbid them from our Python behchmark? In Python, everything is a reference, values cannot be copied. If we pick an algorithm implementation which depends on copying objects, Java may appear to excel. If we pick an algorithm implementation which depends on sharing references, Python may appear to excel.

Somehow we have to get past language differences and programmer mistakes. What to do?

Synthetic Benchmarks

Since we can't easily find comparable programs -- as whole programs -- we're left with the need to create some kind of benchmark based on language primitives. Statements or expressions or something. We can try to follow the Whetstone/Dhrystone approach of analyzing a bunch of programs to find the primitive constructs and their relative frequency.
Here's the plan. We'll take 100 PHP programs, 100 Java programs and 100 Python programs and analyze them to find the relative frequency of different kinds of statements. What then?

The goal is to create one source that reflects the statements actually used in the 300 programs we analyzed. In three different languages. Hmmm... Okay. We'll need to create a magical mapping among the statement constructs in the three languages. Well, that's hard. The languages aren't terribly comparable. A Python expression using a List Comprehension is the same thing a multi-statement Java loop. Rats.

The languages aren't very comparable at the statement level at all. And if we force them to be comparable, we're not comparing real programs, but an artificial mapping.

Virtual Machine Benchmarks

Since we can't compare the languages at the program level or the statement level, what's left? Clearly, the underlying interpreter is what we care about.

We're really comparing the Java Virtual Machine, the PHP interpreter and the Python interpreter. That's what we really care about.

And life is simple because we can compare Java, The Project Zero PHP Interpreter based on the JVM and Jython. We can look at "compiled" PHP, Java Class Files and Python .PYC files to find the VM primitives used by each language and then -- what? Compare the run-time of the various VM primitives? No, that's silly, since the run-times are all JVM run-times.


What We're Left With


The very best we can can do is to compare the statistical distribution of the VM instructions created by Java, PHP or Jython compilers. We could note that maybe PHP or Python uses too many "slow" VM instructions, where Java used more "fast" VM instructions. That would be an "in general" comparison. Right?

See? You can measure anything.

In this case, the compiler itself is a degree of freedom. Sadly, we're not comparing languages "in general". We're comparing the bytecodes created by various compilers. We're actually comparing compilers and compiler optimizations of the bytecode. Sigh.

That's not what we were hoping for. We were hoping for some kind of "in general" comparison of the language, not the JVM compiler.

Java has pretty sophisticated optimization. Python, however, eschews optimization. PHP has it's own weird issues. See this paper from Rob Nicholson from the CodeZero project on how to implement PHP in the JVM. PHP doesn't fit the JVM as well as Python does. So there's a weird bias.

Rule Three. Benchmarking Is Hard.

There is no "in general" comparison of programming languages. All that we can do is benchmark something specific.

It works like this.


  1. Stop quibbling about language performance "in general".
  2. Find something specific and concrete we plan to implement.
  3. Actually write the performance-critical piece in Java, PHP, Python, Ruby, whatever. Yes. Build it several times. Really. We don't want to use "language-independent" or "common" features. We want to optimize ruthlessly -- use the language the way it was meant to be used. -- use the various unique-to-the language features correctly and completely.
  4. Actually run the performance-critical piece to get actual timings.
  5. Since run-time libraries and hardware are degrees of freedom, we have to use multiple run-time libraries, multiple compiler optimization settings and multiple hardware configurations to make a proper decision on which language to use for our specific problem.

Now we know something about our specific problem domain and the available languages. That's the best we can do.

We can only compare a specific problem, with a specific algorithm. That's the basis for all benchmark comparisons. Since each implementation was well-done and properly optimized, the degree of freedom is the language -- and the run-time implementation of that language -- and the selected OS and hardware.
References
Published at DZone with permission of Steven Lott, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Andy Leung replied on Thu, 2011/01/06 - 9:40am

Thanks for the article. I agree with problem #1 and #3 but not #2. IMHO, Java is the same as PHP that requires no web frameworks. You can create your own server listener to network request (which is what we call it web request if request is from HTTP), or even just servlet itself is not quite a framework. If servlet is a framework, so does those libs that help PHP to translate network packets into web friendly forms. So I don't really see the difference.

:)

Jilles Van Gurp replied on Thu, 2011/01/06 - 2:43pm

Sort of an open door that hardly needs kicking in but, rule 0: there is no such thing as in general.

That being said, you can learn a lot from how these languages are used. Typically php is used on web servers for stateless web applications. That means three things: single threaded code that is generally constrained by IO throughput and latency and generally not a whole lot of memory to manage. Python is also used for this and in addition it is used a lot for mostly single threaded work related to e.g. batch job or os level scripting (e.g. debian uses it a lot). Nothing wrong with that but single threaded and modest memory requirements stand out as characteristic. Neither language is commonly used for stuff that involves allocating and manipulating lots of memory or doing multi threaded work. That doesn't mean it's not possible, it's just not very common to do this kind of work with those languages.

Java on the other hand is most commonly used for server side stuff where it is very popular for implementing middleware components, including message brokers, databases, application and web servers, as well as stateless web applications. It is not uncommon for java applications to use gigabytes of RAM and have hundreds or even thousands of active threads. In addition it is also used for implementing end user applications, even though that is getting less common. And finally it is the language of choice for developing applications on Android (arguably one of the fastest growing platforms for mobile development currently). So there's apparently nothing stopping Java applications from running really well on anything from a modest mobile phone up to very large clustered server environments.

So, to put it mildly, single threaded and modest memory requirements are not the first thing that come to mind when I think about Java. And contrary to the popular belief, Java is actually quite good at doing multi threaded stuff and pushing around loads of objects in memory, which is probably why it is used for that kind of work a lot.

Java also happens to have an optimizing run-time compiler that generally beats the shit out of what passes for just in time compilation in either the python or php world. Additionally, the Java garbage collector is generally considered to be among the best possible implementations as well. So, if you have some code that doesn't use a lot of memory and boils down to doing simple algorithmic stuff, the Java version is likely to come out on top. Alternatively, you could use the java version of jython to come close to this level of performance and I believe there is a java based php interpreter as well that actually has quit nice performance and scalability characteristics (which is the primary reason it exists to begin with). The dynamic nature of php and python of course means there is a bit of overhead for certain things that are hard to optimize away, even for the JVM. So, in that sense, those languages have some disadvantage that is inherent to their dynamic nature. 

But going back to rule 0, this doesn't mean Java is the best language for everything and I'd definately love to use python again for some nice Django application. But performance is not a reason for that at all.

Artur Biesiadowski replied on Fri, 2011/01/07 - 5:10am

I do not reall agree with #1. While true on obvious level (language cannot be compared without implementation), language is very big factor in how fast implementation CAN be - or alternatively, how hard it is to write implementation which performs good enough.

When people ask "Is java faster than python?", just read it as "Is best implementation of java faster than best implementation of python?". You design a benchmark and let people from both camps choose their favorite/best implementation to compete.

 

My personal pet peeve with 'benchmark wars' is that as soon as you start proving to python/groovy/whatever fans that their language/implementation is 100 times slower, they immediately jump with "Oh, for performance critical code, just call into C/java subroutine, but you can still keep 99% of the rest of code in python/groovy, gaining gazzilion times increase in productivity".

Nicolas Bousquet replied on Fri, 2011/01/07 - 10:36am

I agree that you can't compare directly with accurate number JAVA to PHP to python.

But what you can do is do some sort of approximationsand have some sort of "mental model" so that you can choose the best one for your needs :

PHP emphatise using a relationnal Database and let it  handle most of the heavy work using SQL request as needed (with joins, subqueries and all). PHP is also used for "simple" app where the language (or even the database) is not required to do really complex task like consolidate billions of row or image processing. For the typical website, PHP is just fine and is really fast. Anyway on simple website, HTTP protocol, the network and the database are the bottleneck. It is not the code that process HTTP request and perform text templating that is slow in a web application.

Nearly everybody will admit that JAVA is faster than PHP. Java can do exactly what PHP do (yes with JSP you are mostly done : you can even include SQL request in it if you like). But many time through, JAVA is not used for that but for big, fat server applications on big companies.

JAVA architects tends to like adding complexs framework under their control and tend to consider the database at just a data container. Theses framework beiing very generic, are complex, difficult to master and the huge software task used restrict your productivity and the resulting performance of the application. You can be very fast, but it is very difficult to achieve. With some tasks, there is so much layers that overall the applicition simple feel "slow". It is simply too difficult to find good enough developpers and enough time to optimize.

Because to do the simplest thing this way you need a thousand lines of code and configuration files, you add another layer of complexity with code generation. You use MDA to generate code that you would not even need if you were using a simpler software stack. There is so mutch to do, that the simplest CRUD application can consume man years to be made.

I do not say that you can't do simple thing in a simple way in JAVA. Of course you can. You can be productive, but the fact is... everything is done in the java community to use the most complex thing to do the work just because in one case out of 3 thousand, the code you added could be necessary in the future.

Python is more a scripting language. Sorry to say that but it is indeed slow. Java start to be considered as fast, and as having similar performance in speed with C++. This last one being considered as better by many if performance is critical.

But python IS slow.Python don't play on the same level. It is a higher level language than PHP or JAVA. It is dynamic, and a few line of python code can express complex things in an elegant way. You will no use it for heavy task like realtime 3D rendering or scientific simulations. But it is really great for hacking, dealing with high level logic and handle everything where the performance does not matter much.

I'am pretty sure everybody out there already know that. We all have heard of the simplicity of PHP and how you can hack even big site with heavy load web load without a hich in PHP. We all know the history of some company that tried to use JAVA to do the same thing and failed using too much of it's time to overengineer everything rather than doing the real work.

And we all know that python code is beautiful to look at and elegant but slow.

Of course people will argue to maker their prefered stack shine. Reading my post you would think that I am a PHP developper wanting to bash JAVA and Python. No, I'am a JAVA developper. That's why I know all the internal of such complex frameworks we like to use in JAVA.

Carla Brian replied on Sat, 2012/06/02 - 11:41am

I am not yet familiar with this. Good thing I saw this post, Now i know which of this is efficient to use. - Incredible Discoveries

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.