In a previous article I stated that the reason the deserialisation of objects was faster was due to using recycled objects. This is potentially surprising for two reasons, 1) the belief that creating objects is so fast these days, it doesn't matter or is just as fast as recycling yourself, 2) None of the serialisation libraries use recycling by default.
This article explores deserialisation with and without recycling objects. How it not only is slower to create objects, but it slows down the rest of your program by pushing data out of your CPU caches.
While this talks about deserialisaton, the same applies to parsing text
or reading binary files, as the actions being performed are the same.
The testIn this test, I deserialise 1000 Price objects, but also time how long it takes to copy a block of data. The copy represents work which the application might have to perform after deserialising.
The test is timed one million times and those results sorted. The X-Axis shows the percentile timing. e.g. the 90% values is the 90% worst value. (or 10% of values are higher)
As you can see, the deserialisation take longer if it has to create
objects as it goes, however sometimes it takes much much long. This is
perhaps not so surprising as creating objects means doing more work and
possibly being delayed by a GC. However, it is the increase in the time
to copy a block of data which is surprising. This demonstrates that
not only is the deserialisation slower, but any work which needs the
data cache is also slower as a result. (Which is just about anything you
might do in a real application)
Performances tests rarely show you the impact on the rest of your application.
In more detailExamining the higher percentile (longest times) you can see that the performance consistently bad if the deserialisation has to wait for the GC.
And the performance of the copy increases significantly in the worst case.
The codeRecycling example code