Hazelcast Serialization Performance

Externalizer4J recently published a blog post entitled, “Hazelcast Serialization Performance“. In the post you’ll see how Externalizer4J can create DataSerializable implementations with the same performance as hand coded logic.

A few years ago Peter Veentjer wrote an interesting post which compares the performance of Hazelcast’s serialization APIs with that of plain Serializable.

This post shows the results obtained with Externalizer4J’s support for Hazelcast’s DataSerializable and IdentifiedDataSerializable APIs. You’ll see how Externalizer4J can create DataSerializable implementations with the same performance as hand coded logic for you! Obviously I used the JMH benchmark created by Peter and posted on github instead of inventing my own. There is one small difference, since we are in 2016 I used Hazelcast 3.6-RC1 instead of version 3.2 from back in 2014.

Original Results

Let’s start by running the benchmarks again. Since my machine is not the same as Peter’s comparing the numbers would be meaningless. Instead the results shown in the chart below are all relative to Serializable results.

The results are in line with those presented in Peter’s post obviously. The specialized DataSerializable and IdentifiedDataSerializable implementations of the Order and OrderLine classes (shown below) clearly outperform the Serializable class. Deserialization is +80% faster while serialization is +30% faster compared to the simple Serializable version.

Of course these are the result on my machine. As they say YMMV. If you want to give the benchmark a spin take a look at the end of this post.

At this point is worth mentioning that there is no noticeable change in performance between Hazelcast 3.2 and Hazelcast 3.6-RC1. I ran the benchmark with both version and found no statistically significant difference.

‘Look ma no hand.. coded methods!’

The benefits of using the DataSerializable and IdentifiedDataSerializable are clear. But this come at a cost too. In this case the cost is coding and maintenance. The DataSerializable API requires that you to implement two methods to serialize and deserialize each class. That’s where Externalizer4J comes in! In 2016 we added support for Hazelcast to Externalizer4J. This means that Serializable classes can be converted into DataSerializable and IdentifiedDataSerializable classes automatically by Externalizer4J.

Externalizer4J optimization process is simple. After compilation by the java compiler Externalizer4J analyses the byte code (not the source code) and generates the necessary methods and interface for you. Straight to byte code, no intermediate source code generation. You refactor your class? No problem, Externalizer4J generates the methods anew each time. You no longer need to write or maintain any of this serialization related code at all while enjoying all the performance benefits.

Generating DataSerializable implementations

Externalizer4J, as its name implies, was originally designed to convert Serializable classes into Externalizable classes. In 2016 its functionality has been extended to include support for Hazelcast. Externalizer4J can now generate the readData() and writeData() methods and uses Hazelcast’s serialization API to maximize performance automatically.

To enable the generation of DataSerializable class from plain Serializable class simply add the following line to you externalizer4j.properties file. The find out more on how to configure Externalizer4J look at the resource below.

# Optimize for Hazelcast DataSerialazble API instead of Externalizable (default)
    optimizer=hazelcast.dataserializable

The chart below compares the performance of the handwritten methods with 3 different levels of optimization by Externalizer4J. These results were all obtained starting from the two Serializable classes shown above.

Automatically generated Dataserializable classes with different optimization options

A comparison of the performance obtained using different optimizations generated automatically by Externalizer4J. Using information about the nullabilty and uniqueness of object references performance can be improved even further.

If you have only 2 minutes to spare: the performance of the automatically generated version is as good as that of handwritten code. If you have more than 2 minutes please keep reading.

The left most results labelled DataSerializable (DS) serves as the reference. This is the optimized serialization written by hand. The results of the basic conversion, labelled SerializableAutoBasic, are almost on par with Peter’s code. Deserialization is 81% faster than Serializable versus 85% for reference DataSerializable implementation, for serialization that’s 25% versus 32% (for DS). This first result were obtained without one single change to the Order and OrderLine classes!

The results labelled SerializableAutoMedium and SerializableAutoOpt improve upon the performance of SerializableAutoBasic. In fact the rightmost results are exactly the same at the handwritten code! What is the difference then? By default Externalizer4J generates the safest possible serialization and deserialization logic. The means that logic takes into account the fact that non-primitive fields may be null. The result is logic to behaves just like the JDK’s serialization algorithm.

To guide the optimization process you have provide additional information regarding:

  1. uniqueness of an object reference
  2. nullability of an object reference

Using annotations you can tell Externalizer4J which object references are unique and which reference will never be null. This information allows Externalizer4J to generate the fastest and yet safest serialization logic for your class. And example is of these annotations is shown in the screenshot in the next section.

IdentifiedDataSerializable too

Externalizer4J can not only convert a good old Serializable class into an DataSerializable one, IdentifiedDataSerializable generation is supported too.

The IdentifiedDataSerializable API expects the getId() and getFactoryId() methods to be present. These methods will not be generated by Externalizer4J, you have to write them yourself. But if the Serializable implements these two methods then Externalizer4J can generate an IdentifiedDataSerializable instead a DataSerializable class for you.

An example of such Serializable Order class which implements getId() and getFactoryId() is shown below. This example also illustrates the use of annotations to optimize serialization even further. In this case the @Optimize annotation tells Externalizer4J that more advanced optimizations can be used. The type of optimizations are defined through the externalizer4j.properties file. See the documentation for more details.

IMPORTANT: from the import statement you can see that the @Optimize annotation is NOT part of Externalizer4J! @Optimize is a custom annotation which is part of our benchmark project. We don’t want to introduce dependencies on Externalizer4J APIs in existing project. Through the externalizer4j.properties you can tell Externalizer4J which annotation it should use.

serializable-class-with-getid-and-getfactorid-methods

The result of the conversion from this Serializable class are shown below. Again the automatically generated logic is performing as well as the handwritten one.

Autogenerated IdentifiedDatasSrializable class which performs as well as one written by hand. Maximizing hazelcast serialization performance just got easier

The performance of the generated class matches that of the handwritten one. Because the Serializable class already implemented the getId() and getFactoryId() methods need by the IdentifiedDataSerializable implementation Externalizer4J generated the latter instead of a DataSerializable class.

Don’t believe me, try it!

Please please don’t believe me! Yes, that’s right. Be skeptical and put it to the test. Download the source for these benchmarks from github and try it for yourself.

The original benchmark is build using Apache Maven. The pom.xml has been modified to use Externalizer4J’s maven plugin. A externalizer4j.properties file has been added to the resources of the module called common.

  • NO strings attached
  • NO trial
  • NO email, no phone calls, etc…

Externalizer4J integrates with different build tools. Click the Links below for the tool of your choice.

IntelliJ Users – Maven Users – Ant Users

Conclusion

If you are a happy Hazelcast user and would like the optimize you code take a look their serialization APIs. And when you have done that give Externalizer4J a try and see what it can do for your code. It is serialization optimization done for you…

Please comment on this page or reach out on twitter or g+

Resources

Peter Veentjer’s original post Configuration basics for Externalizer4J Hazelcast DataSerializable API