In-Memory Format

When a value is stored in a normal HashMap, the actual instance is stored. This works fine because the HashMap is not a distributed data-structure and therefor everything can be done in the same memory space. The Hazelcast IMap is distributed data-structure, so storing a value created one one machine and reading it on a different machine is a bit more complicated compared to an ordinary HashMap. In Hazelcast there is a lot of control on the format that is used to store the value using the following 3 in-memory-formats: 1. BINARY: the value is stored in serialized form. 2. OBJECT: the value is not stored in a serialized form, but stored as a normal object instance. 3. CACHED: this is a combination BINARY and OBJECT: the value always is stored in serialized form, but when it is needed, it cached in object form, so it doesn’t need to be deserialized when the object form is needed again. The in-memory-format can be set like this:

<map name="employees">
    <in-memory-format>OBJECT</in-memory-format>   
</map>

Which one? The question of course is which in-memory-format you want to use:

  1. BINARY: is more efficient compared to OBJECT if normal map operations like get/put are in the majority. This sounds counter intuitive, but the BINARY in-memory-format exactly matches the format required for reading/writing since Hazelcast will always serialize when writing and deserialize when reading. With the OBJECT format an additional step is required; when a value is written, first it is serialized (just like BINARY) but then it needs to be deserialized to be stored as object. When the value is read, it needs to be serialized before it can be deserialized (the actual object instance is never returned)
  2. OBJECT: if the majority of your operations are queries/entry-processors. These operations are allowed to directly access the stored object and bypass any serialization/deserialization unlike the BINARY format.
  3. CACHED: combines the advantages of the BINARY and OBJECT format, but it uses more memory since the value is potentially stored in 2 formats instead of 1.

Important There are some important design considerations I want to explicitly mention:

  1. When the OBJECT/CACHED in-memory-format is used and a value is written, a copy of the object is stored and not the actual instance. So a change made on a value after it is stored, will not reflect on the actual stored value.
  2. When the OBJECT/CACHED in-memory-format is used and a value is read, a copy of the stored object is returned and not the actual instance. So a change made on a value after is is read, will not reflect on the actual stored value. So the following idiom is broken:
Employee employee = employeeMap.get("123");
employee.fire();

If you want to make this change visible in the employeeMap, the value needs to be written back:

Employee employee = employeeMap.get("123");
    

employee.fire(); employeeMap.put("123",employee)

Of course you need to take care of a lost update using e.g. a lock or a non-bocking IMap.replace(key,oldValue,newValue)

Hashcode and equals

The in-memory-format also is closely tied to the equals/hashcode. For the key, the equals/hashcode is determined based on the binary format. For the value the equals is based on the in-memory-format:

  1. BINARY: the equals/hashcode is based on serialized form. So equal objects need to have exactly the same binary representation. You need to be really careful when relying on normal serialization, since the binary format of equal objects can have different binary representations.
  2. OBJECT/CACHED: the equals/hashcode is based on the equals/hashcode of the object itself. So changing the in-memory-format should not be done without considering the consequences for the equals/hashcode implementation.