Controlled Partitioning

One of the cool features of the soon to be released Hazelcast 3.1 version, is the ability to control the partition of certain DistributedObjects like the IAtomicLong or ILock. This makes it easy to collocate data and this will improve performance and scalability since there will be a stronger locality of reference. Let’s start with a basic example. Imagine there are 2 counters:

IAtomicLong counter1 = hz.getAtomicLong("counter1");
IAtomicLong counter2 = hz.getAtomicLong("counter2");

If these 2 counters are always used in some operation, it is very likely that there will be unwanted remoting. The cause is that Hazelcast uses the name not only for identification, but also uses it to determine the partition. So these 2 counters will probably end up in different partitions. To resolve this problem, Hazelcast 3.1 makes it possible to add the partition-key as part of the name:

IAtomicLong counter1 = hz.getAtomicLong("counter1@hazelcast");
IAtomicLong counter2 = hz.getAtomicLong("counter2@hazelcast");

By adding a ‘@partition-key’, Hazelcast knows that it should determine the partition based on the ‘partition-key’ section. If you have configured a DistributedObject in the Hazelcast configuration, Hazelcast uses the base-name of the object, so without the @partition-key section, to find the configuration. So you don’t need to change your configuration to make use of @partition-key. It is important to understand that the name of a DistributedObject includes the @partition-key section:

IAtomicLong counter1 = hz.getAtomicLong("counter1@hazelcast");
IAtomicLong counter2 = hz.getAtomicLong("counter1");

Counter1 is a different counter than counter2 even though their base-names are the same. In the previous example the ‘hazelcast’ partition-key is used. But you are completely free to use any value you see fit, although you need to make sure that the partition-keys are equally spread among the partitions. If you don’t care about the actual value, you can use the ‘randomPartitionKey’ method to generate a partition-key for you:

String partitionKey = hz.getPartitionService().randomPartitionKey();

Other good candidates are a random value, e.g. using the Random class. Or use an UUID, although it will cause more overhead due to its length. In some cases you can’t create all DistributedObjects up front. Sometimes you only have a reference to a DistributedObject, but you want to create a new DistributedObject in the same partition. This is possible using the ‘getPartitionKey’ method:

String partitionKey = counter1.getPartitionKey();
IAtomicLong counter3 = hz.getAtomicLong("counter3@"+partitionKey);

As you can see, counter3 will use the same partitionKey as counter1 and therefor it will end up in the same partition. In practice you want to send the function to be executed to the partition where the data is stored. Otherwise your data is in the same partition, but you will still be doing remote calls to access it. To send the function to the right partition, you might want to have a look at the executeOnKeyOwner method of the IExecutorService:

executor.executeOnKeyOwner(someTask,"hazelcast");

In this example the ‘someTask’ will be executed on the partition for partition-key ‘hazelcast’. Also the IMap can be used in combination with controlled partitioning. Normally the key is used to determine the actual partition, but in some cases this is undesirable because you want different keys to be stored in the same partition. This can be done by letting the key implementing the PartitionAware interface, example:

public class PartitionAwareKey 
        implements PartitionAware, DataSerializable {
    public final String key;
    public final String partitionKey;

    public PartitionAwareKey(String key, String partitionKey) {
        this.key = key;
        this.partitionKey = partitionKey;
    }

    @Override
    public String getPartitionKey() {
        return partitionKey;
    }
    ...
}

Now data can be inserted like this:

IMap map = hz.getMap("map");
map.put(new PartitionAwareKey("a","hazelcast"),"foo");
map.put(new PartitionAwareKey("b","hazelcast"),"foo");

Now both the map entries will be placed in the partition of partition-key ‘hazelcast’. We are thinking about making the partitioning schema for the map more flexible using a injectable partitioning strategy so can say:

map.put("a@hazelcast","foo");

More about this in the near future. With the new @partition-key syntax, you now have full control on the partition where a DistributedObject is stored. My experience is that figuring out a correct partitioning schema is one of the most important choices that needs to be made if you want a high performance and scalable system. So make sure you deal with finding a correct partitioning schema as soon as possible.

 

Further Reading

Partitioning vs. Sharding