Hazelcast Discovery SPI

Your Perception of the Network

Service Discovery or Network Exploration

Eureka, Zookeeper, Consul and others, Service Discovery is an integral part of almost all of the new elastically scaling software systems. Instances or nodes of the application space can come, go or get restarted at any time and the system has to automatically adopt and discover new members.

Service discovery often works in 2 or 3 easy steps. Either the runtime environment (e.g. Microsoft Azure, Amazon EC2 or Google Compute Engine) automatically register the new virtual machine (VM) instance at startup time and push metadata or the application registers itself while starting up. While the application is running it might push a keep-alive notification to the service discovery system from time to time and when being shut down it’ll remove itself. The keep-alive is used to remove an instance for any kind of failure that prevents the application from gracefully removing itself.

To find all instances of an application the application can now ask the service discovery using either a metadata or an application name and maybe scope. Sometimes the service discovery also has additional information about the datacenter location, racks or availability zones (failure zones).

The Magic of Hazelcast

Ever since I started using Hazelcast, it offered some amazing black magic. The out of the box experience definitely holds my 5 minute rule, which is “either you get a lib running in less than 5 minutes or it is too complicated”. Not a lot of software can meet that soft requirement. Anyhow by default Hazelcast uses multicast to detect other cluster members inside the same network. As multicast and production operation do not seem to fit well, at least from my experience, Hazelcast always offered a way to configure fixed IP addresses.

To dynamically scale, not all IP addresses had to be named but the cluster network topology is shared whenever the first connection worked. Even though not necessary, it is highly recommended to set up 2 or 3 well known, always reachable endpoints. Why 2 or 3 you ask? It’s simple, if one of the nodes fails – it happens – there are still 1 or 2 other nodes to act as an entry point into the cluster.

So Why Service Discovery?

So aren’t multicast and fixed IP addresses just enough? Doesn’t it solve all the easy and more complex scenarios?

Unfortunately not and the reason is quite simple. While modern runtime environments become more dynamic, discovery of members becomes more complex. In addition most environments offer additional metadata for failure zones and regions. All those information cannot be utilized by either of the two discovery methods.

Furthermore most environments provide their own service discovery solutions (often based on REST API calls) or it is supposed to deploy a preferred one into private clouds. So why not just use already existing information including all the necessary metadata?

Introducing the Hazelcast Discovery SPI

So Hazelcast, with version 3.6, added the (Cloud) Discovery SPI, but what’s the deal with it? The Discovery SPI offers exactly the entry point to plug in any existing or custom built strategy to tell Hazelcast about other cluster member’s addresses and additional metadata.

To achieve that the Discovery SPI consists of 3 basic parts:

  • Plugin Configuration
  • The member discovery itself
  • Additional features for failure zones and in the future auto-scaling or partitioning strategies

To make discovery happen, we need to implement at least the first two parts of the SPI or use one of the already existing implementations from either Hazelcast or the community. A list of available plugins is available on the community plugins page or you can find further, non-listed, plugins on github.

DIY – Do It Yourself

Anyhow, here we want to learn how to implement our own, custom Discovery SPI implementation. As a base for our own discovery mechanism, we’ll be working against an extremely simple but easy to understand custom REST API service discovery. Still the API is close enough to official solutions to make sense as a demonstration.

So what does the REST API look like? As mentioned, it is the simplest approach in terms of service discovery; add, remove, list. The implementation uses JAX-RS, therefore a basic understanding of the JAX-RS specification is useful.

POST /api/services/{scope}: Registers a new node based on a scope (to make multiple application available in parallel), the private IP address (host) and TCP port of the node (port).

@POST
@Path(“{scope}”)
@Produces(MediaType.APPLICATION_JSON)
public Response register(@PathParam("scope") String scope, 
                         @QueryParam("host") String host, 
                         @QueryParam("port") int port)

DELETE /api/services/{scope}: Removes a listed node based on scope, IP address and TCP port, just the same information we used to register the node.

@DELETE
@Path("{scope}")
@Produces(MediaType.APPLICATION_JSON)
public Response delete(@PathParam("scope") String scope, 
                       @QueryParam("host") String host,
                       @QueryParam("port") int port)

GET /api/services/{scope}: Lists all registered nodes for the given scope, returned as a JSON array of endpoints.

@GET
@Path("{scope}")
@Produces(MediaType.APPLICATION_JSON)
public Response services(@PathParam("scope") String scope)

We’re just looking at the method signatures here but it should make the idea clear enough. To follow up on the implementation, please find the full source code of the REST API here.

Apart from the 2 strings (scope and host) and the integer (port) we’ll use a few more model classes on client side to make the implementation easier to understand. Our two model classes represent the actual registered services and endpoints. So let’s have a quick look at some basic pseudo code:

Service:

class Service {
  property String service
  property List<Endpoint> endpoints
}

A Service defines all registered endpoints (cluster members) for a certain application scope, called service.

Endpoint:

class Endpoint {
  property String host
  property int port
}

An Endpoint defines a single registered member based on host and port.

To work against the REST API we use Retrofit, a library developed by Square Inc. to work with Java interfaces when, in reality, there’s a remote REST API call. That said, we need an interface representing our 3 methods from the REST API defined before. Since I’m not a master of creativity, the interface is called SomeRestService 😉

interface SomeRestService {
  @GET("/api/services/{scope}")
  Call<Service> services(@Path("scope") String scope)
   
  @POST("/api/services/{scope}")
  Call<Endpoint> register(@Path("scope") String scope,
                          @Query("host") String host,
                          @Query("port") int port)
    
  @DELETE("/api/services/{scope}")
  Call<Void> unregister(@Path("scope") String scope,
                        @Query("host") String host,
                        @Query("port") int port)
}

Looking at the code above we see, that it copies the parameter sets and HTTP calls. It’s as simple as that, and since we’re done with preparations let’s jump right into the Discovery SPI part.

Configuration with Discovery in Mind

To begin with, we’ll talk about how to configure Discovery SPI implementations. For configuration Hazelcast provides a simple to use but powerful configuration solution, that takes care of the most common needs like data type conversion or value validation.

The basic interfaces or classes to work with is PropertyDefinition which defines the configuration property by name and datatype, the datatype converter interface TypeConverter or the predefined PropertyTypeConverter enum which matches the most common cases.

Furthermore there is the ValueValidator that can be used to validate values after conversion to the expected data-type. This is useful when certain values, that are legal to the data-type in Java, are not legal due to other constraints. The most common example is the TCP port number, which is only valid in a range of 0-65535, whereas the Java integer is larger and also negative.

So let’s go ahead and implement our first validator right away:

ValueValidator URL_VALIDATOR = value -> {
  String url = value.toString();
  try {
    new URL(url);
  } catch (Exception e) {
    throw new ValidationException(e);
  }
};

As we see, the implementation tries to validate that the given string is an actual URL, otherwise the validator throws a ValidationException.

If URL would be assignable to Comparable directly, we could implement the conversion directly using a custom TypeConverter that would fail when the string is not convertible to a URL but since it’s not we keep it as a string and just make sure it is possible to be convertible.

The next step is to define the possible properties itself. It turned out to be very practical to have a XProperties class per plugin with Javadoc, that the generated Javadoc can be used as documentation of all available config properties. In our case it’s obviously called SomeRestServiceProperties.

class SomeRestServiceProperties {
  public static final ValueValidator URL_VALIDATOR = { ... }
    
    
  /**
   * Defines a name for the application scope. All
   * instances registered using the same application
   * scope will automatically be discovered.
   * default: hazelcast-cluster
   */
  public static final PropertyDefinition APPLICATION_SCOPE =
    new SimplePropertyDefinition("application.scope", true, STRING);
    
    
  /**
   * Defines the url of the remote REST API URL for
   * service discovery.
   * default: http://localhost:12345/
   */
  public static final PropertyDefinition DISCOVERY_URL =
    new SimplePropertyDefinition("discovery.url", true, STRING, URL_VALIDATOR);
}

Nice, isn’t it? So let’s get through it bit by bit. We skipped the ValueValidator implementation as it is already shown above. So let’s quickly get over it. The next two source code lines show the actual property definitions, but wait, what’s SimplePropertyDefinition? Easy, it’s just a straightforward implementation of the according interface PropertyDefinition, so we don’t have to implement it right now.

Ok so what’s defined here; at first we’ve created a property to hold the “applications scope” or service name. It is simply defined as optional since we know a meaningful default value (and yeah startup would fail if not set but defined as required :-)) and as a string. Easy, right? So let’s guess what the second property does. Right, a property to hold the URL of our REST API, obviously optional (again, we know a meaningful default), it’s a string and it uses our custom-built ValueValidator implementation.

So far so good. As mentioned before, the Javadoc acts as a perfect documentation of all available properties for our Discovery SPI implementation and people can just look it up. But ok, enough of all the fuss, let’s get it on and implement the actual discovery strategy.

Java and the Industrialization – Factories

For the discovery itself we need to further classes. The first one is as simple as a few lines; just a factory to create our discovery strategy. To do that, we’re providing an implementation of the DiscoveryStrategyFactory interface, but there’s another use for the factory; providing all legal properties – yes the ones we designed above. To do that we create an unmodifiable collection to return later, again full source is available here:

Collection<PropertyDefinition> PROPERTY_DEFINITIONS =
    Collections.unmodifiableCollection(
        Arrays.asList(APPLICATION_SCOPE, DISCOVERY_URL));

As a full implementation necessary to fulfill the SPI requirements we also provide the actual strategy class instance, the properties and an implementation to create our strategy implementation itself.

class SomeRestServiceDiscoveryStrategyFactory
    implements DiscoveryStrategyFactory {
    
  private static final Collection<PropertyDefinition> PROPERTY_DEFINITIONS = ...;
    
  @Override
  public Class<? extends DiscoveryStrategy> getDiscoveryStrategyType() {
    // The actual implementation type of our strategy
    return SomeRestServiceDiscoveryStrategy.class;
  }
    
  @Override
  public DiscoveryStrategy newDiscoveryStrategy(DiscoveryNode localNode,
                                                ILogger logger,
                                                Map<String, Comparable> properties) {
   
    return new SomeRestServiceDiscoveryStrategy(localNode, logger, properties);
  }
    
  @Override
  public Collection<PropertyDefinition> getConfigurationProperties() {
    return PROPERTY_DEFINITIONS;
  }
}

I expect the implementation to be straightforward enough to just skip any further in-depth explanation, please bear with me 🙂

Still, to make it complete and offer the option to configure it declaratively and not just pass pre-created instances to the configuration object, we need one more file. We need to provide the DiscoveryStrategyFactory implementation to our service lookup which is as easy as to put the canonical class-name into a file named com.hazelcast.spi.discovery.DiscoveryStrategyFactory under META-INF/services, just as with the Java standard ServiceLoader API.

The file’s content is simply:

com.hazelcast.example.SomeRestServiceDiscoveryStrategyFactory

Let’s Dis–wait for it–cover

We’re almost there, this is the last step we need to take to have a fully working discovery mechanism. The last and final interface is the DiscoveryStrategy, who would’ve guess, anyhow we’re going to extend its abstract class representation, AbstractDiscoveryStrategy, to be equipped for future changes to the SPI.

Unfortunately before we can jump into the code we need a bit more theory, since we’re going to implement the full lifecycle from startup (registration), over discovery of other members, to removing ourselves when the local member is shutdown.

Looking back at the DiscoveryStrategyFactory’s method newDiscoveryStrategy we’ll see the first parameter is named localNode, you might already wondered why. It represents the local, just starting up, cluster member. Furthermore it includes the public / private addresses and all pre-configured member attributes. Sounds useful for registration? It is!

Besides the addresses we’re also in need of our defined properties, or better said their configured values and in case of non-existence their default values. So let’s begin here:

String applicationScope =
    getOrDefault("discovery.rest", APPLICATION_SCOPE, "hazelcast-cluster");
String baseUrl =
    getOrDefault("discovery.rest", DISCOVERY_URL, "http://localhost:12345/");

What we can see is the method getOrDefault and it looks pretty much self-explaining, however what is the first parameter? From the Javadoc we can see, that it defines a prefix to be prepended when used as a system property (-Ddiscovery.rest.application.scope=...). It simply is used to separate multiple, probably similarly named properties by different plugins.

So far, so clear, let’s move on. Our specific implementation also requires Retrofit, as we remember, it is our connector to the REST API but since it’s not a general requirement I’ll just quickly show the code but without any further explanation, I think it’s pretty self-explanatory anyways.

GsonConverterFactory converterFactory = GsonConverterFactory.create();
Retrofit retrofit = new Retrofit.Builder().baseUrl(baseUrl).addConverterFactory(converterFactory).build();
SomeRestService someRestService = retrofit.create(SomeRestService.class);

That should be enough introduction to actually look at some more parts of our own implementation. As always, the source is available right here. Looking at the actual implementation we can find the configuration retrieval as well as the creation of the REST API client.

Based on the lifecycle the next step would be to register our member when it’s starting up. A couple of lines make this happen:

@Override
public void start() {
  Address address = localNode.getPrivateAddress();
  String host = address.getHost();
  int port = address.getPort();
  execute(() -> someRestService.register(applicationScope, host, port));
}

Alright, simple. We take the private address of our member, the one that cluster members connect to, and extract the host and port. Equipped with those values and our previously retrieved applicationScope we are ready to call the Retrofit client to register ourselves. That’s it. No magic necessary to guess what the counterpart, the removal, looks like.

@Override
public void destroy() {
  Address address = discoveryNode.getPrivateAddress();
  String host = address.getHost();
  int port = address.getPort();
  execute(() -> someRestService.unregister(applicationScope, host, port));
}

Ok and now, finally the most important part, our lookup but as we already know it cannot be complicated. A few more methods, anyhow, are necessary but those are mostly mapping function to map our endpoints to what the Discovery SPI expects to be returned from a discovery call.

@Override
public Iterable<DiscoveryNode> discoverNodes() {
  Service service = execute(() -> someRestService.services(applicationScope));
  List<Endpoint> endpoints = service.getEndpoints();
  return mapEndpoints(endpoints);
}

First of all we call our Retrofit client to return a Service instance based on our configured applicationScope. That instance contains a list of all our registered endpoints. In a standalone setup, this is at least us, as start is called before the other members are requested. Finally we need to map the endpoints into DiscoveryNode instances.

To do this, we simply iterate over all endpoints and map those into a SimpleDiscoveryNode which is a direct implementation of the named DiscoveryNode interface.

private Iterable<DiscoveryNode> mapEndpoints(List<Endpoint> endpoints) {
  List<DiscoveryNode> discoveryNodes = new ArrayList<>();
  for (Endpoint endpoint : endpoints) {
    discoveryNodes.add(new SimpleDiscoveryNode(mapEndpoint(endpoint)));
  }
  return discoveryNodes;
}

private Address mapEndpoint(Endpoint endpoint) {
  try {
    String host = endpoint.getHost();
    int port = endpoint.getPort();
    return new Address(host, port);
  } catch (UnknownHostException e) {
    throw new RuntimeException(e);
  }
}

And that’s it. We’ve created a full-blown Discovery SPI implementation in probably less than 30 mins. Only one more step remains.

Give it a shot

As the final step we obviously need to understand how to configure and run the newly created discovery plugin, but first a quick note. At the moment, since the Discovery SPI is still in beta, it needs to be activated explicitly by either using a system property (-Dhazelcast.discovery.enabled=true) or defining the property inside the Hazelcast (programmatic or declarative) configuration.

Programmatic:

Config config = new XmlConfigBuilder().build();
config.setProperty("hazelcast.discovery.enabled", "true");

Decarative:true

After we activated it, it’s time to configure a test run. Let’s begin with the more common case of declarative configuration of the discovery mechanism.

<hazelcast>
  <properties>
    <property name="hazelcast.discovery.enabled">true</property>
  </properties>
</hazelcast>

After we activated it, it’s time to configure a test run. Let’s begin with the more common case of declarative configuration of the discovery mechanism.

<hazelcast>
  <properties>
    <property name="hazelcast.discovery.enabled">true</property>
  </properties>
  <network>
    <join>
      <!-- deactivate multicast which is enabled by default -->
      <multicast enabled="false"/>
      <discovery-strategies>
        <discovery-strategy enabled="true"
            class="com.hazelcast.example.SomeRestServiceDiscoveryStrategyFactory">
          <properties>
            <property name="application.scope">hazelcast-test</property>
          </properties>
        </discovery-strategy>
      </discovery-strategies>
    </join>
  </network>
</hazelcast>

For Hazelcast users this should be easy to understand. As mentioned in the introduction, Hazelcast has multicast enabled by default, so we have to deactivate it in order to use the Discovery SPI. Afterwards we configure our discovery strategy. We configure the canonical class-name and enable it, why? Because it is possible to have multiple configurations (for different environments like test and production) in the same configuration file and use variables to activate one at a time. Furthermore we configure one of our two configuration properties defined earlier. As we remember, both are optional, so it is perfectly legal to configure both, or just one or none at all.

And we’re ready to run a Hazelcast as we already know. However there’s one more nice thing about the Discovery SPI we haven’t named yet; the Hazelcast client can use it as well. Just as a quick peek:

<hazelcast-client>
  <properties>
    <property name="hazelcast.discovery.enabled">true</property>
  </properties>
  <network>
    <discovery-strategies>
      <discovery-strategy enabled="true"
          class="com.hazelcast.example.SomeRestServiceDiscoveryStrategyFactory">
        <properties>
          <property name="application.scope">hazelcast-test</property>
        </properties>
      </discovery-strategy>
    </discovery-strategies>
  </network>
</hazelcast-client>

There’s something to realize here, convenience wins 🙂

To close, we’ll have a quick look at how we can achieve the same using the programmatic configuration API.

Config config = new XmlConfigBuilder().build();
config.setProperty("hazelcast.discovery.enabled", "true");

JoinConfig joinConfig = config.getNetworkConfig().getJoin();
joinConfig.getMulticastConfig().setEnabled(false);

DiscoveryConfig discoveryConfig = joinConfig.getDiscoveryConfig();
DiscoveryStrategyFactory factory = new SomeRestServiceDiscoveryStrategyFactory();
DiscoveryStrategyConfig strategyConfig = new DiscoveryStrategyConfig(factory);
strategyConfig.addProperty("application.scope", "hazelcast-test");

discoveryConfig.addDiscoveryStrategyConfig(strategyConfig);

And there we go, ready to run again by just passing the configuration right the Hazelcast factory methods.

Closing Words

There are a few more bits left out, like the option to filter discovered members based on custom criteria with the NodeFilter or the platform integrator API for users that implement Hazelcast into their own framework or platform, however there’s no reason to get everyone bored with the details 🙂 If you’re interested into further details, you’re welcome to read the Hazelcast documentation part on the Discovery SPI to find all necessary information.

Given the above walk-through everybody should be able to implement a custom discovery strategy now. The full source code is available at github to follow up.