Simulator 0.7 released!

Today we released version 0.7 of the Hazelcast Simulator tool. It is our production simulator used to test Hazelcast and Hazelcast based applications in clustered environments.

Please read “Simulator 0.4 released” for a general introduction or have a look at the Hazelcast Simulator Documentation. You can download Hazelcast Simulator here.

Simulator Communication Protocol

The biggest change in this release is the new Simulator Communication Protocol. It makes each Simulator component (Coordinator, Agents, Workers) individually addressable and the communication between them language agnostic. This is a big milestone in the support for our Hazelcast clients in C#, C++, Python etc. We also created a Simulator Worker Implementation Guide which will be published soon.

The new SimulatorAddress for each component is also a requirement to implement a script based resilience testing. With this upcoming task we will be able to induct disturbances into an ongoing Simulator run to trigger migrations and split-brain handling.

An additional benefit is the bi-directional communication between all Simulator components. So we could switch most internal systems to directly push their data to the Coordinator, instead of polling. This reduced the latency of the failure detection and improved the continuous performance monitoring.

Simulator Test Framework

We’ve cleaned up the Simulator Test Framework, which defines the API for a Simulator Test.

There are new annotations to inject common objects into your test class: @InjectTestContext, @InjectHazelcastInstance and @InjectProbe

The @InjectProbe is now mandatory to get a Probe injected. The old behavior was to inject a probe into every field of the type Probe. The annotation also provides a field to override the Probe name (if not defined the field name will be used).

We merged all Probes into a single implementation, based on HdrHistogram. This saves the whole configuration part in the TestSuite file and different interfaces. You can define if a Probe should be used for the throughput calculation of the test via the @InjectProbe annotation. Per default it will just be used to record latency values.

We also created more abstract IWorker classes for the @RunWithWorker annotation (which is the recommended way to write a Simulator Test).

Class name Abstract methods Description
AbstractMonotonicWorker timeStep() This is the simplest implementation which supports just a single operation.Has a single, built-in Probe which automatically measures the execution time of the whole timeStep() method.

Useful for simple Simulator Tests with a single operation and a fast timeStep() method.

AbstractMonotonicWorker WithProbeControl timeStep(Probe probe) Supports a single operation like the AbstractMonotonicWorker.Has a single, built-in Probe as parameter for the timeStep() method, so you have to do the latency measurement on your own.

Useful if you have a single operation and need more control over the measured code block, e.g. if you do expensive operations in your timeStep() method, which you don’t want to capture.

AbstractWorker <O extends Enum> timeStep(O operation) This is the basic implementation for multiple operations. You have to pass an OperationSelectorBuilder to the constructor, which creates an OperationSelector for each IWorker instance.The timeStep() method gets a randomly selected operation as parameter, based on the defined probabilities of the OperationSelectorBuilder. You just have to implement a switch-case for your operation Enum within the timeStep() method.

Useful for most Simulator Tests with multiple operations and a fast timeStep() method.

AbstractWorker WithProbeControl <O extends Enum> timeStep(O operation, Probe probe) Supports multiple operations like the AbstractWorker.Has a single, built-in Probe as second parameter for the timeStep() method, like the AbstractMonotonicWorkerWithProbeControl.

Useful if you have multiple operations and need more control over the measured code block, e.g. if you do expensive operations in your timeStep() method, which you don’t want to capture.

AbstractWorker WithMultipleProbes <O extends Enum> timeStep(O operation, Probe probe) Like the AbstractWorkerWithProbeControl, but has a separate Probe per operation.If you defined an operation Enum with PUT and GET, the TestContainer automatically generates a PutProbe and GetProbe. The according Probe to the randomly selected operation is given as second parameter of the timeStep() method.

Useful if you have multiple operations and need a separate Probe for each operation.

AbstractAsyncWorker <O extends Enum, V> timeStep(O operation)handleResponse(V response)

handleFailure(Throwable t)

Like the AbstractWorker, but with support for asynchronous methods.Implements the Hazelcast interface ExecutionCallback.

Useful if you have multiple operations for asynchronous map operations.

NoOperationWorker n/a Has no timeStep() method and will do nothing.Useful if you decide dynamically what your Simulator Worker should do at runtime and you need Workers which do nothing.

New Features

  • Added a new command line tool simulator-wizard to ease the installation and creation of working directories.
  • Added new property CLOUD_PROVIDER=local to run Simulator with a minimum setup on the local machine.
  • Added new property AGENT_PORT to configure the Agent port.
  • Added new properties HAZELCAST_PORT and HAZELCAST_PORT_RANGE to configure the Hazelcast cluster ports.
  • Added command line parameters --targetType and --targetCount to control the load generation.
  • Added command line parameter --licenseKey to set an Enterprise License key for a Simulator run.
  • Added configuration option to create Workers with different Hazelcast versions and configuration (via cluster.xml file).
  • Added KeyLocality.SHARED which generates random keys from the same range on all Workers.
  • Added Streamer implementation for ICache.

Test Changes

  • Resolved a lot of SonarQube issues in test classes.
  • Adapted test classes to changes in the Simulator Test Framework.
  • Optimized some tests to scale better with high number of Workers.
  • Added ExtractorMapTest to test queries with attribute extractors.
  • Added MapPutAllTest and MapPutAllOnTheFlyTest to test the performance of IMap.putAll().
  • Added NetworkTest to test the performance of the Hazelcast IO system.
  • Added PartitionServiceMBeanTest to test method calling via PartitionServiceMBean.
  • Added LongTestPhasesTest to test timeout detection of WorkerJvmFailureMonitor.
  • Fixed MapTimeToLiveTest which instantiated a wrong class if used with local TestRunner.
  • Created common base class for DomainObject classes to reduce code duplication.
  • Removed HTTP tests from the default test module, because they needed a lot of dependencies and didn’t work out of the box.

Improvements

  • Reduced the technical debt of the project.
  • Simulator is now compliant with the company code coverage requirements.
  • Simulator CheckStyle configuration is aligned with Hazelcast main project.
  • Added logging which worker is the “first worker”, which is used for the global test phases.
  • Cleanup of the user-lib directory to avoid clashes with previous runs in static setups.
  • Moved upload of Hazelcast JARs from Provisioner to Coordinator to support different Hazelcast versions on each Worker.
  • Symlinks are now resolved in file upload methods.
  • Allowed the timeStep() methods of the abstract IWorker implementations to throw checked exceptions.
  • Simplified the Probes to a single implementation based on HdrHistogram.
  • Added interval latency snapshots.
  • Moved the throughput calculation to the Workers to eliminate network latency to be part of the calculation.
  • Added latency values to continuous performance monitoring.
  • Switched continuous performance logging to milliseconds for latency values over one second. Removed the (performance not available) logging, since it makes the output of mixed TestSuites harder to read.
  • Agents are started and stopped by the Coordinator, so there are no more failures if Agents got killed or are still running from another Simulator version.
  • Pulled out harakiri-monitor as standalone command line tool which is used by Provisioner and Coordinator.
  • Made use of ThreadSpawner to reduce internal code duplication and to parallelize some code paths.
  • Created CloudProviderUtils to have a single location for CLOUD_PROVIDER constants.
  • Implemented and improved the fail-fast behavior of TestCaseRunner, so a TestSuite is aborted properly on a critical failure.
  • Added fetching of tags to GitSupport so new versions can automatically be built.
  • Replaced SLF4J and Logback with lo4gj bridge, to unify log configuration (and enable easy logging of Netty on Agent and Worker).
  • Removed property logFrequency from AbstractWorker and TestContainer (we have the built-in probes to monitor progress).
  • Removed the --list command from Provisioner, which does the same as a simple cat agents.txt.
  • Added -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints to the JVM arguments for JFR profiler.

Fixes

  • Fixed multiple issues with the live performance monitoring.
  • Fixed retrieval of Enterprise JARs when --enterpriseEnabled is used with Maven version specification.
  • Fixed mounting of ephemeral devices on EC2 instances with default AMIs.
  • Fixed GCE setup related files and settings.
  • Fixed an incompatibility with Hazelcast Simulator and Hazelcast version 3.2.
  • Removed the need to parse the hazelcast.xml file on the Coordinator to retrieve the Hazelcast port, since this failed if the file wasn’t compatible with the out-of-the-box version of Hazelcast in Simulator.
  • Removed the need for HostAddressPicker which failed in complex network setups.

Code Quality

We increased code coverage by 37.7% to 94.5% and added 681 new tests. We resolved all 340 SonarQube issues, reduced the technical debt by 38 days and raised the code duplication by 0.2%. We aligned some CheckStyle rules between project XML and SonarQube.

simulator-0-7-code-quality

simulator-0-7-issues

The figures in the screenshots are a bit lower, since the 0.6 tag was pushed some days after the release when the code quality was already increased again. You can compare the exact numbers by having a look at the Simulator 0.6 Release. The drops in code coverage in the end were caused by a test configuration error.

Try Simulator for yourself, get started today!

download simulator