Simulator 0.7 released!
Today we released version 0.7 of the Hazelcast Simulator tool. It is our production simulator used to test Hazelcast and Hazelcast based applications in clustered environments.
Please read “Simulator 0.4 released” for a general introduction or have a look at the Hazelcast Simulator Documentation. You can download Hazelcast Simulator here.
Simulator Communication Protocol
The biggest change in this release is the new Simulator Communication Protocol. It makes each Simulator component (Coordinator, Agents, Workers) individually addressable and the communication between them language agnostic. This is a big milestone in the support for our Hazelcast clients in C#, C++, Python etc. We also created a Simulator Worker Implementation Guide which will be published soon.
The new SimulatorAddress
for each component is also a requirement to implement a script based resilience testing. With this upcoming task we will be able to induct disturbances into an ongoing Simulator run to trigger migrations and split-brain handling.
An additional benefit is the bi-directional communication between all Simulator components. So we could switch most internal systems to directly push their data to the Coordinator, instead of polling. This reduced the latency of the failure detection and improved the continuous performance monitoring.
Simulator Test Framework
We’ve cleaned up the Simulator Test Framework, which defines the API for a Simulator Test.
There are new annotations to inject common objects into your test class: @InjectTestContext
, @InjectHazelcastInstance
and @InjectProbe
The @InjectProbe
is now mandatory to get a Probe
injected. The old behavior was to inject a probe into every field of the type Probe
. The annotation also provides a field to override the Probe
name (if not defined the field name will be used).
We merged all Probes
into a single implementation, based on HdrHistogram
. This saves the whole configuration part in the TestSuite file and different interfaces. You can define if a Probe
should be used for the throughput calculation of the test via the @InjectProbe
annotation. Per default it will just be used to record latency values.
We also created more abstract IWorker
classes for the @RunWithWorker
annotation (which is the recommended way to write a Simulator Test).
Class name | Abstract methods | Description |
---|---|---|
AbstractMonotonicWorker |
timeStep() |
This is the simplest implementation which supports just a single operation.Has a single, built-in Probe which automatically measures the execution time of the whole timeStep() method.
Useful for simple Simulator Tests with a single operation and a fast |
AbstractMonotonicWorker WithProbeControl |
timeStep(Probe probe) |
Supports a single operation like the AbstractMonotonicWorker .Has a single, built-in Probe as parameter for the timeStep() method, so you have to do the latency measurement on your own.
Useful if you have a single operation and need more control over the measured code block, e.g. if you do expensive operations in your |
AbstractWorker <O extends Enum> |
timeStep(O operation) |
This is the basic implementation for multiple operations. You have to pass an OperationSelectorBuilder to the constructor, which creates an OperationSelector for each IWorker instance.The timeStep() method gets a randomly selected operation as parameter, based on the defined probabilities of the OperationSelectorBuilder . You just have to implement a switch-case for your operation Enum within the timeStep() method.
Useful for most Simulator Tests with multiple operations and a fast |
AbstractWorker WithProbeControl <O extends Enum> |
timeStep(O operation, Probe probe) |
Supports multiple operations like the AbstractWorker .Has a single, built-in Probe as second parameter for the timeStep() method, like the AbstractMonotonicWorkerWithProbeControl .
Useful if you have multiple operations and need more control over the measured code block, e.g. if you do expensive operations in your |
AbstractWorker WithMultipleProbes <O extends Enum> |
timeStep(O operation, Probe probe) |
Like the AbstractWorkerWithProbeControl , but has a separate Probe per operation.If you defined an operation Enum with PUT and GET , the TestContainer automatically generates a PutProbe and GetProbe . The according Probe to the randomly selected operation is given as second parameter of the timeStep() method.
Useful if you have multiple operations and need a separate |
AbstractAsyncWorker <O extends Enum, V> |
timeStep(O operation) handleResponse(V response)
|
Like the AbstractWorker , but with support for asynchronous methods.Implements the Hazelcast interface ExecutionCallback .
Useful if you have multiple operations for asynchronous map operations. |
NoOperationWorker |
n/a |
Has no timeStep() method and will do nothing.Useful if you decide dynamically what your Simulator Worker should do at runtime and you need Workers which do nothing. |
New Features
- Added a new command line tool
simulator-wizard
to ease the installation and creation of working directories. - Added new property
CLOUD_PROVIDER=local
to run Simulator with a minimum setup on the local machine. - Added new property
AGENT_PORT
to configure the Agent port. - Added new properties
HAZELCAST_PORT
andHAZELCAST_PORT_RANGE
to configure the Hazelcast cluster ports. - Added command line parameters
--targetType
and--targetCount
to control the load generation. - Added command line parameter
--licenseKey
to set an Enterprise License key for a Simulator run. - Added configuration option to create Workers with different Hazelcast versions and configuration (via
cluster.xml
file). - Added
KeyLocality.SHARED
which generates random keys from the same range on all Workers. - Added
Streamer
implementation forICache
.
Test Changes
- Resolved a lot of SonarQube issues in test classes.
- Adapted test classes to changes in the Simulator Test Framework.
- Optimized some tests to scale better with high number of Workers.
- Added
ExtractorMapTest
to test queries with attribute extractors. - Added
MapPutAllTest
andMapPutAllOnTheFlyTest
to test the performance ofIMap.putAll()
. - Added
NetworkTest
to test the performance of the Hazelcast IO system. - Added
PartitionServiceMBeanTest
to test method calling viaPartitionServiceMBean
. - Added
LongTestPhasesTest
to test timeout detection ofWorkerJvmFailureMonitor
. - Fixed
MapTimeToLiveTest
which instantiated a wrong class if used with localTestRunner
. - Created common base class for
DomainObject
classes to reduce code duplication. - Removed HTTP tests from the default test module, because they needed a lot of dependencies and didn’t work out of the box.
Improvements
- Reduced the technical debt of the project.
- Simulator is now compliant with the company code coverage requirements.
- Simulator CheckStyle configuration is aligned with Hazelcast main project.
- Added logging which worker is the “first worker”, which is used for the global test phases.
- Cleanup of the
user-lib
directory to avoid clashes with previous runs in static setups. - Moved upload of Hazelcast JARs from Provisioner to Coordinator to support different Hazelcast versions on each Worker.
- Symlinks are now resolved in file upload methods.
- Allowed the
timeStep()
methods of the abstractIWorker
implementations to throw checked exceptions. - Simplified the
Probes
to a single implementation based onHdrHistogram
. - Added interval latency snapshots.
- Moved the throughput calculation to the Workers to eliminate network latency to be part of the calculation.
- Added latency values to continuous performance monitoring.
- Switched continuous performance logging to milliseconds for latency values over one second. Removed the (performance not available) logging, since it makes the output of mixed TestSuites harder to read.
- Agents are started and stopped by the Coordinator, so there are no more failures if Agents got killed or are still running from another Simulator version.
- Pulled out
harakiri-monitor
as standalone command line tool which is used by Provisioner and Coordinator. - Made use of
ThreadSpawner
to reduce internal code duplication and to parallelize some code paths. - Created
CloudProviderUtils
to have a single location forCLOUD_PROVIDER
constants. - Implemented and improved the fail-fast behavior of
TestCaseRunner
, so a TestSuite is aborted properly on a critical failure. - Added fetching of tags to
GitSupport
so new versions can automatically be built. - Replaced SLF4J and Logback with lo4gj bridge, to unify log configuration (and enable easy logging of Netty on Agent and Worker).
- Removed property
logFrequency
fromAbstractWorker
andTestContainer
(we have the built-in probes to monitor progress). - Removed the
--list
command from Provisioner, which does the same as a simplecat agents.txt
. - Added
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
to the JVM arguments for JFR profiler.
Fixes
- Fixed multiple issues with the live performance monitoring.
- Fixed retrieval of Enterprise JARs when
--enterpriseEnabled
is used with Maven version specification. - Fixed mounting of ephemeral devices on EC2 instances with default AMIs.
- Fixed GCE setup related files and settings.
- Fixed an incompatibility with Hazelcast Simulator and Hazelcast version 3.2.
- Removed the need to parse the
hazelcast.xml
file on the Coordinator to retrieve the Hazelcast port, since this failed if the file wasn’t compatible with the out-of-the-box version of Hazelcast in Simulator. - Removed the need for
HostAddressPicker
which failed in complex network setups.
Code Quality
We increased code coverage by 37.7% to 94.5% and added 681 new tests. We resolved all 340 SonarQube issues, reduced the technical debt by 38 days and raised the code duplication by 0.2%. We aligned some CheckStyle rules between project XML and SonarQube.
The figures in the screenshots are a bit lower, since the 0.6 tag was pushed some days after the release when the code quality was already increased again. You can compare the exact numbers by having a look at the Simulator 0.6 Release. The drops in code coverage in the end were caused by a test configuration error.
Try Simulator for yourself, get started today!