Hazelcast and MongoDB

Hazelcast and MongoDB

In this article, I will implement a sample (getting-started) project which uses MongoDB as persistence layer for Hazelcast distributed cache.

Hazelcast has a flexible persistence layer, you should just implement an Interface (MapStore) to store your memory grid into your preferred database. By 2.1 version Hazelcast supports MongoDB persistence in a smoother way using Spring-MongoDB data library. Let’s implement a simple project step-by-step to illustrate this feature. Our project will have a single model class and we will see it will persisted to MongoDB when we put it to Hazelcast distributed map.
1- Project Set-Up
I will use Maven. Here the dependencies:
The dependencies are libraries for projects Spring, Spring-MongoDB, 
Hazelcast makes use of Spring Data project, connecting and mapping objects to MongoDB.

2- MongoDB Set-Up
Install and run mongodb in your local machine. One of the things makes mongodb attractive, its quick-start is really quick. 
You can follow this guide:
3- Model
A simple POJO to store basic info about users. Only thing you should care, it should be Serializable.
4- Configuration
As we use Spring, all configuration is bundled in Spring configuration xml. I named the file as beans.xml
5- Run and Test
Now we can test Mongo-Hazelcast integration. What we will do is to get the user map from spring context and put a new User object into map. We do not add any code related to Mongo or database layer, the object should be saved to MongoDB automatically. Also there is no Hazelcast code in this class. It seems that it just puts an object to a map. But in fact the object is put to distributed data grid, also persisted to MongoDB. The code is so clean thanks to Spring and the Hazelcast’s standart Map implementation.
Here the main class for that:
And let’s see if it is in Mongo:

MongoDB shell version: 2.0.2
connecting to: test
> db.user.find()
{ “_id” : “id-134”, “_class” : “com.hazelmongo.User”, “name” : “Enes”, “age” : 29 }
As you see, Mongo generates two fields other than the ones defined in POJO. _id field is assigned from the key which you used putting to the map. And _class is used to map record the corresponding Java Object.

This sample illustrates the default usage of MongoDB-Hazelcast. You can override default behaviour and object mapping (annotating the POJO) thanks to Spring Data project. Have a look at here for further details.

Gartner Selected Hazelcast as a “Cool Vendor”

Each year, Gartner, leading industry analyst firm, selects the cool vendors in key technology areas and creates a report by evaluating the innovative vendors. We are honored to be selected as a “Cool Vendor” by Gartner in the Application and Integration Platforms for the year 2012. This confirms what we have been hearing from the […]

Distribute with Hazelcast, Persist into HBase

Distribute with Hazelcast, Persist into HBase

In this article I will implement a solution for a Big Data scenario. 
I will use HBase as persistence layer, and Hazelcast as distributed cache.
So the resulting project will be a “Getting Started Sample” for ones who wants to use HBase as persistent storage for their Hazelcast application.

The Scenario
Suppose you have (or hope to have:) ) “User” data with billions of records. -> Big Data
People will reach the data from your web application; query them, search them… -> Real-time Access
Some records will be reached more frequently -> Cache them in memory, serve faster.
Can add/remove columns, no strict schema -> Sparse data
Given the main requirements, the solution “NoSQL + Distributed Cache” fits to our scenario.
I will persist user data to the HBase:
A no-sql key-value datastore based on Hadoop technology and specialized for Big Data requirements.
It is modeled after Google’s Big table and used by Yahoo and Facebook.
Facebook prefered HBase over Cassandra for its messaging system.
To learn more 
I will cache and distribute the data with Hazelcast.

HBase Setup
HBase is intended to be used in cluster but it has a standalone mode that you can try and use for development purposes.
For HBase setup follow:
If you use Ubuntu, you will encounter problems.
Although windows is not recommended for production, still you can try HBase on Windows.

Hazelcast Setup
Hazelcast is deadly simple to use. Just download and add hazelcast.jar to your classpath.
If you are new to hazelcast have a look at:

Project Setup
Create a maven Java project with dependencies:
Create a User pojo:
Create the user table in HBase:
Run your hbase by,

HBASE_DIR> ./bin/start-hbase.sh

Here it will be good to check the logs, to be sure it is installed and started properly.
Then open the HBASE shell by,

HBASE_DIR> ./bin/hbase shell

Create the user table

hbase(main):008:0> create ‘user’, ‘cf_basic’, ‘cf_text’

Here I should tell more about ‘cf_basic’ and ‘cf_text’. These are column families. 
Column families are stored together in the disk with the same storage specifications.
For example if you want some type of data (e.g. images) to be compressed then make them the same column family so you can define the same storage rule for them.
Here we have two column families: ‘cf_basic’ is for simple types, numbers, strings and ‘cf_text’ is for long text columns.
Notice that we have done nothing about schema, column types etc.
In the HBase intro video, you will recall Todd uses the term “datastore” instead “database” defining HBase.
HBase (and other key-value stores) is more like a persisted HashMap than a database.
You gain scalability but lose complex queries.

Create HBaseMapStore
This is the class where hazelcast will call at each map operation.
And a singleton service for getting HBase table.
On map.get hazelcast will look at HBase if it can not find the key in memory. Similarly when you put an element to map, hazelcast will persist it to HBase.
Why have not we implemented the loadAll? loadAll and loadAllKeys methods are for initially filling the hazelcast map from database. As we expect millions of records, it is not feasible to load db to memory. So we left them empty.
Unfortunately HTable is not thread safe, so you have to handle concurrency.

Configure Hazelcast
Here is hazelcast.xml that we put to classpath.
First difference from default one is I have added mapstore declaration to map config part.
Secondly I have enabled the eviction on maps. You can use hazelcast as a distributed cache by enabling eviction. So hazelcast evicts (removes) expired entries. To enable eviction set eviction-policy to LRU (or LFU) and max-size. For more information about hazelcast eviction see: 

Run The Code
Now let’s test it. 
And see the records in database:

hbase(main):055:0> get ‘user’, ‘u-6’
COLUMN                CELL                                                      
 cf_basic:age         timestamp=1334320415281, value=\x00\x00\x00\x1D           
 cf_basic:location    timestamp=1334320415281, value=Istanbul                   
 cf_basic:name        timestamp=1334320415281, value=Mehmet Dogan               
 cf_text:details      timestamp=1334320415281, value=software developer …..   
4 row(s) in 0.0150 seconds

Write-Through and Write-Behind
The default configuration of map-store is write-through: records are synchronously persisted to datastore.
If you set write-delay-seconds in hazelcast.xml to a positive value then the behaviour will be write-behind.
The entries added will be persisted after n seconds. 
deleteAll and storeAll methods implemented in mapstore are used in write-behind mode.

POJO Mapping
If you do not want to map your objects manually; you can use Kundera.
It is JPA compliant ORM for Big data.

Source Code
You can reach the example project code:

Getting started with Spring and Hazelcast

This article is a getting-started tutorial on integrating hazelcast into a Spring project.
Part 1: First Create Spring project (if you have already skip to Part 2)
1- Create a maven project.
mvn archetype:generate -DgroupId=spring_hazelcast -DartifactId=spring_hazelcast  

2- Add spring dependencies into pom.
3- Create a TestBean.
3- Create beans.xml in your source root (src/main/java/beans.xml)
4- Test your Spring app.
If you see “success” printed, now we can integrate hazelcast.

Part 2: Integrate Hazelcast
1- Add hazelcast dependencies
2- Add hz name spaces, hazelcast configuration and an hazelcast map bean to beans xml.
In hz:map definition, “id” is spring bean id, name refers to hazelcast map’s name.

3- Test hazelcast.

You can download and browse the getting started project here:

Distribute Grails with Hazelcast

Distribute Grails with Hazelcast


In this article I will try to integrate my two favorite technology: grails and hazelcast.
(Bias: I am currently work for Hazelcast)

Ruby on Rails gained popularity among people who seeks productivity on web programming.
Java is often criticisized on being heavy for rapid development.
But richness of Java community has given birth to flexible and dynamic JVM languages like Groovy.
Grails is somewhat synthesis of power of Java (with the help of Groovy) and productivity of Rails with philosophy “convention over configuration”.
Another technology which amazes me is Hazelcast.
I remember the days which I first meet socket programming, RMI; in university.
And when I first tried the Hazelcast my first reaction is “How “Distributing your data over machines” could be so easy.
Single jar, no dependency, distribute your data over maps, queues, topics…

So I have decided to integrate these two technology, write a simple plugin so anyone can easily distribute data over memory by hazelcast.
I called it hazelgrails and pushed it to GitHub: https://github.com/enesakar/hazelgrails
Here introductory on using this plugin.
How to Install Plugin

Run the command:

install-plugin hazelgrails

You will see hazelcast.xml in conf directory under plugins directory.
You can configure hazelcast in details. 
For available options have a look at:

To see hazelcast logs add following to Config.groovy:

info ‘com.hazelcast’

Use Hazelcast as Hibernate 2nd Level Cache
In DataSource.groovy replace the following line in hibernate configuration block.

cache.region.factory_class = ‘com.hazelcast.hibernate.HazelcastCacheRegionFactory’

For more details about 2nd level cache configuration have a look at:

Test The  Plugin

Create an Grails application and install the plugin. Then create a domain and two controllers.

create-domain-class com.hazelgrails.Customer
create-controller com.hazelgrails.Server1
create-controller com.hazelgrails.Server2

As you see, Customer is serializable. Hazelcast requires the objects to be serializable in order to distribute them in cluster.

Now create the war file (command “grails war”) but copy the file with different name (app2.war). 
You may deploy the wars into different machines in the same network, or to different servers (Tomcat, Jetty) in same machine or even into the same Tomcat.
For simplicity I have run the current app by “grails run-app” and I have deployed the war to an external Tomcat.
And test them:

Cities:[2:New York, 1:London] 
Timestamps:[1333447087796, 1333447112863]
Cities:[2:New York, 1:London] 
First customer name:tom, age:20 
Timestamps:[1333447087796, 1333447112863]

In practice, if you see the following then you can conclude the nodes formed a cluster succesfully. (you should add “info ‘com.hazelcast’” into Config.groovy)

Members [2] {
        Member []
        Member [] this

Usage Examples
There are two new methods defined for domain classes.
saveHz() method, first persists the domain object (like original save()) then puts it to hazelcast map.
getHz() method tries to find object with given id first in hazelcast map, if not found then tries to find it in datasource.
Hazelcast create a distributed map for each domain class. 
So by using saveHz() and getHz() you can get your objects from distributed memory instead of getting by database operations.
Also by injecting hazelService, you can create hazelcast instances.
Here the usage exampples:

Advantage of using MultiMap

One of the distributed data structures supported by Hazelcast is MultiMap.
It is very useful container as it stores a collection of values mapped to given key.
But as you know you can always use an Map as MultiMap.
Just create a set or list for each new key, and add to this collection in each put operation.
But in Hazelcast’s distributed world, MultiMap has extra benefits.
Hazelcast optimized the MultiMap operations so that operations (put, remove, containsValue and containsEntry) do not serialize, deserialize the whole collection.
Instead just the added or retrieved object is processed.

Note: This information is extracted from a conversation in Hazelcast mail group.
For more you can join the group.

Hazelcast 2.0 Released: Big Data In-Memory

With 2.0 release, Hazelcast is ready for caching/sharing terabytes of data in-memory. Storing terabytes of data in-memory is not a problem but avoiding GC and being resilient to crashes are big challenges. Among several others, there are two major features added to tackle these challenges. Elastic Memory (off-heap storage) and Distributed Backups. 1. Elastic Memory […]

Hazelcast 2.0 is coming! What is new?

Hazelcast 2.0 is a huge step forward in building the best IMDG and making Hazelcast experience even more pleasant. As always this release contains many fixes, enhancements and improvements. But there are different reasons that make 2.0 very special. We have many big changes in the internals of Hazelcast. Many of these changes are made […]

Hazelcast USA Trip, EastBayJUG and Coffee

Fuad and I will be in USA. Our schedule is below. I am sure we can squeeze a coffee break somewhere if you are near-by and would like to meet. Please drop an email (talip@hazelcast.com). We will also speak at SF EastBayJUG on 13th. Here is the link to sign up: http://www.meetup.com/eastbayjug/events/42096972/ Dec 4-6 Atlanta, GA […]