Using Cassandra with JHipster

Introduction

This is the first part of an introductory series on using JHipster with Cassandra: it is aimed at people getting started with using both tools together, but some basic Cassandra and Spring Boot knowledge is expected to understand it fully.
It is written by Julien Dubois, the JHipster lead, who also coded the Cassandra support in JHipster.

This series is in 3 parts:

  1. Using Cassandra with JHipster
  2. Modeling data with Cassandra
  3. 10 Cassandra tips and tricks

Today we start with the first part, how Cassandra support works with JHipster.

What is JHipster?

JHipster is an application generator that focuses on Spring Boot and AngularJS. It is a very popular Open Source project hosted on GitHub, and you can find a lot of information on its website at http://jhipster.github.io/.

JHipster supports what Martin Fowler calls Polyglot Persistence: by default is uses a classical SQL datastore (using JPA/Hibernate), but it also support as an option MongoDB and Cassandra. The AngularJS front-end and most of the Java (Spring-based) code will work identically, but of course the data access layer is specific to the underlying datastore.

Cassandra support in JHipster

During application generation, JHipster allows to select Cassandra as a persistent datastore. This will generate:

  • A specific Spring Boot configuration.
  • A specific data access layer.
  • CQL scripts to generate the default Cassandra schema.

The Spring Boot configuration allows to configure the DataStax CQL driver using a Spring Boot YAML file. By default, this file is located in your *src/main/resources/config/application.yml *configuration file:

  • This is a standard Spring Boot configuration file, where all Spring Boot components can be parameterized.
  • Currently Spring Boot (version 1.2.x) does not support Cassandra out-of-the-box, but this support is planned for version 1.3.0: the good news is that this support comes from JHipster, so your configuration should be the same when this support is official.
  • This YAML file supports Spring profiles, so JHipster will create an application-dev.yml for development and an application-prod.yml for production.

For SQL and MongoDB datastores, JHipster uses the Spring Data project (namely Spring Data JPA and Spring Data MongoDB), which allows for very concise and efficient code. Unfortunately, Spring Data Cassandra does not have the same level of support (at the time of this writing), so the data access layer is coded directly using the DataStax CQL driver. At Ippon Technologies, we have already used this approach successfully for several clients, and for us this has proven to be the best choice when using Cassandra with Spring Boot.

How repositories access Cassandra

Here is a UserRepository generated by JHipster, and an example “Foo” entity generated with the JHipster entity sub-generator.

All repositories (for example the UserRepository) follow the same model:

  • Repositories are Spring Beans, annotated with Spring’s @Repository annotation.
  • The Session object, from the DataStax CQL driver, is injected using CDI’s @Inject annotation: this is a shared object for all repositories. As the DataStax documentation says “Session instances are thread-safe and usually a single instance is enough per application”, so this is a perfectly normal use of the Session, even if the name “session” might be misleading (in particular if you have an Hibernate/JPA background).
  • A com.datastax.driver.mapping.Mapper and several com.datastax.driver.core.PreparedStatement are declared as class variables at the beginning of the repository: as the repository is a Spring singleton, they will all be accessed concurrently (which is fine, as they are all thread-safe objects).
  • An init() method, annotated with CDI’s @PostConstruct annotation, is creating the Mapper and all PreparedStatement objects: this method will be invoked once the Spring configuration is all set up, hence when the application’s CQL driver is correctly set up and ready to serve requests.
  • The usual CRUD methods are available, using the Mapper and the PreparedStatement instances to map the object to the Cassandra datastore.

Managing indexes

We generally advice people not use Cassandra’s indexes (see “when not to use an index” from the official documentation), and we have had many issues with indexes when working on clients projects at Ippon Technologies.

On the other hand, managing indexes manually, in a specific “index table”, has always been successful for us:

  • This allows to have a fine-grained knowledge of what is being indexed, and performance is always good.
  • Of course, this comes at the cost of maintaining those tables, which adds quite a lot of boilerplate code to write.

For JHipster, we have used this principle to handle indexes for the “User” table, so a user can be looked up by e-mail, login, or activation key. Here is the CQL script for the “User” table and its “index tables”.

Those three keys have a very high cardinality: if you looked at the section “when not to use an index” in the official documentation, you know those should not be indexed by Cassandra. That’s why we created separate index tables for those keys, and that we manage them manually in the repository.

This design can of course be criticized:

  • There is quite a lot of code to write and manage. If you compare this code to JHipster’s JPA model, the difference is huge!
  • We use “batches”, as provided by the CQL driver, to ensure our changes are atomic: this is good for quality, obviously, but on the other hand they slow down our operations as they are required to use the same coordinator node.
  • Another design would be to write again the whole user data in each “index table”. Writes are cheap in Cassandra, so this will just cost some disk space and will clearly speed up our reads, as we could get all user data at once instead of doing two queries. At the moment we do not feel that those reads need to be optimized, but this might change depending on your own specific use case.

Going further with Cassandra

Following this first article on using Cassandra with JHipster, we will post tomorrow another blog post discussing how to model that data with Cassandra, and how it is stored internally in the database.