Elasticsearch 5 Data Store - BloomReach Experience - Open Source CMS

This article covers a Hippo CMS version 11. There's an updated version available that covers our most recent release.

31-08-2017

Elasticsearch 5 Data Store

This feature is available since Hippo CMS 11.2.0.

Introduction

To use the Trends panel and to see the experiment servings, visits must be stored in Elasticsearch.

Hippo DX 11.2 supports Elasticsearch versions 2 and 5. Out-of-the-box it's configured to use an Elasticsearch 2 data store. This page describes how to configure an Elasticsearch 5 data store.

Install Elasticsearch 5

Download and install Elasticsearch 5.

For guidance on installing, configuring, deploying and administering Elasticsearch please refer to the Elasticsearch documentation. Hippo does not require any special or additional steps to set up Elasticsearch. For production environments we recommend a cluster of at least two Elasticsearch nodes for high availability.

Choose a Stale Data Removal Strategy

To control the data volume of the Elasticsearch index, choose one of the following two strategies:

  • Use a scheduled cleanup job provided by the Relevance Module.
    When choosing this strategy, entries older than a certain number of days will be automatically deleted by the relevance engine. Configure maximum age in the targeting datasource in the application context configuration (see below).
  • Use the Rollover Index API provided by Elasticsearch 5.
    When the application connects to Elasticsearch it uploads an index template containing the mapping for the visit type. When rolling over to a new index this mapping will be automatically added by elasticsearch and the alias will be moved to the new index. Configure the template name and alias name in the targeting datasource in the application context configuration (see below).

Configure Visits Data Store

A Relevance Elasticsearch Data Store connects to its database through a JNDI data source lookup which needs to be defined on container level, e.g. Apache Tomcat.

Depending on your stale data removal strategy, add on of the following environment entries in conf/context.xml in your project.

When using the scheduled cleanup job stale data removal strategy:

<Environment name="elasticsearch/targetingDS" type="java.lang.String"
  value="{'indexName':'visits','maxAgeDays':'60',  'locations':['url-1','url-2]',...]}" />

When using the rollover index stale data removal strategy:

<Environment name="elasticsearch/targetingDS" type="java.lang.String"
  value="{'templateName':'myproject-hippo_relevance_visit', 'aliasName':'visits', 'locations':['url-1','url-2]',...]}" />

This will register a JNDI environment resource under java/comp:env/elasticsearch/targetingDS when the site web application is started. The JSON string contains the properties needed to instantiate a client that can connect to an Elasticsearch cluster.

Change ['url-1','url-2]',...] to the list of the URLs of your Elasticsearch cluster nodes. For local development, you can set locations to ['http://localhost:9200']'.

The table below lists all available JSON fields:

Field

Type

Default 

Description

indexName1

String 

n/a

The name of the Elasticsearch index (use with the scheduled cleanup job stale data removal strategy).

templateName2 String n/a The name of the index template (use with the rollover index stale data removal strategy). You are free to choose any name, but it is advised to use a descriptive name to prevent name collisions and confusion.
aliasName2 String n/a The name of the alias.

locations3

String array

n/a

URL locations of nodes in the Elasticsearch cluster to connect to. One location is enough to connect to the cluster. Specifying multiple locations adds robustness for the startup process.

Username

String

n/a

Optional. Username for if elasticsearch requires authenticated access.
password

String

n/a

Optional. Password for if elasticsearch requires authenticated access.

maxConnections

Long 

20

Optional. Maximum number of client threads in the connection pool that will be used to connect to Elasticsearch.
maxAgeDays

Long 

n/a

Optional. Maximum number of days request logs are stored. Not relevant when using the rollover functionality.

1 Required when using the scheduled cleanup job stale data removal strategy.
2 Required when using the rollover index stale data removal strategy. 
3 Required regardless of stale data removal strategy.

Configure this JNDI environment resource for the visits store, like below default bootstrapped configuration:

/targeting:targeting/targeting:datastores/targeting:visits
  - targeting:storefactoryclass =
      com.onehippo.cms7.targeting.storage.elastic5.ElasticStoreFactory
  - dataSource = elasticsearch/targetingDS

Configure Elasticsearch

When using the scheduled cleanup job stale data removal strategy, create the index configured above (referred to by the indexName property) in Elasticsearch, e.g. using curl:

curl -s -S -XPUT http://localhost:9200/visits

When using the rollover stale data removal strategy, create the aliased index configured above (referred to by the aliasName property) in Elasticsearch, e.g. using curl:

curl -XPUT 'localhost:9200/%3Cvisits-%7Bnow%2Fd%7D-000001%3E' -d
'{
 "aliases": {
   "visits": {}
 }
}'

This creates an initial index named visits-YYYY.MM.dd-000001 where YYYY.MM.dd are the current year, month and day. The alias for this index is visits. Make sure that the index you create is prefixed with the alias because the application will use ${alias}* for queries.

The index must be accessible for reading and writing to the users as configured by the authentication property.  How this can be done is out of scope of this document because it depends on the deployment scenario of your Elasticsearch instance. Please consult your administrator to find out how you can create the index in your Elasticsearch instance.

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?