Elasticsearch Data Store - BloomReach Experience - Open Source CMS
20-09-2019

Elasticsearch Data Store

Introduction

The Relevance Module requires Elasticsearch.

Bloomreach Experience Manager 13 supports Elasticsearch version 6.x (see system requirements for supported versions). Elasticsearch 5.x is no longer supported in Bloomreach Experience Manager 13, users must upgrade to Elasticsearch 6.

Install Elasticsearch

Download and install Elasticsearch.

For guidance on installing, configuring, deploying and administering Elasticsearch please refer to the Elasticsearch documentation. Bloomreach Experience Manager does not require any special or additional steps to set up Elasticsearch. For production environments we recommend a cluster of at least two Elasticsearch nodes for high availability.

Choose a Stale Data Removal Strategy

To control the data volume of the Elasticsearch index, choose one of the following two strategies:

  • Use a scheduled cleanup job provided by the Relevance Module.
    When choosing this strategy, entries older than a certain number of days will be automatically deleted by the relevance engine. Configure maximum age in the targeting datasource in the application context configuration (see below).
  • Use the Rollover Index API provided by Elasticsearch 5.
    When the application connects to Elasticsearch it uploads an index template containing the mapping for the visit type. When rolling over to a new index this mapping will be automatically added by elasticsearch and the alias will be moved to the new index. Configure the template name and alias name in the targeting datasource in the application context configuration (see below).

Configure Visits Data Store

A Relevance Elasticsearch Data Store connects to its database through a JNDI data source lookup which needs to be defined on container level, e.g. Apache Tomcat.

Depending on your stale data removal strategy, add one of the following environment entries in conf/context.xml in your project.

When using the scheduled cleanup job stale data removal strategy:

<Environment name="elasticsearch/targetingDS" type="java.lang.String"
  value="{'indexName':'visits','maxAgeDays':'60',  'locations':['url-1','url-2]',...]}" />

When using the rollover index stale data removal strategy:

<Environment name="elasticsearch/targetingDS" type="java.lang.String"
  value="{'templateName':'myproject-hippo_relevance_visit', 'aliasName':'visits', 'locations':['url-1','url-2]',...]}" />

This will register a JNDI environment resource under java/comp:env/elasticsearch/targetingDS when the site web application is started. The JSON string contains the properties needed to instantiate a client that can connect to an Elasticsearch cluster.

Change ['url-1','url-2]',...] to the list of the URLs of your Elasticsearch cluster nodes. For local development, you can set locations to ['http://localhost:9200']'.

The table below lists all available JSON fields:

Field

Type

Default 

Description

indexName1

String 

n/a

The name of the Elasticsearch index (use with the scheduled cleanup job stale data removal strategy).

templateName2 String n/a The name of the index template (use with the rollover index stale data removal strategy). You are free to choose any name, but it is advised to use a descriptive name to prevent name collisions and confusion.
aliasName2 String n/a The name of the alias.

locations3

String array

n/a

URL locations of nodes in the Elasticsearch cluster to connect to. One location is enough to connect to the cluster. Specifying multiple locations adds robustness for the startup process.

Username

String

n/a

Optional. Username for if elasticsearch requires authenticated access.
password

String

n/a

Optional. Password for if elasticsearch requires authenticated access.

maxConnections

Long 

20

Optional. Maximum number of client threads in the connection pool that will be used to connect to Elasticsearch.
maxAgeDays Long 0 Records older than this are deleted. 0 means records are never deleted.
cleanupJobCronTrigger4 String n/a Valid cron expression defining the interval at which data store cleanup jobs run. The cleanup jobs will only run if maxAgeDays is greater than 0. If the cleanupJobCronTrigger property is absent then the jobs execute with a fixed delay of one hour.

1 Required when using the scheduled cleanup job stale data removal strategy.
2 Required when using the rollover index stale data removal strategy. 
3 Required regardless of stale data removal strategy.
4 Available since version 13.4.0.

Configure this JNDI environment resource for the visits store, like below default bootstrapped configuration.

Elasticsearch 6:

/targeting:targeting/targeting:datastores/targeting:visits:
  targeting:storefactoryclass: com.onehippo.cms7.targeting.storage.elastic6.ElasticStoreFactory
  dataSource: elasticsearch/targetingDS

Elasticsearch 5.6:

/targeting:targeting/targeting:datastores/targeting:visits:
  targeting:storefactoryclass: com.onehippo.cms7.targeting.storage.elastic5.ElasticStoreFactory
  dataSource: elasticsearch/targetingDS

Configure Elasticsearch

When using the scheduled cleanup job stale data removal strategy, create the index configured above (referred to by the indexName property) in Elasticsearch, e.g. using curl:

curl -s -S -XPUT http://localhost:9200/visits

If the index does not exist when the CMS is started creating the configured index will be tried.

When using the rollover stale data removal strategy, create the aliased index configured above (referred to by the aliasName property) in Elasticsearch, e.g. using curl:

curl -XPUT 'localhost:9200/%3Cvisits-%7Bnow%2Fd%7D-000001%3E' -d
'{
 "aliases": {
   "visits": {}
 }
}'

This creates an initial index named visits-YYYY.MM.dd-000001 where YYYY.MM.dd are the current year, month and day. The alias for this index is visits. Make sure that the index you create is prefixed with the alias because the application will use ${alias}* for queries.

The index must be accessible for reading and writing to the users as configured by the authentication property.  How this can be done is out of scope of this document because it depends on the deployment scenario of your Elasticsearch instance. Please consult your administrator to find out how you can create the index in your Elasticsearch instance.

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?