Collectors - BloomReach Experience - Open Source CMS

This article covers a Hippo CMS version 10. There's an updated version available that covers our most recent release.

14-11-2018

Collectors

Table of Contents

Terminology

Collector

A collector extracts information from an HTTP request, and adds that to the targeting data of the current user. For example, a collector could check to see if the current request was a search action and if so, add the search terms to the targeting data. Another collector could check if the current request was for a document that defines keywords and if so add those keywords to the targeting data.

For every HTTP request, all collectors will be asked to update their targeting data. A collector can indicate that it should only be updated for new visitors, new visits, or based on existing targeting data.

Since Hippo CMS 10.1.0

Data collected by a collector is not automatically added to the user's targeting data. To have the collected data added you must add a characteristic that uses the collector.

Visitor

Someone visiting a channel. The Relevance Module gives each visitor an ID in a cookie to recognize which HTTP requests originate from the same visitor. In practice, different devices and/or browsers will be identified as different visitors, even when they are used by the same person.

Visit

One or more HTTP requests from the same visitor that are at most 30 minutes apart. When a visitor is inactive for a longer period before issuing a new request, a new visit starts.

Configuration

As an example, we'll configure a collector called ' documenttypes' that collects the document types a user has seen on the site (i.e. the JCR types of HST content beans for visited URLs).

Collectors are configured in the repository at

/targeting:targeting/targeting:collectors

Each child node specifies one collector. The node name is the ID of the collector. Each collector node has one mandatory property:

  • targeting:className The Java class name of the collector implementation

Depending on the collector, more configuration properties may be available to configure the collector.

Available collectors

The following collectors are available by default:

Class Name *) Collected Targeting Data Update Frequency
ChannelCollector The IDs of all visited channels. Every request
DayOfWeekCollector The day of the week on which a channel is visited. The day is based on the time on the server side, not the time on the client side. Every request
DocumentTypesCollector The type of documents a user has seen (i.e. the relative content beans of visited pages). Every request
GeoIPCollector The city, country, latitude, and longitude of the visitor, based on its IP address. Every new visit
GroupsCollector The group a logged-in visitor belongs to. Every request
PageViewsCollector The visited page URLs Every request
ReferrerCollector The webpage that led the visitor to our site. The first visit
ReturningVisitorCollector Whether the visitor has visited our site before or not. Every new visit
TagsCollector All tags on documents a user has seen (i.e. tags on relative content beans of visited pages). Every request
SiteSearchKeywordsCollector Keywords used to search the site. Every request

*) All collectors in the table above are located in the package com.onehippo.cms7.targeting.collectors.

See Collector configurations.

Writing a custom collector

Collectors have to be part of the site application. You can either add them to the 'site' module of a Hippo project, or create a separate Maven module for your own collector(s) and let the site module depend on that separate module.

First, add the following Maven dependency to the module that will contain your custom collector:

<dependency>
  <groupId>com.onehippo.cms7</groupId>
  <artifactId>hippo-addon-targeting-api</artifactId>
</dependency>

Second, add a class that implements the interface com.onehippo.cms7.targeting.Collector. The class should have a constructor that gets a String and a JCR Node object. The String is the configured ID for the collector (e.g. 'documenttypes' in the previous example). The node is the configuration JCR node of the collector (e.g. ' /targeting:targeting/targeting:collectors/documenttypes').

An alternative for implementing the Collector interface is to extend the base class AbstractCollector. Extending this class saves you from writing your own JSON serialization code (more about that below). If your collector extend AbstractCollector, add the following Maven dependency too:

<dependency>
    <groupId>com.onehippo.cms7</groupId>
    <artifactId>hippo-addon-targeting-collectors</artifactId>
</dependency>

The example implementation below also extends AbstractCollector.

MyCollector.java:

import javax.jcr.Node;
import com.onehippo.cms7.targeting.collectors.AbstractCollector;

public class MyCollector extends AbstractCollector<MyTargetingData,
                                                   MyRequestData> {

    public MyCollector(String id, Node node) throws RepositoryException {
        super(id);
        // read any collector-specific configuration properties from the node
    }

    /**
     * Get the targeting data that this collector provides 
     * for the current request.
     * This allows decoupling of runtime request information 
     * and the generation of statistics.
     *
     * @param request               the <code>request</code> to inspect 
     *                              for new targeting information to add to the
     *                              data.
     * @param newVisitor
     * @param newVisit
     * @param previousTargetingData the previous collected data 
     *                              for this Collector for the 
     *                              current visitor, which can be null
     * @return processed request data, or {@code null} if 
     * no relevant data is available
     */
    MyRequestData getTargetingRequestData(HttpServletRequest request, 
                              boolean newVisitor, 
                              boolean newVisit, 
                              MyTargetingData previousTargetingData) {
        // TODO: implement
    }

    /**
     * Update the targeting data of this visitor 
     * with the request data gathered by
     * {@link #getTargetingRequestData(javax.servlet.http.HttpServletRequest, 
     * boolean, boolean, TargetingData)}
     *
     * @param targetingData the {@link TargetingData} to update. May be {@code null}
     *                      if this is the first time this collector is 
     *                      called for this visitor.
     * @param requestData   the requestData that resulted from processing 
     *                      the current request. May be {@code null}.
     * @return the updated {@link TargetingData}. Null if both 
     * the passed in {@link TargetingData} was null and there
     * was no new information to store.
     */
    MyTargetingData updateTargetingData(MyTargetingData targetingData, 
                                        MyRequestData requestData) 
                                        throws IllegalArgumentException {
        // TODO: implement
    }
}

Your collector will most likely use its own objects to store targeting data and request data. In the example we'll call these MyTargetingData and MyRequestData.   MyTargetingData needs to be a POJO that can be mapped from Java to JSON and vice versa  via com.fasterxml.jackson.databind.ObjectMapper

The targeting data bean stores all targeting data of a visitor. The collector is responsible for updating the data in the bean. It extends the TargetingData interface, which only defines the method getCollectorId.

MyTargetingData.java:

public class MyTargetingData extends AbstractTargetingData {

    @JsonCreator
    public MyTargetingData(@JsonProperty("collectorId") String collectorId,
                            ...) {
        super(collectorId);
    }

    // add any custom fields, getters, and setters.

} 
Note that since the release of Hippo CMS 10 the JsonCreator annotation to use is com.fasterxml.jackson.annotation.JsonCreator. This replaces org.codehaus.jackson.annotate.JsonCreator which was used in Hippo CMS 7.9 and should not be used in Hippo CMS 10 projects.

The request data bean contains all data collected from a single HTTP request. This bean will be stored in the request log of the targeting engine.

MyRequestData.java:

public class MyRequestData {

    public MyRequestData(...) {

    }

    // add any custom getters, and setters.

}

JSON Serialization

The targeting and request data will be serialized to JSON when communicating with the CMS UI and when it is persisted. The default serialization is based on Jackson and can be tuned using it's annotations @JsonCreator, @JsonProperty etcetera. Since this may not give sufficient control, is inconvenient or because you need to adapt data that was serialized using an older format, the actual serialization is delegated to the Collector implementation.

The methods that will be invoked for (de)serializing request & targeting data are (see the Collector interface):

T convertJsonToTargetingData(ObjectNode root, ObjectMapper objectMapper)
                                                        throws IOException;

JsonNode convertTargetingDataToJson(T data, ObjectMapper objectMapper)
                                                        throws IOException;

U convertJsonToRequestData(JsonNode root, ObjectMapper objectMapper)
                                                        throws IOException;

JsonNode convertRequestDataToJson(U data, ObjectMapper objectMapper)
                                                        throws IOException;

The AbstractCollector has default implementations of these methods. The MyTargetingData example demos how the collector ID should be passed to the AbstractTargetingData base class. Other properties can be set with setters conforming to the java beans convention, but they can also be initialized with additional @JsonProperty annotations, allowing for instance to create immutable data structures.

Alter Ego

The Alter Ego functionality allows a CMS user to impersonate a visitor with certain characteristics. The 'Show this page as' menu in the template composer always contains the option 'Alter Ego'. When the 'Alter Ego' option is selected, targeting data will be collected while previewing the channel, and targeted content will be shown. The 'Edit Alter Ego' button opens a window in which collected targeting data can be overridden with a specific value. For example, it is possible to select a specific location instead of location collected by the Relevance Module.

To be able to override collector data, a collector plugin must be provided that can edit the (json representation) of the targeting data. Such a plugin is similar to a characteristic plugin, and provides the UI components shown in the 'Edit Alter Ego' window in the template composer.

The remainder of this page explains how to implement a collector plugin.

Configuration

Collector plugins are configured in the repository at:

/hippo:configuration/hippo:frontend/cms/hippo-targeting

Each collector plugin is configured in one child node of type ' frontend:pluginconfig'. The node name does not matter, but it is good practice to name it ' collector-<ID of your collector>'. Each collector plugin node can have the following properties:

  • collector (String, mandatory) The ID of the collector.

  • plugin.class (String, mandatory) The Java class name of the collector plugin

A collector plugin can define more configuration properties to customize the plugin.

Example: GroupsCollectorPlugin

The groups collector plugin allows you to alter the groups a user is a member of. The targeting data of the GroupsCollector simply returns the groups as a comma-separated string. The groups collector plugin consists of a checkbox group in which one or more groups can be selected.

The plugin consist of three files: a Java class, a .properties file, and a Javascript class. The code shown below is a slightly simplified version of the GroupsCollectorPlugin in the Relevance Module.

The Java class contains an ExtClassannotation that specifies the associated Javascript class of the plugin.

GroupsCollectorPlugin.java:

package com.onehippo.cms7.targeting.frontend.plugin.groups;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;

import javax.jcr.NodeIterator;
import javax.jcr.RepositoryException;
import javax.jcr.Session;
import javax.jcr.query.Query;

import com.onehippo.cms7.targeting.frontend.plugin.CollectorPlugin;

import org.hippoecm.frontend.plugin.IPluginContext;
import org.hippoecm.frontend.plugin.config.IPluginConfig;
import org.hippoecm.frontend.session.UserSession;
import org.json.JSONException;
import org.json.JSONObject;
import org.wicketstuff.js.ext.util.ExtClass;

/**
 * Plugin for the groups collector. Available plugin properties:
 * <ul>
 * <li>groups: multi-value String property, each string specifies
 *     a selectable group</li>
 * </ul>
 */
@ExtClass("Hippo.Targeting.GroupsCollectorPlugin")
@SuppressWarnings("unused")
public class GroupsCollectorPlugin extends CollectorPlugin {

    private List<Pattern> excludes;

    public GroupsCollectorPlugin(final IPluginContext context,
                                 final IPluginConfig config) {
        super(context, config);
        final String[] excludesConfig = config.getStringArray("excludes");
        excludes = new ArrayList<Pattern>();
        if (excludesConfig != null) {
            for (String exclude : excludesConfig) {
                excludes.add(Pattern.compile(exclude));
            }
        }
    }

    @Override
    protected void onRenderProperties(final JSONObject properties)
                                                    throws JSONException {
        super.onRenderProperties(properties);
        try {
            properties.put("groups", listGroups());
        } catch (RepositoryException e) {
            throw new JSONException(e);
        }
    }

    private List<String> listGroups() throws RepositoryException {
        final Session session = UserSession.get().getJcrSession();

        final StringBuilder statement = new StringBuilder();
        statement.append("//element");
        statement.append("(*, ").append("hipposys:group").append(")");
        statement.append(" order by @jcr:name");

        final Query q = session.getWorkspace().getQueryManager()
                           .createQuery(statement.toString(), Query.XPATH);

        final List<String> groups = new ArrayList<String>();
        final NodeIterator nodes = q.execute().getNodes();
        while (nodes.hasNext()) {
            final String group = nodes.nextNode().getName();
            if (!isExcluded(group)) {
                groups.add(group);
            }
        }

        return groups;
    }

    private boolean isExcluded(final String group) {
        if (group.equals("everybody")) {
            return true;
        }
        for (Pattern exclude : excludes) {
            if (exclude.matcher(group).matches()) {
                return true;
            }
        }
        return false;
    }

The .properties file contains all i18n labels. The special key collector-description is shown as the description of the collector in the 'Edit Alter Ego' window.

GroupsCollectorPlugin.properties:

collector-description=is in the user group
groups-empty=No groups available
no-groups=<none>

All properties are automatically available in the Javascript class as via the resources variable. For example, renderGroupsmethod shows ' <none>' when the list of groups is empty.

(function() {
    "use strict";

    Ext.namespace('Hippo.Targeting');

    Hippo.Targeting.GroupsCollectorPlugin =
                            Ext.extend(Hippo.Targeting.CollectorPlugin, {

        constructor: function(config) {
            var editor;

            if (Ext.isEmpty(config.groups)) {
                editor = {
                    message: config.resources['groups-empty'],
                    xtype: 'Hippo.Targeting.TargetingDataMessage'
                };
            } else {
                editor = {
                    collector: config.collector,
                    groups: config.groups,
                    resources: config.resources,
                    xtype: 'Hippo.Targeting.GroupsTargetingDataEditor'
                };
            }

            Hippo.Targeting.GroupsCollectorPlugin.superclass.constructor
                                           .call(this, Ext.apply(config, {
                editor: editor,
                renderer: this.renderGroups
            }));
        },

        renderGroups: function(value) {
            var groups = value ? value.groups: [];
            if (Ext.isEmpty(groups)) {
                return this.resources['no-groups'];
            }
            return groups.join(', ');
        }

    });

    Hippo.Targeting.GroupsTargetingDataEditor =
                    Ext.extend(Hippo.Targeting.TargetingDataCheckboxGroup, {

        constructor: function(config) {
            var checkboxes = [];
            Ext.each(config.groups, function(group) {
                checkboxes.push({
                    boxLabel: group,
                    name: group
                });
            });
            Hippo.Targeting.GroupsTargetingDataEditor.superclass
                                .constructor.call(this, Ext.apply(config, {
                columns: 2,
                items: checkboxes,
                vertical: true
            }));
        },

        convertDataToCheckedArray: function(data) {
            var checkedArray = this.createBooleanArray(this.checkboxNames
                                                                   .length);

            if (!Ext.isEmpty(data.groups)) {

                Ext.each(data.groups, function(dataItem) {
                    var index = this.checkboxNames.indexOf(dataItem);
                    if (index >= 0) {
                        checkedArray[index] = true;
                    }
                }, this);
            }

            return checkedArray;
        },

        convertCheckedBoxesToData: function(checkedBoxes) {
            var checkedIds = Ext.pluck(checkedBoxes, 'name');
            return {
                collectorId: this.collector,
                groups: checkedIds
            };
        }

    });
    Ext.reg('Hippo.Targeting.GroupsTargetingDataEditor',
             Hippo.Targeting.GroupsTargetingDataEditor);
}());

The Javascript constructor specifies an editor and a renderer. The editor is the component used for editing the targeting data. In this case the editor is a checkbox group, but any Ext.form.Field is possible. The default editor is a textfield. The renderer is a function that converts the data string returned by the collector to a value shown in the 'Edit Alter Ego' window. The groups renderer function simply returns the string as-is, except when it is empty.

Java API

com.onehippo.cms7.targeting.frontend.plugin.CollectorPlugin

Base class for collector plugins.

Plugin configuration properties:

  • collector( String, mandatory) The ID of the collector

  • plugin.class( String, mandatory) The Java class name of the characteristic plugin

com.onehippo.cms7.targeting.frontend.plugin.dayofweek.DayOfWeekCollectorPlugin

Plugin to alter the current day of the week.

com.onehippo.cms7.targeting.frontend.plugin.geo.GeoIPCollectorPlugin

Plugin to alter the location of the visitor.

Plugin configuration properties:

  • locations(multiple String) A list of location strings to show as selectable options in the editor. Each location string has the format "city | country | latitude | longitude".

com.onehippo.cms7.targeting.frontend.plugin.groups.GroupsCollectorPlugin

Plugin to alter the groups a visitor is a member of.

Plugin configuration properties:

  • excludes(multiple String) A list of regular expression of patterns of group names to exclude from showing as selectable options in the editor.

com.onehippo.cms7.targeting.frontend.plugin.referrer.ReferrerCollectorPlugin

Plugin to alter the referrer URL.

com.onehippo.cms7.targeting.frontend.plugin.returningvisitor.ReturningVisitorCollectorPlugin

Plugin to alter whether the visitor is new or returning.

Javascript API

Hippo.Targeting.CollectorPlugin

Base class for collector plugins. A collector plugin can define its own renderer and/or editor for targeting data.

Extends: Ext.util.Observable

Properties:

  • renderer (Mixed) Optional interceptor method that transforms the targeting data string to rendered data. See Ext.grid.Column.renderer for details.

  • editor ( Ext.form.Field) Optional form field for editing the targeting data string.

Hippo.Targeting.TargetingDataCheckboxGroup

Checkbox group for editing targeting data. The default implementation iterates over a configurable property in the targeting data and assumes each element is the name of a checkbox in the group. The names of all checked checkboxes are again converted to an array and set in the targeting data. Subclasses can provide their own implementation of the methods convertDataToCheckedArray and convertCheckedBoxesToData to customize this behavior.

Extends: Ext.form.CheckboxGroup

Properties:

  • targetingDataProperty ( String) The property in the targeting data object to iterate over. Must be serialized as a JSON array.

Methods:

  • convertDataToCheckedArrayStringtargetingData\) : Array Converts the targeting data to an array of booleans that indicates which checkboxes should be checked. The default implementation iterates over a configurable property of the targeting data and assumes each element is the name of a checkbox in the group.
    Parameters:
    targetingData(Object): the targeting data object serialized to JSON

    Returns:
    An array of booleans. The Nth boolean indicates whether the Nth checkbox should be checked or not.

  • ( ArraycheckedBoxes) : Array
    Converts an array of Ext.form.Checkbox objects to a targeting data string. The default implementation adds the the name of each checked box to an array and sets that array in the configured targeting data property.

    Parameters:
    checkedBoxes (Array): an array of Ext.form.Checkbox objects that are currently checked.

    Returns:
    A targeting data object

Hippo.Targeting.TargetingDataMessage

'Editor' for targeting data that only displays a string. Useful for only displaying a 'no options available' message instead of the normal editor.

Extends: Ext.form.DisplayField

Properties:

  • message ( String)
    The message to show.

Hippo.Targeting.TargetingDataRadioGroup

Radio group for editing targeting data. The default implementation assumes that targeting data string is the inputValue of the radio button in the group to select. Subclasses can provide their own implementation of the methods convertDataToInputValue and getValue to customize this behavior. Note that each radio button should have the same 'name' property to make them mutually exclusive. Also, commas in the radio button input values lead to incorrect behavior, so avoid those.

Extends: Ext.form.RadioGroup

Methods:

  • convertDataToInputValue( String data) : String
    Converts the targeting data string to the inputValue of the radio button that should be selected. The default implementation returns the data string as-is.

    Parameters:
    data (String): the data string as returned by the targeting data of the collector

    Returns:
    The input value of the radio button to select.

  • getValue(): String

    Returns the targeting data string that reflects the selected radio button. The default implementation returns the inputValue of the selected radio button, or an empty string if no radio button is selected.

 

This product includes GeoLite2 data created by MaxMind, available from http://www.maxmind.com.
Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?