Derived Data - BloomReach Experience - Open Source CMS

Derived Data

Description of the problem

Derived data are properties that are automatically calculated and set on a document during a session save. An example of derived data is the size of some (e.g. binary) property of a node. Such derived data might have to be stored on the node itself.

Since you don't control all the places where such properties might be set, the derived data needs to be calculated and set implicitly to prevent inconsistent data.

Another reason for this functionality is that the query languages available do not allow you to express all types of realistic queries. For example, XPath does not allow you to query for documents that have two properties that are equal to each other. Naively this could be written down as //*[@a=@b] but this yields no results, even though logically there are. Certain other queries are possible but have huge performance impacts. These are deliberate limitations in the query languages XPATH and JCR-SQL, not bugs.

Facility offered

As a solution for expressing efficient queries and for accessing information about the content without having to know and execute the procedure to obtain the data, Hippo Repository has the capability of triggering derived data functions. A derived data function computes properties that derived from other properties of the document. Derived properties may be put on and computed from the JCR node that represents a document, or on any descendant node in a document.

When editing a document which should contain such derived property, you should not set the value of the derived property yourself, instead, the repository will automatically compute the value of the property during  save(). Because the repository guarantees to recompute the property upon a save, the data will always be up to date.

In order for the repository to do this, it must be informed  when and  how to compute the properties.

  • "when" is determined by the JCR nodetype of the data. The repository can be configured to compute a property of a certain node type.
  • "how" to compute a property is to be implemented by a class implementing the derived data function interface.


We will outline how to define, configure and use derived data functions based on a simple example to compute Pythagorean theorem.

Defining the data for which to compute properties

We define a document type that is a core shape definition:

[sample:shape] > hippo:document
- sample:a (double)
- sample:b (double)

And subsequently a definition that can be added as mixin type to the shape definition to indicate the shape is a triangle:

[sample:triangle] > hippo:derived mixin
- sample:c (double)

To indicate certain properties of this type sample:triangle are to be computed using the procedure of derived data we must extend from the  hippo:derived mixin node type.

Configuring the repository to compute derived properties for this data

Now we need to configure in the repository how to compute the derived property field of sample:triangle. These procedures are defined in the JCR repository under  /hippo:configuration/hippo:derivatives. To compute the c property we can enter the following JCR definition

    jcr:primaryType: hipposys:derivativesfolder
      jcr:primaryType: hipposys:deriveddefinition
      hipposys:nodetype: sample:triangle
      hipposys:classname: sample.PythagoreanTheorem
      hipposys:serialver: 1
        jcr:primaryType: hipposys:propertyreferences
          jcr:primaryType: hipposys:relativepropertyreference
          hipposys:relPath: sample:a
          jcr:primaryType: hipposys:relativepropertyreference
          hipposys:relPath: sample:b
        jcr:primaryType: hipposys:propertyreferences
          jcr:primaryType: hipposys:relativepropertyreference
          hipposys:relPath: sample:c

First, the  hipposys:nodetype property defines the nodetype which contains the properties that should be derived. For any change to nodes of this type, this derived data definition indicates the function to be executed.

The  hipposys:classname property contains the name of the class that should extend the base class  org.hippoecm.repository.ext.DerivedDataFunction. The class PythagoreanTheorem must have a no argument public constructor. The number stated in the  hipposys:serialver property should match the  serialVersionUID field in the implementing class  sample.PythagorieanTheorem. The definitions in hippo:accessed and hippo:derived node structure indicate the input and output parameters to the derived data function. Here we indicate that relative to the node of type  sample:triangle there are two input properties:  sample:a and  sample:b. The  hipposys:relPath properties indicate the relative path to the subject node for which the computation takes place. The value of these two properties are entered as keys "a" and "b" (the name of the  hipposys:relativepropertyreference nodes) in a Map the compute method implemented by PythagoreanTheorem takes as input:

public Map<String,Value[]> compute(Map<String,Value[]> parameters);

As result the compute method should return a map where under the key " c" the value for the derived property sample:c can be found. The definition also states the (possibly multiple) computed results by the function as nodes under  hippo:derived. The  hipposys:relPath again indicates the relative path to the property. The  hipposys:relPath may indicate any property below the document for which properties are computed. It may not contain references to other documents.

Supplying the method that computes the derived property

The configuration indicates which class should be used to compute the data. This class must extend  the org.hippoecm.repository.ext.DerivedDataFunction base class and implement the compute method. Since derived data is a Repository function, add this class to the cms module of your project and not the site module.

package sample;

import org.hippoecm.repository.ext.DerivedDataFunction;

public static class PythagoreanTheorem extends DerivedDataFunction {
  static final long serialVersionUID = 1;

  public Map<String,Value[]> compute(Map<String,Value[]> parameters) {
    double a = parameters.get("a")[0].getDouble();
    double b = parameters.get("b")[0].getDouble();
    double c = Math.sqrt(a * a + b * b);
    parameters.put("c", new Value[] { getValueFactory().createValue(c) });
    return parameters;

This class can be packaged in a normal plug-in. Upon any change the properties will be computed. Current limitations give however one exception, imported data is not recomputed and must be already correct.

Deriving Data From Another Node

As stated above derived properties may be put on and computed from the JCR node that represents a document, or on any descendant node in a document. In some use cases this is not sufficient. Take for example the following typical node structure representing a document:

  jcr:primaryType: hippo:handle
  hippo:name: "Pretty Name"
    jcr:primaryType: myproject:newsdocument
    myproject:title: "Pretty Name"
    hippostd:state: draft

(node and properties not relevant to the example left out)

There is a hippo:handle node document with one myproject:newsdocument child node with the same name, representing the draft variant of the document. In addition the hippo:handle node has a property hippo:name (from the hippo:named mixin) holding the "pretty name" of the document.

The document's pretty name is entered by the user in the new document dialog when creating a document. Suppose you want to store the same pretty name for the myproject:title property of the new document draft so that the user does not have to enter it again. A Derived Data Function would be a convenient way to implement this. However, the pretty name is not stored on the document node or one of its descendants, but rather on a parent node (the handle). A regular  hipposys:relativepropertyreference node can't be used. In such a use case you can use a  hipposys:resolvepropertyreference node and reference the sibling node's property as  ../hippo:name.

    jcr:primaryType: hipposys:derivativesfolder
      jcr:primaryType: hipposys:deriveddefinition
      hipposys:nodetype: myproject:newsdocument
      hipposys:classname: org.example.NewsDocumentTitle
      hipposys:serialver: 1
        jcr:primaryType: hipposys:propertyreferences
          jcr:primaryType: hipposys:resolvepropertyreference
          hipposys:relPath: ../hippo:name
        jcr:primaryType: hipposys:propertyreferences
          jcr:primaryType: hipposys:relativepropertyreference
          hipposys:relPath: myproject:title