This article covers a Bloomreach Experience Manager version 13. There's an updated version available that covers our most recent release.

Derived Data

Description of the problem

Derived data are properties that are automatically calculated and set on a document during a session save. An example of derived data is the size of some (e.g. binary) property of a node. Such derived data might have to be stored on the node itself.

Since you don't control all the places where such properties might be set, the derived data needs to be calculated and set implicitly to prevent inconsistent data.

Another reason for this functionality is that the query languages available do not allow you to express all types of realistic queries. For example, XPath does not allow you to query for documents that have two properties that are equal to each other. Naively this could be written down as //*[@a=@b] but this yields no results, even though logically there are. Certain other queries are possible but have huge performance impacts. These are deliberate limitations in the query languages XPATH and JCR-SQL, not bugs.

Facility offered

As a solution for expressing efficient queries and for accessing information about the content without having to know and execute the procedure to obtain the data, Hippo Repository has the capability of triggering derived data functions. A derived data function computes properties that derived from other properties of the document. Derived properties may be put on and computed from the JCR node that represents a document, or on any descendant node in a document.

When editing a document which should contain such derived property, you should not set the value of the derived property yourself, instead, the repository will automatically compute the value of the property during  save(). Because the repository guarantees to recompute the property upon a save, the data will always be up to date.

In order for the repository to do this, it must be informed  when and  how to compute the properties.

  • "when" is determined by the JCR nodetype of the data. The repository can be configured to compute a property of a certain node type.
  • "how" to compute a property is to be implemented by a class implementing the derived data function interface.

Usage

We will outline how to define, configure and use derived data functions based on a simple example to compute Pythagorean theorem.

Defining the data for which to compute properties

We define a document type that is a core shape definition:

[sample:shape] > hippo:document
- sample:a (double)
- sample:b (double)

And subsequently a definition that can be added as mixin type to the shape definition to indicate the shape is a triangle:

[sample:triangle] > hippo:derived mixin
- sample:c (double)

To indicate certain properties of this type sample:triangle are to be computed using the procedure of derived data we must extend from the  hippo:derived mixin node type.

Configuring the repository to compute derived properties for this data

Now we need to configure in the repository how to compute the derived property field of sample:triangle. These procedures are defined in the JCR repository under  /hippo:configuration/hippo:derivatives. To compute the c property we can enter the following JCR definition

/hippo:configuration:
  /hippo:derivatives:
    jcr:primaryType: hipposys:derivativesfolder
    /pythagorean:
      jcr:primaryType: hipposys:deriveddefinition
      hipposys:nodetype: sample:triangle
      hipposys:classname: sample.PythagoreanTheorem
      hipposys:serialver: 1
      /hippo:accessed:
        jcr:primaryType: hipposys:propertyreferences
        /a:
          jcr:primaryType: hipposys:relativepropertyreference
          hipposys:relPath: sample:a
        /b:
          jcr:primaryType: hipposys:relativepropertyreference
          hipposys:relPath: sample:b
      /hippo:derived:
        jcr:primaryType: hipposys:propertyreferences
        /c:
          jcr:primaryType: hipposys:relativepropertyreference
          hipposys:relPath: sample:c

First, the  hipposys:nodetype property defines the nodetype which contains the properties that should be derived. For any change to nodes of this type, this derived data definition indicates the function to be executed.

The  hipposys:classname property contains the name of the class that should extend the base class  org.hippoecm.repository.ext.DerivedDataFunction. The class PythagoreanTheorem must have a no argument public constructor. The number stated in the  hipposys:serialver property should match the  serialVersionUID field in the implementing class  sample.PythagorieanTheorem. The definitions in hippo:accessed and hippo:derived node structure indicate the input and output parameters to the derived data function. Here we indicate that relative to the node of type  sample:triangle there are two input properties:  sample:a and  sample:b. The  hipposys:relPath properties indicate the relative path to the subject node for which the computation takes place. The value of these two properties are entered as keys "a" and "b" (the name of the  hipposys:relativepropertyreference nodes) in a Map the compute method implemented by PythagoreanTheorem takes as input:

public Map<String,Value[]> compute(Map<String,Value[]> parameters);

As result the compute method should return a map where under the key " c" the value for the derived property sample:c can be found. The definition also states the (possibly multiple) computed results by the function as nodes under  hippo:derived. The  hipposys:relPath again indicates the relative path to the property. The  hipposys:relPath may indicate any property below the document for which properties are computed. It may not contain references to other documents.

Supplying the method that computes the derived property

The configuration indicates which class should be used to compute the data. This class must extend  the org.hippoecm.repository.ext.DerivedDataFunction base class and implement the compute method. Since derived data is a Repository function, add this class to the cms module of your project and not the site module.

package sample;

import org.hippoecm.repository.ext.DerivedDataFunction;

public static class PythagoreanTheorem extends DerivedDataFunction {
  static final long serialVersionUID = 1;

  public Map<String,Value[]> compute(Map<String,Value[]> parameters) {
    double a = parameters.get("a")[0].getDouble();
    double b = parameters.get("b")[0].getDouble();
    double c = Math.sqrt(a * a + b * b);
    parameters.put("c", new Value[] { getValueFactory().createValue(c) });
    return parameters;
  }
}

This class can be packaged in a normal plug-in. Upon any change the properties will be computed. Current limitations give however one exception, imported data is not recomputed and must be already correct.

Deriving Data From Another Node

As stated above derived properties may be put on and computed from the JCR node that represents a document, or on any descendant node in a document. In some use cases this is not sufficient. Take for example the following typical node structure representing a document:

/document:
  jcr:primaryType: hippo:handle
  hippo:name: "Pretty Name"
  /document:
    jcr:primaryType: myproject:newsdocument
    myproject:title: "Pretty Name"
    hippostd:state: draft

(node and properties not relevant to the example left out)

There is a hippo:handle node document with one myproject:newsdocument child node with the same name, representing the draft variant of the document. In addition the hippo:handle node has a property hippo:name (from the hippo:named mixin) holding the "pretty name" of the document.

The document's pretty name is entered by the user in the new document dialog when creating a document. Suppose you want to store the same pretty name for the myproject:title property of the new document draft so that the user does not have to enter it again. A Derived Data Function would be a convenient way to implement this. However, the pretty name is not stored on the document node or one of its descendants, but rather on a parent node (the handle). A regular  hipposys:relativepropertyreference node can't be used. In such a use case you can use a  hipposys:resolvepropertyreference node and reference the sibling node's property as  ../hippo:name.

/hippo:configuration:
  /hippo:derivatives:
    jcr:primaryType: hipposys:derivativesfolder
    /title:
      jcr:primaryType: hipposys:deriveddefinition
      hipposys:nodetype: myproject:newsdocument
      hipposys:classname: org.example.NewsDocumentTitle
      hipposys:serialver: 1
      /hippo:accessed:
        jcr:primaryType: hipposys:propertyreferences
        /message:
          jcr:primaryType: hipposys:resolvepropertyreference
          hipposys:relPath: ../hippo:name
      /hippo:derived:
        jcr:primaryType: hipposys:propertyreferences
        /title:
          jcr:primaryType: hipposys:relativepropertyreference
          hipposys:relPath: myproject:title

Enforce Generating Multiple-Value Properties

This feature is available since Bloomreach Experience Manager 13.4.3.

When a derived data function runs for the first time on a document (for example when a new document is saved for the first time), new, derived properties will be written to the document node. The derived data function returns, for each such property, an array of Value instances, even for properties intended to be single-valued. 

By default, the derived data engine makes the assumption that the new property is intended to be single, so it creates a single-valued property and it stores only the first Value instance from the returned array.

Enforcing the created property to be multiple has been possible, via the (document) node type definition in the CND. Specifically, the derived properties can be registered along with the 'multiple' modifier to mark them as multiple:

[myproject:mytypewithderiveddata] > hippo:document, hippostd:relaxed
  - myproject:mymultiplederivedproperty (string) multiple

This approach is not recommended however, especially when using relaxed CNDs, and in general it is advisable to keep node type definitions as simple as possible.

Since 14.2.0, 14.1.1, and 13.4.3, a new approach exists for marking derived properties as multiple. The boolean property hipposys:multivalue is available, which can be used on output (under /hipposys:derived) hipposys:relativepropertyreference nodes:

/myderiveddatafunction:
  ...
  /hipposys:accessed:
    ...
  /hipposys:derived
    jcr:primaryType: hipposys:propertyreferences
    /mymultiplederivedproperty:
      jcr:primaryType: hipposys:relativepropertyreference
      hipposys:relPath: myproject:mymultiplederivedproperty
      hipposys:multivalue: true

The new configuration property is only applicable for new properties created via the derived data function. In other words, adding or changing the multivalue flag, during the lifetime of a project, will not result in changing existing properties on documents (from single to multiple or vice versa). This is also the case when the CND approach is used. In general, changing this behaviour in a running environment (via either approach) can result in inconsistencies in a project's content, where the same derived property is single in some documents and multiple in others.

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?