Write an Updater Script - BloomReach Experience - Open Source CMS

This article covers a Hippo CMS version 12. There's an updated version available that covers our most recent release.

02-08-2018

Write an Updater Script

Introduction

Goal

Write a Groovy Updater Script to perform bulk changes to repository content.

Background 

In order to perform bulk changes to existing content in a running repository, developers have the option to write updater scripts in the Groovy language. Updater scripts have access to the full JCR API.

With Great Power Comes Great Responsibility

Updater scripts can modify large parts of your repository. Use them with care.

Security

The scripts are executed via a custom Groovy ClassLoader which protects against obvious and trivial mistakes and misuse (for example invoking System.exit()). However this is not intended to provide a fully protected Groovy sandbox. This means that technically Groovy Updater scripts can be used to execute external programs, possibly compromising the server environment.
Therefore protection against incorrect usage of Groovy updater scripts must be enforced by limiting the access and usage to trusted developers and administrators only.

Create a New Script

Log into the CMS as admin.

Open the Admin perspective, select Updater Editor, and click on the New button.

Enter a Name for the script.

All other options are execution options, see Run an Updater Script for more information.

Implement NodeUpdateVisitor

Updater scripts are written in Groovy and must implement the interface NodeUpdateVisitor:

/**
 * Visitor for updating repository content. Replaces
 * {@link org.hippoecm.repository.ext.UpdaterModule}s for all update tasks
 * except backward incompatible node type changes.
 */
public interface NodeUpdateVisitor {

    /**
     * Allows initialization of this updater. Called before any other method is
     * called.
     *
     * @param session a JCR {@link Session} with system credentials
     * @throws RepositoryException when thrown, the updater will not be run by
     *         the framework
     */
    void initialize(Session session) throws RepositoryException;

    /**
     * Update the given node.
     *
     * @param node  the {@link Node} to be updated
     * @return  <code>true</code> if the node was changed, <code>false</code>
     *          if not
     * @throws RepositoryException  if an exception occurred while updating
     *         the node
     */
    boolean doUpdate(Node node) throws RepositoryException;

    /**
     * Revert the given node. This method is intended to be the reverse of the
     * {@link #doUpdate} method.
     * It allows update runs to be reverted in case a problem arises due to the
     * update. The method should throw an {@link UnsupportedOperationException}
     * when it is not implemented.
     *
     * @param node  the node to be reverted.
     * @return  <code>true</code> if the node was changed, <code>false</code>
     *          if not
     * @throws RepositoryException  if an exception occurred while reverting
     *         the node
     * @throws UnsupportedOperationException if the method is not implemented
     */
    boolean undoUpdate(Node node) throws RepositoryException,
                                         UnsupportedOperationException;

    /**
     * Allows cleanup of resources held by this updater. Called after an
     * updater run was completed.
     */
    void destroy();

}

Most scripts will extend the base class BaseNodeUpdateVisitor, which provides a logger and default (no-op) implementations of the methods initialize and destroy

The updater engine uses the visitor pattern. For each visited node, the updater engine will call the script method doUpdate. When the script modifies the node in any way, it should notify the updater engine by returning true from that method.

The default updater script only logs the paths of all visited nodes:

package org.hippoecm.frontend.plugins.cms.admin.updater

import org.onehippo.repository.update.BaseNodeUpdateVisitor
import javax.jcr.Node
import javax.jcr.RepositoryException
import javax.jcr.Session

class UpdaterTemplate extends BaseNodeUpdateVisitor {

  boolean logSkippedNodePaths() {
    return false // don't log skipped node paths
  }

  boolean skipCheckoutNodes() {
    return false // return true for readonly visitors and/or updates unrelated to versioned content
  }

  Node firstNode(final Session session) throws RepositoryException {
    return null // implement when using custom node selection/navigation
  }

  Node nextNode() throws RepositoryException {
    return null // implement when using custom node selection/navigation
  }

  boolean doUpdate(Node node) {
    log.debug "Updating node ${node.path}"
    return false
  }

  boolean undoUpdate(Node node) {
    throw new UnsupportedOperationException('Updater does not implement undoUpdate method')
  }

}

The node parameter is a javax.jcr.Node object with which to gain full JCR access to the repository.

See example 1 (Add a property) at Groovy Updater Scripts Examples for a basic implementation.

Implement Optional Features

Parameters

If your updater script can be reused multiple times without modification of the source, it is useful to set parameters and let your script read the parameters instead of using hard-coded values.

Parameters can be specified in the execution options as a valid JSON string which defines a map of parameter name (String) and parameter value (Object) pairs.

In your script, you may access the parameters by using the parametersMap variable. For example, if you set Parameters to { "basePath": "/content/documents/myhippoproject/news", "tag" : "gogreen" }, then you can access those parameters anywhere (e.g, in #initialize(Session) or #doUpdate(Node) method) in your updater script as follows:

def basePath = parametersMap["basePath"]
def tag = parametersMap["tag"]
log.debug "basePath: ${basePath}, tag: ${tag}"

Undo

An updater script can support easy undo of its modifications by implementing the undoUpdate method. That method should revert a node back to the state before doUpdate was called.

Example 1 (Add a property) at Groovy Updater Scripts Examples implements undoUpdate.

Custom Node Visiting Logic

This feature is available since Bloomreach Experience Manager v12.1.1 (also backported to v12.0.4, v11.2.5 and v10.2.9).

Typically, the nodes visited by the script are specified (in the execution options) by either an XPath query or a repository path. Alternatively, an updater script can provide the logic for navigating one or more nodes to visit, by  implementing (overriding) the following two methods provided by the BaseNodeUpdateVisitor base class of the UpdaterTemplate script:

/**
 * Initiates the retrieval of the nodes when using custom, instead of path or xpath (query) based, node
 * selection/navigation, returning the first node to visit. Intended to be overridden, default implementation returns null.
 * @param session
 * @return first node to visit, or null if none found
 * @throws RepositoryException
 */
public Node firstNode(final Session session) throws RepositoryException {
    return null;
}

/**
 * Return a following node, when using custom, instead of path or xpath (query) based, node selection/navigation.
 * Intended to be overridden, default implementation returns null.
 * @return next node to visit, or null if none left
 * @throws RepositoryException
 */
public Node nextNode() throws RepositoryException {
    return null;
}

A contrived example usage (visiting all nodes of type hippo:document, e.g. similar to just specifying a XPath query:  //element(*, hippo:document) is:

private NodeIterator nodeIterator;

Node firstNode(final Session session) throws RepositoryException {
    final javax.jcr.query.QueryManager queryManager = session.getWorkspace().getQueryManager();
    final javax.jcr.query.Query jcrQuery = queryManager.createQuery("//element(*, hippo:document)", "xpath");
    nodeIterator = jcrQuery.execute().getNodes();
    return nextNode();
}

Node nextNode() throws RepositoryException {
    return nodeIterator.hasNext() ? nodeIterator.next() : null;
}

The difference with using a Repository path or XPath query based Updater is that those will first query/iterate through all nodes to be visited before calling the script method doUpdate(Node) method, while (in the above example) that method will be invoked during the query iteration. Which may be (in some use-cases) more efficient. In addition, this way a long running updater script can be cancelled during the query iteration and the node update process, whereas otherwise this only is possible during the node update process.
A different, not advisable, approach sometimes used is with an XPath query to select the rep:root node and implement all custom processing within the (single) doUpdate method call. Which works but cannot be cancelled

Override Default Behavior

There are two boolean function methods provided in the BaseNodeUpdateVisitor which sometimes might be worthwhile to override the default behavior:

skipCheckoutNodes(): by default (returning false) before visiting a node through the doUpdate method, it will be checked out if necessary to ensure updating the node actually is allowed. If however the updater script only is used for querying and reporting, or performing updates unrelated to versionable content, then unnecessarily checking out nodes can cause substantial overhead. In that case, this method can be modified (overridden) to return true instead.

/**
 * Overridable boolean function to indicate if node checkout can be skipped (default false)
 * @return true if node checkout can be skipped (e.g. for readonly visitors and/or updates unrelated to versioned content)
 */
public boolean skipCheckoutNodes() {
    return false;
}
This feature is available since Bloomreach Experience Manager v12.1.1 (also backported to v12.0.4, v11.2.5 and v10.2.9).

logSkippedNodePaths(): by default (returning true) all visited node paths for which the doUpdate method returned false are (also) logged as a separate audit trail in the repository. If this is a substantional number of nodes skipped and the audit trail is not needed, this method can be modified (overridden) to return false instead.

/**
 * Overridable boolean function to indicate if skipped node paths should be logged (default true)
 * @return true if skipped node paths should be logged
 */
public boolean logSkippedNodePaths() {
    return true;
}
This feature is available since Bloomreach Experience Manager v12.1.1 (also backported to v12.0.4, v11.2.5 and v10.2.9).

Manually Report Updated/Skipped/Failed Nodes

The updater engine automatically records the updated, skipped or failed count on every invocation on #doUpdate(Node) method by default. So, if each unit task of the update process in your updater script matches with each node iteration based on either path or query configuration, this automatic recording and batch processing by the updater engine should be good enough.

However, if your updater script doesn't match with the node iteration based on either path or query configuration but it makes a query and iterates nodes manually, then the generated report would not reflect what the updater script really executed. Such a script can't take advantage of using 'Dry run' option, and its execution is not controlled by the batch processing of the updater engine with the batch size configuration, either. Even worse, it may cause an impactful system overhead (e.g, consuming too much memory) due to uncontrolled batch updates.

To address the potential problem mentioned above, an updater script may report the updated/skipped/failed nodes manually by using visitorContext variable (type of org.onehippo.repository.update.NodeUpdateVisitorContext).

Here's an example using visitorContext to report the updated news document count after changing a field in a manual node iteration:

/**
 * ExampleNewsDocumentDateFieldUpdateDemoVisitor is a script that does manual node iteration
 * in an original iteration cycle and reports updated node manually in order to be aligned
 * with the built-in batch commit/revert feature of the updater engine for demonstration purpose.
 */
package org.hippoecm.frontend.plugins.cms.admin.updater

import org.onehippo.repository.update.BaseNodeUpdateVisitor
import java.util.*
import javax.jcr.*
import javax.jcr.query.*

class ExampleNewsDocumentDateFieldUpdateDemoVisitor extends BaseNodeUpdateVisitor {

  boolean doUpdate(Node node) {
    log.debug "Visiting node at ${node.path} just as an entry point in this demo."
    
    // new date field value from the current time
    def now = Calendar.getInstance()
    
    // do manual query and node iteration
    def query = node.session.workspace.queryManager.createQuery("//element(*,demosite:newsdocument)", "xpath")
    def result = query.execute()
    
    for (NodeIterator nodeIt = result.getNodes(); nodeIt.hasNext(); ) {
      def newsNode = nodeIt.nextNode()
      newsNode.setProperty("demosite:date", now)
      // report updated to the engine manually here.
      visitorContext.reportUpdated(newsNode.path)
    }
    
    return false
  }

  boolean undoUpdate(Node node) {
    throw new UnsupportedOperationException('Updater does not implement undoUpdate method')
  }

}

In the example shown above, it invokes visitorContext.reportUpdated(path) method after setting "demosite:date" property. And so, the updater engine can be aware of how many nodes were updated and do the batch processing (either save or discard session) properly based on the batch size configuration.

Remarks

Default Imports

By default all of the main JCR API packages are already imported by the script classloader: javax.jcr, javax.jcr.nodetype, javax.jcr.security, and javax.jcr.version. You should not have to import package members explicitly anymore.

Restrictions

Some basic restrictions apply to the calls you can make and the classes you can use from your script. Interaction with the local filesystem has been disabled, the following classes cannot be used: java.io.File, java.io.FileDescriptor, java.io.FileInputStream, java.io.FileInputStream, java.io.FileOutputStream, java.io.FileWriter, java.io.FileReader, along with the following packages: java.nio.file, java.net, javax.net, javax.net.ssl. It is also not possible to use reflection, calling Class.forName is illegal and you can't use the package java.lang.reflect. Calling System.exit is also prevented.

There can be additional limitations with respect to the accessible classpath when automatically executing an updater script at startup (see Run an Updater Script), depending on in which environment it is executed.
In a delivery-tier-only environment, only the functionality provided by the Hippo Repository might be available on the classpath.

Portability

The scripts, when executed from within the Updater Editor, are using a classloader in the CMS application context. Therefore, all libraries packaged with your CMS application are available to use by your script. If, however, you wish to develop scripts that can be reused in multiple projects you should take care not to use libraries that are only packaged with that project. The safest bet would be to only use libraries and APIs that are available in the shared class loader only but availability of libraries such as commons-collections and guava can be depended on with some confidence as well.
Furthermore, for automatically executed scripts during startup (see Run an Updater Script) possibly only classes in the Repository context might be available in a delivery-tier only environment.

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?