Node name encoding - BloomReach Experience - Open Source CMS

Node name encoding

Introduction

By default, documents and folders in the CMS have two name values:

  • display name - a translatable string value that is meant for display on a webpage (like a breadcrumb value, a link name or a page title) or in the document listing in the CMS.
  • node name - the actual node name as stored in the repository, which is also the value used by the HST for constructing a URL.

Both values are encoded using an implementation of the org.hippoecm.repository.api.StringCodec interface to ensure unsupported charcters are either removed or replaced with a supported character.

For display names, the class org.hippoecm.repository.api.StringCodecFactory$IdentEncoding is used which simply returns the input value as is. For node names, the class org.hippoecm.repository.api.StringCodecFactory$UriEncoding  is used which, as its name implies, performs a one-way encoding (no decoding possible) for translating any UTF-8 String to a suitable set of characters that can be used in URIs. See this page for a detailed explanation.

Configuration

Both codecs can be configured in the repository. The repository location depends on the version of Hippo CMS.

Hippo CMS version Repository location
12.1 and older

/hippo:configuration/hippo:frontend/cms/cms-services/settingsService/codecs

12.2 and newer

/hippo:configuration/hippo:modules/stringcodec/hippo:moduleconfig

Both locations accept two properties named encoding.display and encoding.node. As a property value you need to use the value returned by Class.getName(), e.g. org.hippoecm.repository.api.StringCodecFactory$UriEncoding.

Different node name encoding per locale

In some cases it is desirable to have a different StringCodec for encoding node names per locale. This way, the URLs constructed by the HST will be in the format that users (and machines) expect it to be for a related locale. For example, 'ä' is generally encoded as 'a' but in German it should be 'ae'.

To support this the configuration option for setting a node name codec has been extended. For example, a StringCodec for the German language can be configured with a property named encoding.node.de, or if you need to be more specific (like a different StringCodec for both Austrian and German), two properties should be added with the names encoding.node.de_de and encoding.node.de_at.

BloomReach Experience Manager ships with a default StringCodec implementation. If you decide to configure a custom StringCodec you will have to implement it yourself. A good starting point is class org.hippoecm.repository.api.StringCodecFactory$UriEncoding which can be found in the Hippo Repository project.