HTML cleaning - BloomReach Experience - Open Source CMS

This article covers a Hippo CMS version 10. There's an updated version available that covers our most recent release.

23-03-2018

HTML cleaning

The contents of HTML fields can be cleaned both on client-side and the server-side.

Client-side

Client-side HTML cleaning is done by CKEditor itself. This feature is called Advanced Content Filter (ACF). Each plugin and command added to or removed from CKEditor influences the allowed HTML. For example, when there is no plugin to add an image, <img> tags will be removed automatically. This filtering also applies to attributes, which can for instance be allowed or required. ACF can also be controlled per editor instance via the configuration property ' extraAllowedContent'.

More information on ACF and how to configure it can be found at the CKEditor documentation website.

Disable client-side HTML cleaning

ACF is enabled by default. To disable ACF, set the CKEditor property ' allowedContent' to 'true':

ckeditor.config.overlayed.json:

{
  "allowedContent": true
}

Server-side

Server-side HTML cleaning is done by an HTML cleaner service. A CKEditor field uses the HTML cleaner service with the ID in the configuration property ' htmlcleaner.id'.

By default the HTML Cleaner is used. The HTML Cleaner checks, cleans and corrects the output of rich-text fields. The configuration of the HTML Cleaner works on the basis of a white list that defines which elements are allowed and which attributes they may contain. If an element, attribute is not configured as allowed, it is stripped from the output (text nodes from elements are preserved).

Server-side HTML cleaning also removes any usage of the javascript: protocol and, as of Bloomreach Experience Manager 10.2.8, the data: protocol within <a> href and <object> data attributes.

Configuration

The configuration is located at

/hippo:configuration/hippo:frontend/cms/cms-services/filteringHtmlCleanerService

The properties of this node are:

  • charset: the character set of the output. Defaults to UTF-8.
  • serializer: the type of serializer to use. Valid values are pretty, compact, and simple. Defaults to simple.
  • service.id: the ID of the HTML cleaner service. Defaults to org.hippoecm.frontend.plugins.richtext.DefaultHtmlCleanerService.
  • omitComments: whether to strip comments from the html. Defaults to false.
  • filter: whether to apply whitelist filtering. Defaults to true.

A child node called whitelist contains a list of nodes that define whitelisted HTML elements. The name of such a node corresponds with the element name to allow. These white list element nodes may contain a multivalued property called attributes to list the allowed HTML attributes on the element.

The pretty and compact serializers add some whitespace characters to the HTML source in order to make it human readable. This may result in some unwanted spacing when using super or sub scripts. For this reason, the default serializer is  simple.

Disable server-side HTML cleaning

Change the configuration property ' htmlcleaner.id' to an empty string, or remove it altogether.

To thoroughly check whether the server-side HTML cleaner is enabled or disabled, set the log level of org.hippoecm.frontend.plugins.ckeditor.AbstractCKEditorPlugin to INFO.

 

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?