Sitemap Plugin Configuration: Advanced Sitemap - BloomReach Experience - Open Source CMS

Sitemap Plugin Configuration: Advanced Sitemap

Description

The Advanced Sitemap component is a more advanced option (compared to the Basic Sitemap), which crawls a channel's HST sitemap (URL space) configuration in order to include its complete URL space in the sitemap feed.

The Advanced Sitemap component complies with the specifications provided by Google, which means that Google accepts and recognizes the sitemap feed. This will lead to better (i.e. higher percentage) findability of the site pages in Google.

It is possible to exclude URLs from the sitemap feed (see Configuration below).

Only canonical URLs (i.e. the shortest path to the same content) are included in the sitemap feed.

In case of more than 50,000 URLs, the sitemap can be split into multiple files and a sitemap index can be generated.

Component configuration /hst:hst/hst:configurations/hst:default/hst:components/forge-sitemap-based-on-hst-configuration-feed
Component class org.onehippo.forge.sitemap.components.SitemapFeedBasedOnHstSitemap

Configuration

URLs can be excluded based on a reference ID ( hst:refid), component configuration ID ( hst:componentconfigurationid) or content path (e.g. news/_default_/subject/_any_.html). These exclusions can be specified in the component configuration.

All the items in the sitemap are canonical, i.e. the location of each item is the shortest path to this item location. For instance, if there are two HST sitemap items news/_any_.html and archive/news/_any_.html and a document 'hello-world' can be reached via the URLs news/hello-world.html and archive/news/hello-world.html (i.e. 2 different URLs rendering the same document), then only the URL news/hello-world.html is included in the sitemap feed, because this is the shortest path and so leaving out the redundant URLs saving time and storage. See also Sitemap Index.

The Advanced Sitemap component is multi-threaded. It can use multiple threads to generate the sitemap feed. The number of threads (so called workers) can be configured manually in the component configuration by assigning a value to the amountOfWorkers parameter. Its default value, so the default number of threads, is 4.

Parameters

The Advanced Sitemap component provides the following configuration parameters:

Parameter Required Default Value Description
sitemapRefIdExclusions no   The reference ID's ( hst:refid) to exclude (comma separated)
splitter-enabled no   true: if sitemap needs to be split in multiple sitemaps
false: otherwise
splitter-destination-foldername no   Folder name of the sitemap index file
informationProvider no   Name of a custom class extending  DefaultUrlInformationProvider (provided by the plugin)
sitemapComponentConfigurationIdExclusions no   The component configuration ID's ( hst:componentconfigurationid) to exclude (comma separated), e.g. hst:pages/404
sitemapPathExclusions no   The sitemap paths to exclude (comma separated), e.g.  news/_default_/subject/_any_.html
write-to-repository no   true: if the sitemap index file may write to the repository
false: otherwise
amountOfWorkers no 4 The number of threads used to generate the sitemap

Security

The Advanced Sitemap component writes the generated sitemap feed to an XML file and stores it either in the content repository (as asset) or on the local file system.

To be able to write to the content repository, the sitewriter group needs to have readwrite permission on the following security domains:

  • defaultwrite
  • hippofolders
  • hippogallery

Example

Use the Console to configure the Advanced Sitemap component as follows.

Browse to the Sitemap URL configuration as provided by the Sitemap plugin at /hst:hst/hst:configurations/hst:default/hst:sitemap/sitemap.xml.

Change the property hst:componentconfigurationid to hst:components/forge-sitemap-based-on-hst-configuration-feed (replacing the reference to the Basic Sitemap component).

/hst:hst/hst:configurations/hst:default/hst:sitemap/sitemap.xml
  - hst:componentconfigurationid = hst:components/forge-sitemap-based-on-hst-configuration-feed

Since none of the configuration parameters are required, the sitemap feed will work in this configuration.

Point your web browser to http://localhost:8080/site/sitemap.xml to see your site's sitemap feed.

Use the hst:parameternames and hst:parametervalues properties to specify any configuration parameters, e.g. if you want to:

  • exclude documents which contain the refId's admin or private
  • use the sitmap splitter to split the sitemap into multiple sitemaps
  • set the destination folder for the splitter to sitemap-splitter-directory
  • set the class of the site specific configuration (as described above) to UrlSitemapRenderer which is in the package org.example.site.sitemap
  • exclude documents which contain the component configuration Id hst:pages/404
  • exclude documents which match to
    subject/_default_/_default_/news/_any_.html
    subject/_default_/_default_/faq/_any_.html
    info/_default_/_default_/news/_any_.html
    info/_default_/_default_/faq/_any_.html
    sitemap.xml
    draft
  • write to the repository
  • set the amount of workers (Threads) to 4

the configuration would look like this:

/hst:hst/hst:configurations/hst:default/hst:sitemap/sitemap.xml
  - hst:componentconfigurationid = hst:components/forge-sitemap-based-on-hst-configuration-feed
  - hst:parameternames = sitemapRefIdExclusions,
                         splitter-enabled,
                         splitter-destination-foldername,
                         informationProvider,
                         sitemapComponentConfigurationIdExclusion,
                         write-to-repository,
                         amountOfWorkers
  - hst:parametervalues = {
                            admin,
                            private
                          },
                          true,
                          sitemap-splitter-directory,
                          org.example.site.sitemap.UrlSitemapRenderer,
                          hst:pages/404,
                          {
                            subject/_default_/_default_/news/_any_.html,
                            subject/_default_/_default_/faq/_any_.html,
                            info/_default_/_default_/news/_any_.html,
                            info/_default_/_default_/faq/_any_.html,
                            sitemap.xml,
                            draft
                          },
                          true,
                          5

Point your web browser to http://localhost:8080/site/sitemap.xml to see your site's sitemap feed.

Note: your browser may not display the sitemap as XML. View the page source to see the sitemap XML.

Customization

Site-specific configuration can be done in a new Java class in your project (e.g. UrlSitemapRenderer). This class must extend the DefaultUrlInformationProvider class provided by the Sitemap plugin. By overriding DefaultUrlInformationProvider's methods you can adjust it to your own preferences. See the example below:

public class UrlSitemapRenderer extends DefaultUrlInformationProvider {

    @Override
    public String getLoc(final HippoBean hippoBean, final HstRequestContext requestContext) {
        return "Something else";
    }


    @Override
    public BigDecimal getPriority(final HippoBean hippoBean) {
        return 1.0;
    }
}
Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?