Analyzing Java heap dumps - BloomReach Experience - Open Source CMS

Hints and tips for analyzing Java heap dumps

Tools

There are a lot more!

Approaches

  • Examine threads to get an idea on what's going on. Your tool may provide a thread in which an OOM happened. This can be a lead to figure out the cause, but not necessarily because the thread can also be just a tipping point. When on Tomcat, a thread "catalina-exec-xyz" can be matched to a URL (see later).
  • Examine the biggest objects: Dominators, Top Consumers, Leak Suspects that the tool provides.

Some known objects

PersistenceManager / Bundle Cache 
E.g. org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager.
Corresponds to the repository, also known as the bundle cache. Sizes up to 200MB are not ununsual, depending on data size and settings, e.g. bundle cache size.

HippoLocalItemStateManager
org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager objects correspond to JCR sessions. Sizes up to 20MB are not unusual. There should not be too many of them, some tens are normal.

HstManager
org.hippoecm.hst.configuration.model.HstManagerImpl objects contain the HST model. They can be a couple of tens of MBs, depending on the HST configuration involved. 1 or 2 instances only; can be one for live and one for preview.

HstRequestContext
org.hippoecm.hst.site.request.HstRequestContextImpl, one for an HST request, typically from a browser. Commonly tied to a 'catalina-exec' thread. Some valuable properties are contained in the baseURL object within this class:
  requestContext.baseURL.hostName="www.mydomain.com"
  requestContext.baseURL.requestPath="/mypath"
  requestContext.baseURL.parameterMap.table.map.table.key=myparameter
  requestContext.baseURL.parameterMap.table.map.table.value=myparametervalue

org.hippoecm.frontend.Main
If present, a CMS was deployed, else just a site/delivery tier.

Matching a big object to a URL

1) Match a big object to an HTTP request.
From an object, use "Path to GC roots" function (or similar). It should show the root object which is 'catalina-exec-xyz' if the object is part of an incoming request to a Tomcat, typically a browser request.

2) Match HST request context to the same HTTP request.
Try to find a RequestContextImpl that has the same 'catalina-exec-xyz' as found earlier. In its properties you will find what this request was, looking at the baseURL. Search for RequestContextImpl objects using OQL if that is supported, see below.

Using Object Query Language (OQL)

If supported by your tool, use OQL to list objects, see https://en.wikipedia.org/wiki/Object_Query_Language

Examples: all HST Request Contexts, HST Requests, HST Managers:
  select * from org.hippoecm.hst.site.request.HstRequestContextImpl
  select * from org.hippoecm.hst.core.component.HstRequestImpl
  select * from org.hippoecm.hst.configuration.model.HstManagerImpl

Some known out-of-memory cases

Exploding query
A thread with a lot of org.apache.lucene.search.* objects like BooleanQuery, BooleanScorer2, TermScorer: it can be an exploding query, meaning one that has too many hits, taking up too much memory while processing those.

For analysis, find the URL involved from the HstRequestContext to map it back to the page and components involved. If it's an HST query, the actual JCR XPath queries themselves can be found within the thread.

An exploding query can also be a result of a misconfigured Faceted Navigation; e.g., when no limits are set or when inefficient faceted query is used.

Custom cache with node references
Custom cache implementations have been seen to cause OOM when they cache JCR nodes or properties, because these then will never be cleaned up. Simple rule here is: never put  JCR node/property references into cache.

Too many JCR sessions
A lot of HippoLocalItemStateManager objects may mean an exhausted JCR session pool.

Groovy script with large JCR session
A custom Groovy script can take up a lot of memory if it tries to save a lot of changes at once.
Pointers: the thread view shows a thread involving org.onehippo.repository.update.UpdaterExecutor, a big bundle cache objects.

URL rewriter having an infinite redirects
When wrongly configured, e.g. with a redirect loop, the URL rewriter can pollute the memory with large String objects containing for example: "/https:/www.mydomain.nl/https:/www.mydomain.nl/https:/www.mydomain.nl..."

Large YAML (XML) download from console
If a user tries to download an oversized node structure from the console, this can cause OOM.
Pointers: the thread view shows a thread involving org.onehippo.cm.engine.ConfigurationServiceImpl#exportContent or alike.