JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
enhancement_small.png
CLOSED  Enhancement JPPF-475  -  JMXMP large notifications lead to OutOfMemoryError
Posted Sep 10, 2016 - updated Sep 26, 2016
action_vote_minus_faded.png
0
Votes
action_vote_plus_faded.png
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Enhancement
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Category
    JMX connector
  • Resolution
    RESOLVED
  • Priority
    High
  • Reproducability
    Always
  • Severity
    Normal
  • Targetted for
    icon_milestones.png JPPF 5.2.2
Issue description
When sending large notifications via Task.fireNotification(largeObject, true), I can see that notifications are not sent. Eventually, the node will throw an OOME. Analysis of the resulting heap dump shows that the notifications from multiple sequential jobs are still in the JMXMP connector server's notification buffer.
Steps to reproduce this issue
Using 1 driver and one node with "-Xmx128m -XX:+HeapDumpOnOutOfMemoryError", submit a sequence of jobs with one or more tasks like this:
public class MyTask extends AbstractTask<String> {
  @Override
  public void run() {
    System.out.println("task executing");
    fireNotification(new byte[20 * 1024 * 1024], true);
  }
}
==> a heap dump will be generated

#3
Comment posted by
 lolo4j
Sep 24, 08:31
After analyzing the code of the JMXMP connector, it turns out that there is, in fact, no leak. The connector uses a notification buffer designed as a circular buffer where notification references are replaced when the buffer is full, but never explicitely released otherwise (apart from when the connector is closed). For instance, if the buffer size is 20, then the first notification will be kept until 20 more notifications are emitted, at which time it will be replaced with the 21st. The main goal of this design is to minimize the risk of losing notifications, at the expense of increased memory usage.

The notification buffer size can be specified as a system property with "-Djmx.remote.x.notification.buffer.size=<size>", which allows a mitigation of the impact on memory usage.

Consequently, I will change this bug report into an enhancement request and explore ways of minimizing the memory usage of the JMX notifications.

Currently, I'd like to explore these possibilities:
  • erase notifications as soon as they have been sent to at least one listener. Unfortuately, this doesn't preclude the possibility of OOME, it depends on whether any listener subscribed to the notifications and on the frequency at which notifications are emitted
  • use soft references in the notification buffer: this guarantees that all notifications are garbage-collected before an OOME is raised, but increases the risk of missing notifications
  • offload the notification's user data to external storage (disk, database, etc.) when memory usage reaches a configurable threshold. This is what we do for jobs received by the JPPF driver. This solution avoids OOMEs and does not increase the risk of missing notifications. Of course, offloaded notifications will have a performance impact since they will be much slower to write/read to/from external storage. No magic here. It also has the advantage that it can be implemented without changing the JMX connector code, since we can "simply" customize the handling of the user data in our notification class (org.jppf.management.TaskExecutionNotification).
#6
Comment posted by
 lolo4j
Sep 26, 09:36
Finally I went with a solution which offloads the user data of a notification to disk, when the used heap memory reaches a configurable threshold. In the process, I also noticed that the object wrapping defined for the JMXMP connector was needlessly serializing objects to memory, so I also changed that. It is now possible to emit notifications from the tasks such that the sum of their retained memory is higher than the heap size.

Implemented in: