JPPF Issue Tracker
star_faded.png
Please log in to bookmark issues
bug_report_small.png
CLOSED  Bug report JPPF-517  -  Deadlock in the driver during stress test
Posted Sep 30, 2017 - updated Sep 30, 2017
icon_info.png This issue has been closed with status "Closed" and resolution "RESOLVED".
Issue details
  • Type of issue
    Bug report
  • Status
     
    Closed
  • Assigned to
     lolo4j
  • Progress
       
  • Type of bug
    Not triaged
  • Likelihood
    Not triaged
  • Effect
    Not triaged
  • Posted by
     lolo4j
  • Owned by
    Not owned by anyone
  • Category
    Server
  • Resolution
    RESOLVED
  • Priority
    High
  • Reproducability
    Always
  • Severity
    Critical
  • Targetted for
    icon_milestones.png JPPF 5.2.9
Issue description
While performing a stress test in the driver, I was mpnitoring witht he admin console and it showed in the JVM Health vie, the following deadlock:
Deadlock detected
 
- thread id 32 "JPPF NIO-0008" is waiting to lock java.util.concurrent.locks.ReentrantLock$NonfairSync@7494873c which is held by thread id 30 "JPPF NIO-0006"
- thread id 30 "JPPF NIO-0006" is waiting to lock org.jppf.nio.SelectionKeyWrapper@20bda110 which is held by thread id 29 "JPPF NIO-0005"
- thread id 29 "JPPF NIO-0005" is waiting to lock java.util.concurrent.locks.ReentrantLock$NonfairSync@7494873c which is held by thread id 30 "JPPF NIO-0006"
 
Stack trace information for the threads listed above
 
"JPPF NIO-0008" - 32 - state: WAITING - blocked count: 5932 - blocked time: 2065 - wait count: 247708 - wait time: 864535
  at sun.misc.Unsafe.park(Native Method)
  - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@7494873c
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
  at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
  at org.jppf.server.queue.JPPFPriorityQueue.addBundle(JPPFPriorityQueue.java:99)
  at org.jppf.server.nio.client.WaitingJobState.performTransition(WaitingJobState.java:87)
  at org.jppf.server.nio.client.WaitingJobState.performTransition(WaitingJobState.java:34)
  at org.jppf.nio.StateTransitionTask.run(StateTransitionTask.java:79)
  - locked org.jppf.nio.SelectionKeyWrapper@358ddcb3
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 
  Locked ownable synchronizers:
  - java.util.concurrent.ThreadPoolExecutor$Worker@7074c91e
 
"JPPF NIO-0006" - 30 - state: BLOCKED - blocked count: 6280 - blocked time: 212246 - wait count: 263218 - wait time: 669040
  at org.jppf.server.nio.client.CompletionListener.taskCompleted(CompletionListener.java:85)
  - waiting on org.jppf.nio.SelectionKeyWrapper@20bda110
  at org.jppf.server.protocol.ServerTaskBundleClient.fireTasksCompleted(ServerTaskBundleClient.java:393)
  at org.jppf.server.protocol.ServerTaskBundleClient.resultReceived(ServerTaskBundleClient.java:245)
  at org.jppf.server.protocol.ServerJob.postResultsReceived(ServerJob.java:165)
  at org.jppf.server.protocol.ServerJob.resultsReceived(ServerJob.java:132)
  at org.jppf.server.protocol.ServerTaskBundleNode.resultsReceived(ServerTaskBundleNode.java:197)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.processResults(WaitingResultsState.java:151)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.process(WaitingResultsState.java:87)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:67)
  at org.jppf.server.nio.nodeserver.WaitingResultsState.performTransition(WaitingResultsState.java:43)
  at org.jppf.nio.StateTransitionTask.run(StateTransitionTask.java:79)
  - locked org.jppf.nio.SelectionKeyWrapper@41f49664
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 
  Locked ownable synchronizers:
  - java.util.concurrent.locks.ReentrantLock$NonfairSync@7494873c
  - java.util.concurrent.ThreadPoolExecutor$Worker@1c3590e
 
"JPPF NIO-0005" - 29 - state: WAITING - blocked count: 6000 - blocked time: 1502 - wait count: 256419 - wait time: 877563
  at sun.misc.Unsafe.park(Native Method)
  - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@7494873c
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
  at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
  at org.jppf.server.queue.JPPFPriorityQueue.addBundle(JPPFPriorityQueue.java:99)
  at org.jppf.server.nio.client.WaitingJobState.performTransition(WaitingJobState.java:87)
  at org.jppf.server.nio.client.WaitingJobState.performTransition(WaitingJobState.java:34)
  at org.jppf.nio.StateTransitionTask.run(StateTransitionTask.java:79)
  - locked org.jppf.nio.SelectionKeyWrapper@20bda110
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 
  Locked ownable synchronizers:
  - java.util.concurrent.ThreadPoolExecutor$Worker@7c52859c
Steps to reproduce this issue
1) configure a driver with "rl2" load-balancer, with the following parameters:
performanceCacheSize = 5000
performanceVariationThreshold = 0.92
minSamples = 1500
maxSamples = 5000
maxRelativeSize = 0.9
2) start 10 nodes

3) start the admin console

4) start a client with a pool of 10 connections to the driver and use a job streaming pattern to submit 100,000 of 100 tasks each, where each task just sleeps for 1 millisecond

==> after a few thousand jobs are processed, the admin console shows a deadlock in the driver

#6
Comment posted by
 lolo4j
Sep 30, 18:31
Fixed in: