![]() Please wait while updating issue type...
Could not save your changes
This issue has been changed since you started editing it
Data that has been changed is highlighted in red below. Undo your changes to see the updated information
You have changed this issue, but haven't saved your changes yet. To save it, press the Save changes button to the right
This issue is blocking the next release
![]() There are no comments
There is nothing attached to this issue
This issue has no duplicates
There are no code checkins for this issue |
|||||||||||||||||||||||||||||||||||||||||
Really delete this comment?
Really delete this comment?
Really delete this comment?
Really delete this comment?
On the server side, the feedback(int nbTasks, double time) method will be invoked on the load-balancer, where nbTasks is the size of the tasks batch and time represents the round-trip time between the server and the node for the whole batch.
This is where the problem is. In effect, with this feedback information, all we can do is compute an approximative mean time for each task which is incorrect when the number of tasks is not a strict multiple of the number of threads in the node. In our example above, we have basically captured the elapsed time for 3 tasks executed in sequence (3 = 5 tasks / 2 threads, rounded to the next integer if the remainder is > 0). The false assumption we make here is that this represents the node performance capability, when in fact it represents the elapsed time for the tasks. A more accurate assumption would be to say that this represents the node performance for 2.5 tasks instead, or more formally nbTasks / nbThreads + (nbTasks % nbThreads) / nbThreads.
So, we can already see that the load-balancer is missing one piece of information: the number of threads in the node. This can be fixed easily enough, by making the load-balancer node-aware. Another problem here is that the time parameter includes the whole server-to-node-and-back round-trip, and there's no way to know which part of it represents the grid overhead (i.e. serialization/deserialization + network transport) and which part is the actual task execution time.
Thus, to provide as much accuracy as possible, I added code to the node and server such that the overhead time can be communicated separately to the load-balancer, and also such that a new piece of data is communicated: the accumulated elapsed execution time of the tasks in the batch received by the node. I added the following interface, to be implemented by any load-balancer that wishes to receive this information:
Then, I made all adaptive load-balancers (i.e. "proportional", "autotuned" and "rl" algorithms) implement the NodeAware and BundlerEx interface, with the following implemntation for the new feedback() method:
Note how we simply recompute the totalTime before feeding it to the "old" feedback method, so we don't have to rewrite the code of the algorithm.
With this technique, I observed a much more balanced load on the nodes, from the execution time perspective. In general, the number of tasks sent to he nodes is not exactly evenly balanced. For instance with 3 nodes and "proportional algorithm" you will have a 4/5/6 tasks distribution for job with 15 tasks. However this provides a near optimal throughput for the overall job execution, and I also observed that the number of tasks executed by each node over time is much closer to an optimal distribution than the former implementation of the algorithm.
So the plan is as follows:
Really delete this comment?
The issue was updated with the following change(s):
Really delete this comment?