Configuring a JPPF server
From JPPF 6.0 Documentation
1 Basic network configuration
The server network communication mechanism uses TCP/IP to do its basic work of receiving jobs and dispatching them for execution, over plain connections, secure connections, or both. Each type of connection requires the configuration of a dedicated TCP port In the configuration file, this property would be defined as follows, with its default value:
# JPPF server port for plain connections; default value is 11111 jppf.server.port = 11111 # JPPF server port for secure connections via SSL/TLS; default value is -1 jppf.ssl.server.port = 11143
Note 2: secure connectivity is disabled by default, therefore you must explicitely configure the secure port to enable it
Note 3: when the port number is set to 0, JPPF will dynamically allocate a valid port number. Note that this feature is mostly useful when server discovery is enabled, since the port number will not be known in advance to connecting nodes.and client2 Server JVM options
A JPPF server is in fact made of two processes: a “controller” process and a “server” process. The controller launches the server as a separate process and watches its exit code. If the exit code has a pre-defined value of 2, then it will restart the server process, otherwise it will simply terminate. This mechanism allows the remote (eventually delayed) restart of a server using the management APIs or the management console. It is also made such that, if any of the two processes dies unexpectedly, then the other process will die as well, leaving no lingering Java process in the OS.
The server process inherits the following parameters from the controller process:
- location of jppf configuration (-Djppf.config or -Djppf.config.plugin)
- location of Log4j configuration (-Dlog4j.configuration)
- current directory
- environment variables
- Java class path
It is possible to specify additional JVM parameters for the server process, using the configuration property jppf.jvm.options, as in this example:
jppf.jvm.options = -Xms64m -Xmx512m
Here is another example with remote debugging options:
jppf.jvm.options = -server -Xmx512m \ -Xrunjdwp:transport=dt_socket,address=localhost:8000,server=y,suspend=n
It is possible to specify additional class path elements through this property, by adding one or more “-cp” or “-classpath” options (unlike the Java command which only accepts one). For example:
jppf.jvm.options = -cp lib/myJar1.jar:lib/myJar1.jar-Xmx512m -classpath lib/external/externalJar.jar
This syntax allows configuring multiple paths in an OS-independant way, in particular with regards to the path separator character (e.g. ':' on Linux, ';' on Windows).
If a classpath element contains one or more spaces, the path(s) if defines must be surrounded with double quotes:
jppf.jvm.options = -Xmx512m -cp "dir with spaces/myJar1.jar" -cp NoSpaces/myJar2.jar
3 Specifying the path to the JVM
It is possible to choose which JVM will run a driver, by specifying the full path to the Java executable with the following property:
# Full path to the java executable jppf.java.path = <path_to_java_executable> # Linux example jppf.java.path = /opt/jdk1.8.0/bin/java # Windows example jppf.java.path = C:/java/jdk1.7.0/bin/java.exe
This property is used by the shell script from the driver distribution that launches the driver (startDriver.sh or startDriver.bat) and when a driver is restarted with the JPPFDriverAdminMBean.restartShutdown() management method or from the administration console.
4 Server discovery through UDP multicast
By default, JPPF nodes and clients are configured to automatically discover active servers on the network. This is made possible because, by default, a JPPF server will broadcast the required information (i.e. host address and port numbers) using the UDP multicast mechanism.
4.1 Enabling and disabling UDP multicast
This is done with the following property, which defaults to true (enabled):
# Enable or disable broadcast of the JPPF driver's information s via UDP multicast jppf.discovery.enabled = true
4.2 Configuration of UDP multicast
The configuration is done by defining a multicast group and port number, as in this example showing their default values:
# UDP multicast group to which the driver broadcasts its connection parameters jppf.discovery.group = 230.0.0.1 # UDP multicast port to which the driver broadcasts its connection parameters jppf.discovery.port = 11111
4.3 Broadcast interval
Since the UDP protocol offers no guarantee of delivery, the JPPF driver will periodically broadcast its connection information, at regular intervals defined with the following property:
# How long a driver should wait between 2 broadcasts, in millis jppf.discovery.broadcast.interval = 1000
4.4 Inclusion and exclusion patterns
The driver can be configured to allow or exclude broadcasting on specific network interfaces, according to their IP addresses. The following properties define inclusion and exclusion patterns for IPv4 and IPv6 addresses. They provide a means of controlling whether to connect to a server based on its IP address. Each of these patterns defines a list of comma- or semicolumn- separated patterns. The IPv4 patterns can be exppressed in either CIDR notation, or in a syntax defined in the Javadoc for the class IPv4AddressPattern. Similarly, IPv6 patterns can be expressd in CIDR notation or in a syntax defined in IPv6AddressPattern. This enables filtering out unwanted IP addresses: the discovery mechanism will only allow addresses that are included and not excluded.
# IPv4 address inclusion patterns jppf.discovery.boradcast.include.ipv4 = # IPv4 address exclusion patterns jppf.discovery.boradcast.exclude.ipv4 = # IPv6 address inclusion patterns jppf.discovery.boradcast.include.ipv6 = # IPv6 address exclusion patterns jppf.discovery.boradcast.exclude.ipv6 =
Let's take for instance the following pattern specifications:
jppf.discovery.include.ipv4 = 192.168.1. jppf.discovery.exclude.ipv4 = 192.168.1.128-
The equivalent patterns in CIDR notation would be:
jppf.discovery.include.ipv4 = 192.168.1.0/24 jppf.discovery.exclude.ipv4 = 192.168.1.128/25
The inclusion pattern only allows IP addresses in the range 192.168.1.0 ... 192.168.1.255. The exclusion pattern filters out IP addresses in the range 192.168.1.128 ... 192.168.1.255. Thus, we actually defined a filter that only accepts addresses in the range 192.168.1.0 ... 192.168.1.127.
These 2 patterns can in fact be rewritten as a single inclusion pattern:
jppf.discovery.include.ipv4 = 192.168.1.-127
or, in CIDR notation:
jppf.discovery.include.ipv4 = 192.168.1.0/25
5 Connecting to other servers
We have seen in the "reminder" section that servers can connect to each other, up to a full-fledged peer-to-peer topology. When a server A connects to another server B, A will act as a node attached to B (from B's perspective). The benefit is that, when server A is connected to server B, B will be able to ollload some of its workload to server A, for example when all nodes attahed to B are already busy.
There are 4 possible kinds of connectivity between 2 servers:
- A and B are not connected at all
- A is connected to B (i.e. A acts as a node attached to B)
- B is connected to A (i.e. B acts as a node attached to A)
- A and B are connected to each other
Because of this flexibility, it is possible to define any type of topology made of JPPF drivers, up to fully connected P2P topologies.
5.1 Orphan servers
The default behavior when a server doesn't have any attached node (i.e.orphan server) is, for the peer servers it is connected to, to not send any job to this server. Servers are automatically notified of the number of nodes attached to their peers, and will start sending jobs their way as soon as they have at least one node.
To force a server to send jobs to their peers even when they don't have any node, for instance if you wish to set a server as a router to other servers, you can set the following property in its configuration:
# ingore the fact that peer servers may not have any node; default to false jppf.peer.allow.orphans = true
5.2 Configuring peer connections manually
This will be best illustrated with an example configuration:
# define a space-separated list of peers to connect to jppf.peers = server_1 server_2 # connection to server_1 jppf.peer.server_1.server.host = host_1 jppf.peer.server_1.server.port = 11111 jppf.peer.server_1.pool.size = 2 # enable heartbeat-based connection failure detection jppf.peer.server_1.recovery.enabled = true # connection to server_2 jppf.peer.server_2.server.host = host_2 jppf.peer.server_2.server.port = 11111 jppf.peer.server_2.pool.size = 2
To connect to each peer, we must define its IP address or host name as well as a port number. Please note that the value we have defined for "jppf.peer.server_1.server.port" must be the same as the one defined for "jppf.server.port" in server_1's configuration, and the value for "jppf.peer.server_1.server.port" must be equal to that of "jppf.server.port" in server_1's configuration.
As for auto-discovered servers, it is possible to specifiy the number of connections to each manually configured peer server with the "jppf.peer.<peer_name>.pool.size" property, which defaults to 1 if unspecified.
5.3 Discovering peer drivers via UDP multicast
In this scenario, we must enable the discovery of peer servers:
# Enable or disable auto-discovery of other peer servers (defaults to false) jppf.peer.discovery.enabled = true # number of connections to establish with each dicovered server, aka pool size; defaults to 1 jppf.peer.pool.size = 2 # enable heartbeat-based connection failure detection jppf.peer.recovery.enabled = true
For this to work, the server broadcast must be enabled on the peer server(s), and the properties defined in the previous "server discovery" section will be used, hence they must be set to the same values on the other server(s). A server can discover other servers without having to broadcast its own connection information (i.e. without being "discoverable").
Please note that the default value for "jppf.peer.discovery.enabled" is "false". Setting the default to "true" would cause each server to connect to all other servers accessible on the network, with a high risk of unwanted side effects.
It is also possible to define more than one connection with each discovered peer driver, by setting the property "jppf.peer.pool.size" to the desired number of connections. If this property is unspecified, it will default to 1.
5.4 Using manual configuration and server discovery together
It is possible to use the manual configuration together with the UDP multicast discovery, by adding a special driver name, “jppf_discovery”, to the list of manually configured peers:
# enable auto-discovery of other peer servers jppf.peer.discovery.enabled = true # specifiy both discovery and manually configured drivers jppf.peers = jppf_discovery server_1 # connection to server_1 jppf.peer.server_1.server.host = host_1 jppf.peer.server_1.server.port = 11111
5.5 Peer drivers load-balancing threshold
It is possible to configure a driver, such that it will start load-balancing its workload to other drivers only when it has less than a specified number of attached nodes. This is done with the following configuration property:
# Load-balance to peer drivers when the number of connected nodes is less than 3 jppf.peers.load.balance.threshold = 3
The default value of this property is Integer.MAX_VALUE, which is the closest equivalent to an infinite threshold. This default value means that the driver will always load-balance to other drivers.
On the other hand, a value of 1 or less means that the driver will never load-balance to other peer drivers, which is another way of saying that the other drivers are there for failover only.
6 JMX management configuration
JPPF uses JMX to provide remote management capabilities for the servers, and uses its own JMX connector for communication. The management features are enabled by default; this behavior can be changed by setting the following property:
# Enable or disable management of this server jppf.management.enabled = true
7 Load-balancing
The distribution of the tasks to the nodes is performed by the JPPF driver. This work is actually the main factor of the observed performance of the framework. It consists essentially in determining how many tasks will go to each node for execution, out of a set of tasks, or job, sent by the client application. Each set of tasks sent to a node is called a "task bundle", and the role of the load balancing (or task scheduling) algorithm is to optimize the performance by adjusting the number of task sent to each node.
7.1 General configuration
The algorithm to use is configured with the following property:
jppf.load.balancing.algorithm = <algorithm_name>
The algorithm name can be one of those prefefined in JPPF, or a user-defined one. JPPF has a number of predefined load-balancing algorithms to compute the distribution of tasks to the nodes, each with its own configuration parameters.
The predefined possible values for the property jppf.load.balancing.algorithm are: manual, autotuned, proportional, rl2 and nodethreads. If not specified, the algorithm defaults to manual. For example:
jppf.load.balancing.algorithm = proportional
Each algorithm uses its own set of parameters, which define together a profile for the algorithm, A profile has a name that serves to identify a group of parameters and their values, using the following pattern:
jppf.load.balancing.profile = <profile_name> jppf.load.balancing.profile.<profile_name>.<parameter_1> = <value_1> ... jppf.load.balancing.profile.<profile_name>.<parameter_n> = <value_n>
Using this, you can define multiple profiles and easily switch from one to the other, by simply changing the value of jppf.load.balancing.profile. It is also possible to mix, in a single profile, the parameters for multiple algorithms, however it is not recommended, as there may be name collisions.
7.2 Predefined algorithms
7.2.1 “manual” algorithm
With this algorithm, each bundle has a fixed number of tasks, meaning that each node will receive at most this number of tasks. This is equivalent to performing a round-robin assignment of the tasks to the nodes.
# algorithm name jppf.load.balancing.algorithm = manual # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = manual_profile # "manual" profile strategy.manual_profile.size = 1
7.2.2 “autotuned” algorithm
This is an adaptive heuristic algorithm based on a simulated annealing technique. "Adaptive" means that the number of tasks sent to each node varies, depending on the node's past performance and the nature of the workload.
# algorithm name jppf.load.balancing.algorithm = autotuned # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = autotuned_profile # "autotuned" profile jppf.load.balancing.profile.autotuned_profile.size = 5 jppf.load.balancing.profile.autotuned_profile.minSamplesToAnalyse = 100 jppf.load.balancing.profile.autotuned_profile.minSamplesToCheckConvergence = 50 jppf.load.balancing.profile.autotuned_profile.maxDeviation = 0.2 jppf.load.balancing.profile.autotuned_profile.maxGuessToStable = 50 jppf.load.balancing.profile.autotuned_profile.sizeRatioDeviation = 1.5 jppf.load.balancing.profile.autotuned_profile.decreaseRatio = 0.2
7.2.3 “proportional” algorithm
This is an adaptive algorithm based on the contribution of each node to the overall mean task execution time.
# algorithm name jppf.load.balancing.algorithm = proportional # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = proportional_profile # "proportional" profile jppf.load.balancing.profile.proportional_profile.initialSize = 5 jppf.load.balancing.profile.proportional_profile.performanceCacheSize = 1000 jppf.load.balancing.profile.proportional_profile.proportionalityFactor = 1
7.2.4 “rl2” algorithm
This is an adaptive algorithm based on an artificial intelligence technique called “reinforcement learning”
# algorithm name jppf.load.balancing.algorithm = rl2 # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = rl2_profile # "rl2" profile jppf.load.balancing.profile.rl2_profile.performanceCacheSize = 1000 jppf.load.balancing.profile.rl2_profile.performanceVariationThreshold = 0.75 jppf.load.balancing.profile.rl2_profile.minSamples = 20 jppf.load.balancing.profile.rl2_profile.maxSamples = 100 jppf.load.balancing.profile.rl2_profile.maxRelativeSize = 0.5
7.2.5 “nodethreads” algorithm
With this algorithm, each node will receive at most n * m tasks, where n is the number of processing threads in the node and m is a user-defined parameter named 'multiplicator". Note that the number f processing threads of a node can be changed dynamically through the JPPF management features, in which case the algorithm will be notfied and adapt accordingly.
# algorithm name jppf.load.balancing.algorithm = nodethreads # name of the set of parameter values or profile for the algorithm jppf.load.balancing.profile = nodethreads_profile # means that multiplicator * nbThreads tasks will be sent to each node jppf.load.balancing.profile.nodethreads_profile.multiplicator = 1
7.3 Load-balancing documentation references
For a detailed explanation of the load-balancing in JPPF, its APIs and the predefined algorithms, please refer to the Load Balancing section of the manual.
Defining a custom algorithm is described in the "creating a custom load-balancer" section of this manual.
8 Configuring a local node
Each JPPF driver can run a single node in its own JVM, called ”local node”. The main advantage is that the communication between server and node is much faster, since the network overhead is removed. This is particularly useful if you intend to create a pure P2P topology, where all servers communicate with each other and only one node is attached to each server.
To enable a local node in the driver, use the following configuration propoerty, which defaults to “false”:
jppf.local.node.enabled = true
Note 2: for the same reason, the SSL configuration does not apply to a local node
9 Heartbeat-based connection failure detection
Network disconnections due to hardware failures are notoriously difficult to detect, let alone recover from. JPPF implements a configurable heartbeat mechanism that enables detecting such failures, and recover from them, in a reasonable time frame. This mechanism works as follows:
- the JPPF node or client - designated here as heartbeat client - establishes a specific connection to the server, dedicated to failure detection
- at connection time, a handshake protocol takes place, where the heartbeat client communicates a unique id to the server, that can be correlated to other connections for this heartbeat client (job data channel, distributed class loader)
- at regular intervals (heartbeats), the server will send a very short message to the heartbeat client, who will acknowledge it by sending a short response of its own
- if the heartbeat client's response is not received in a specified time frame (heartbeat timeout), and this, a specified number of times in a row (heartbeat retries), the server will consider the connection to be broken, will close it cleanly, close the associated connections, and handle the recovery, such as requeing tasks that were being executed
- on the heartbeat client side, if no message is received from the server for a time greater than heartbeat_timeout * heartbeat_retries, then it will close its connection to the server and attempt to reconnect.
In practice, the polling of the heartbeat clients is performed by a “reaper” object that will handle the querying of the nodes, using a pool of dedicated threads rather than one thread per node. This enables a higher scalability with a large number of nodes ort clients.
The ability to specify multiple attempts at getting a response from the node is useful to handle situations where the network is slow, or when the node or server is busy with a high CPU utilization level.On the server side, the parameters of this mechanism are configurable via the following properties:
# Enable recovery from hardware failures on the nodes. Defaults to false (disabled). jppf.recovery.enabled = true # Maximum number of attempts to get a response form the node before the # connection is considered broken. Default value is 3. jppf.recovery.max.retries = 3 # Maximum time in milliseconds allowed for each attempt to get a response # from the node. Default value is 15000 (15 seconds) jppf.recovery.read.timeout = 15000 # Number of threads allocated to the reaper, defaults to the number of available CPUs jppf.recovery.reaper.pool.size = 8
10 Redirecting the console output
In some situations, it might be desirable to redirect the standard and error output of the driver, that is, the output of System.out and System.err, to files. This can be accomplished with the following properties:
# file on the file system where System.out is redirected jppf.redirect.out = /some/path/someFile.out.log # whether to append to an existing file or to create a new one jppf.redirect.out.append = false # file on the file system where System.err is redirected jppf.redirect.err = /some/path/someFile.err.log # whether to append to an existing file or to create a new one jppf.redirect.err.append = false
By default, a new file is created each time the driver is started, unless “jppf.redirect.out.append = true” or “jppf.redirect.err.append = true” are specified. If a file path is not specified, then the corresponding output is not redirected.
11 Resolution of the nodes IP addresses
You can switch on or off the DNS name resolution for the nodes connecting to this driver, with the following property:
# whether to resolve the nodes' ip addresses into host names # defaults to true (resolve the addresses) org.jppf.resolve.addresses = true
Main Page > Configuration guide > Configuring a JPPF server |