JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF

The open source
grid computing
solution

 Home   About   Features   Download   Documentation   On Github   Forums 

Configuring a JPPF server

From JPPF 6.0 Documentation

Jump to: navigation, search

Contents

Main Page > Configuration guide > Configuring a JPPF server


1 Basic network configuration

The server network communication mechanism uses TCP/IP to do its basic work of receiving jobs and dispatching them for execution, over plain connections, secure connections, or both. Each type of connection requires the configuration of a dedicated TCP port In the configuration file, this property would be defined as follows, with its default value:

# JPPF server port for plain connections; default value is 11111
jppf.server.port = 11111

# JPPF server port for secure connections via SSL/TLS; default value is -1
jppf.ssl.server.port = 11143
Note 1: to disable either plain or secure connectivity, set the corresponding port value to -1

Note 2: secure connectivity is disabled by default, therefore you must explicitely configure the secure port to enable it

Note 3: when the port number is set to 0, JPPF will dynamically allocate a valid port number. Note that this feature is mostly useful when server discovery is enabled, since the port number will not be known in advance to connecting nodes.and client

2 Server JVM options

A JPPF server is in fact made of two processes: a “controller” process and a “server” process. The controller launches the server as a separate process and watches its exit code. If the exit code has a pre-defined value of 2, then it will restart the server process, otherwise it will simply terminate. This mechanism allows the remote (eventually delayed) restart of a server using the management APIs or the management console. It is also made such that, if any of the two processes dies unexpectedly, then the other process will die as well, leaving no lingering Java process in the OS.

The server process inherits the following parameters from the controller process:

  • location of jppf configuration (-Djppf.config or -Djppf.config.plugin)
  • location of Log4j configuration (-Dlog4j.configuration)
  • current directory
  • environment variables
  • Java class path


It is possible to specify additional JVM parameters for the server process, using the configuration property jppf.jvm.options, as in this example:

jppf.jvm.options = -Xms64m -Xmx512m

Here is another example with remote debugging options:

jppf.jvm.options = -server -Xmx512m \
  -Xrunjdwp:transport=dt_socket,address=localhost:8000,server=y,suspend=n

It is possible to specify additional class path elements through this property, by adding one or more “-cp” or “-classpath” options (unlike the Java command which only accepts one). For example:

jppf.jvm.options = -cp lib/myJar1.jar:lib/myJar1.jar-Xmx512m -classpath lib/external/externalJar.jar

This syntax allows configuring multiple paths in an OS-independant way, in particular with regards to the path separator character (e.g. ':' on Linux, ';' on Windows).

If a classpath element contains one or more spaces, the path(s) if defines must be surrounded with double quotes:

jppf.jvm.options = -Xmx512m -cp "dir with spaces/myJar1.jar" -cp NoSpaces/myJar2.jar

3 Specifying the path to the JVM

It is possible to choose which JVM will run a driver, by specifying the full path to the Java executable with the following property:

# Full path to the java executable
jppf.java.path = <path_to_java_executable>
# Linux example
jppf.java.path = /opt/jdk1.8.0/bin/java
# Windows example
jppf.java.path = C:/java/jdk1.7.0/bin/java.exe

This property is used by the shell script from the driver distribution that launches the driver (startDriver.sh or startDriver.bat) and when a driver is restarted with the JPPFDriverAdminMBean.restartShutdown() management method or from the administration console.

4 Server discovery through UDP multicast

By default, JPPF nodes and clients are configured to automatically discover active servers on the network. This is made possible because, by default, a JPPF server will broadcast the required information (i.e. host address and port numbers) using the UDP multicast mechanism.

4.1 Enabling and disabling UDP multicast

This is done with the following property, which defaults to true (enabled):

# Enable or disable broadcast of the JPPF driver's information s via UDP multicast
jppf.discovery.enabled = true

4.2 Configuration of UDP multicast

The configuration is done by defining a multicast group and port number, as in this example showing their default values:

# UDP multicast group to which the driver broadcasts its connection parameters
jppf.discovery.group = 230.0.0.1
# UDP multicast port to which the driver broadcasts its connection parameters
jppf.discovery.port = 11111

4.3 Broadcast interval

Since the UDP protocol offers no guarantee of delivery, the JPPF driver will periodically broadcast its connection information, at regular intervals defined with the following property:

# How long a driver should wait between 2 broadcasts, in millis
jppf.discovery.broadcast.interval = 1000

4.4 Inclusion and exclusion patterns

The driver can be configured to allow or exclude broadcasting on specific network interfaces, according to their IP addresses. The following properties define inclusion and exclusion patterns for IPv4 and IPv6 addresses. They provide a means of controlling whether to connect to a server based on its IP address. Each of these patterns defines a list of comma- or semicolumn- separated patterns. The IPv4 patterns can be exppressed in either CIDR notation, or in a syntax defined in the Javadoc for the class IPv4AddressPattern. Similarly, IPv6 patterns can be expressd in CIDR notation or in a syntax defined in IPv6AddressPattern. This enables filtering out unwanted IP addresses: the discovery mechanism will only allow addresses that are included and not excluded.

# IPv4 address inclusion patterns
jppf.discovery.boradcast.include.ipv4 = 
# IPv4 address exclusion patterns
jppf.discovery.boradcast.exclude.ipv4 = 
# IPv6 address inclusion patterns
jppf.discovery.boradcast.include.ipv6 = 
# IPv6 address exclusion patterns
jppf.discovery.boradcast.exclude.ipv6 = 

Let's take for instance the following pattern specifications:

jppf.discovery.include.ipv4 = 192.168.1.
jppf.discovery.exclude.ipv4 = 192.168.1.128-

The equivalent patterns in CIDR notation would be:

jppf.discovery.include.ipv4 = 192.168.1.0/24
jppf.discovery.exclude.ipv4 = 192.168.1.128/25

The inclusion pattern only allows IP addresses in the range 192.168.1.0 ... 192.168.1.255. The exclusion pattern filters out IP addresses in the range 192.168.1.128 ... 192.168.1.255. Thus, we actually defined a filter that only accepts addresses in the range 192.168.1.0 ... 192.168.1.127.

These 2 patterns can in fact be rewritten as a single inclusion pattern:

jppf.discovery.include.ipv4 = 192.168.1.-127

or, in CIDR notation:

jppf.discovery.include.ipv4 = 192.168.1.0/25

5 Connecting to other servers

We have seen in the "reminder" section that servers can connect to each other, up to a full-fledged peer-to-peer topology. When a server A connects to another server B, A will act as a node attached to B (from B's perspective). The benefit is that, when server A is connected to server B, B will be able to ollload some of its workload to server A, for example when all nodes attahed to B are already busy.

There are 4 possible kinds of connectivity between 2 servers:

  • A and B are not connected at all
  • A is connected to B (i.e. A acts as a node attached to B)
  • B is connected to A (i.e. B acts as a node attached to A)
  • A and B are connected to each other


Because of this flexibility, it is possible to define any type of topology made of JPPF drivers, up to fully connected P2P topologies.

5.1 Orphan servers

The default behavior when a server doesn't have any attached node (i.e.orphan server) is, for the peer servers it is connected to, to not send any job to this server. Servers are automatically notified of the number of nodes attached to their peers, and will start sending jobs their way as soon as they have at least one node.

To force a server to send jobs to their peers even when they don't have any node, for instance if you wish to set a server as a router to other servers, you can set the following property in its configuration:

# ingore the fact that peer servers may not have any node; default to false
jppf.peer.allow.orphans = true

5.2 Configuring peer connections manually

This will be best illustrated with an example configuration:

# define a space-separated list of peers to connect to
jppf.peers = server_1 server_2

# connection to server_1
jppf.peer.server_1.server.host = host_1
jppf.peer.server_1.server.port = 11111
jppf.peer.server_1.pool.size = 2
# enable heartbeat-based connection failure detection
jppf.peer.server_1.recovery.enabled = true

# connection to server_2
jppf.peer.server_2.server.host = host_2
jppf.peer.server_2.server.port = 11111
jppf.peer.server_2.pool.size = 2

To connect to each peer, we must define its IP address or host name as well as a port number. Please note that the value we have defined for "jppf.peer.server_1.server.port" must be the same as the one defined for "jppf.server.port" in server_1's configuration, and the value for "jppf.peer.server_1.server.port" must be equal to that of "jppf.server.port" in server_1's configuration.

As for auto-discovered servers, it is possible to specifiy the number of connections to each manually configured peer server with the "jppf.peer.<peer_name>.pool.size" property, which defaults to 1 if unspecified.

5.3 Discovering peer drivers via UDP multicast

In this scenario, we must enable the discovery of peer servers:

# Enable or disable auto-discovery of other peer servers (defaults to false)
jppf.peer.discovery.enabled = true
# number of connections to establish with each dicovered server, aka pool size; defaults to 1
jppf.peer.pool.size = 2
# enable heartbeat-based connection failure detection
jppf.peer.recovery.enabled = true

For this to work, the server broadcast must be enabled on the peer server(s), and the properties defined in the previous "server discovery" section will be used, hence they must be set to the same values on the other server(s). A server can discover other servers without having to broadcast its own connection information (i.e. without being "discoverable").

Please note that the default value for "jppf.peer.discovery.enabled" is "false". Setting the default to "true" would cause each server to connect to all other servers accessible on the network, with a high risk of unwanted side effects.

It is also possible to define more than one connection with each discovered peer driver, by setting the property "jppf.peer.pool.size" to the desired number of connections. If this property is unspecified, it will default to 1.

5.4 Using manual configuration and server discovery together

It is possible to use the manual configuration together with the UDP multicast discovery, by adding a special driver name, “jppf_discovery”, to the list of manually configured peers:

# enable auto-discovery of other peer servers
jppf.peer.discovery.enabled = true
# specifiy both discovery and manually configured drivers
jppf.peers = jppf_discovery server_1
# connection to server_1
jppf.peer.server_1.server.host = host_1
jppf.peer.server_1.server.port = 11111

5.5 Peer drivers load-balancing threshold

It is possible to configure a driver, such that it will start load-balancing its workload to other drivers only when it has less than a specified number of attached nodes. This is done with the following configuration property:

# Load-balance to peer drivers when the number of connected nodes is less than 3
jppf.peers.load.balance.threshold = 3

The default value of this property is Integer.MAX_VALUE, which is the closest equivalent to an infinite threshold. This default value means that the driver will always load-balance to other drivers.

On the other hand, a value of 1 or less means that the driver will never load-balance to other peer drivers, which is another way of saying that the other drivers are there for failover only.

6 JMX management configuration

JPPF uses JMX to provide remote management capabilities for the servers, and uses its own JMX connector for communication. The management features are enabled by default; this behavior can be changed by setting the following property:

# Enable or disable management of this server
jppf.management.enabled = true

7 Load-balancing

The distribution of the tasks to the nodes is performed by the JPPF driver. This work is actually the main factor of the observed performance of the framework. It consists essentially in determining how many tasks will go to each node for execution, out of a set of tasks, or job, sent by the client application. Each set of tasks sent to a node is called a "task bundle", and the role of the load balancing (or task scheduling) algorithm is to optimize the performance by adjusting the number of task sent to each node.

7.1 General configuration

The algorithm to use is configured with the following property:

jppf.load.balancing.algorithm = <algorithm_name>

The algorithm name can be one of those prefefined in JPPF, or a user-defined one. JPPF has a number of predefined load-balancing algorithms to compute the distribution of tasks to the nodes, each with its own configuration parameters.

The predefined possible values for the property jppf.load.balancing.algorithm are: manual, autotuned, proportional, rl2 and nodethreads. If not specified, the algorithm defaults to manual. For example:

jppf.load.balancing.algorithm = proportional

Each algorithm uses its own set of parameters, which define together a profile for the algorithm, A profile has a name that serves to identify a group of parameters and their values, using the following pattern:

jppf.load.balancing.profile = <profile_name>

jppf.load.balancing.profile.<profile_name>.<parameter_1> = <value_1>
...
jppf.load.balancing.profile.<profile_name>.<parameter_n> = <value_n>

Using this, you can define multiple profiles and easily switch from one to the other, by simply changing the value of jppf.load.balancing.profile. It is also possible to mix, in a single profile, the parameters for multiple algorithms, however it is not recommended, as there may be name collisions.

7.2 Predefined algorithms

7.2.1 “manual” algorithm

With this algorithm, each bundle has a fixed number of tasks, meaning that each node will receive at most this number of tasks. This is equivalent to performing a round-robin assignment of the tasks to the nodes.

# algorithm name
jppf.load.balancing.algorithm = manual
# name of the set of parameter values or profile for the algorithm
jppf.load.balancing.profile = manual_profile
# "manual" profile
strategy.manual_profile.size = 1

7.2.2 “autotuned” algorithm

This is an adaptive heuristic algorithm based on a simulated annealing technique. "Adaptive" means that the number of tasks sent to each node varies, depending on the node's past performance and the nature of the workload.

# algorithm name
jppf.load.balancing.algorithm = autotuned
# name of the set of parameter values or profile for the algorithm
jppf.load.balancing.profile = autotuned_profile
# "autotuned" profile
jppf.load.balancing.profile.autotuned_profile.size = 5
jppf.load.balancing.profile.autotuned_profile.minSamplesToAnalyse = 100
jppf.load.balancing.profile.autotuned_profile.minSamplesToCheckConvergence = 50
jppf.load.balancing.profile.autotuned_profile.maxDeviation = 0.2
jppf.load.balancing.profile.autotuned_profile.maxGuessToStable = 50
jppf.load.balancing.profile.autotuned_profile.sizeRatioDeviation = 1.5
jppf.load.balancing.profile.autotuned_profile.decreaseRatio = 0.2

7.2.3 “proportional” algorithm

This is an adaptive algorithm based on the contribution of each node to the overall mean task execution time.

# algorithm name
jppf.load.balancing.algorithm = proportional
# name of the set of parameter values or profile for the algorithm
jppf.load.balancing.profile = proportional_profile
# "proportional" profile
jppf.load.balancing.profile.proportional_profile.initialSize = 5
jppf.load.balancing.profile.proportional_profile.performanceCacheSize = 1000
jppf.load.balancing.profile.proportional_profile.proportionalityFactor = 1

7.2.4 “rl2” algorithm

This is an adaptive algorithm based on an artificial intelligence technique called “reinforcement learning

# algorithm name
jppf.load.balancing.algorithm = rl2
# name of the set of parameter values or profile for the algorithm
jppf.load.balancing.profile = rl2_profile
# "rl2" profile
jppf.load.balancing.profile.rl2_profile.performanceCacheSize = 1000
jppf.load.balancing.profile.rl2_profile.performanceVariationThreshold = 0.75
jppf.load.balancing.profile.rl2_profile.minSamples = 20
jppf.load.balancing.profile.rl2_profile.maxSamples = 100
jppf.load.balancing.profile.rl2_profile.maxRelativeSize = 0.5

7.2.5 “nodethreads” algorithm

With this algorithm, each node will receive at most n * m tasks, where n is the number of processing threads in the node and m is a user-defined parameter named 'multiplicator". Note that the number f processing threads of a node can be changed dynamically through the JPPF management features, in which case the algorithm will be notfied and adapt accordingly.

# algorithm name
jppf.load.balancing.algorithm = nodethreads
# name of the set of parameter values or profile for the algorithm
jppf.load.balancing.profile = nodethreads_profile
# means that multiplicator * nbThreads tasks will be sent to each node
jppf.load.balancing.profile.nodethreads_profile.multiplicator = 1

7.3 Load-balancing documentation references

For a detailed explanation of the load-balancing in JPPF, its APIs and the predefined algorithms, please refer to the Load Balancing section of the manual.

Defining a custom algorithm is described in the "creating a custom load-balancer" section of this manual.

8 Configuring a local node

Each JPPF driver can run a single node in its own JVM, called ”local node”. The main advantage is that the communication between server and node is much faster, since the network overhead is removed. This is particularly useful if you intend to create a pure P2P topology, where all servers communicate with each other and only one node is attached to each server.

To enable a local node in the driver, use the following configuration propoerty, which defaults to “false”:

jppf.local.node.enabled = true
Note 1: the local node can be configured using the same properties as described in the Node Configuration section, except for the network-related properties, since no network is involved between driver and local node.

Note 2: for the same reason, the SSL configuration does not apply to a local node

9 Heartbeat-based connection failure detection

Network disconnections due to hardware failures are notoriously difficult to detect, let alone recover from. JPPF implements a configurable heartbeat mechanism that enables detecting such failures, and recover from them, in a reasonable time frame. This mechanism works as follows:

  • the JPPF node or client - designated here as heartbeat client - establishes a specific connection to the server, dedicated to failure detection
  • at connection time, a handshake protocol takes place, where the heartbeat client communicates a unique id to the server, that can be correlated to other connections for this heartbeat client (job data channel, distributed class loader)
  • at regular intervals (heartbeats), the server will send a very short message to the heartbeat client, who will acknowledge it by sending a short response of its own
  • if the heartbeat client's response is not received in a specified time frame (heartbeat timeout), and this, a specified number of times in a row (heartbeat retries), the server will consider the connection to be broken, will close it cleanly, close the associated connections, and handle the recovery, such as requeing tasks that were being executed
  • on the heartbeat client side, if no message is received from the server for a time greater than heartbeat_timeout * heartbeat_retries, then it will close its connection to the server and attempt to reconnect.

In practice, the polling of the heartbeat clients is performed by a “reaper” object that will handle the querying of the nodes, using a pool of dedicated threads rather than one thread per node. This enables a higher scalability with a large number of nodes ort clients.

The ability to specify multiple attempts at getting a response from the node is useful to handle situations where the network is slow, or when the node or server is busy with a high CPU utilization level.On the server side, the parameters of this mechanism are configurable via the following properties:

# Enable recovery from hardware failures on the nodes. Defaults to false (disabled).
jppf.recovery.enabled = true
# Maximum number of attempts to get a response form the node before the
# connection is considered broken. Default value is 3.
jppf.recovery.max.retries = 3
# Maximum time in milliseconds allowed for each attempt to get a response
# from the node. Default value is 15000 (15 seconds)
jppf.recovery.read.timeout = 15000
# Number of threads allocated to the reaper, defaults to the number of available CPUs
jppf.recovery.reaper.pool.size = 8

10 Redirecting the console output

In some situations, it might be desirable to redirect the standard and error output of the driver, that is, the output of System.out and System.err, to files. This can be accomplished with the following properties:

# file on the file system where System.out is redirected
jppf.redirect.out = /some/path/someFile.out.log
# whether to append to an existing file or to create a new one
jppf.redirect.out.append = false
# file on the file system where System.err is redirected
jppf.redirect.err = /some/path/someFile.err.log
# whether to append to an existing file or to create a new one
jppf.redirect.err.append = false

By default, a new file is created each time the driver is started, unless “jppf.redirect.out.append = true” or “jppf.redirect.err.append = true” are specified. If a file path is not specified, then the corresponding output is not redirected.

11 Resolution of the nodes IP addresses

You can switch on or off the DNS name resolution for the nodes connecting to this driver, with the following property:

# whether to resolve the nodes' ip addresses into host names
# defaults to true (resolve the addresses)
org.jppf.resolve.addresses = true
Main Page > Configuration guide > Configuring a JPPF server

JPPF Copyright © 2005-2020 JPPF.org Powered by MediaWiki