Sharing data among tasks : the DataProvider API
From JPPF 6.2 Documentation
|
Main Page > Development guide > Sharing data among tasks |
After a job is submitted, the server will distribute the tasks in the job among the nodes of the JPPF grid. Generally, more than one task may be sent to each node. Given the communication and serialization protocols implemented in JPPF, objects referenced by multiple tasks at submission time will be deserialized as multiple distinct instances at the time of execution in the node. This means that, if n tasks reference object A at submission time, the node will actually deserialize multiple copies of A, with Task1 referencing A1, … , Taskn referencing An. We can see that, if the shared object is very large, we will quickly face memory issues.
To resolve this problem, JPPF provides a mechanism called data provider that enables sharing common objects among tasks in the same job. A data provider is an instance of a class that implements the interface DataProvider. Here is the definition of this interface:
public interface DataProvider extends Metadata { // @deprecated: use getParameter(Object) instead <T> T getValue(final Object key) throws Exception; // @deprecated: use setParameter(Object, Object) instead void setValue(Object key, Object value) throws Exception; }
As we can see, the two methods in the interface are deprecated, but kept for preserving the compatibility with applications written with a JPPF version prior to 4.0. The actual API is is defined in the Metadata interface as follows:
public interface Metadata extends Serializable { // Retrieve a parameter in the metadata <T> T getParameter(Object key); // Return a parameter in the metadata, or a default value if not found <T> T getParameter(Object key, T def); // Set or replace a parameter in the metadata void setParameter(Object key, Object value); // Remove a parameter from the metadata <T> T removeParameter(Object key); // Get the metadata map Map<Object, Object> getAll(); // Clear all the the metadata void clear(); }
This is indeed a basic object map interface: you can store objects and associate them with a key, then retrieve these objects using the associated key.
Here is an example of using a data provider:
In the application:
MyLargeObject myLargeObject = ...; // create a data provider backed by a HashMap DataProvider dataProvider = new MemoryMapDataProvider(); // store the shared object in the data provider dataProvider.setValue("myKey", myLargeObject); // associate the dataProvider with the job JPPFJob = new JPPFJob(dataProvider); job.add(new MyTask());
In the task:
public class MyTask extends JPPFTask { public void run() { // get a reference to the data provider DataProvider dataProvider = getDataProvider(); // retrieve the shared data MyLargeObject myLargeObject = (MyLargeObject) dataProvider.getValue("myKey"); // ... use the data ... } }
Note 1: the association of a data provider to each task is done automatically by JPPF and is totally transparent to the application.
Note 2: from each task's perspective, the data provider should be considered read-only. Modifications to the data provider such as adding or modifying values, will NOT be propagated beyond the scope of the node. Hence, a data provider cannot be used as a common data store for the tasks. Its only goal is to avoid exessive memory consumption and improve the performance of the job serialization.
In the next sub-sections, we will detail the existing implementations of DataProvider that exist in the JPPF API.
1 MemoryMapDataProvider: map-based provider
MemoryMapDataProvider is a very simple implementation of the DataProvider interface. It is backed by a java.util.Hashtable<Object, Object>. It can be used safely from multiple concurrent threads..
2 Data provider for non-JPPF tasks
By default, tasks whose class does not extend AbstractTask do not have access to the DataProvider that is set on the a job. This includes tasks that implement Runnable or Callable (including those submitted with a JPPFExecutorService), annotated with @JPPFRunnable, and POJO tasks.
JPPF now provides a mechanism which enables non JPPF tasks to gain access to the DataProvider. To this effect, the task must implement the interface DataProviderHolder, defined as follows:
package org.jppf.client.taskwrapper; import org.jppf.task.storage.DataProvider; // This interface must be implemented by tasks that are not subclasses // of JPPFTask when they need access to the job's DataProvider public interface DataProviderHolder { // Set the data provider for the task void setDataProvider(DataProvider dataProvider); }
Here is an example implementation:
public class MyTask implements Callable<String>, Serializable, DataProviderHolder { // DataProvider set onto this task private transient DataProvider dataProvider; @Override public String call() throws Exception { String result = (String) dataProvider.getValue("myKey"); System.out.println("got value " + result); return result; } // Called by the node when the task is received from the server @Override public void setDataProvider(final DataProvider dataProvider) { this.dataProvider = dataProvider; } }
Note that the “dataProvider” attribute is set as transient, to prevent the DataProvider from being serialized along with the task when it is sent back to the server after execution. Another way to achieve this would be to set it to null at the end of the call() method, for instance in a try {} finally {} block.
Main Page > Development guide > Sharing data among tasks |