Jonathan Passerat-Palmbach, Romain Reuillon, Mathieu Leclaire, Antonios Makropoulos, Emma C. Robinson, Sarah Parisot and Daniel Rueckert, Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System, published in Frontiers in Neuroinformatics Vol 11, 2017.
[online version] [bibteX]
CARETaskoffers to run external applications packaged with CARE. The site (proposing an outdated version of CARE for now, but a great documentation) can be found here. CARE makes it possible to package your application from any Linux computer, and then re-execute it on any other Linux computer. The CARE / OpenMOLE pair is a very efficient way to distribute your application at very large scale with very little effort. Please note that this packaging step is only necessary if you plan distribute your workflow to an heterogeneous computing environment such as the EGI grid. If you target local clusters, running the same operating system and sharing a network file system, you can directly jump to the SystemExecTask.
You should first install CARE:
chmod +x care)
CARETask was designed to embed native binaries such as programs compiled from C, C++, Fortran, Python, R, Scilab... Embedding an application in a
CARETask happens in 2 steps:
First you should package your application using the CARE binary you just installed, so that it executes on any Linux environment. This usually consists in prepending your command line with:
care -o /path/to/myarchive.tgz.bin -r ~ -p /path/to/mydata1 -p /path/to/mydata2 mycommand myparam1 myparam2
Before going any further, here are a few notes about the options accepted by CARE:
-oindicates where to store the archive. At the moment, OpenMOLE prefers to work with archives stored in .tgz.bin so please don't toy with the extension ;-)
-r ~is not compulsory but it has proved mandatory in some cases. So as rule of thumb, if you encounter problems when packaging your application, try adding / removing it.
-p /pathasks CARE not to archive /path. This is particularly useful for input data that will change with your parameters. You probably do not want to embed this data in the archive, and we'll see further down how to inject the necessary input data in the archive from OpenMOLE.
Second, just provide the resulting package along with some other information to OpenMOLE. Et voila! If you encounter any problem to package your application, please refer to the corresponding entry in the FAQ
One very important aspect of CARE is that you only need to package your application once. As long as the execution you use to package your application makes uses of all the dependencies (libraries, packages, ...), you should not have any problem re-executing this archive with other parameters.
errorOnReturnValueto false will prevent OpenMOLE from re-scheduling a CARETask that has reported a return code different from 0. You can also get the return code in a variable using the
stdErractions on the task.
environmentVariable += (variable, "variableName")field. If no name is specified, the environment variable is named after the OpenMOLE variable. Environment variables injected from the dataflow are inserted in the pre-existing set of environment variables from the execution host. This shows particularly useful to preserve the behaviour of some toolkits when executed on local environments (ssh, clusters, ...) where users control their work environment. The following snippet creates a task that employs the features described in this section:
You will note that options holding a single value are set using the
// Declare the variable val output = Val[String] val error = Val[String] val value = Val[Int] // Any task val pythonTask = CARETask("hello.tgz.bin", "python hello.py") set ( stdOut := output, stdErr := error, returnValue := value, environmentVariable += (value, "I_AM_AN_ENV_VAR") )
:=operator. Also, the OpenMOLE variables containing the standard and error outputs are automatically marked as outputs of the task, and must not be added to the
setoperator on a freshly defined task.
The available options are described hereafter:
val out = Val[Int] val careTask = CARETask("care.tgz.bin", "executable arg1 arg2 /path/to/my/file /virtual/path arg4") set ( hostFiles += ("/path/to/my/file"), customWorkDirectory := "/tmp", returnValue := out )
hostFiles += ("/etc/hosts")or with a specific path
hostFiles += ("/etc/bash.bashrc", "/home/foo/.bashrc")
environmentVariables += ("VARIABLE1", "42"). Multiple
hostFilesentries can be used within the same
workDirectory := "/tmp"
returnValue := out
errorOnReturnValue := false
stdOut := output
stdErr := error
hostFiles. This option takes the path of a file on the execution host and binds it to the same path in the CARE filesystem. Optionally you can provide a second argument to specify the path explicitly. For instance:
val careTask = CARETask("care.tgz.bin", "executable arg1 arg2 /path/to/my/file /virtual/path arg4") set ( hostFiles += ("/path/to/my/file"), hostFiles += ("/path/to/another/file", "/virtual/path") )
CAREtaskwill thus have access to /path/to/my/file and /virtual/path.
SystemExecTask. This task is made to launch native commands on the execution host. There is two modes for using this task:
resourcesfield. Please note that contrary to the
CARETask, there is no guarantee that an application passed as a resource to a
SystemExecTaskwill re-execute successfully on a remote environment
SystemExecTaskaccepts an arbitrary number of commands. These commands will be executed sequentially on the same execution node where the task is instantiated. In other words, it is not possible to split the execution of multiple commands grouped in the same
SystemExecTask. The following example first copies and runs a bash script on the remote host, before calling the remote's host
/bin/hostname. Both commands' standard and error outputs are gathered and concatenated to a single OpenMOLE variable: respectively
In this case the bash script might depend on applications installed on the remote host. Similarly, we assume the presence of
// Declare the variable val output = Val[String] val error = Val[String] // Any task val scriptTask = SystemExecTask("bash script.sh", "hostname") set ( resources += workDirectory / "script.sh", stdOut := output, stdErr := error ) scriptTask hook ToStringHook()
/bin/hostnameon the execution node. Therefore this task cannot be considered as portable. Note that each execution is isolated in a separate folder on the execution host and that the task execution is considered as failed if the script returns a value different from 0. If you need another behaviour you can use the same advanced options as the CARETask regarding the return code.