Jonathan Passerat-Palmbach, Romain Reuillon, Mathieu Leclaire, Antonios Makropoulos, Emma C. Robinson, Sarah Parisot and Daniel Rueckert, Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System, published in Frontiers in Neuroinformatics Vol 11, 2017.
[online version] [bibteX]
CARETask offers to run external applications packaged with CARE. The site (proposing an outdated version of CARE for now, but a great documentation) can be found here. CARE makes it possible to package your application from any Linux computer, and then re-execute it on any other Linux computer. The CARE / OpenMOLE pair is a very efficient way to distribute your application at very large scale with very little effort. Please note that this packaging step is only necessary if you plan distribute your workflow to an heterogeneous computing environment such as the EGI grid. If you target local clusters, running the same operating system and sharing a network file system, you can directly jump to the SystemExecTask.
      You should first install CARE:
chmod +x care)export PATH=/path/to/the/care/folder:$PATH)The CARETask was designed to embed native binaries such as programs compiled from C, C++, Fortran, Python, R, Scilab... Embedding an application in a CARETask happens in 2 steps:
First you should package your application using the CARE binary you just installed, so that it executes on any Linux environment. This usually consists in prepending your command line with: care -o /path/to/myarchive.tgz.bin -r ~ -p /path/to/mydata1 -p /path/to/mydata2 mycommand myparam1 myparam2
Before going any further, here are a few notes about the options accepted by CARE:
-o indicates where to store the archive. At the moment, OpenMOLE prefers to work with archives stored in .tgz.bin so please don't toy with the extension ;-)-r ~ is not compulsory but it has proved mandatory in some cases. So as rule of thumb, if you encounter problems when packaging your application, try adding / removing it.-p /path asks CARE not to archive /path. This is particularly useful for input data that will change with your parameters. You probably do not want to embed this data in the archive, and we'll see further down how to inject the necessary input data in the archive from OpenMOLE.Second, just provide the resulting package along with some other information to OpenMOLE. Et voila! If you encounter any problem to package your application, please refer to the corresponding entry in the FAQ
One very important aspect of CARE is that you only need to package your application once. As long as the execution you use to package your application makes uses of all the dependencies (libraries, packages, ...), you should not have any problem re-executing this archive with other parameters.
errorOnReturnValue to false will prevent OpenMOLE from re-scheduling a CARETask that has reported a return code different from 0. You can also get the return code in a variable using the returnValue setting.
     stdOut and stdErr actions on the task.
     environmentVariable += (variable, "variableName") field.
    If no name is specified, the environment variable is named after the OpenMOLE variable.
    Environment variables injected from the dataflow are inserted in the pre-existing set of environment variables from the execution host. This shows particularly useful to preserve
    the behaviour of some toolkits when executed on local environments (ssh, clusters, ...) where users control their work environment.
     The following snippet creates a task that employs the features described in this section:
    // Declare the variable
val output = Val[String]
val error  = Val[String]
val value = Val[Int]
// Any task
val pythonTask =
  CARETask("hello.tgz.bin", "python hello.py") set (
    stdOut := output,
    stdErr := error,
    returnValue := value,
    environmentVariable += (value, "I_AM_AN_ENV_VAR")
  ):= operator. Also, the OpenMOLE variables containing the standard and error outputs are automatically marked as outputs of the task, and must not be added to the outputs list.
CARETask using the set operator on a freshly defined task.
   val out = Val[Int]
val careTask = CARETask("care.tgz.bin", "executable arg1 arg2 /path/to/my/file /virtual/path arg4") set (
  hostFiles += ("/path/to/my/file"),
  customWorkDirectory := "/tmp",
  returnValue := out
)
hostFiles += ("/etc/hosts") or with a specific path hostFiles += ("/etc/bash.bashrc", "/home/foo/.bashrc")environmentVariables += ("VARIABLE1", "42"). Multiple hostFiles entries can be used within the same set block.workDirectory := "/tmp"Val[Int] variable. Example: returnValue := outerrorOnReturnValue := falseVal[String] variable. Example: stdOut := outputVal[String] variable. Example: stdErr := errorCARETask: hostFiles. This option takes the path of a file on the execution host and binds it to the same path in the CARE filesystem. Optionally you can provide a second argument to specify the path explicitly. For instance:
    val careTask = CARETask("care.tgz.bin", "executable arg1 arg2 /path/to/my/file /virtual/path arg4") set (
  hostFiles += ("/path/to/my/file"),
  hostFiles += ("/path/to/another/file", "/virtual/path")
)CAREtask will thus have access to /path/to/my/file and /virtual/path.
SystemExecTask. This task is made to launch native commands on the execution host. There is two modes for using this task:
    resources field. Please note that contrary to the CARETask, there is no guarantee that an application passed as a resource to a SystemExecTask will re-execute successfully on a remote environmentSystemExecTask accepts an arbitrary number of commands. These commands will be executed sequentially on the same execution node where the task is instantiated. In other words, it is not possible to split the execution of multiple commands grouped in the same SystemExecTask.
     The following example first copies and runs a bash script on the remote host, before calling the remote's host /bin/hostname. Both commands' standard and error outputs are gathered and concatenated to a single OpenMOLE variable: respectively stdOut and stdErr:
     // Declare the variable
val output = Val[String]
val error  = Val[String]
// Any task
val scriptTask =
  SystemExecTask("bash script.sh", "hostname") set (
    resources += workDirectory / "script.sh",
    stdOut := output,
    stdErr := error
  )
 scriptTask hook ToStringHook()/bin/hostname on the execution node. Therefore this task cannot be considered as portable.
     Note that each execution is isolated in a separate folder on the execution host and that the task execution is considered as failed if the script returns a value different from 0. If you need another behaviour you can use the same advanced options as the CARETask regarding the return code.