Linux Executable

Embed your Linux Executable model

When your model is written in a language for which a specific OpenMOLE task doesn't exist, or if it uses an assembly of tools, libraries, binaries, etc. you have to use a container task to make it potable, so that you can send it to any machine (with potentially varying OS installations).

Packaging with Docker 🔗

The ContainerTask runs docker container in OpenMOLE. It is based on a container engine running without administration privileges called udocker. Using this task, you can run containers from the docker hub, or containers exported from docker.

Containers from the docker hub 🔗

A simple task running a python container would look like:

val container =
  ContainerTask(
    "python:3.6-stretch",
    """python -c 'print("youpi")'"""
  )

You can run this task. At launch time, it downloads the python image from the docker hub in order to be able to run it afterwards.

Let's imagine a slightly more complete example: we will use the following python script, which uses the numpy library to multiply a matrix (stored in a csv file) by a scalar number.

import sys
import numpy
from numpy import *
from array import *
import csv

input = open(sys.argv[1],'r')
n = float(sys.argv[2])

print("reading the matrix")
data = csv.reader(input)
headers = next(data, None)

array = numpy.array(list(data)).astype(float)

print(array)
print(n)
mult = array * n

print("saving the matrix")
numpy.savetxt(sys.argv[3], mult, fmt='%g')

An example input file would look like this:

col1,col2,col3
31,82,80
4,48,7

For this example, we consider that we have a data directory, containing a set of csv files in the workDirectory. We want to compute the python script for each of this csv file and for a set of values for the second argument of the python script. The OpenMOLE workflow would then look like this:

val dataFile = Val[File]
val dataFileName = Val[String]
val i = Val[Int]
val resultFile = Val[File]

val container =
  ContainerTask(
    "python:3.6-stretch",
    """python matrix.py data.csv ${i} out.csv""",
    install = Seq("pip install numpy")
  ) set (
    resources += workDirectory / "matrix.py",
    inputFiles += (dataFile, "data.csv"),
    outputFiles += ("out.csv", resultFile),
    (inputs, outputs) += (i, dataFileName)
  )

DirectSampling(
  sampling =
    (dataFile in (workDirectory / "data") withName dataFileName) x
    (i in (0 to 3)),
  evaluation =
    container hook CopyFileHook(resultFile, workDirectory / "results/${dataFileName.dropRight(4)}_${i}.csv")
)

The install parameter contains a set of command used to install some components in the container once and for all, when the task is instantiated.

Advanced Options 🔗

Return value

Some applications disregarding standards might not return the expected 0 value upon completion. The return value of the application is used by OpenMOLE to determine whether the task has been successfully executed, or needs to be re-executed. Setting the boolean flag errorOnReturnValue to false will prevent OpenMOLE from re-scheduling a ContainerTask that has reported a return code different from 0. You can also get the return code in a variable using the returnValue setting. In case you set this option a return code different from 0 won't be considered an error and the task produce an output with the value of the return code.

val ret = Val[Int]

val container =
  ContainerTask(
    "python:3.6-stretch",
    """python matrix.py""",
    returnValue = retSeq
  )

Standard and error outputs

Another default behaviour is to print the standard and error outputs of each task in the OpenMOLE console. Some processes might display useful results on these outputs. A ContainerTask's standard and error outputs can be set to OpenMOLE variables and thus injected in the data flow by summoning respectively the stdOut and stdErr actions on the task.

val myOut = Val[String]

val container =
  ContainerTask(
    "debian:testing-slim",
    """echo 'great !'""",
    stdOut = myOut
  )

val parse =
  ScalaTask("""myOut = myOut.toUpperCase""") set (
    (inputs, outputs) += myOut
  )

container -- (parse hook display)

Environment variables

As any other process, the applications contained in OpenMOLE's native tasks accept environment variables to influence their behaviour. Variables from the data flow can be injected as environment variables using the environmentVariable parameter. If no name is specified, the environment variable is named after the OpenMOLE variable. Environment variables injected from the data flow are inserted in the pre-existing set of environment variables from the execution host. This shows particularly useful to preserve the behaviour of some toolkits when executed on local environments (ssh, clusters...) where users control their work environment.

val name = Val[String]

val container =
  ContainerTask(
    "debian:testing-slim",
     """env""",
     environmentVariables = Seq("NAME" -> "${name.toUpperCase}")
  ) set (inputs += name)


DirectSampling(
  sampling = name in List("Bender", "Fry", "Leila"),
  evaluation = container
)

Using local Resources

To access data present on the execution node (outside the container filesystem) you should use a dedicated option of the ContainerTask: hostFiles. This option takes the path of a file on the execution host and binds it to the same path in the container filesystem.

ContainerTask(
  "debian:testing-slim",
  """ls /bind""",
  hostFiles = Seq("/tmp" -> "/bind")
)

WorkDirectory

You may set the directory within the container where to start the execution from.

ContainerTask(
  "debian:testing-slim",
  """ls""",
  workDirectory = "/bin")
)

Containers an archive 🔗

You can use the ContainerTask to run a container exported from Docker:

val container =
  ContainerTask(
    workDirectory / "container.tgz",
    """python -c 'print("youpi")'"""
  )

Calling a local program 🔗

The ContainerTask was designed to be portable from one machine to another. However, some use-cases require executing specific commands installed on a given cluster. To achieve that you should use another task called SystemExecTask. This task is made to launch native commands on the execution host. There is two modes for using this task:

Calling a command that is assumed to be available on any execution node of the environment. The command will be looked for in the system as it would from a traditional command line: searching in the default PATH or an absolute location.
Copying a local script not installed on the remote environment. Applications and scripts can be copied to the task's work directory using the resources field. Please note that contrary to the CARETask, there is no guarantee that an application passed as a resource to a SystemExecTask will re-execute successfully on a remote environment.

The SystemExecTask accepts an arbitrary number of commands. These commands will be executed sequentially on the same execution node where the task is instantiated. In other words, it is not possible to split the execution of multiple commands grouped in the same SystemExecTask.

The following example first copies and runs a bash script on the remote host, before calling the remote's host /bin/hostname. Both commands' standard and error outputs are gathered and concatenated to a single OpenMOLE variable, respectively stdOut and stdErr:

// Declare the variable
val output = Val[String]
val error  = Val[String]

// Any task
val scriptTask =
  SystemExecTask("bash script.sh", "hostname") set (
    resources += workDirectory / "script.sh",
    stdOut := output,
    stdErr := error
  )

 scriptTask hook DisplayHook()

In this case the bash script might depend on applications installed on the remote host. Similarly, we assume the presence of /bin/hostname on the execution node. Therefore this task cannot be considered as portable.
Note that each execution is isolated in a separate folder on the execution host and that the task execution is considered as failed if the script returns a value different from 0. If you need another behaviour you can use the same advanced options as the ContainerTask regarding the return code.

File management 🔗

To provide files as input of a ContainerTask or SystemExecTask and to get files produced by these task, you should use the inputFiles and outputFiles keywords. See the documentation on file management.

Contents