Documentation > Plug
The
The following properties must be defined using
The following arguments are optional arguments for an advanced usage
We develop below a detailed example of how to use a
This script creates a function
We provide the
This workflow should return the exact same result as the previous example.
In order to use the R script, we need to use the
This time, we modify the output of the R script (by adding 2 to the result) before returning a value to OpenMOLE.
Here,
If your OpenMOLE variable and R variable have the same name (say
This technique can be used when you have a chain of tasks and you want to use a hook. Indeed, the hook only captures outputs of the last executed task, thus we can add a variable of interest in the output of the task even if it does not appear in this task.
In this example workflow, we first have a
The R script in the
The fileForR.txt file is set as an input file of the
The end of the workflow simply tells OpenMOLE to chain the two tasks and to display the outputs of the last task (here the OpenMOLE variable
We need to write the names of the libraries we need in the field
The
Note: the first time you use R with
A first reason might be sleep: download and installation of packages might require hundreds of megabytes of download, leading to an important consumption of data and a slower construction of the container (only at the first execution, as the container is reused for further executions). If your institution is running a local Debian repository, you would save data and time by using this repository. You might also need packages which are not part of the default Debian repositories.
You can do so by making a smart use of the
Content:
RTask 🔗
R is a scripted language initially designed for statistics, but whose application range is much broader today (for example GIS, operational research, linear algebra, web applications, etc.), thanks to its large community and the variety of packages. It may be convenient to use specific R libraries within a workflow, and therefore OpenMOLE provides a specificRTask
.
Preliminary remarks 🔗
TheRTask
uses the Singularity container system. You should install Singularity on your system otherwise you won't be able to use it.The
RTask
supports files and directories, in and out. Get some help on how to handle it by reading this page.
RTask syntax 🔗
TheRTask
relies on an underlying ContainerTask
but is designed to be transparent and takes only R-related arguments.
The current version of R used is 4.0.2.
It takes the following arguments :
script
String,mandatory. The R script to be executed, either R code directly or a R script file.libraries
Sequence of strings or tuple, optional (default = empty). The name of R libraries that will be used by the script and need to be installed beforehand (note: as detailed below, installations are only done during the first execution of the R script, and then stored in a cached docker image). Dependencies for R libraries can be automatically resolved and installed, for that you can write (\"ggrah\", true) instead of \"ggraph\".clearContainerCache
Boolean, optional (default =false
). Should the R image and libraries be cleared and reinstalled (to ensure an update for example)? Iftrue
, the task will perform the installation (and thus the update) even if the library was already installed.
The following properties must be defined using
set
:
input/output
similar to any other task,- mapped input: the syntax
inputs += om-variable mapped "r-variable"
establishes a link between the workflow variableom-variable
(Val) and the corresponding R variable namedr-variable
(as a String). If variables have the same name, you can use the short syntaxinputs += my-variable.mapped
, - mapped output: similar syntax as inputs to collect outputs of the model.
The following arguments are optional arguments for an advanced usage
install
Sequence of strings, optional (default = empty). System commands to be executed prior to any R packages installation and R script execution. This can be used to install system packages using apt.image
String, optional (default = \"openmole/r2u:4.3.0\"). Changes the docker image used by the RTask. OpenMOLE uses r2uprepare
Sequence of strings, optional (default = empty). System commands to be executed just before to the execution of R on the execution node.
errorOnReturnValue
, returnValue
, stdOut
, stdErr
, hostFiles
, workDirectory
, environmentVariables
, containerSystem
, installContainerSystem
.
We develop below a detailed example of how to use a
RTask
, from a very simple use case to a more elaborate one, with system libraries and R libraries.
Execute R code 🔗
The toy R script for this first test case is the following:# Define the function
f <- function(x) {
x + 1
}
# Use the function
j <- f(2)
This script creates a function
f
that takes a parameter (a number) and adds 1 to it.
It then applies the function to the number 2.
We save this to a file named myRScript.R in our OpenMOLE workspace.
Write R code in the RTask 🔗
For our first example, we write the R script directly in theRTask
.
// Declare variables
val result = Val[Int]
// Task
val rTask1 = RTask("""
# Here you write your R code
# Define the function
f <- function(x) {
x + 1
}
# Use the function
j <- f(2)
""") set (
outputs += result mapped "j"
)
// Workflow
rTask1 hook display
We provide the
result
variable to store the result of the function execution j
, and we display its value in the standard output through hook display
.
Running R code from a script 🔗
Instead of writing the R code in theRTask
, we can call an external R script containing the code to be executed.
We will use the file myRScript.R created earlier.
It needs to be uploaded in the OpenMOLE workspace.
All the code is in the R script
If all the R code you need is written in your R script, you just need to provide the path to this script.// Declare variables
val result = Val[Int]
// Task
val rTask2 = RTask(script = workDirectory / "myRScript.R") set (
outputs += result mapped "j"
)
// Workflow
rTask2 hook display
This workflow should return the exact same result as the previous example.
Additional R code is needed
If you need additional R code besides what is included in your script, you need a mix of the first two examples. We will need to write R code and thus use the syntax from the first example, while also providing an external R script.In order to use the R script, we need to use the
resources
field with the precise location of the file in our work directory.
It will then be imported in the @code{RTask} by the R primitive source("myRScript.R")
).
// Declare variables
val result = Val[Int]
// Task
val rTask3 = RTask("""
# Import the external R script
source("myRScript.R")
# Add some code
k <- j + 2
""") set (
resources += (workDirectory / "myRScript.R"),
outputs += result mapped "k"
)
// Workflow
rTask3 hook display
This time, we modify the output of the R script (by adding 2 to the result) before returning a value to OpenMOLE.
Provide input values 🔗
We want to be able to define inputs to theRTask
externally, and to store the output values.
Mapped values 🔗
It is possible to do so through theinputs
and outputs
parameters in the set
part of the task.
// Declare variables
val myInput = Val[Int]
val myOutput = Val[Int]
// Task
val rTask4 = RTask("""
# Define the function
f <- function(x) {
x + 1
}
# Use the function
j <- f(i)
""") set (
inputs += myInput mapped "i",
outputs += myOutput mapped "j",
// Default value for the input
myInput := 3
)
// Workflow
rTask4 hook display
Here,
i
and j
are R variables defined and used in the R code, while myInput
and myOutput
are OpenMOLE variables.
The syntax om-variable mapped "r-variable"
creates a link between the two, indicating that these should be considered the same in the workflow.
If your OpenMOLE variable and R variable have the same name (say
my-variable
for instance), you can use the following shortcut syntax: my-variable.mapped
.
Combine mapped and classic inputs/outputs 🔗
If you have several outputs, you can combine mapped outputs with classic outputs that are not part of theRTask
:
// Declare variables
val i = Val[Int]
val j = Val[Double]
val c = Val[Double] // c is not used in the RTask
// Task
val rTask5 =
RTask("""
# Define the function
f <- function(x) {
x + 1
}
# Use the function
j <- f(i)
""") set (
inputs += i.mapped,
inputs += c,
outputs += i, // i doesn't need to be mapped again, it was done just above
outputs += j.mapped,
outputs += c,
// Default values
i := 3,
c := 2
)
// Workflow
rTask5 hook display
This technique can be used when you have a chain of tasks and you want to use a hook. Indeed, the hook only captures outputs of the last executed task, thus we can add a variable of interest in the output of the task even if it does not appear in this task.
Working with files 🔗
It is possible to use files as arguments of aRTask
.
The inputFiles
keyword is used.
We emphasize that inputFiles
is different from resources
, which was used to import external R scripts.
inputFiles
is used to provide OpenMOLE variables of type File
that can be acted upon in a workflow.
In this example workflow, we first have a
ScalaTask
writing numbers in a file.
The file is created through the OpenMOLE variable myFile
of type java.io.File.
In order to have access to this file in the RTask
, we add myFile
as an output of the ScalaTask
and an input of the RTask
.
// Declare variable
val myFile = Val[File]
val resR = Val[Array[Double]]
// ScalaTask creating the file myFile
val task1 = ScalaTask("""
val myFile = newFile()
myFile.content = "3 6 4"
""") set (
outputs += myFile
)
// RTask using myFile as an input
val rTask5 = RTask("""
myData <- read.table("fileForR.txt", sep = " ")
myVector <- as.vector(myData, mode = "numeric")
f <- function(x) {
x + 1
}
k <- f(myVector)
""") set(
inputFiles += (myFile, "fileForR.txt"),
outputs += resR mapped "k"
)
// Workflow
task1 -- (rTask5 hook display)
The R script in the
RTask
reads a file named fileForR.txt (in the R script presented here, it is supposed to have numeric values, separated by a simple space), and creates a R variable myVector
, which is a vector containing the values of the file fileForR.txt.
We then apply the function f
to that vector.
The fileForR.txt file is set as an input file of the
RTask
following the syntax: inputFiles += (om-fileVariable, "filename-in-R-code")
.
For more information about file management in OpenMOLE, see this page.
The end of the workflow simply tells OpenMOLE to chain the two tasks and to display the outputs of the last task (here the OpenMOLE variable
resR
) in the standard output.
Using libraries 🔗
Here we give an example of how to use a library in aRTask
.
We use the function CHullArea
of the library GeoRange
to compute the area in the convex envelop of a set of points.
We need to write the names of the libraries we need in the field
libraries
, as a sequence, and they will be installed from the CRAN repository.
The
RTask
is based on a Debian container, therefore you can use any Debian command here including apt
installation tool.
See advanced usage below for examples of custom commands in specific use cases.
Note: the first time you use R with
libraries
or packages
, it takes some time to install them, but for the next uses those libraries will be stored, and the execution will be quicker.
// Declare variable
val area = Val[Double]
// Task
val rTask6 = RTask("""
library(GeoRange)
n <- 40
x <- rexp(n, 5)
y <- rexp(n, 5)
# To have the convex envelop of the set of points we created
liste <- chull(x, y)
hull <- cbind(x, y) [liste,]
# require GeoRange
area <- CHullArea(hull[, 1], hull[, 2])
""",
libraries = Seq("GeoRange")
) set(
outputs += area.mapped
)
// Workflow
rTask6 hook display
Advanced RTask usage 🔗
Use a library within Docker 🔗
If you are starting OpenMOLE within docker, installingR
packages in a RTask
might require a specific parameter setting.
The install
field must be used with particular commands: we prefix install commands with fakeroot
to get the permissions to use the Debian command apt
for installation.
// Declare variable
val area = Val[Double]
// Task
val rTask7 = RTask("""
library(GeoRange)
n <- 40
x <- rexp(n, 5)
y <- rexp(n, 5)
# To have the convex envelop of the set of points we created
liste <- chull(x, y)
hull <- cbind(x, y) [liste,]
# require GeoRange
area <- CHullArea(hull[, 1], hull[, 2])
""",
install = Seq("fakeroot apt-get update", "fakeroot apt-get install -y libgdal-dev libproj-dev"),
libraries = Seq("GeoRange")
) set(
outputs += area.mapped
)
// Workflow
rTask7 hook display
Use of HTTP proxy 🔗
If you start OpenMOLE behind a HTTP proxy, you are probably familiar already with the--proxy
parameter you can add to the OpenMOLE command line, which makes OpenMOLE use your proxy when downloading anything from the web.
You can use it like openmole --proxy http://myproxy:3128
.
This proxy will also be used by OpenMOLE to download any container, including the containers used behind the curtain to run a RTask
.
This proxy will also be used by the RTask
to download packages from the web.
Use alternative Debian repositories 🔗
We showed how using theinstall
parameter of a RTask
enables to use Debian installation tools such as apt
to install packages in the container running R.
This downloads Debian packages from the default international repositories (servers) for Debian.
In some cases, you might be willing to use alternative repositories.
A first reason might be sleep: download and installation of packages might require hundreds of megabytes of download, leading to an important consumption of data and a slower construction of the container (only at the first execution, as the container is reused for further executions). If your institution is running a local Debian repository, you would save data and time by using this repository. You might also need packages which are not part of the default Debian repositories.
You can do so by making a smart use of the
install
parameter to define your own repositories as shown in the example below.
// Declare variable
val area = Val[Double]
// Task
val rTask8 = RTask("""
library(ggplot2)
library(gganimate)
# your R script here
# [...]
""",
install = Seq(
// replace the initial Debian repositories by my repository
"fakeroot sed -i 's/deb.debian.org/linux.myinstitute.org/g' /etc/apt/sources.list",
// display the list on the console so I can double check what happens
"fakeroot cat /etc/apt/sources.list",
// update the list of available packages (here I disable HTTP proxy as this repository is in my network)
"fakeroot apt-get -o Acquire::http::proxy=false update ",
// install required R packages in their binary version (quicker, much stable!)
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get -o Acquire::http::proxy=false install -y r-cran-ggplot2",
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get -o Acquire::http::proxy=false install -y r-cran-gganimate",
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get -o Acquire::http::proxy=false install -y r-cran-plotly",
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get -o Acquire::http::proxy=false install -y r-cran-ggally",
// install the libs required for the compilation of R packages
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get -o Acquire::http::proxy=false install -y libssl-dev libcurl4-openssl-dev libudunits2-dev",
// install ffmpeg to render videos
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get -o Acquire::http::proxy=false install -y ffmpeg"
), //
libraries = Seq("ggplot2", "gganimate", "plotly", "GGally")
) set(
outputs += area.mapped
)
// Workflow
rTask8 hook display