Plug your R model

Suggest edits
Documentation > Plug

Contents

RTask 🔗

R is a scripted language initially designed for statistics, but whose application range is much broader today (for example GIS, operational research, linear algebra, web applications, etc.), thanks to its large community and the variety of packages. It may be convenient to use specific R libraries within a workflow, and therefore OpenMOLE provides a specific RTask.

Preliminary remark 🔗

The RTask uses the Singularity container system. You should install Singularity on your system otherwise you won't be able to use it.

RTask syntax 🔗

The RTask relies on an underlying ContainerTask but is designed to be transparent and takes only R-related arguments. The current version of R used is 3.3.3. It takes the following arguments :
  • script String, mandatory. The R script to be executed.
  • install Sequence of strings, optional (default = empty). The commands to be executed prior to any R packages installation and R script execution (see example below: some R libraries may have system dependencies, that have to be installed first).
  • libraries Sequence of strings, optional (default = empty). The name of R libraries that will be used by the script and need to be installed before (note: as detailed below, installations are only done during the first execution of the R script, and then stored in a docker image in cache. To force an update, use the forceUpdate argument).
  • forceUpdate Boolean, optional (default = false). Should the libraries installation be forced (to ensure an update for example). If true, the task will perform the installation (and thus the update) even if the library was already installed.

The following properties must be defined using set :
  • input/output similar to any other task
  • mapped input: the syntax inputs += prototype mapped "r-variable" establishes a link between the workflow variable prototype (Val) and the corresponding R variable name "r-variable" (String) If variables have the same name, you can use the short syntax inputs += prototype.mapped
  • mapped output: similar syntax as inputs to collect outputs of the model

We develop below a detailed example of how to use a RTask, from a very simple use to a more elaborate with system libraries and R libraries.

Example 1: A simple RTask 🔗

The toy R script for this first test case is:

f = function(x){
      x+1
    }
j = f(2)

We save this to Rscript1.R, it will be used in the second part.

Here we create a function f and a variable j in which we store the result of the evaluation of the function. For this first script example, we write the R script directly in the RTask. We will see below how to import it from a file.R, deal with libraries, and plug inputs and outputs between OpenMOLE and R.

// Task
val rTask1 = RTask("""
    # Here you write code in R
    f = function(x){
        x+1
    }
    j = f(2)
""")

// Workflow
rTask1

Running a script from a file 🔗

You can now upload Rscript1.R to your OpenMOLE workspace.
Here is the OpenMOLE script to use in the RTask. In the resources field, you have to provide the precise location of the file.R, which is then imported in the R script using the R primitive source()).

// Task
val rTask2 = RTask("""
    source("Rscript1.R")
""") set (
    resources += workDirectory / "Rscript1.R"
)

// Workflow
rTask2

Input and output values 🔗

In this script we want to pass the OpenMOLE variable i to the RTask. It is possible to do so through an input in the set of the task. i can be a variable whose value is given by a previous task, but here we choose to set it manually to 3.
Remark: here the OpenMOLE variable has the same name as the R variable i, but it is not mandatory as we will see below.

// Declare variable
val i = Val[Int]

// Task
val rTask3 = RTask("""
    f = function(x){
        x+1
    }
    j = f(i)
""") set (
    inputs += i.mapped,
    i := 3
)

// Workflow
rTask3

In the script below (rTask4), we add an output variable j, and we change the name of the R variable (now varRi) which is mapped to the OpenMOLE variable i.

// Declare variables
val i = Val[Int]
val j = Val[Int]

// Task
val rTask4 = RTask("""
    f= function(x){
        x+1
    }
    j = f(varRi)
""") set(
    inputs += i mapped "varRi",
    outputs += j.mapped,
    i := 3
)

// Workflow
rTask4 hook display

Remark: if you have several outputs, you can combine mapped outputs with classic outputs that are not part of the R task (for example, the variable c in rTask5 below).

// Declare variables
val i = Val[Int]
val j = Val[Double]
val c = Val[Double]

// Task
val rTask5 =
RTask("""
    f = function(x){
         x+1
    }
    j = f(i)
""") set (
    inputs += i.mapped,
    (inputs, outputs) += c,
    outputs +=  j.mapped,
    outputs += i.mapped,
    i := 3 ,
    c:=2
)

// Workflow
rTask5 hook display

This technique can be used when you have a chain of tasks and you want to use a hook. Indeed, the hook only captures outputs of the last task, thus we can add a variable of interest in the output of the task even if it does not appear in this task. Note that the last section presents an alternative.

Example 2: Working with files 🔗

It is also possible to pass files as argument of the RTask. However, we can't pass them with rInputs as seen before, as it will result in a type mismatch with an error message like type class java.io.File is not convertible to JSON. We thus use inputFiles, and we give an illustration in the following workflow (rTask). We emphasize that inputFiles must be used here (and not resources), since it is used as a Val and can be acted upon in a workflow, whereas resources act differently since they are fixed.

We have first a ScalaTask which writes numbers in a file. The file is the OpenMOLE variable g of type java.io.File. In order to have access to this file in the RTask, we add g as an output of the ScalaTask. The R script in the RTask reads a file named fileForR (in the R script presented here, it is supposed to have numeric values, separated by a simple space), and creates a R variable temp2, which is a vector containing the value of the file fileForR. We then apply the function f to that vector. The end of the workflow simply tells OpenMOLE to chain the two tasks and to display the outputs of the last task (here the OpenMOLE variable resR).

Note that the g is an OpenMOLE variable. If you want to see the file created in your workspace, you can use a hook. Note that you have to put g as an output of the RTask (see the section "A complete workflow", below to have a workflow example).

// Declare variable
val g = Val[File]

// Task
val task1 = ScalaTask("""
    val g = newFile()
    g.content = "3 6 4"
""") set (
    outputs += g
)

/////////////////////////////

// Declare variable
val resR =  Val[Array[Double]]

// Task
val rTask = RTask("""
    temp1=read.table("fileForR", sep="")
    temp2=as.vector(temp1,mode = "numeric")

    f= function(x) {
        x+1
    }
    k=f(temp2)
""") set(
    inputFiles += (g, "fileForR"),
    outputs += resR mapped "k"
)

// Workflow
task1 -- (rTask hook display)

Example 3: Use a library 🔗

Here we give an example of how to use a library in an RTask. We use the function CHullArea of the library GeoRange to compute the area in the convex envelop of a set of points.

Write the names of the libraries you need in the field libraries, and adapt the install field accordingly. The install argument is a sequence of system commands which are executed prior to the installation of the R libraries. It can be used to install the packages which are required by the R libraries. The RTask is based on a Debian container, therefore you can use any Debian command here including apt installation tool.
The libraries argument is a sequence of libraries that are installed from the CRAN repository.

Remark: the first time you use R with libraries or packages, it takes some time to install them, but for the next uses those libraries will be stored, and the execution will be quicker.

// Declare variable
val area = Val[Double]

// Task
val rTask3 = RTask("""
    library(GeoRange)
    n=40
    x = rexp(n, 5)
    y = rexp(n, 5)

    # to have the convex envelopp of the set of points we created
    liste = chull(x,y)
    hull <- cbind(x,y) [liste,]

    #require GeoRange
    area=CHullArea(hull[,1],hull[,2])
    """,
    install = Seq("apt update", "apt install -y libgdal-dev libproj-dev"),
    libraries = Seq("GeoRange")
) set(
    outputs += area.mapped
)

// Workflow
rTask3 hook display

Example 4: A complete workflow 🔗

Here is an example of an (almost complete) workflow using a RTask, it uses rInputs, inputFiles, install (you can add your R script using resource)

The first task is a ScalaTask whose aim is to create OpenMOLE variables that will be used in the RTask. We provide them to the RTask using inputFiles and inputs keywords.
Notice that the conversion from the OpenMOLE type (Scala) Array[Double] to the R type vector is made directly by the inputs.
The hook stores the file g in your workspace} and displays the area.

// Declare variables
val n = Val[Int]
val g = Val[File]
val y =  Val[Array[Double]]

val seed = Val[Long]

// Task
val task1 = ScalaTask("""
    val rng = Random(seed)

    val dist1 = List.fill(n)(rng.nextDouble).mkString(" ")

    val g = newFile()
    g.content = dist1

    val y =  List.fill(n)(rng.nextDouble).toArray
""") set (
    inputs += (n, seed),
    n := 10 ,
    outputs += (g, n, y, seed),
)

// Declare variable
val res =  Val[Double]

// Task
val rTask = RTask("""
    library(GeoRange)

    # Read the file created by the first scala task
    temp1=read.table("fileForR", sep="")
    x=as.vector(temp1,mode = "numeric")

    # y is the variable created in the second task

    # requiered for the function CHullArea
    liste = chull(x,y)
    hull <- cbind(x,y) [liste,]

    #require GeoRange
    area=CHullArea(hull[,1],hull[,2])

    """,
    install = Seq("apt update", "apt install -y libgdal-dev libproj-dev"),
    libraries = Seq("GeoRange")
) set(
    inputs += (n, seed),
    inputs += g mapped "fileForR",
    inputs += y.mapped,
    outputs += res mapped "area",
    outputs += (g, n, seed)
)

// Define hook
val h1 = CopyFileHook(g, workDirectory / "random_points${seed}.txt")

// Workflow
Replication(
    evaluation = task1 -- (rTask hook h1),
    seed = seed,
    sample = 100
)