Run your R model

Suggest edits
Documentation > Run

Contents

RTask 🔗

R is a scripted language initially designed for statistics, but whose application range is much broader today (for example GIS, operational research, linear algebra, web applications, etc.), thanks to its large community and the variety of packages. It may be convenient to use specific R libraries within a workflow, and therefore OpenMOLE provides a specific RTask.

RTask syntax 🔗

The RTask relies on an underlying ContainerTask but is designed to be transparent and takes only R-related arguments. The current version of R used is 3.3.3. It takes the following arguments :
  • script String, mandatory. The R script to be executed.
  • install Sequence of strings, optional (default = empty). The commands to be executed prior to any R packages installation and R script execution (see example below: some R libraries may have system dependencies, that have to be installed first).
  • libraries Sequence of strings, optional (default = empty). The name of R libraries that will be used by the script and need to be installed before (note: as detailed below, installations are only done during the first execution of the R script, and then stored in a docker image in cache. To force an update, use the forceUpdate argument).
  • forceUpdate Boolean, optional (default = false). Should the libraries installation be forced (to ensure an update for example). If true, the task will perform the installation (and thus the update) even if the library was already installed.

The following properties must be defined using set :
  • input/output similar to any other task
  • mapped input: the syntax inputs += prototype mapped "r-variable" establishes a link between the workflow variable prototype (Val) and the corresponding R variable name "r-variable" (String) If variables have the same name, you can use the short syntax inputs += prototype.mapped
  • mapped output: similar syntax as inputs to collect outputs of the model

We develop below a detailed example of how to use a RTask, from a very simple use to a more elaborate with system libraries and R libraries.

Example 1: A simple RTask 🔗

The toy R script for this first test case is:

f = function(x){
      x+1
    }
j = f(2)
We save this to Rscript1.R, it will be used in the second part.

Here we create a function f and a variable j in which we store the result of the evaluation of the function. For this first script example, we write the R script directly in the RTask. We will see below how to import it from a file.R, deal with libraries, and plug inputs and outputs between OpenMOLE and R.

val rTask1 =
 RTask("""
   # Here you write code in R
   f = function(x){
         x+1
       }
   j = f(2)
   """)

rTask1

Running a script from a file 🔗

You can now upload Rscript1.R to your OpenMOLE workspace.
Here is the OpenMOLE script to use in the RTask. In the resources field of the RTask, you have to provide the precise location of the file.R, which is then imported in the R script using the R primitive source()).

val rTask2 =
  RTask("""
    source("Rscript1.R")
  """) set (
    resources += workDirectory / "Rscript1.R"
  )

rTask2

Input and output values 🔗

In this script we want to pass the OpenMOLE variable i to the RTask. It is possible to do so through an input in the set of the task. i can be a variable whose value is given by a previous task, but here we choose to set it manually to 3.
Remark: here the OpenMOLE variable has the same name as the R variable i, but it is mandatory as we will see below.

val i = Val[Int]

val rTask3 =
  RTask("""
    f = function(x){
          x+1
        }
    j = f(i)
  """) set (
    inputs += i.mapped,
    i := 3
  )

  rTask3
    

In the script below (rTask4), we add an output variable j, and we change the name of the R variable (now varRi) which is mapped to the OpenMOLE variable i.

val i = Val[Int]
val j = Val[Int]

val rTask4 =
  RTask("""
    f= function(x){
         x+1
       }
    j = f(varRi)
  """) set(
    inputs += i mapped "varRi",
    outputs += j.mapped,
    i := 3
  )


rTask4 hook ToStringHook()
 

Remark: if you have several outputs, you can combine mapped outputs with classic outputs that are not part of the R task (for example, the variable c in rTask5 below).

val i = Val[Int]
val j = Val[Double]
val c = Val[Double]

val rTask5 =
  RTask("""
    f = function(x){
          x+1
        }
    j = f(i)
    """) set (
      inputs += i.mapped,
      (inputs, outputs) += c,
      outputs +=  j.mapped,
      outputs += i.mapped,
      i := 3 ,
      c:=2
    )
rTask5 hook ToStringHook()

This technique can be used when you have a chain of tasks and you want to use a hook. Indeed, the hook only captures outputs of the last task, thus we can add a variable of interest in the output of the task even if it does not appear in this task. Note that the last section presents an alternative.

Example 2: Working with files 🔗

It is also possible to pass files as argument of the RTask. However, we can't pass them with rInputs as seen before, as it will result in a type mismatch with an error message like type class java.io.File is not convertible to JSON. We thus use inputFiles, and we give an illustration in the following workflow (rTask). We emphasize that inputFiles must be used here (and not resources), since it is used as a Val and can be acted upon in a workflow, whereas resources act differently since they are fixed.

We have first a ScalaTask which writes numbers in a file. The file is the OpenMole variable g of type java.io.File. In order to have access to this file in the RTask, we add g as an output of the ScalaTask. The R script in the RTask reads a file named fileForR (in the R script presented here, it is supposed to have numeric values, separated by a simple space), and creates a R variable temp2, which is a vector containing the value of the file fileForR. We then apply the function f to that vector. The end of the workflow simply tells OpenMOLE to chain the two tasks and to display the outputs of the last task (here the OpenMOLE variable resR).

Remark that the g is an OpenMole variable. If you want to see the file created in your workspace, you can use a hooks. Note that you have to put g as an output of the RTask (see the section "A complete workflow", below to have a workflow example).

val g = Val[File]

val task1 =
  ScalaTask("""
    val g = newFile()
    g.content = "3 6 4"
  """) set (
    outputs += g
  )

/////////////////////////////

val resR =  Val[Array[Double]]

val rTask =
  RTask("""
    temp1=read.table("fileForR", sep="")
    temp2=as.vector(temp1,mode = "numeric")

    f= function(x) {
         x+1
        }
    k=f(temp2)
  """) set(
    inputFiles += (g, "fileForR"),
    outputs += resR mapped "k"
  )

(task1 -- rTask ) hook ToStringHook(resR)

Example 3: Use a library 🔗

Here we give an example of how to use a library in an RTask. We use the function CHullArea of the library GeoRange to compute the area in the convex envelop of a set of points.

Write the names of the libraries you need in the field libraries, and adapt the install field accordingly. The install argument is a sequence of system commands which are executed prior to the installation of the R libraries. It can be used to install the packages which are required by the R libraries. The R task is based on a debian container, therefore you can use any debian command here including apt installation tool.
The libraries argument is a sequence of libraries that are installed from the CRAN repository.

Remark: the first time you use R with libraries or packages, it takes some time to install them, but for the next uses those libraries will be stored, and the execution will be quicker.

val area = Val[Double]

val rTask3 =
  RTask("""
    library(GeoRange)
    n=40
    x = rexp(n, 5)
    y = rexp(n, 5)

    # to have the convex envelopp of the set of points we created
    liste = chull(x,y)
    hull <- cbind(x,y) [liste,]

    #require GeoRange
    area=CHullArea(hull[,1],hull[,2])
    """,
    install = Seq("apt update", "apt install -y libgdal-dev libproj-dev"),
    libraries = Seq("GeoRange")
  ) set(
    outputs += area.mapped
  )

rTask3 hook ToStringHook()

Example 4: A complete workflow 🔗

Here is an example of an (almost complete) workflow using a RTask, it uses rInputs, inputFiles, install (you can add your R script using resource)

The first two tasks are ScalaTask whose aim is to create OpenMOLE variables that will be used in the RTask. task1 creates a file g and task2 creates y, an Array[Double]. We both put them in the RTask using respectively inputFiles and inputs.
Notice that the conversion from the OpenMOLE type (scala) Array[Double] to the R type vector is made directly by the inputs.
The hook stores the file g in your Workspace, and displays the area.

//////  Create a file for the RTask

val n = Val[Int]
val g = Val[File]

val task1 =
  ScalaTask("""
   import org.apache.commons.math3.distribution._
   import scala.util.Random

   val dist_Unif = new UniformRealDistribution()
   val dist1=  List.fill(n)(dist_Unif.sample()).mkString(" ")

   val g = newFile()
   g.content = dist1
   """) set (
     inputs += n ,
     n := 10 ,
     outputs += (g, n)
   )


   //////  Create a variable for the Rtask
   val y =  Val[Array[Double]]

   val task2 =
     ScalaTask("""
       import org.apache.commons.math3.distribution._
       val dist_Unif = new UniformRealDistribution()
       val y =  List.fill(n)(dist_Unif.sample()).toArray
     """) set (
        inputs += (n,g),
        outputs += (y,n,g)
     )

//////////////////////////

val res =  Val[Double]

val rTask =
  RTask("""
    library(GeoRange)

    # Read the file created by the first scala task
    temp1=read.table("fileForR", sep="")
    x=as.vector(temp1,mode = "numeric")

    # y is the variable created in the second task

    # requiered for the function CHullArea
    liste = chull(x,y)
    hull <- cbind(x,y) [liste,]

    #require GeoRange
    area=CHullArea(hull[,1],hull[,2])

    """,
    install = Seq("apt update", "apt install -y libgdal-dev libproj-dev"),
    libraries = Seq("GeoRange")
    ) set(
      inputs += n ,
      inputs += g mapped "fileForR",
      inputs += y.mapped,
      outputs += res mapped "area",
      outputs += (g,n)
    )


val h1 = CopyFileHook(g, workDirectory / "random_points.txt")

(task1 -- task2 -- rTask ) hook (h1,ToStringHook(res,n) )

Remarks about this workflow 🔗

Here is an example of the workflow which avoids passing all the arguments in inputs/outputs, when you don't need them in the intermediary task. It uses slot and capsule.

val a = Val[Int]
val b = Val[Int]
val j = Val[Int]
val res = Val[Int]

val task1 =
  ScalaTask("""
    val b = a+1
  """) set (
    inputs += a ,
    a := 10 ,
    outputs += (b,a)
  )


val task2 =
  ScalaTask("""
    val j = b + 2
  """) set (
    inputs += b,
    outputs += (j)
  )


val task3 =
  ScalaTask("""
    val res = b + a + j
  """) set (
    inputs += (a, b, j),
    outputs += res
  )


val objSlot = Slot(task3)  // we create a slot over the task3
val task1Capsule = Capsule(task1)


((task1Capsule --  objSlot) & (task1Capsule -- task2 -- objSlot)) hook ToStringHook()
 

Notice that a is not an output of the task2, and if you try a classical chain task1 -- task2 -- task3, OpenMOLE will inform you that:

Input (a: Int) is missing when reaching the slot270273838:task3-1057250483.