Explore: Advanced Samplings

Suggest edits
Documentation > Explore > Direct Sampling

Contents

Advanced sampling 🔗

Sampling are tools for exploring a space of parameter. The term parameter is understood in a very broad acceptation in OpenMOLE. It may concern numbers, files, random streams, images, etc...

Complete sampling


The most common way of exploring a model is by using a "complete" sampling, or DirectSampling, which generates every combination of parameters values and then use the DirectSampling method:
val i = Val[Int]
val j = Val[Double]
val k = Val[String]
val l = Val[Long]

val exploration = DirectSampling (
  sampling =
    (i in (0 to 10 by 2)) x
    (j in (0.0 to 5.0 by 0.5)) x
    (k in List("hello", "world")) x
    (l in (UniformDistribution[Long]() take 10)),
  evaluation = myModel
)
Using the x combinator means that all the domains are unrolled before being combined with each other.

Combine samplings 🔗

To define samplings, you can combine them with each other. As we've previously seen, the complete sampling is a way to achieve that. Many composition functions are implemented in OpenMOLE.

The "x" combinator also enables domain bounds to depend on each others. Notice how the upper bound of the second factor depends on the value of the first one.

val i = Val[Int]
val j = Val[Double]

val explo =
 DirectSampling (
   sampling =
     (i in (0 to 10 by 2)) x
     (j in Range[Double]("0.0", "2 * i", "0.5")),
   evaluation = myModel
 )
Samplings can also be combined using variants of the zip operator.

Zip samplings 🔗

Zip Samplings come in three declinations in OpenMOLE.

The first one is the ZipSampling. It combines the elements of corresponding indices from two samplings. ZipSampling mimics the traditional zip operation from functional programming that combining elements from two lists. OpenMOLE implements the ZipSampling through the keyword zip.

The second sampling from the Zip family is the ZipWithIndexSampling. Again, this is inspired from a common functional programming operation called zipWithIndex. Applying zipWithIndex to a list would create a new list of pairs formed by the elements of the original list and the index of their position in the list. For instance List('A', 'B', 'C') zipWithIndex would returns the new list List(('A',0), ('B',1), ('C',2)). ZipWithIndexSampling performs a similar operation in the dataflow. An integer variable from the dataflow is filled with the index instead of generating a new pair. OpenMOLE implements the ZipWithIndexSampling through the keyword withIndex.

The following code snippet gives an example of how to use these two first Zip samplings.
val p1 = Val[Int]
val p2 = Val[Int]

val s1 = p1 in (0 to 100) // Code to build sampling 1
val s2 = p2 in (0 to 100) // Code to build sampling 2

// Create a sampling by zipping line by line s1 and s2
val s3 = s1 zip s2

// Create a sampling containing an id for each experiment in a variable called id
val id = Val[Int]
val s4 = s2 withIndex id

The third and last sampling from the Zip family is the ZipWithNameSampling. It maps the name the files from a FileDomain (see the next section for more details about exploring files) to a String variable in the dataflow. In the following excerpt, we map the name of the file and print it along to its size. In OpenMOLE file variables generally don't preserve the name of the file from which it was originally created. In order to save some output results depending on the input filename the filename should be transmitted in a variable of type String. When running this snippet, the file is renamed by the ScalaTask however, its name is saved in the name variable.
val file = Val[File]
val name = Val[String]
val size = Val[Long]

val t = ScalaTask("val size = new java.io.File(workDir, \"file\").length") set (
  inputFiles += (file, "file"),
  inputs += name,
  outputs += (name, size)
)

DirectSampling(
  sampling = file in (workDirectory / "dir") withName name,
  evaluation = (t hook ToStringHook())
)

If you need to go through several levels of files you may use a sampling like this one:
val dir = Val[File]
val dirName = Val[String]
val file = Val[File]
val fileName = Val[String]
val name = Val[String]
val size = Val[Long]

val t = ScalaTask("val size = file.length") set (
  inputs += file,
  outputs += size,
  (inputs, outputs) += (fileName, dirName)
)

DirectSampling(
  sampling =
    (dir in (workDirectory / "test") withName dirName) x
    (file in dir withName fileName),
  evaluation = t hook ToStringHook()
)

Take, filter, sample samplings 🔗

You can modify a Sampling using various operations in OpenMOLE.

When calling take N on a Sampling, along with N an integer, OpenMOLE will generate a new Sampling from the first N values of the initial Sampling.

Similarly, you can use sample N to create a new Sampling with N random values picked up at random from the initial Sampling.

More advanced Sampling reductions happen through filter ("predicate"). It filters out all the values from the initial Sampling for which the given predicate is wrong.

The 3 sampling operations presented in this section are put into play in the following example:
val p1 = Val[Int]
val p2 = Val[Int]

val s1 = p1 in (0 to 100) // Code to build sampling 1
val s2 = p2 in (0 to 100) // Code to build sampling 2

// Create a sampling containing the 10 first values of s1
val s3 = s1 take 10

// Create a new sampling containing only the lines of s1 for which the given predicate is true
val s4 = (s1 x s2) filter ("p1 + p2 < 100")

// Sample 5 values from s1
val s5 = s1 sample 5

Random samplings 🔗

OpenMOLE can generate random samplings from an initial sampling using shuffle that creates a new sampling which is a randomly shuffled version of the initial one.

OpenMOLE can also generate a fresh new Sampling made of random numbers using UniformDistribution[T], with T the type of random numbers to be generated.

Check the following script to discover how to use these random-based operations in a workflow:
val p1 = Val[Int]
val p2 = Val[Int]

val s1 = p1 in (0 to 100) // Code to build sampling 1
val s2 = p2 in (0 to 100) // Code to build sampling 2
// Create a sampling containing the values of s1 in a random order
val s6 = s1.shuffle

// Replicate 100 times the sampling s1 and provide seed for each experiment
val seed = Val[Int]
val s7 = s1 x (seed in (UniformDistribution[Int]() take 100))

Higher level samplings 🔗

Some sampling combinations generates higher level samplings such as repeat and bootstrap:
val i = Val[Int]

val s1 = i in (0 to 100)

// Re-sample 10 times s1, the output is an array of array of values
val s2 = s1 repeat 10

// Create 10 samples of 5 values from s1, it is equivalent to "s1 sample 5 repeat 10", the output type is an
// array of array of values
val s3 = s1 bootstrap (5, 10)

Here is how such higher level samplings would be used within a Mole:
// This code compute 10 couples (for f1 and f2) of medians among 5 samples picked at random in f1 x f2
val p1 = Val[Double]
val p2 = Val[Double]

val f1 = p1 in (0.0 to 1.0 by 0.1)
val f2 = p2 in (0.0 to 1.0 by 0.1)

val stat = ScalaTask("val p1 = input.p1.median; val p2 = input.p2.median") set (
  inputs += (p1.toArray, p2.toArray),
  outputs += (p1, p2)
)

DirectSampling(
  sampling = (f1 x f2) bootstrap (5, 10),
  evaluation = stat hook ToStringHook()
)

The is keyword 🔗

The is keyword can be use to assigned a value to a variable in a sampling. For instance:
val i = Val[Int]
val j = Val[Int]
val k = Val[Int]

DirectSampling(
  sampling =
    (i in (0 until 10)) x
    (j is "i * 2") x
    (k in Range[Int]("j", "j + 7")),
  evaluation = myModel
)