Explore with Elementary Samplings

Suggest edits
Documentation > Explore > Samplings

Content:

1 - Grid sampling
2 - One factor at a time sampling


Grid sampling 🔗

A grid sampling (also called complete sampling) consists in evaluating every possible combination of the provided input values, for a reasonable number of dimensions and discretisation steps.

Method's score 🔗

image/svg+xml Output Exploration Input Exploration Sensitivity Optimisation

Grid sampling is a good way of getting a first glimpse at the output space of your model when you don't know anything about your input space structure. However, it will not give you any information ont the structure of the output space, as there is no reason for evenly spaced inputs to lead to evenly spaced outputs.
Grid sampling is hampered by input space dimensionality, as high dimension spaces need a lot of samples to be covered, as well as a lot of memory to store them.

Use within OpenMOLE 🔗

A grid sampling is declared via the DirectSampling constructor, in which the bounds and discretisation steps of each input to vary are declared:

val input_i = Val[Int]
val input_j = Val[Double]
val output1 = Val[Double]
val output2 = Val[Double]

DirectSampling(
  evaluation = my_own_model,
  sampling =
    (input_i in (0 to 10 by 2)) x
    (input_j in (0.0 to 5.0 by 0.5)),
  aggregation = Seq(output1 evaluate median, output2)
) hook display

with
  • evaluation is the task (or composition of tasks) that uses your inputs, typically your model task and a hook,
  • sampling is the sampling task,
  • aggregation (optional) is some aggregation functions to be called on the outputs of your evaluation task. The format is variable evaluate function. OpenMOLE provides some aggregation functions to such as: median, medianAbsoluteDeviation, average, meanSquaredError, rootMeanSquaredError. If no a variable is listed and no aggregate function is provided, the values are aggregated in a array.
For Double sequence samplings, a convenient primitive provides logarithmic ranges the following way: input_j in LogRangeDomain(min,max,number_of_steps) where the third argument is the number of steps in the range. The syntax can be simplified with the logSteps keyword like this: input_j in (Range(1e-2,0.1) logSteps 4. The hook keyword is used to save or display results generated during the execution of a workflow. The generic way to use it is to write either hook(workDirectory / "path/of/a/file.csv") to save the results in a CSV file, or hook display to display the results in the standard output. See this page for more details about this hook.

Use example 🔗

Here is a dummy workflow showing the exploration of a Java model, that takes an integer value as input, and generates a string as output:

// Inputs and outputs declaration
val i = Val[Int]
val o = Val[Double]
// Defines the model
val myModel =
  ScalaTask("val o = i * 2") set (
    inputs += i,
    outputs += (i, o)
  )


DirectSampling(
  evaluation = myModel hook display,
  sampling = i in (0 to 10 by 1),
  aggregation = Seq(o evaluate average)
) hook display

Some details:
  • myModel is the task that multiply the input by 2,
  • the evaluation attribute of the DirectSampling method is the composition of myModel and a hook,
  • the aggregation attribute of the DirectSampling method is set to computes the average upon the values of o,
  • the task declared under the name DirectSampling is a DirectSampling task, which means it will generate parallel executions of myModel, one for each sample generated by the sampling task.

One factor at a time sampling 🔗

In the case of models requiring a long time to run, or for preliminary experiments, one may want to proceed to a sampling similar to the grid sampling, with a reduced number of total runs. For this, one can vary each factor successively in its domain, the others being fixed to a nominal value. Note that this type of sampling first will necessarily miss potential interactions between factors, and secondly will explore only a very small fraction of the parameter space. The computational load in terms of number of model runs will then be only the sum of the sizes of factor domains, instead of their product in the case of a full grid. For example, with two factors x1 and x2 varying each between 0 and 1 with a step of 0.1, if their nominal value is 0.5, the one factor sampling will take first x1 = 0, 0.1, ... , 1} while @b{x2 = 0.5}, and then the contrary.

Use within OpenMOLE 🔗

The sampling primitive OneFactorSampling does so and takes as arguments any number of factors decorated by the keyword nominal and the nominal value. It is used as follows in an example with a DirectSampling:

val x1 = Val[Double]
val x2 = Val[Double]
val o = Val[Double]

val myModel = ScalaTask("val o = x1 + x2") set (
    inputs += (x1,x2),
    outputs += (x1,x2, o)
  )

DirectSampling(
  evaluation = myModel hook display,
  sampling = OneFactorSampling(
    (x1 in (0.0 to 1.0 by 0.2)) nominal 0.5,
    (x2 in (0.0 to 1.0 by 0.2)) nominal 0.5
  )
)

The hook keyword is used to save or display results generated during the execution of a workflow. The generic way to use it is to write either hook(workDirectory / "path/of/a/file.csv") to save the results in a CSV file, or hook display to display the results in the standard output. See this page for more details about this hook.