Documentation > Explore > Samplings
Grid sampling is a good way of getting a first glimpse at the output space of your model when you don't know anything about your input space structure. However, it will not give you any information ont the structure of the output space, as there is no reason for evenly spaced inputs to lead to evenly spaced outputs.
Grid sampling is hampered by input space dimensionality, as high dimension spaces need a lot of samples to be covered, as well as a lot of memory to store them.
with
Some details:
The
Grid sampling 🔗
A grid sampling (also called complete sampling) consists in evaluating every possible combination of the provided input values, for a reasonable number of dimensions and discretisation steps.Method's score 🔗
Grid sampling is a good way of getting a first glimpse at the output space of your model when you don't know anything about your input space structure. However, it will not give you any information ont the structure of the output space, as there is no reason for evenly spaced inputs to lead to evenly spaced outputs.
Grid sampling is hampered by input space dimensionality, as high dimension spaces need a lot of samples to be covered, as well as a lot of memory to store them.
Use within OpenMOLE 🔗
A grid sampling is declared via theDirectSampling
constructor, in which the bounds and discretisation steps of each input to vary are declared:
val input_i = Val[Int]
val input_j = Val[Double]
val output1 = Val[Double]
val output2 = Val[Double]
DirectSampling(
evaluation = my_own_model,
sampling =
(input_i in (0 to 10 by 2)) x
(input_j in (0.0 to 5.0 by 0.5)),
aggregation = Seq(output1 evaluate median, output2)
) hook display
with
evaluation
is the task (or composition of tasks) that uses your inputs, typically your model task and a hook,sampling
is the sampling task,aggregation
(optional) is some aggregation functions to be called on the outputs of your evaluation task. The format isvariable evaluate function
. OpenMOLE provides some aggregation functions to such as:median, medianAbsoluteDeviation, average, meanSquaredError, rootMeanSquaredError
. If no a variable is listed and no aggregate function is provided, the values are aggregated in a array.
input_j in LogRangeDomain(min,max,number_of_steps)
where the third argument is the number of steps in the range. The syntax can be simplified with the logSteps
keyword like this: input_j in (Range(1e-2,0.1) logSteps 4
.
The hook
keyword is used to save or display results generated during the execution of a workflow.
The generic way to use it is to write either hook(workDirectory / "path/of/a/file")
to save the results in a file, or hook display
to display the results in the standard output.
See this page for more details about this hook.
Use example 🔗
Here is a dummy workflow showing the exploration of a Java model, that takes an integer value as input, and generates a string as output:// Inputs and outputs declaration
val i = Val[Int]
val o = Val[Double]
// Defines the model
val myModel =
ScalaTask("val o = i * 2") set (
inputs += i,
outputs += (i, o)
)
DirectSampling(
evaluation = myModel hook display,
sampling = i in (0 to 10 by 1),
aggregation = Seq(o evaluate average)
) hook display
Some details:
myModel
is the task that multiply the input by 2,- the
evaluation
attribute of theDirectSampling
method is the composition of myModel and a hook, - the
aggregation
attribute of theDirectSampling
method is set to computes the average upon the values of o, - the task declared under the name
DirectSampling
is a DirectSampling task, which means it will generate parallel executions ofmyModel
, one for each sample generated by the sampling task.
One factor at a time sampling 🔗
In the case of models requiring a long time to run, or for preliminary experiments, one may want to proceed to a sampling similar to the grid sampling, with a reduced number of total runs. For this, one can vary each factor successively in its domain, the others being fixed to a nominal value. Note that this type of sampling first will necessarily miss potential interactions between factors, and secondly will explore only a very small fraction of the parameter space. The computational load in terms of number of model runs will then be only the sum of the sizes of factor domains, instead of their product in the case of a full grid. For example, with two factors x1 and x2 varying each between 0 and 1 with a step of 0.1, if their nominal value is 0.5, the one factor sampling will take first x1 = 0, 0.1, ... , 1} while @b{x2 = 0.5}, and then the contrary.Use within OpenMOLE 🔗
The sampling primitiveOneFactorSampling
does so and takes as arguments any number of factors decorated by the keyword nominal
and the nominal value.
It is used as follows in an example with a DirectSampling
:
val x1 = Val[Double]
val x2 = Val[Double]
val o = Val[Double]
val myModel = ScalaTask("val o = x1 + x2") set (
inputs += (x1,x2),
outputs += (x1,x2, o)
)
DirectSampling(
evaluation = myModel hook display,
sampling = OneFactorSampling(
(x1 in (0.0 to 1.0 by 0.2)) nominal 0.5,
(x2 in (0.0 to 1.0 by 0.2)) nominal 0.5
)
)
The
hook
keyword is used to save or display results generated during the execution of a workflow.
The generic way to use it is to write either hook(workDirectory / "path/of/a/file")
to save the results in a file, or hook display
to display the results in the standard output.
See this page for more details about this hook.