Explore: Calibration

Suggest edits
Documentation > Explore

Contents

Using Genetic Algorithms (GA), OpenMOLE finds the input set matching one or several criteria: this is called calibration. In practice, calibration is used to target one specific scenario or dynamic. Usually, one uses a fitness function that is commensurable to a distance from your target, so if your model is not able to match the objective dynamic, the calibration will find the parameterization leading to the closest possible dynamic. For more details on the calibration using genetic algorithms, see the GA detailed page.

Single criterion Calibration 🔗

image/svg+xml Output Exploration Input Exploration Sensitivity Optimisation

Method scores 🔗

The single criterion calibration method is designed to solve an optimization problem, so unsurprisingly it performs well regarding the optimization grade. Since it is only focused towards discovering the best performing individual (parameter set), this method doesn't provide insights about the model sensitivity regarding its input parameters, as it does not keep full records of the past input samplings leading to the optimal solution.
For the same reason, this method does not intend to cover the entirety of the input and output spaces, and thus does not perform well regarding the input and output exploration grades. It concentrates the sampling of the input space towards the part wich minimizes the fitness, and therefore intentionally neglects the part of the input space leading to high fitness values. Calibration can handle stochasticity, using a specific method.
The dimensionality of the model input space is generally not an issue for this method, as it can handle up to a hundred dimensions in some cases. However, the number of objectives (output dimensionality) should be limited to a maximum of 2 or 3 compromise objectives.


Single criterion calibration answers the following question: for a given target value of output o1, what is(are) the parameter set(s) (i, j , k) that produce the closest values of o1 to the target?

Typed signature 🔗

The single criterion calibration of a model can be seen as a general function, whose signature can be typed, and be noted as:

Calibrationsingle : (X → Y) → Y → X

With X the input space, and Y the output space.

In other words, this function takes a model M whose signature is (X→Y) (since it transforms inputs into outputs), an element y of Y representing the criterion value to reach, and finds an element x of X such that M(x) is the "closest" possible value to y.

Multi-criteria Calibration 🔗

image/svg+xml Output Exploration Input Exploration Sensitivity Optimisation

Method scores 🔗

The multi criteria calibration method slightly differs from the single criterion version. It suffers the same limitations regarding input and output spaces. However, since it may reveal a Pareto frontier and the underlying trade-off, it reveals a little bit of the model sensitivity, showing that the model performance regarding a criterion is impacted by its performances regarding the others. This is not genuine sensitivity as in sensitivity analysis, but still, it reveals a variation of your model outputs, which is not bad after all!


Multi criteria calibration answers the following question: for a given target pattern (o1,o2), what are the parameter sets (i,j) that produce the closest output values to the target pattern ?
Sometimes a Pareto Frontier may appear!

Differences between single and multi criteria calibration 🔗

Calibration boils down to minimizing a distance measure between the model output and some data. When there is only a single distance measure considered, it is single criterion calibration. When there are more than one distances that matter, it is multi-criteria calibration. For example, one may study a prey-predator model and want to find parameter values for which the model reproduces some expected size of both the prey and predator populations.

The single criterion case is simpler, because we can always tell which distance is smaller between any two distances. Thus, we can always select the best set of parameter values.
In the multi criteria case, it may not always be possible to tell which simulation output has the smallest distance to the data. For example, consider a pair (d1, d2) representing the differences between the model output and the data for the prey population size (d1) and the predator population size (d2). Two pairs such as (10, 50) and (50, 10) each have one element smaller than the other and one bigger. There is no natural way to tell which pair represents the smallest distance between the model and the data. Thus, in the multi-criteria case, we keep all the parameter sets (e.g. {(i1, j1, k1), (i2, j2, k2), ...}) yielding such equivalent distances (e.g. {(d11, d21), (d12, d22), ...}). The set of all these parameter sets is called the Pareto front.

Typed signature 🔗

The signature function of multi-criteria calibration can be typed as:

Calibrationmulti : (X → Y) → Y → [X]

With X the input space, and Y the output space.

In other words, this function takes a model M whose signature is (X→Y), an element y of Y representing the list of criteria values to reach, and finds a list x of elements of X such that M(x) is Pareto dominant compared to every image of other elements of X by M, regarding criterion y.

Calibration with OpenMOLE 🔗

Single and multi-criteria calibration in OpenMOLE are both done with the NSGA2 algorithm. It takes the following parameters:
  • mu: the population size,
  • genome: a list of the model parameters and their respective variation intervals,
  • objectives: a list of the distance measures (which in the single criterion case will contain only one measure),
  • algorithm: the nsga2 algorithm defining the evolution,
  • evaluation: the OpenMOLE task that runs the simulation,
  • parallelism: the number of simulations that will be run in parallel,
  • termination: the total number of evaluations (execution of the task passed to the parameter "evaluation") to be executed.

In your OpenMOLE script, the NSGA2 algorithm scheme is defined like so:

val param1 = Val[Double]
val param2 = Val[Double]
val param3 = Val[Int]
val param4 = Val[String]

val nsga2 =
  NSGA2Evolution(
    evaluation = modelTask,
    parallelism = 10,
    termination = 100,
    genome = Seq(
      param1 in (0.0, 99.0),
      param2 in (0.0, 99.0),
      param3 in (0, 5),
      param4 in List("apple", "banana", "strawberry")),
    objectives = Seq(distance1, distance2)
  )

Where param1, param2, param3 and param4 are inputs of the modelTask, and distance1 and distance2 are its outputs.

The output of the genetic algorithm must be captured with a specific hook which saves the current optimal population: SavePopulationHook. The arguments for SavePopulationHook are:
  • evolution: the genetic algorithm,
  • dir: the directory in which to store the population files,
  • frequency: (Optional, Long) the frequency at which the generations should be saved.
If you only want to get the final population and do not care about dynamics of the algorithm, you can use a SaveLastPopulationHook which saves the last population only, and takes as arguments the algorithm and the file in which it will be saved.

A minimal working workflow is given below:

val param1 = Val[Double]
val param2 = Val[Double]
val param3 = Val[Int]
val param4 = Val[String]

val distance1 = Val[Double]
val distance2 = Val[Double]

val modelTask = ScalaTask("val distance1 = math.abs(param1 - 10.0) + math.abs(param2 - 50.0); val distance2 = (param3 == 2).toInt + (param4 == \"apple\").toInt") set (
    inputs += (param1,param2, param3, param4),
    outputs += (distance1, distance2)
    )

val nsga2 =
  NSGA2Evolution(
    evaluation = modelTask,
    parallelism = 10,
    termination = 100,
    genome = Seq(
      param1 in (0.0, 99.0),
      param2 in (0.0, 99.0),
      param3 in (0, 5),
      param4 in List("apple", "banana", "strawberry")),
    objectives = Seq(distance1, distance2)
  )

  val savePopulation = SavePopulationHook(nsga2, workDirectory / "evolution/")

  (nsga2 hook savePopulation on LocalEnvironment(4))


More details and advanced notions can be found on the GA detailed page.