Pattern Space Exploration

Explore with PSE

Content:

PSE description 🔗

The Pattern Space Exploration (PSE) method is used to explore the output's diversity of a model. Input parameter values are selected to produce new output values, such that as the exploration progresses, the region of the output space that is covered gets bigger. PSE reveals the potential of your model: the variety of dynamics it is able to produce, even those you were not investigating in the first place!

Method's score 🔗

The PSE method is designed to cover the output space, hence it gets the highest possible score in output exploration. Since PSE is all about covering output space, it gets low scores in optimization and input space exploration. As the method discovers new patterns in the output space, you can get some insight about the sensitivity of the model by looking at the input values leading to these patterns. Contrarily to calibration-based methods, PSE is sensitive to the dimensionality of the output space, as it records all the locations that were covered during the exploration. This can quickly become costly for more than three or four dimensions.
PSE handles stochasticity in the sense that the selected patterns are estimated by the median of several model execution output values.

How it works 🔗

The PSE method searches for diverse output values. As with all evolutionary algorithms, PSE generates new individuals through a combination of genetic inheritance from the parent individuals and mutation. PSE (inspired by novelty search selects for the parents whose output values are rare compared to the rest of the population and to the previous generations. In order to evaluate the rarity of an output value, PSE discretises the output space, dividing it into cells. Each time a simulation is run and its output is known, a counter is incremented in the corresponding cell. PSE preferentially selects the parents whose associated cells have low counters. By selecting parents with rare output values, we try and increase the chances to produce new individuals with previously unobserved behaviours.

Run

PSE within OpenMOLE 🔗

Specific constructor 🔗

The OpenMOLE constructor for PSE is PSEEvolution. It takes the following parameters:

evaluation the OpenMOLE task that runs the simulation, i.e. the model,
parallelism the number of simulations that will be run in parallel,
termination the total number of evaluations to be executed,
genome a list of the model parameters and their respective variation intervals,
objective a list of indicators measured for each evaluation of the model within which we search for diversity, with a discretization step,
stochastic the seed generator, which generates suitable seeds for the method. Mandatory if your model contains randomness. The generated seed for the model task is transmitted through the variable given as an argument of @code{Stochastic} (here myseed).
accept: (optional) a predicate which must be true for genomes that can be accepted by the genome sampler (for instance \"i1 > 50\").

Hook 🔗

The outputs of PSE must be captured with a hook. The generic way to use it is to write either hook(workDirectory / "path/of/a/file") to save the results in a OMR file, or hook display to display the results in the standard output.
The hook arguments for the PSEEvolution are:

output: the file in which to store the results,
keepHistory: optional, Boolean, keep the history of the results for future analysis,
frequency: optional, Long, the frequency in generations where the result should be saved, it is generally set to avoid using too much disk space,
keepAll: optional, Boolean, save all the individuals of the population not only the optimal ones.

For more details about hooks, check the corresponding Language page.

Use example 🔗

Here is a use example of the PSE method in an OpenMOLE script:

// Seed declaration for random number generation
val myseed = Val[Int]

val param1 = Val[Double]
val param2 = Val[Double]
val output1 = Val[Double]
val output2 = Val[Double]

// PSE method
PSEEvolution(
  evaluation = modelTask,
  parallelism = 10,
  termination = 100,
  genome = Seq(
    param1  in (0.0, 1.0),
    param2 in (-10.0, 10.0)),
  objective = Seq(
    output1 in (0.0 to 40.0 by 5.0),
    output2 in (0.0 to 4000.0 by 50.0)),
  stochastic = Stochastic(seed = myseed)
) hook (workDirectory / "results", frequency = 100)

Where param1 and param2 are inputs of the task running the model, and output1 and output2 are outputs of that same task. The number of inputs and outputs are unlimited.

Note that this method is subject to the curse of dimensionality on the output space, meaning that the number of output patterns can grow as a power of the number of output variables. With more than just a few output variables, the search space may become so big that the search will take too long to complete and the search results will take more memory than a modern computer can handle. Restricting the number of output variables to 2 or 3 also facilitates the interpretation of the results, making them easy to visualise.

The PSE method is described in the following scientific paper :
Guillaume Chérel, Clémentine Cottineau and Romain Reuillon, « Beyond Corroboration: Strengthening Model Validation by Looking for Unexpected Patterns» published in PLOS ONE 10(9), 2015.
[online version] [bibteX]

Stochastic models 🔗

You can check additional options to run PSE on stochastic models on this page.