ModelsEnvironments
Explore with Data Processing

Data processing often revolves about massive computation using big bunch of files. Model inputs come in many shapes; this is why OpenMOLE features some file handlings functions to explore them as easily as possible.

Exploring a set of files


OpenMOLE introduces the concept of Domains as a variable ranging along a set of files. For instance, to run a program over a set of files in a subdirectory you may use:
val f = Val[File]
val explo = ExplorationTask (f in (workDirectory / "dir"))

To explore files located in several directories:
val i = Val[Int]
val f = Val[File]

val explo =
  ExplorationTask (
    (i in (0 to 10)) x
    (f in (workDirectory / "dir").files("subdir${i}", recursive = true).filter(f => f.isDirectory && f.getName.startsWith("exp")))
  )

The filter modifier filters according to some rules, use the filter function . You can filter using any function from File (see javadoc) to Boolean, such as hl.highlight("startsWith(), contains(), endsWith() ", "plain").
val f = Val[File]

val explo =
  ExplorationTask ( (f in (workDirectory / "dir") filter(_.getName.endsWith(".nii.gz")) ) )

Searching in deep file trees can be very time consuming and irrelevant in some cases where you know how your data is organised. By default the file selector only explores the direct level under the directory you've passed as a parameter. If you want it to explore the whole file tree, you can set the option recursive to true as in files(recursive = true).


As its name suggests, the files selector manipulates File instances and directly injects them in the dataflow. If you plan to delegate your workflow to a local cluster environment equipped with a shared file system across all nodes, you don't need data to be automatically copied by OpenMOLE. In this case, you might prefer the paths selector instead. Paths works exactly like files and accepts the very same options. The only difference between the two selectors is that paths will inject Path variables in the dataflow. Path describes a file's location but not its content. The explored files won't be automatically copied by OpenMOLE in this case, so this does not fit a grid environment for instance.
More details on the difference between manipulating Files and Paths can be found in the dedicated entry of the FAQ.

If you wish to select one single file for each value of i you may use the select operation:
val i = Val[Int]
val f = Val[File]

val explo =
  ExplorationTask (
    (i in (0 to 10)) x
    (f in File("/path/to/a/dir").select("file${i}.txt"))
  )

Files can also be injected in the dataflow through Sources. They provide more powerful file filtering possibilities using regular expressions and can also target directories only.