Documentation > Explore > Samplings
The
To explore files located in several directories of your workspace, use:
"subdir${i}" allows you to select one single file for each value of
In this case, you might prefer the paths selector instead. Paths works exactly like files and accept the very same options. The only difference between the two selectors is that
However, a path describes a file's location but not its content. The explored files won't be automatically copied by OpenMOLE when using
More details on the difference between manipulating
Exploring a set of files 🔗
Data processing often involves manipulating a large number of files, and browsing through these files. This is why OpenMOLE features some file exploration functions to manipulate your datasets as easily as possible.Explore files in one directory 🔗
For instance, to run a model over a set of files in the subdirectorydir
of your workspace, you may use:
val f = Val[File]
DirectSampling(
evaluation = myModel,
sampling = f in (workDirectory / "dir")
)
The
filter
modifier filters the initial file according to a predicate (here f
needs to be a directory whose name begins with exp).
You can filter using any function taking a File
and computing a Boolean
(see the corresponding javadoc or create your own).
Some predicate functions available out of the box are startsWith()
, contains()
, or endsWith()
:
val f = Val[File]
DirectSampling(
evaluation = myModel,
sampling = (f in (workDirectory / "dir").filter(_.getName.endsWith(".nii.gz")) )
)
Explore files in several subdirectories 🔗
Searching in deep file trees can be very time consuming and irrelevant if you know how your data is organised. By default, the file selector only explores the direct level under the directory you have passed as a parameter. If you want it to explore the whole file tree, you can set therecursive
option to true
.
To explore files located in several directories of your workspace, use:
val i = Val[Int]
val f = Val[File]
DirectSampling(
evaluation = myModel,
sampling =
(i in (0 to 10)) x
(f in (workDirectory / "dir").files("subdir${i}", recursive = true).filter(f => f.isDirectory && f.getName.startsWith("exp")))
)
"subdir${i}" allows you to select one single file for each value of
i
.
Files vs Paths 🔗
As its name suggests, thefiles
selector manipulates File
instances and directly injects them in the dataflow.
If you plan to delegate your workflow to a local cluster environment equipped with a shared file system across all nodes, you don't need data to be automatically copied by OpenMOLE.
In this case, you might prefer the paths selector instead. Paths works exactly like files and accept the very same options. The only difference between the two selectors is that
paths
will inject Path
variables in the dataflow.
However, a path describes a file's location but not its content. The explored files won't be automatically copied by OpenMOLE when using
Path
, so this does not fit a grid environment for instance:
import java.nio.file.Path
val dataDir = "/vol/vipdata/data/HCP100"
val subjectPath = Val[Path]
val subjectID = Val[String]
DirectSampling(
evaluation = myModel,
sampling = subjectPath in File(dataDir).paths(filter=".*\\.nii.gz") withName subjectID
)
More details on the difference between manipulating
Files
and Paths
can be found in the dedicated entry of the FAQ.
Going further 🔗
You can find full examples using OpenMOLE's capabilities to process a dataset in the following entries of the marketplace:- FSL-Fast
- Random Forest