Scale up on Cluster

Suggest edits
Documentation > Scale Up

Content:

1 - Using clusters with OpenMOLE
2 - PBS
3 - SGE
4 - Slurm
5 - Condor
6 - OAR


Using clusters with OpenMOLE 🔗

Batch systems 🔗

Many distributed computing environments offer batch processing capabilities. OpenMOLE supports most of the batch systems.
Batch systems generally work by exposing an entry point on which the user can log in and submit jobs. OpenMOLE accesses this entry point using SSH. Different environments can be assigned to delegate the workload resulting of different tasks or groups of tasks. However, not all clusters expose the same features, so options may vary from one environment to another.

Before being able to use a batch system, you should first provide your authentication information to OpenMOLE.

Grouping 🔗

You should also note that the use of a batch environment is generally not suited for short tasks, i.e. less than 1 minute for a cluster. In case your tasks are short, you can group several executions with the keyword by in your workflow. For instance, the workflow below groups the execution of model by 100 in each job submitted to the environment:

// Define the variables that are transmitted between the tasks
val i = Val[Double]
val res = Val[Double]

// Define the model, here it is a simple task executing "res = i * 2", but it can be your model
val model =
  ScalaTask("val res = i * 2") set (
    inputs += i,
    outputs += (i, res)
  )

// Define a local environment
val env = LocalEnvironment(10)

// Make the model run on the the local environment
DirectSampling(
  evaluation = model on env by 100 hook display,
  sampling = i in (0.0 to 1000.0 by 1.0)
)

PBS 🔗

PBS is a venerable batch system for clusters. It is also referred to as Torque. You may use a PBS computing environment as follows:

val env =
  PBSEnvironment(
    "login",
    "machine.domain"
  )

You also can set options by providing additional parameters to the environment (..., option = value, ...):
  • port: the port number used by the ssh server, by default it is set to 22,
  • sharedDirectory: OpenMOLE uses this directory to communicate from the head node of the cluster to the worker nodes (defaults to sharedDirectory = "/home/user/.openmole/.tmp/ssh",
  • storageSharedLocally: When set to true, OpenMOLE will use symbolic links instead of physically copying files to the remote environment. This assumes that the OpenMOLE instance has access to the same storage space as the remote environment (think same NFS filesystem on desktop machine and cluster). Defaults to false and shouldn't be used unless you're 100% sure of what you're doing!,
  • workDirectory: the directory in which OpenMOLE will execute on the remote server, for instance workDirectory = "${TMP}",
  • queue: the name of the queue on which jobs will be submitted, for instance queue = "longjobs",
  • wallTime: the maximum time a job is permitted to run before being killed, for instance wallTime = 1 hour,
  • memory: the memory for the job, for instance memory = 2 gigabytes,
  • openMOLEMemory: the memory of attributed to the OpenMOLE runtime on the execution node, if you run external tasks you can reduce the memory for the OpenMOLE runtime to 256MB in order to have more memory for you program on the execution node, for instance openMOLEMemory = 256 megabytes,
  • nodes: Number of nodes requested,
  • threads: the number of threads for concurrent execution of tasks on the worker node, for instance threads = 4,
  • coreByNodes: An alternative to specifying the number of threads. coreByNodes takes the value of the threads when not specified, or 1 if none of them is specified.
  • flavour: Specify the declination of PBS installed on your cluster. You can choose between Torque (for the open source PBS/Torque) or PBSPro. Defaults to flavour = Torque
  • localSubmission: set to true if you are running OpenMOLE from a node of the cluster (useful for example if you have a cluster that you can only ssh behind a VPN but you can not set up the VPN where your OpenMOLE is running); user and host are not mandatory in this case.

SGE 🔗

To delegate some computation load to a SGE based cluster you can use the SGEEnvironment as follows:

val env =
  SGEEnvironment(
    "login",
    "machine.domain"
  )

You also can set options by providing additional parameters to the environment (..., option = value, ...):
  • port: the port number used by the ssh server, by default it is set to 22,
  • sharedDirectory: OpenMOLE uses this directory to communicate from the head node of the cluster to the worker nodes (defaults to sharedDirectory = "/home/user/.openmole/.tmp/ssh",
  • storageSharedLocally: When set to true, OpenMOLE will use symbolic links instead of physically copying files to the remote environment. This assumes that the OpenMOLE instance has access to the same storage space as the remote environment (think same NFS filesystem on desktop machine and cluster). Defaults to false and shouldn't be used unless you're 100% sure of what you're doing!
  • workDirectory: the directory in which OpenMOLE will execute on the remote server, for instance workDirectory = "${TMP}",
  • queue: the name of the queue on which jobs will be submitted, for instance queue = "longjobs",
  • wallTime: the maximum time a job is permitted to run before being killed, for instance wallTime = 1 hour,
  • memory: the memory for the job, for instance memory = 2 gigabytes,
  • openMOLEMemory: the memory of attributed to the OpenMOLE runtime on the execution node, if you run external tasks you can reduce the memory for the OpenMOLE runtime to 256MB in order to have more memory for you program on the execution node, for instance openMOLEMemory = 256 megabytes,
  • threads: the number of threads for concurrent execution of tasks on the worker node, for instance threads = 4,
  • localSubmission: set to true if you are running OpenMOLE from a node of the cluster (useful for example if you have a cluster that you can only ssh behind a VPN but you can not set up the VPN where your OpenMOLE is running); user and host are not mandatory in this case.

Slurm 🔗

To delegate the workload to a Slurm based cluster you can use the SLURMEnvironment as follows:

val env =
  SLURMEnvironment(
    "login",
    "machine.domain",
    // optional parameters
    partition = "short-jobs",
    time = 1 hour
  )

You also can set options by providing additional parameters to the environment (..., option = value, ...):
  • port: the port number used by the ssh server, by default it is set to 22,
  • sharedDirectory: OpenMOLE uses this directory to communicate from the head node of the cluster to the worker nodes (defaults to sharedDirectory = "/home/user/.openmole/.tmp/ssh",
  • storageSharedLocally: When set to true, OpenMOLE will use symbolic links instead of physically copying files to the remote environment. This assumes that the OpenMOLE instance has access to the same storage space as the remote environment (think same NFS filesystem on desktop machine and cluster). Defaults to false and shouldn't be used unless you're 100% sure of what you're doing!
  • workDirectory: the directory in which OpenMOLE will execute on the remote server, for instance workDirectory = "${TMP}",
  • partition: the name of the queue on which jobs will be submitted, for instance partition = "longjobs",
  • time: the maximum time a job is permitted to run before being killed, for instance time = 1 hour,
  • memory: the memory for the job, for instance memory = 2 gigabytes,
  • openMOLEMemory: the memory of attributed to the OpenMOLE runtime on the execution node, if you run external tasks you can reduce the memory for the OpenMOLE runtime to 256MB in order to have more memory for you program on the execution node, for instance openMOLEMemory = 256 megabytes,
  • nodes: Number of nodes requested,
  • threads: the number of threads for concurrent execution of tasks on the worker node, for instance threads = 4, it automatically sets the cpuPerTask entry,
  • cpuPerTask: An alternative to specifying the number of threads. cpuPerTask takes the value of the threads when not specified, or 1 if none of them is specified.
  • reservation: name of a SLURM reservation,
  • qos: Quality of Service (QOS) as defined in the Slurm database
  • gres: a list of Generic Resource (GRES) requested. A Gres is a pair defined by the name of the resource and the number of resources requested (scalar). For instance gres = List( Gres("resource", 1) )
  • constraints: a list of SLURM defined constraints which selected nodes must match,
  • localSubmission: set to true if you are running OpenMOLE from a node of the cluster (useful for example if you have a cluster that you can only ssh behind a VPN but you can not set up the VPN where your OpenMOLE is running); user and host are not mandatory in this case.

Condor 🔗

Condor clusters can be leveraged using the following syntax:

val env =
  CondorEnvironment(
    "login",
    "machine.domain"
  )

You also can set options by providing additional parameters to the environment (..., option = value, ...):
  • port: the port number used by the ssh server, by default it is set to 22,
  • sharedDirectory: OpenMOLE uses this directory to communicate from the head node of the cluster to the worker nodes (defaults to sharedDirectory = "/home/user/.openmole/.tmp/ssh",
  • storageSharedLocally: When set to true, OpenMOLE will use symbolic links instead of physically copying files to the remote environment. This assumes that the OpenMOLE instance has access to the same storage space as the remote environment (think same NFS filesystem on desktop machine and cluster). Defaults to false and shouldn't be used unless you're 100% sure of what you're doing!
  • workDirectory: the directory in which OpenMOLE will execute on the remote server, for instance workDirectory = "${TMP}",
  • memory: the memory for the job, for instance memory = 2 gigabytes,
  • openMOLEMemory: the memory of attributed to the OpenMOLE runtime on the execution node, if you run external tasks you can reduce the memory for the OpenMOLE runtime to 256MB in order to have more memory for you program on the execution node, for instance openMOLEMemory = 256 megabytes,
  • threads: the number of threads for concurrent execution of tasks on the worker node, for instance threads = 4,
  • localSubmission: set to true if you are running OpenMOLE from a node of the cluster (useful for example if you have a cluster that you can only ssh behind a VPN but you can not set up the VPN where your OpenMOLE is running); user and host are not mandatory in this case.

OAR 🔗

Similarly, OAR clusters are reached as follows:

val env =
  OAREnvironment(
    "login",
    "machine.domain"
  )

You also can set options by providing additional parameters to the environment (..., option = value, ...):
  • port: the port number used by the ssh server, by default it is set to 22,
  • sharedDirectory: OpenMOLE uses this directory to communicate from the head node of the cluster to the worker nodes (defaults to sharedDirectory = "/home/user/.openmole/.tmp/ssh",
  • storageSharedLocally: When set to true, OpenMOLE will use symbolic links instead of physically copying files to the remote environment. This assumes that the OpenMOLE instance has access to the same storage space as the remote environment (think same NFS filesystem on desktop machine and cluster). Defaults to false and shouldn't be used unless you're 100% sure of what you're doing!
  • workDirectory: the directory in which OpenMOLE will execute on the remote server, for instance workDirectory = "${TMP}",
  • queue: the name of the queue on which jobs will be submitted, for instance queue = "longjobs",
  • wallTime: the maximum time a job is permitted to run before being killed, for instance wallTime = 1 hour,
  • openMOLEMemory: the memory of attributed to the OpenMOLE runtime on the execution node, if you run external tasks you can reduce the memory for the OpenMOLE runtime to 256MB in order to have more memory for you program on the execution node, for instance openMOLEMemory = 256 megabytes,
  • threads: the number of threads for concurrent execution of tasks on the worker node, for instance threads = 4,
  • core: number of cores allocated for each job,
  • cpu: number of CPUs allocated for each job,
  • bestEffort: a boolean for setting the best effort mode (true by default),
  • localSubmission: set to true if you are running OpenMOLE from a node of the cluster (useful for example if you have a cluster that you can only ssh behind a VPN but you can not set up the VPN where your OpenMOLE is running); user and host are not mandatory in this case.