Skip to content

Slurm memory specification - Main Thread #2198

@rexcsn

Description

@rexcsn

Opening this issue as the main thread to collect slurm memory related info/concern/workarounds.

Issue

As mentioned by previously opened issues such as #1517 and #1714, due to changes from slurm, nodes for pcluster>=v2.5.0 are not configured with RealMemory information.
As a result, ParallelCluster currently does not support scheduling with memory options with slurm.

Workarounds

For pcluster>=v2.5.0<v2.9.0, workaround outlined here can be used to configure memory for cluster containing only 1 compute instance type.

For pcluster>=v2.9.0, multiple queue mode is introduced, and a cluster can now have multiple compute instance types.
Old workaround can still be used for cluster with only 1 compute instance type.
Here are the updated instruction on how to configure memory for multiple instance types in pcluster>=v2.9.0:

  • Determine the RealMemory available in the compute instance. We can get this by ssh into an available/online compute node and running /opt/slurm/sbin/slurmd -C, we should see something like RealMemory=<SOME_NUMBER> in the output.
  • Note that since we have multiple compute instance types, we will need to repeat step 1 for every instance type to get RealMemory information for each instance type.
  • Once we have the RealMemory information we need to add this information to the corresponding nodes in each queue/partition. We can do this by modifying the partition configuration file, located at /opt/slurm/etc/pcluster/slurm_parallelcluster_<PARTITION_NAME>_partition.conf.
  • Append RealMemory=<CHOSEN_MEMORY> to NodeName=<YOUR_NODE_NAME> ... entry for each instance type in each queue/partition.
  • For example, if I want to configure RealMemory=60000 for my nodes queue1-dy-m54xlarge-[1-10]. I would modify /opt/slurm/etc/pcluster/slurm_parallelcluster_queue1_partition.conf, and the modified file should look like:
$ cat /opt/slurm/etc/pcluster/slurm_parallelcluster_queue1_partition.conf 
# This file is automatically generated by pcluster

NodeName=queue1-dy-m54xlarge-[1-10] CPUs=16 State=CLOUD Feature=dynamic,m5.4xlarge RealMemory=60000
...
  • Note that ideally we should just use the RealMemory info we got from /opt/slurm/sbin/slurmd -C, but RealMemory might be different for different machines. If configured RealMemory is larger than the actual seen by /opt/slurm/sbin/slurmd -C when a new node launches, the node will be placed into DRAIN state by slurm automatically. To be safe, we want to round down the value.
  • In /opt/slurm/etc/slurm.conf change SelectTypeParameters from CR_CPU to CR_CPU_Memory
  • [Optional] pcluster's clustermgtd process will replace/terminate DRAINED nodes automatically, to disable this functionality and avoid nodes getting terminated automatically when setting up memory, add terminate_drain_nodes = False to clustermgtd configuration file at /etc/parallelcluster/slurm_plugin/parallelcluster_clustermgtd.conf. Once setup is finished, we can remove or set terminate_drain_nodes = True to restore fully clustermgtd functionalities.
  • Restart slurmd on compute nodes and slurmctld on head node, and we should see that memory is configured in scontrol show nodes

Further discussion

We understand that workarounds for this feature maybe difficult to setup manually.
Official support for this feature is not currently planned because there is not a good way to retrieve RealMemory from nodes and configure this information prior to launching the cluster. In addition, there is currently no way for slurm to configure this information for nodes automatically.
We will continue to evaluate on ways to add support for this feature.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions