-
Notifications
You must be signed in to change notification settings - Fork 315
Description
Opening this issue as the main thread to collect slurm memory related info/concern/workarounds.
Issue
As mentioned by previously opened issues such as #1517 and #1714, due to changes from slurm, nodes for pcluster>=v2.5.0 are not configured with RealMemory
information.
As a result, ParallelCluster currently does not support scheduling with memory options with slurm.
Workarounds
For pcluster>=v2.5.0<v2.9.0, workaround outlined here can be used to configure memory for cluster containing only 1 compute instance type.
For pcluster>=v2.9.0, multiple queue mode is introduced, and a cluster can now have multiple compute instance types.
Old workaround can still be used for cluster with only 1 compute instance type.
Here are the updated instruction on how to configure memory for multiple instance types in pcluster>=v2.9.0:
- Determine the RealMemory available in the compute instance. We can get this by ssh into an available/online compute node and running
/opt/slurm/sbin/slurmd -C
, we should see something likeRealMemory=<SOME_NUMBER>
in the output. - Note that since we have multiple compute instance types, we will need to repeat step 1 for every instance type to get
RealMemory
information for each instance type. - Once we have the
RealMemory
information we need to add this information to the corresponding nodes in each queue/partition. We can do this by modifying the partition configuration file, located at/opt/slurm/etc/pcluster/slurm_parallelcluster_<PARTITION_NAME>_partition.conf
. - Append
RealMemory=<CHOSEN_MEMORY>
toNodeName=<YOUR_NODE_NAME> ...
entry for each instance type in each queue/partition. - For example, if I want to configure
RealMemory=60000
for my nodesqueue1-dy-m54xlarge-[1-10]
. I would modify/opt/slurm/etc/pcluster/slurm_parallelcluster_queue1_partition.conf
, and the modified file should look like:
$ cat /opt/slurm/etc/pcluster/slurm_parallelcluster_queue1_partition.conf
# This file is automatically generated by pcluster
NodeName=queue1-dy-m54xlarge-[1-10] CPUs=16 State=CLOUD Feature=dynamic,m5.4xlarge RealMemory=60000
...
- Note that ideally we should just use the
RealMemory
info we got from/opt/slurm/sbin/slurmd -C
, butRealMemory
might be different for different machines. If configuredRealMemory
is larger than the actual seen by/opt/slurm/sbin/slurmd -C
when a new node launches, the node will be placed intoDRAIN
state by slurm automatically. To be safe, we want to round down the value. - In
/opt/slurm/etc/slurm.conf
changeSelectTypeParameters
fromCR_CPU
toCR_CPU_Memory
- [Optional] pcluster's
clustermgtd
process will replace/terminateDRAINED
nodes automatically, to disable this functionality and avoid nodes getting terminated automatically when setting up memory, addterminate_drain_nodes = False
toclustermgtd
configuration file at/etc/parallelcluster/slurm_plugin/parallelcluster_clustermgtd.conf
. Once setup is finished, we can remove or setterminate_drain_nodes = True
to restore fullyclustermgtd
functionalities. - Restart
slurmd
on compute nodes andslurmctld
on head node, and we should see that memory is configured inscontrol show nodes
Further discussion
We understand that workarounds for this feature maybe difficult to setup manually.
Official support for this feature is not currently planned because there is not a good way to retrieve RealMemory
from nodes and configure this information prior to launching the cluster. In addition, there is currently no way for slurm to configure this information for nodes automatically.
We will continue to evaluate on ways to add support for this feature.
Thank you!