Skip to content

[torchx/schedulers] Add more runopts for SLURM #389

@kiukchung

Description

@kiukchung

Description

Currently we have a handful of SLURM options exposed as runopt. Asking for more.

Motivation/Background

FAIR users typically set these configs: https://github.com/facebookresearch/pycls/blob/8c79a8e2adfffa7cae3a88aace28ef45e52aa7e5/pycls/core/distributed.py#L120-L130

Some of them can be set via the AppDef (especially those that have to do with resources: mem, gpu, cpu, etc). While others like "email" need to just be straight up offered as runopt or need to figure out a more dynamic way to pass them (see detailed proposal)

Detailed Proposal

Either:

  1. keep adding user requested sbatch options on a "need-to" basis
  2. support a dynamic kv pair ( "--cfg sbatch_options=k:v,k:v,k:v")
  3. support slurm specific options via appdef.metadata (we do this for our internal schedulers - to allow users to set thrift fields - as json - directly from the metadata).

Alternatives

(discussed in the proposal above)

Additional context/links

N/A

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions