|
| 1 | +# Slurm Fairshare User Manual |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Fairshare (specifically, Fair-Tree Fairshare) scheduling in Slurm is a mechanism that allows |
| 6 | +equitable resource allocation among users and accounts based on their historical resource usage. |
| 7 | +This ensures all users receive a fair opportunity to access cluster resources, promoting a balanced |
| 8 | +utilization of GPU computational power. |
| 9 | + |
| 10 | +## Key Concepts |
| 11 | + |
| 12 | +### Accounts |
| 13 | + |
| 14 | +- **Accounts:** In Slurm, accounts are used to group users for resource allocation. Each account |
| 15 | +can have associated priorities and resource limits. In regards to Beatrix, all NRC users are |
| 16 | +equally prioritized for resource allocation. |
| 17 | + |
| 18 | +- **Fairshare Weighting:** Each account is assigned a fairshare factor that influences job |
| 19 | +prioritization. The factor is determined based on the account's historical usage and current |
| 20 | +resource limits. For more details on the specifics please see |
| 21 | +[the slurm fair tree website](https://slurm.schedmd.com/fair_tree.html) |
| 22 | + |
| 23 | +### Partitions |
| 24 | + |
| 25 | +- **Partitions:** These are logical divisions within the cluster, allowing different sets of resources |
| 26 | +to be allocated for different types of jobs. Partitions can have different configurations, such as |
| 27 | +node types and resource limits. |
| 28 | + |
| 29 | +- Here are the partitions available on the Beatrix cluster |
| 30 | + |
| 31 | + | Partition | Time limit (D-HH:MM:SS) | Nodes | Note | |
| 32 | + | ----------- | ----------------------- | ----- | --------------------------------------------------- | |
| 33 | + | Trixiemain* | 12:00:00 | 34 | Default partition for all users | |
| 34 | + | Trixielong | 2-00:00:00 | 24 | Users must be authorized by the user representative | |
| 35 | + | JobTesting | 6:00:00 | 2 | Job testing partition | |
| 36 | + | Preemptible | 12:00:00 | 30 | Jobs submitted here may be pre-empted | |
| 37 | + | Larus | 7-00:00:00 | 34 | Industry SME partition, not available to NRC users | |
| 38 | + |
| 39 | +## Fair-Tree Fairshare Scheduling |
| 40 | + |
| 41 | +### How It Works |
| 42 | + |
| 43 | +- **Historical Usage:** Slurm tracks the historical usage of resources by each account, adjusting |
| 44 | +priorities based on this data. Accounts using fewer resources within the fairshare window will have |
| 45 | +a higher priority relative to other users who have used more resources. This calculation is done at |
| 46 | +multiple levels, such that if accounts A and B are siblings and A has a higher fairshare factor |
| 47 | +than B, all children of A will have higher fairshare factors than all children of B. |
| 48 | + |
| 49 | +- **Priority Calculation:** Job priorities are calculated using a combination of fair-share |
| 50 | +factors, job age, and other configurable attributes. |
| 51 | + |
| 52 | +- **Dynamic Adjustment:** The system dynamically adjusts priorities, ensuring fair distribution of |
| 53 | +resources over time. |
0 commit comments