Skip to content

Commit b3b603a

Browse files
committed
Docs: Added slurm fairshare page
1 parent 99c1534 commit b3b603a

File tree

2 files changed

+54
-0
lines changed

2 files changed

+54
-0
lines changed

docs/Slurm-Fairshare.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Slurm Fairshare User Manual
2+
3+
## Overview
4+
5+
Fairshare (specifically, Fair-Tree Fairshare) scheduling in Slurm is a mechanism that allows
6+
equitable resource allocation among users and accounts based on their historical resource usage.
7+
This ensures all users receive a fair opportunity to access cluster resources, promoting a balanced
8+
utilization of GPU computational power.
9+
10+
## Key Concepts
11+
12+
### Accounts
13+
14+
- **Accounts:** In Slurm, accounts are used to group users for resource allocation. Each account
15+
can have associated priorities and resource limits. In regards to Beatrix, all NRC users are
16+
equally prioritized for resource allocation.
17+
18+
- **Fairshare Weighting:** Each account is assigned a fairshare factor that influences job
19+
prioritization. The factor is determined based on the account's historical usage and current
20+
resource limits. For more details on the specifics please see
21+
[the slurm fair tree website](https://slurm.schedmd.com/fair_tree.html)
22+
23+
### Partitions
24+
25+
- **Partitions:** These are logical divisions within the cluster, allowing different sets of resources
26+
to be allocated for different types of jobs. Partitions can have different configurations, such as
27+
node types and resource limits.
28+
29+
- Here are the partitions available on the Beatrix cluster
30+
31+
| Partition | Time limit (D-HH:MM:SS) | Nodes | Note |
32+
| ----------- | ----------------------- | ----- | --------------------------------------------------- |
33+
| Trixiemain* | 12:00:00 | 34 | Default partition for all users |
34+
| Trixielong | 2-00:00:00 | 24 | Users must be authorized by the user representative |
35+
| JobTesting | 6:00:00 | 2 | Job testing partition |
36+
| Preemptible | 12:00:00 | 30 | Jobs submitted here may be pre-empted |
37+
| Larus | 7-00:00:00 | 34 | Industry SME partition, not available to NRC users |
38+
39+
## Fair-Tree Fairshare Scheduling
40+
41+
### How It Works
42+
43+
- **Historical Usage:** Slurm tracks the historical usage of resources by each account, adjusting
44+
priorities based on this data. Accounts using fewer resources within the fairshare window will have
45+
a higher priority relative to other users who have used more resources. This calculation is done at
46+
multiple levels, such that if accounts A and B are siblings and A has a higher fairshare factor
47+
than B, all children of A will have higher fairshare factors than all children of B.
48+
49+
- **Priority Calculation:** Job priorities are calculated using a combination of fair-share
50+
factors, job age, and other configurable attributes.
51+
52+
- **Dynamic Adjustment:** The system dynamically adjusts priorities, ensuring fair distribution of
53+
resources over time.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ nav:
1919
- Transfer Files: File-Transfers.md
2020
- TMP FS Backup: Temporary-Filesystem-Backups.md
2121
- Network and Connection: Networking-and-connectivity.md
22+
- Slurm Fairshare: Slurm-Fairshare.md
2223
- FAQ & Example Workflows:
2324
- Quickstart: Running-jobs.md
2425
- Requeue Job: Automatically-Resuming-Requeueing.md

0 commit comments

Comments
 (0)