-
Notifications
You must be signed in to change notification settings - Fork 147
Description
Description of the bug
I am seeing repeated failures in:
NFCORE_MAG:MAG:BINNING:SEMIBIN_SINGLEEASYBIN
with exit status 137. The task stderr shows only "Killed", and dmesg confirms it is being OOM-killed inside the Docker memory cgroup.
Examples from dmesg:
Memory cgroup out of memory: Killed process 1739457 (SemiBin2) total-vm:472319276kB, anon-rss:36381296kB, file-rss:113872kB, shmem-rss:0kB, UID:1007 pgtables:145912kB oom_score_adj:0
Memory cgroup out of memory: Killed process 3556228 (SemiBin2) total-vm:477333324kB, anon-rss:104084912kB, file-rss:112924kB, shmem-rss:0kB, UID:1007 pgtables:304708kB oom_score_adj:0
Memory cgroup out of memory: Killed process 2722988 (SemiBin2) total-vm:473160272kB, anon-rss:364931412kB, file-rss:113376kB, shmem-rss:0kB, UID:1007 pgtables:812484kB oom_score_adj:0
I set:
withName: 'NFCORE_MAG:MAG:BINNING:SEMIBIN_SINGLEEASYBIN' {
cpus = 8
memory = 360.GB
time = 96.h
}
but inspecting .command.run shows the container was actually launched with:
--memory 147456m
which is ~144 GB, not 360 GB. So it looks like the process-specific memory request is being capped somewhere before Docker launch.
Command used and terminal output
Relevant files
No response
System information
No response