Skip to content

Wrong memory ceiling in cgroup v2 environments #47259

@martindisch

Description

@martindisch

Since #27508 Node.js is able to automatically limit its heap size based on cgroup limits instead of the host's physical memory. This is very important for containerized workloads, where we have many pods with smaller memory limits running on a node with a large amount of memory. Node.js has to consider the cgroup limit and not the amount of physical memory, in order to avoid allocating more than allowed and getting OOM killed.

This seems to have broken with cgroup v2, because libuv's uv_get_constrained_memory does not support cgroup v2, at least in the currently released version 1.44.2. It has already been implemented in libuv/libuv#3744 in September 2022, but there hasn't been a release since.

For us the issue manifested itself in many Node.js pods getting OOM killed after a Kubernetes cluster upgrade to 1.25 where cgroup v2 support is introduced. Because the solution depends on the next libuv release I don't expect a fix here, this report is intended more as a tracking issue for people affected by it.

At the moment I can see several options:

  • Wait and work around it by manually setting --max-old-space-size
  • Backport cgroup v2 support from libuv
  • Switch to a different mechanism for detecting available memory (seems unattractive given we already have the libuv dependency)

Version

v13.0.0 and upward

Platform

5.15.0-1034-azure

Subsystem

src/api/environment.cc

What steps will reproduce the bug?

I reproduced this by comparing the behavior on two AKS clusters, one on 1.24.9 (cgroup v1) and the other on 1.25.5 (cgroup v2). Both had a single node with 4 GiB memory. But you should be able to observe the behavior in any environment using cgroup v2, you can verify that is the case by running stat -fc %T /sys/fs/cgroup/ which should output cgroup2fs.

My repro:

  1. Create a pod with a limit of 600Mi
  2. From within that pod run node -e "console.log(v8.getHeapStatistics())" and check the output

How often does it reproduce? Is there a required condition?

Always when using cgroup v2

What is the expected behavior? Why is that the expected behavior?

heap_size_limit should be less than the cgroup maximum reported by cat /sys/fs/cgroup/memory.max (v2) or cat /sys/fs/cgroup/memory/memory.limit_in_bytes (v1). In my specific repro with the 600 MiB pod limit, heap_size_limit was 312 MiB in the cgroup v1 environment.

What do you see instead?

heap_size_limit is more than the cgroup maximum reported by cat /sys/fs/cgroup/memory.max, it seems influenced by the host's physical memory instead. In my specific repro with the 600 MiB pod limit, heap_size_limit was 1.96 GiB in the cgroup v2 environment, which will eventually lead to the pod being OOM killed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    known limitationIssues that are identified as known limitations.libuvIssues and PRs related to the libuv dependency or the uv binding.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions