Wrong memory ceiling in cgroup v2 environments

Since https://github.com/nodejs/node/pull/27508 Node.js is able to automatically limit its heap size based on cgroup limits instead of the host's physical memory. This is very important for containerized workloads, where we have many pods with smaller memory limits running on a node with a large amount of memory. Node.js has to consider the cgroup limit and not the amount of physical memory, in order to avoid allocating more than allowed and getting OOM killed.

This seems to have broken with cgroup v2, because `libuv`'s `uv_get_constrained_memory` does not support cgroup v2, at least in the currently released version 1.44.2. It has already been implemented in https://github.com/libuv/libuv/pull/3744 in September 2022, but there hasn't been a release since.

For us the issue manifested itself in many Node.js pods getting OOM killed after a Kubernetes cluster upgrade to 1.25 where cgroup v2 support [is introduced](https://kubernetes.io/blog/2022/08/23/kubernetes-v1-25-release/#support-for-cgroups-v2-graduates-to-stable). Because the solution depends on the next `libuv` release I don't expect a fix here, this report is intended more as a tracking issue for people affected by it.

At the moment I can see several options:

- Wait and work around it by manually setting `--max-old-space-size`
- Backport cgroup v2 support from `libuv`
- Switch to a different mechanism for detecting available memory (seems unattractive given we already have the `libuv` dependency)

### Version

v13.0.0 and upward

### Platform

5.15.0-1034-azure

### Subsystem

`src/api/environment.cc`

### What steps will reproduce the bug?

I reproduced this by comparing the behavior on two AKS clusters, one on 1.24.9 (cgroup v1) and the other on 1.25.5 (cgroup v2). Both had a single node with 4 GiB memory. But you should be able to observe the behavior in any environment using cgroup v2, you can verify that is the case by running `stat -fc %T /sys/fs/cgroup/` which should output `cgroup2fs`.

My repro:

1. Create a pod with a limit of `600Mi`
2. From within that pod run `node -e "console.log(v8.getHeapStatistics())"` and check the output

### How often does it reproduce? Is there a required condition?

Always when using cgroup v2

### What is the expected behavior? Why is that the expected behavior?

`heap_size_limit` should be less than the cgroup maximum reported by `cat /sys/fs/cgroup/memory.max` (v2) or `cat /sys/fs/cgroup/memory/memory.limit_in_bytes` (v1). In my specific repro with the 600 MiB pod limit, `heap_size_limit` was 312 MiB in the cgroup v1 environment.

### What do you see instead?

`heap_size_limit` is more than the cgroup maximum reported by `cat /sys/fs/cgroup/memory.max`, it seems influenced by the host's physical memory instead. In my specific repro with the 600 MiB pod limit, `heap_size_limit` was 1.96 GiB in the cgroup v2 environment, which will eventually lead to the pod being OOM killed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Wrong memory ceiling in cgroup v2 environments #47259

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior? Why is that the expected behavior?

What do you see instead?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Wrong memory ceiling in cgroup v2 environments #47259

Description

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior? Why is that the expected behavior?

What do you see instead?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions