Version Information
Server Version:
- Hasura Enterprise v2.48.12
CLI Version (for CLI related issue):
Environment
- Hasura Enterprise Edition
- Kubernetes deployment
- Pod resources:
- requests:
- memory:
4608Mi
- cpu:
1000m
- limits:
- memory:
8048Mi
- cpu:
8000m
Database source configuration:
- name: default
kind: postgres
configuration:
connection_info:
database_url:
from_env: HASURA_GRAPHQL_DATABASE_URL
isolation_level: read-committed
pool_settings:
connection_lifetime: 120
idle_timeout: 180
max_connections: 5
retries: 1
use_prepared_statements: false
read_replicas:
- database_url:
from_env: HASURA_GRAPHQL_READ_REPLICA_URL
isolation_level: read-committed
pool_settings:
connection_lifetime: 120
max_connections: 20
use_prepared_statements: false
tables: "!include default/tables/tables.yaml"
functions: "!include default/functions/functions.yaml"
What is the current behaviour?
We are seeing sustained memory growth over time in Hasura Enterprise v2.48.12.
Observed behaviour:
- pod memory rises steadily over time
- latency worsens as memory rises
- restarting the pod (OOM) drops memory back down
- the pattern then repeats
We initially wanted to determine whether this was just container working-set / page-cache growth, but inspection inside the pod suggests the memory is genuinely in the graphql-engine process itself.
Evidence collected from inside the pod:
cgroup memory:
$ cat /sys/fs/cgroup/memory.current
5790588928
$ cat /sys/fs/cgroup/memory.peak
7086899200
cgroup breakdown:
$ cat /sys/fs/cgroup/memory.stat
anon 5637496832
file 77258752
kernel 29048832
shmem 0
inactive_anon 5637431296
active_anon 28672
inactive_file 43024384
active_file 34234368
...
process memory:
$ cat /proc/101/smaps_rollup
Rss: 5520072 kB
Pss: 5518638 kB
Pss_Dirty: 5507744 kB
Pss_Anon: 5507744 kB
Pss_File: 10894 kB
Private_Dirty: 5507744 kB
Anonymous: 5507744 kB
Swap: 0 kB
Interpretation of the above:
- container memory is about 5.79 GB at sample time
- Hasura process RSS is about 5.52 GB
- almost all of that is anonymous private dirty memory
- file cache is small
- shmem is zero
- this does not look like page cache, tmpfs, kernel memory, or another helper process
So this appears to be real in-process memory growth in graphql-engine, not just a misleading container metric.
What is the expected behaviour?
We would expect Hasura memory usage to remain broadly stable under steady workload, or at least not show repeated sustained growth that:
- increases latency over time
- requires pod restarts to recover memory
- appears as mostly anonymous private dirty process memory
How to reproduce the issue?
- Run Hasura Enterprise v2.48.12 in Kubernetes with the resource settings and database pool configuration above.
- Allow normal production-like workload to run over time.
- Observe that pod/process memory rises steadily, latency worsens as memory rises, and restarting the pod resets memory usage.
Screenshots or Screencast
Please provide any traces or logs that could help here.
Memory diagnostics gathered from inside the pod:
$ cat /sys/fs/cgroup/memory.current
5790588928
$ cat /sys/fs/cgroup/memory.peak
7086899200
$ cat /sys/fs/cgroup/memory.stat
anon 5637496832
file 77258752
kernel 29048832
kernel_stack 1130496
pagetables 17580032
sec_pagetables 0
percpu 2448
sock 0
vmalloc 65536
shmem 0
file_mapped 14290944
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 5637431296
active_anon 28672
inactive_file 43024384
active_file 34234368
unevictable 0
slab_reclaimable 4948768
slab_unreclaimable 5222592
slab 10171360
workingset_refault_anon 0
workingset_refault_file 10913
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgscan 0
pgsteal 0
pgscan_kswapd 0
pgscan_direct 0
pgscan_khugepaged 0
pgsteal_kswapd 0
pgsteal_direct 0
pgsteal_khugepaged 0
pgfault 59022249
pgmajfault 191
pgrefill 0
pgactivate 9843
pgdeactivate 0
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 26
thp_collapse_alloc 0
$ cat /proc/101/smaps_rollup
00200000-7ffdcf679000 ---p 00000000 00:00 0 [rollup]
Rss: 5520072 kB
Pss: 5518638 kB
Pss_Dirty: 5507744 kB
Pss_Anon: 5507744 kB
Pss_File: 10894 kB
Pss_Shmem: 0 kB
Shared_Clean: 2064 kB
Shared_Dirty: 0 kB
Private_Clean: 10264 kB
Private_Dirty: 5507744 kB
Referenced: 5520072 kB
Anonymous: 5507744 kB
KSM: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
Any possible solutions/workarounds you're aware of?
Current workaround:
- restarting the pod drops memory back down temporarily
Keywords
hasura enterprise 2.48.12 memory growth
hasura graphql-engine memory leak
hasura high anonymous memory
hasura private dirty memory
hasura kubernetes memory rises over time
hasura latency increases with memory
hasura smaps_rollup anonymous memory
hasura cgroup memory anon
Version Information
Server Version:
CLI Version (for CLI related issue):
Environment
4608Mi1000m8048Mi8000mDatabase source configuration:
What is the current behaviour?
We are seeing sustained memory growth over time in Hasura Enterprise v2.48.12.
Observed behaviour:
We initially wanted to determine whether this was just container working-set / page-cache growth, but inspection inside the pod suggests the memory is genuinely in the
graphql-engineprocess itself.Evidence collected from inside the pod:
cgroup memory:
cgroup breakdown:
process memory:
Interpretation of the above:
So this appears to be real in-process memory growth in
graphql-engine, not just a misleading container metric.What is the expected behaviour?
We would expect Hasura memory usage to remain broadly stable under steady workload, or at least not show repeated sustained growth that:
How to reproduce the issue?
Screenshots or Screencast
Please provide any traces or logs that could help here.
Memory diagnostics gathered from inside the pod:
Any possible solutions/workarounds you're aware of?
Current workaround:
Keywords
hasura enterprise 2.48.12 memory growth
hasura graphql-engine memory leak
hasura high anonymous memory
hasura private dirty memory
hasura kubernetes memory rises over time
hasura latency increases with memory
hasura smaps_rollup anonymous memory
hasura cgroup memory anon