Skip to content

OOM - Segmentation fault (not ulimit, not cgroups, not max-space, not exhausted RAM) #54692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
riverego opened this issue Sep 1, 2024 · 13 comments
Labels
memory Issues and PRs related to the memory management or memory footprint. wrong repo Issues that should be opened in another repository.

Comments

@riverego
Copy link

riverego commented Sep 1, 2024

Version

v16.20.2, v20.17.0, v22.7.0

Platform

Linux ip-10-8-1-229 6.1.0-23-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux

But same on ubuntu and debian 11.

Subsystem

No response

What steps will reproduce the bug?

const bufs = []
let i = 0
while (true) {
  ++i
  bufs.push(Array.from({ length: 10*1024 * 1024 }, () => Math.random().toString()))
  // console.log(i)
}

The code just have to reach the OOM point

node --max-old-space-size=32000 --trace-gc index.js
[12808:0x6f27120]   146468 ms: Scavenge 19279.2 (19571.3) -> 19263.9 (19571.3) MB, 50.10 / 0.00 ms  (average mu = 0.831, current mu = 0.831) allocation failure;
[12808:0x6f27120]   146787 ms: Scavenge 19317.6 (19610.3) -> 19302.1 (19610.5) MB, 35.85 / 0.00 ms  (average mu = 0.831, current mu = 0.831) allocation failure;
Segmentation fault

How often does it reproduce? Is there a required condition?

On Outscale VMs

What is the expected behavior? Why is that the expected behavior?

OOM at max-old-size-space

What do you see instead?

OOM when heap reaches ~20G

Additional information

The code works as expected on my own computer and crashes when max-old-space is reached...
But on cloud VMs (of Outscale) it always runs OOM around 20G.

$ cat /proc/<pid>/limits
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             257180               257180               processes
Max open files            1048576              1048576              files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       257180               257180               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

I checked ulimits, cgroups (even if cgroups kills a process with oom reaper, it doesn't throws a segfault), found nothing...

I tried to put 50G fixed value on ulimits to see if unlimited hides a low default value and it's the same.
I looked with /proc/sys/vm/overcommit_memory 0,1,2 values and its the same.
I tried to recompile nodejs on the VM.... Same....
I exhausted ChatGPT ideas....

I thought maybe this is a host limit applied on processes of my VM by the cloud provider, so I tried this :

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc,char* argv[]){
        size_t oneHundredMiB=1024*1048576;
        size_t maxMemMiB=25*oneHundredMiB;
        void *memPointer = NULL;
        do{
                if(memPointer != NULL){
                        printf("Max Tested Memory = %zi\n",maxMemMiB);
                        memset(memPointer,0,maxMemMiB);
                        free(memPointer);
                }
                maxMemMiB+=oneHundredMiB;
                memPointer=malloc(maxMemMiB);
        }while(memPointer != NULL);
        maxMemMiB -= oneHundredMiB;
        printf("Max Usable Memory aprox = %zi\n",maxMemMiB);

        memPointer = malloc(maxMemMiB);
        memset(memPointer,1,maxMemMiB);
        sleep(30);

        return 0;
}

But this can reach the VM RAM limit (64G or 128G) without any problem.
Same for stress command....

So I'm running out of ideas...
I hope someone here has a clue about what is happening....

Thank you.

@avivkeller
Copy link
Member

avivkeller commented Sep 1, 2024

I can't reproduce the segfault in v22.7.0:

// repro.js
const bufs = []
let i = 0
while (true) {
  ++i
  bufs.push(Array.from({ length: 10*1024 * 1024 }, () => Math.random().toString()))
  // console.log(i)
}
$ node --max-old-space-size=32000 --trace-gc repro.js 
[144303:0x6bb7000]       88 ms: Scavenge 85.7 (87.0) -> 85.0 (88.0) MB, pooled: 0 MB, 13.39 / 0.00 ms  (average mu = 1.000, current mu = 1.000) allocation failure; 
[144303:0x6bb7000]      113 ms: Scavenge 87.4 (89.7) -> 86.7 (92.2) MB, pooled: 0 MB, 2.53 / 0.00 ms  (average mu = 1.000, current mu = 1.000) allocation failure; 
[144303:0x6bb7000]      170 ms: Scavenge 92.8 (96.0) -> 91.1 (96.0) MB, pooled: 0 MB, 1.89 / 0.00 ms  (average mu = 1.000, current mu = 1.000) allocation failure; 
[144303:0x6bb7000]      223 ms: Scavenge 97.1 (100.5) -> 95.3 (100.5) MB, pooled: 0 MB, 1.39 / 0.00 ms  (average mu = 1.000, current mu = 1.000) allocation failure; 
[144303:0x6bb7000]      282 ms: Scavenge 101.4 (104.7) -> 99.6 (104.7) MB, pooled: 0 MB, 1.74 / 0.00 ms  (average mu = 1.000, current mu = 1.000) allocation failure; 
[144303:0x6bb7000]      334 ms: Scavenge (interleaved) 105.6 (109.2) -> 103.9 (109.2) MB, pooled: 0 MB, 1.72 / 0.00 ms  (average mu = 1.000, current mu = 1.000) allocation failure; 
[144303:0x6bb7000]      397 ms: Mark-Compact 104.0 (109.2) -> 103.9 (109.0) MB, pooled: 0 MB, 61.02 / 0.00 ms  (+ 0.2 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 71 ms) (average mu = 0.846, current mu = 0.846) finalize incremental marking via stack guard; GC in old space requested
[... similar messages ...]

Additionally, please specify a valid Node.js version to make this easier to reproduce.

@avivkeller avivkeller added the memory Issues and PRs related to the memory management or memory footprint. label Sep 1, 2024
@riverego
Copy link
Author

riverego commented Sep 1, 2024

Thank you.
Yes I know, on my computer I don't have the issue...
It's only on Outscale VMs

@avivkeller
Copy link
Member

It's only on Outscale VMs

Maybe this isn't an issue with Node.js, but rather how the VM's memory is managed? AFAICT the program will only segfault when --max-old-space-size is reached.

@riverego
Copy link
Author

riverego commented Sep 1, 2024

If you want I can provide a VM for you to see..

I know it's due to this VM context, but as I can't reproduce the segfault from a C program so I don't understand how the system context is kicking a js app out.

@avivkeller avivkeller added the wrong repo Issues that should be opened in another repository. label Sep 1, 2024
@avivkeller
Copy link
Member

I know it's due to this VM context

I'm not sure there is much that can be done in this regard. Could a collaborator transfer this to nodejs/help?

@riverego
Copy link
Author

riverego commented Sep 1, 2024

Thank you for your answers, I'll do that.

@benz0li
Copy link

benz0li commented Mar 7, 2025

I can reproduce this with Docker CE on Linux/x86_641 using image glcr.b-data.ch/jupyterlab/python/base:3.12.8-devtools:

docker run --runtime runc --rm -ti glcr.b-data.ch/jupyterlab/python/base:3.12.8-devtools bash
nano index.js
node --max-old-space-size=32000 --trace-gc index.js
[...]
[37:0x43468000]   154290 ms: Scavenge 19276.9 (19569.0) -> 19261.4 (19569.0) MB, 36.75 / 0.00 ms  (average mu = 0.833, current mu = 0.834) allocation failure; 
[37:0x43468000]   154642 ms: Scavenge 19315.4 (19608.3) -> 19300.0 (19608.5) MB, 22.29 / 0.00 ms  (average mu = 0.833, current mu = 0.834) allocation failure; 
Segmentation fault (core dumped)

I understand, why

happens with Deno2 in limited environments. But I have no idea why this happens with Node.js on a fairly unlimited one.

I wonder what (hidden? Docker?) limit causes this.


node --version
v20.18.1

uname -a
Linux 1b3dd9ed848f 6.1.0-31-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) x86_64 GNU/Linux

prlimit
RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
AS         address space limit                unlimited unlimited bytes
CORE       max core file size                 unlimited unlimited bytes
CPU        CPU time                           unlimited unlimited seconds
DATA       max data size                      unlimited unlimited bytes
FSIZE      max file size                      unlimited unlimited bytes
LOCKS      max number of file locks held      unlimited unlimited locks
MEMLOCK    max locked-in-memory address space   8388608   8388608 bytes
MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
NICE       max nice prio allowed to raise             0         0 
NOFILE     max number of open files             1048576   1048576 files
NPROC      max number of processes            unlimited unlimited processes
RSS        max resident set size              unlimited unlimited bytes
RTPRIO     max real-time priority                     0         0 
RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
SIGPENDING max number of pending signals        1029509   1029509 signals
STACK      max stack size                       8388608 unlimited bytes

Footnotes

  1. Can also be reproduced with Docker CE on Linux/AArch64. Can not be reproduced with Docker Desktop for Mac on Apple Silicon (macOS/arm64 aka AArch64).

  2. Both Deno and Node.js use the V8 JavaScript engine

@benz0li
Copy link

benz0li commented Mar 16, 2025

I am most certainly running into an OOM at ~20 GB when building code-server for Linux/RISC-V (64-bit) using the unofficial Linux/RISC-V Node.js binaries and Docker emulation with QEMU (tonistiigi/binfmt:qemu-v8.1.51): https://gitlab.b-data.ch/coder/code-server/-/jobs/157888

Cross reference:

Footnotes

  1. A build using tonistiigi/binfmt:qemu-v9.2.2 is currently ongoing – remaining just under 20 GB memory usage.

@benz0li
Copy link

benz0li commented May 9, 2025

As I can not reproduce with Docker Desktop for Mac on Apple Silicon, there must be some memory limitation in Docker CE or Debian.

I will open a discussion at https://github.com/moby/moby/discussions and point to my reproduction using Docker CE on Linux/x86_64.

@polarathene
Copy link

EDIT: Below advice is probably not the cause for you, but might help you identify where the difference might be coming from.


Max open files            1048576              1048576              files

You are possibly affected by this. The Docker Desktop for Mac will not show such a high number for ulimit -Sn if I am right?

It might not be this limit specifically, but it has been known to cause various services running in containers to regress in performance or allocate large amounts of memory due to excessive file descriptors (due to LimitNOFILE=infinity). Normally this would be a problem on other Docker hosts where the limit is over a billion, when it's over a million like on Debian it's still a regression but it shouldn't be significant.

You can force the container itself to run with lower limits to see if that resolves the issue?

For compose.yaml, use the ulimits setting:

# Add this to your DMS service settings, it will reset the soft limit to 1024
ulimits:
  nofile:
    soft: 1024

For docker run, use --ulimit option:

# Soft limit:
$ docker run --ulimit nofile=1024:524288 --rm -it alpine ash -c 'ulimit -Sn'
1024

# Hard limit:
$ docker run --ulimit nofile=1024:524288 --rm -it alpine ash -c 'ulimit -Hn'
524288

For context, the soft limit is how many file descriptors a process may have. Each process has it's own individual count, it is not a cumulative limit across processes.

That limit and others can be configured in the main Docker daemon config + systemd drop-in overrides for docker.service + containerd.service as detailed here.


If it's not that, then look at what systemd config is for both Docker Engine and containerd:

For Docker Engine v25, LimitNOFILE=infinity was removed (as can be seen from the link to docker.service above). While for containerd with containerd.service, LimitNOFILE=infinity was also removed but did not land until the 2.0 release. Docker Engine (presently v28) still uses containerd 1.x, thus if your FD limits are still high with Docker Engine 25+ it's probably due to that.

In both cases, since neither project wanted to set a default LimitNOFILE=1024:524288 like I suggested, if your host has systemd release prior to v240 (which added the new hard limit), it will instead have defaults from the kernel 1024:4096 which can be too low for more demanding software.

For changes to docker.service, if you can identify problems with the current settings you can refer to these issues to share your findings and how to resolve it:

@benz0li
Copy link

benz0li commented May 16, 2025

You are possibly affected by this. The Docker Desktop for Mac will not show such a high number for ulimit -Sn if I am right?

@polarathene It [Docker Desktop for Mac on Apple Silicon] shows the same numbers for NOFILE:

docker run --rm -ti debian prlimit
RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
AS         address space limit                unlimited unlimited bytes
CORE       max core file size                         0 unlimited bytes
CPU        CPU time                           unlimited unlimited seconds
DATA       max data size                      unlimited unlimited bytes
FSIZE      max file size                      unlimited unlimited bytes
LOCKS      max number of file locks held      unlimited unlimited locks
MEMLOCK    max locked-in-memory address space unlimited unlimited bytes
MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
NICE       max nice prio allowed to raise             0         0 
NOFILE     max number of open files             1048576   1048576 files
NPROC      max number of processes            unlimited unlimited processes
RSS        max resident set size              unlimited unlimited bytes
RTPRIO     max real-time priority                     0         0 
RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
SIGPENDING max number of pending signals         192348    192348 signals
STACK      max stack size                       8388608 unlimited bytes

You can force the container itself to run with lower limits to see if that resolves the issue?

I will give it a try.

@benz0li
Copy link

benz0li commented May 17, 2025

You can force the container itself to run with lower limits to see if that resolves the issue?

I will give it a try.

@polarathene When I use --ulimit nofile=65536:65536, the code-server build reaches max. 16.5 GB and succeeds.

However, this does not explain why a segmentation fault occurs when heap reaches ~20 GB.

Cross references:

@polarathene
Copy link

Follow-up response from: moby/moby#49945 (reply in thread)


Cross references:

You might want to instead experiment with this comment from your first referenced issue.

It's been a long time since I had that issue, but years ago the defaults on Linux were often too low for that tunable that it was very easy as a developer to trigger the errors being cited there.

I'm specifically referring to sysctl fs.inotify.max_user_watches as I believe that was the culprit. Adjusting your FD limits shouldn't be necessary, other than reducing the soft limit to what you'd have on a regular host outside of a container, 1024.

That tunable is for the kernel so it should be what your host system has set, unless your container runtime has modified it (which does happen, a common modification is for setting sysctl net.ipv4.ip_unprivileged_port_start=0 instead of requiring CAP_NET_BIND_SERVICE capability for a non-root user to bind ports below 1024).

The other potential cause is from Debian. IIRC the systemd v240 change that implemented the 1024:524288 change to FD limits also adjusted a related setting in the kernel for the max files you could have open? (it'll be covered by me in one of my prior links, likely the moby PR where I detail historical context).

Debian unlike other distros chose to patch that change from systemd to keep old behaviour, which I believe was due to their own patch for PAM. Perhaps they've still got the PAM and systemd related patches being carried, so it's very possible that contributes to your experience if you're unable to replicate in other distros like Fedora / ArchLinux / openSUSE.

However, this does not explain why a segmentation fault occurs when heap reaches ~20 GB.

If lowering the FD limit prevented that from occurring, it's likely that the higher limit either hit another bottleneck that exhausted a resource like described above, or as per my earlier comment introduced a regression to memory allocation required?

You'd have to investigate that further if you want to track it down, the easiest being to switch out components like the Docker host distro given that you're using Debian.


As for the 2nd reference, there is very little context on their choice of limit there. It is very likely they have done that similar to why Docker did ("works for me" problem solving), or containerd (copied what Docker did), if you go through my history tracking from the Docker PR, you'll see there is very little information on understanding a correct value, and no real discussion about soft vs hard limit going on IIRC, the focus was on resolving an issue and moving forward quickly due to limited bandwidth/budget as is common with projects 😅 (and it was not as problematic until the systemd v240 change).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory Issues and PRs related to the memory management or memory footprint. wrong repo Issues that should be opened in another repository.
Projects
None yet
Development

No branches or pull requests

4 participants