CPU load spikes

Salamander · edit-2 2 years ago

CPU load spikes

sunaurus@lemm.ee · 2 years ago

ERROR: could not resize shared memory segment “/PostgreSQL.3267719818” to 8388608 bytes: No space left on device

Is your disk full?

Salamander · 2 years ago

No, the disk is about 30% full. I think that this message is about the RAM being full. I am thinking that holding the table in memory is taking up all of the ram - but I don’t see why this started happening in the last day so suddenly.

An explanation may be that if the RAM is not depleted during the operation, whatever causes the CPU spikes is not triggered. As the database grew to the point that it takes over 2 GB to hold this table in RAM, it is this filling-up of the RAM that somehow causes the CPU spikes. In this case, I can resolve the problem by increasing the available RAM… However, I would like to make sure that this is really the problem before increasing the RAM.

Salamander · 2 years ago

I suppose an easy way to test is to increase the RAM to 3GB and see if the spikes stop… I’m going to that now.

Salamander · edit-2 2 years ago

The spikes disappeared after I increased the RAM from 2 GB to 3 GB, and they have not re-appeared over the past few hours.

It appears like some some process was hitting the 2GB RAM limit - even though under normal use only about 800GB of RAM are allocated. At first I thought that the high amount of read IOPS might be due to the swap memory kicking into action, but the server has no allocated swap.

The postgresql container appears to fail when all of the RAM is used up, and it may be that the high CPU usage is somehow related to repopulating the dabase as it is restarted… But I would think that if this were the case I would see similar spikes whenever I reboot - and I don’t.

Conclusion: I am not sure why this happens.

But if anyone else notices these spikes it may be a RAM issue. Try increasing the RAM and see if they go away.

TauZero · 2 years ago

I’m on Linux with no swap, and can experience CPU spikes when running out of RAM. The 100% CPU usage is illusory - the CPU isn’t actually doing any calculations. When I tried using a profiler at such time, 100% of the CPU usage was something like “waiting on input/output”, which htop counts as usage.

Why is it doing input/output? Linux has a “feature” where under memory starvation it evicts pages of executable code (like shared libraries) from memory, because it knows it can load them from disk when needed. But what turns out happening instead is that the kernel will run one line of code from one thread, evict everything, load the code and shared libraries for the other thread from disk (takes loooong time!), run one line of code, evict everything, switch/repeat… This leads to disk thrashing (when we still had disks) and makes system unusable.

Is there any way, like via config or command line options, to set a hard limit on PostgreSQL memory usage, such that it would guarantee not to consume more than 1.5GB, say? Barring that (or adding more RAM indefinitely), look into the “OOM-killer” Linux feature. There is some way to configure the “ferocity” level of the watchdog inside the kernel so that it kills the process with the largest memory consumption sooner, instead of trying to thrash around by evicting even more shared memory. That will kill the Postgress process and force it to restart, but you say it works fine normally at around 0.8GB? Then the spike of runaway memory consumption is either a bug/memory leak, or a rare special event like rearranging/compressing the database somehow.