My system seems to crash from time to time. I still don’t know what causes it.

If I leave it untouched for a few hours, sometimes, it crashes.

To resume, I have to force a reboot by unplugging the power cable (not even pressing the power button for N seconds seems to work).

Then, it seems to work just fine (after displaying some error messages about lost or orphaned inodes at boot). Until, one day, it happens again. When? I never know. It seems to follow some strange and unpredictable pattern.

Where should I start investigating?

  • Shdwdrgn
    link
    fedilink
    English
    arrow-up
    6
    ·
    7 months ago

    Certainly seems like a hardware issue, but there’s no easy answer to this one. It could be your power supply, motherboard, CPU, memory, even the video card. The power-button issue makes me lean towards power supply or motherboard though (assuming you’ve verified the power button works after a fresh boot).

    If you have other parts on hand (even from another running system) you could swap components until you identify the culprit. If you find it’s your power supply, make sure you replace it with a decent quality one, NOT one of those $25 units you find everywhere, or you’ll have even more problems followed by a rapid failure in another year.

      • Shdwdrgn
        link
        fedilink
        English
        arrow-up
        3
        ·
        7 months ago

        Then it could still be the power supply or motherboard. It takes a lot to override the hardware power switch and the power supply itself is usually one of the biggest culprits to random lockups. Beyond that I can’t offer much, I don’t know of any way to test the components unless you could afford thousands of dollars for specialized equipment. I’ve always had multiple machines available to swap parts so I don’t have a different strategy for troubleshooting.

        • Ninguém@lemmy.ptOP
          link
          fedilink
          arrow-up
          2
          ·
          7 months ago

          Thanks.

          I thought there could be some place in logs the system could write error messages to. Even hardware derived.

          I’ve been going through /var/log/* and found thousand of scary messages, but can’t really make sense of any of those.

          • Shdwdrgn
            link
            fedilink
            English
            arrow-up
            1
            ·
            7 months ago

            Something like a hard drive could generate recognizable errors in the logs, but once you get out into the motherboard or power supply, there’s simply not enough monitoring available to detect problems. You might be able to record the voltage output from the power supply rails, depending on your hardware, but that still might not tell you anything.