Is there a daemon that will kill any processes using above a specified % of CPU? I’m having issues where a system is sometimes grinding to a halt due to high CPU usage. I’m not sure what process is doing it (can’t htop as system is frozen); ideally I’d like a daemon that automatically kills processes using more than a given % of CPU, and then logs what process it was for me to look back on later. Alternatively something that just logs processes that use a given % of CPU so that I may look back on it after restarting the system.

The system is being used as a server so it’s unattended a lot of the time; it’s not a situation where I did something on the computer and then CPU usage went up.

Edit: Thanks to the comments pointing out it might be a memory leak instead of CPU usage that’s the issue. I’ve set up earlyoom which seems to have diagnosed the problem as a clamd memory leak. I’ve been running clamd on the server for ages without problems so might be the result of an update; I’ve disabled it for now, and will keep monitoring the situation to see if earlyoom catches anything else, or if the problem keeps occurring I’ll try some of the other tools people have suggested.

  • ferret@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    9
    ·
    8 hours ago

    CPU pressure generally isn’t crippling, the scheduler is pretty clever. I would look into other causes

  • DigitalDilemma@lemmy.ml
    link
    fedilink
    English
    arrow-up
    7
    ·
    8 hours ago

    Never heard of something like that, and I suspect anyone who started creating it soon filed it under “Really bad ideas” alongside “Whoops, why did my kernel just stop?”

    sar is the traditional way to watch for high load processes, but do the basics first as that’s not exactly trivial to get going. Things like running htop. Not only will that give you a simple breakdown of memory usage (others have already pointed out swap load which is very likely), but also sorting by cpu usage. htop is more than just a linux taskmgr, it’s a first step triage for stuff like this.

  • ragica@lemmy.ml
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    10 hours ago

    I used to use earlyoom on an old laptop and it worked well for my purposes.

    I hear there is a systemd-oomd, but I never tried it.

    Edit: sorry I misread your post to be about memory rather than CPU. Too early on the morning for my brain to work.

    • communism@lemmy.mlOP
      link
      fedilink
      arrow-up
      2
      ·
      10 hours ago

      Thanks. I’ve had a couple of comments suggesting that it might be a memory leak instead of CPU usage anyway so I’ve installed earlyoom and we’ll see if that can diagnose the problem, if not I’ll look into CPU solutions.

  • nyan@sh.itjust.works
    link
    fedilink
    arrow-up
    4
    arrow-down
    1
    ·
    9 hours ago

    If you dare, you can try temporarily killing the system’s swap (using the swapoff command) and see what happens. With no swap, the standard OOM reaper should trigger within a couple of minutes at most if it’s needed, and it should write an entry to the system log indicating which process it killed.

    Note that the process killed is not necessarily the one causing the problem. I haven’t had the OOM trigger on me in many years (I normally run without swap), but the last time it did, it killed my main browser instance (which was holding a large but not increasing amount of memory at the time) rather than the gcc instance that was causing the memory pressure.

  • Jerkface (any/all)@lemmy.ca
    link
    fedilink
    English
    arrow-up
    4
    ·
    10 hours ago

    high cpu usage isn’t going to make your system unusable. it’s probably consuming all your wired ram and thrashing your swap.

  • just_another_person@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    10 hours ago
    1. Get some sort of resource monitor running on the machine to collect timeseries data about your procs, preferably sent to another machine. Prometheus is simple enough, but SigNoz and Outrace are like DataDog alternatives if you want to go there.
    2. Identify what’s running out of control. Check CPU and Memory (most likely a memory leak)
    3. Check logs to see if something is obviously wrong
    4. Look and see if there is an update for whatever the proc is that addresses this issue
    5. If it’s a systems process, set proper limits

    In general, it’s not an out of control CPU that’s going to halt your machine, it’s memory loss. If you have an out of control process taking too much memory, it should get OOMkilled by the kernel, but if you don’t have proper swap configured, and not enough memory, it may not have time to successfully prevent the machine from running out of memory and halting.

  • custard_swollower@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    10 hours ago

    Open a console with top/htop and check if it will be visible when the system halts.

    From my experience it looks like out of memory situation and some process starts swapping like crazy, or a faulty hdd that tries to read some part of the disk over and over again without success.

    • communism@lemmy.mlOP
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      10 hours ago

      Open a console with top/htop and check if it will be visible when the system halts.

      That would require me to have a second machine up all the time sshed in with htop open, no? Sometimes this happens on the server while I’m asleep and I don’t really want a second machine running 24/7.