We have a machine running some stuff on Docker, and little by little it has started to become important to keep an eye on it. However, looking for information on monitoring a Docker server it always seem to assume you’re running it in Swarm mode, which is not and WILL NOT be the case of this machine, Swarm adds a layer of complexity unneeded in this case.

What do you recommend for this case? I for one would love if the thing didn’t just give you a view of the things running on it but also gave you notifications if something went wrong (like if a container had to be restarted, or if one suddenly started eating all the CPU or something unusual).

  • wjs018@lemmy.ml
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    I will be keeping an eye on this thread to see what other people do, but what I have done in the past is to have a couple different health checking strategies.

    • For web-accessible services I am running, I usually run something like Uptime Kuma or Gatus on a different box checking to make sure those web endpoints are available and performant. I lately have been really digging how Gatus can check more than just the response header, but also latency and certificate validity.
    • For the host machine, you can set up custom alerts within netdata for stuff like cpu utilization and memory with custom thresholds. The only other solution I have used for this in the past is setting up alerts through my VPS provider (if it is a VPS that is).
      • On really low-spec machines I have had trouble with netdata though, so I don’t have a good solution in those cases. Interested to see if there are less demanding options. Instead, I have resorted to just using dashdot as a PWA so that I can check it easily on my phone if I am on the go.
    • For some custom services in the past that run on set schedules, I have used healthchecks.io (which you can selfhost) to send alerts in the case that they don’t run for some reason.
    • As for the containers being restarted, I actually don’t have experience with that, so I am interested to see what others have done.
    • Lupec@lemm.ee
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      Gatus sounds pretty cool, I’ll definitely give it a closer look later. Maybe it’s the push I needed to go ahead and look into proper observability as a whole, log ingestion and whatnot. My homelab setup is sorely lacking on that department if I’m being honest lol

  • Toribor@corndog.social
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 year ago

    Uptime Kuma for web monitoring.

    I’m experimenting with both Zabbix and Netdata to see which one I want to keep for monitoring resources on my hosts.

    I use healthchecks.io to monitor backup scripts and cronjobs.

    I’m using Autoheal to restart containers that are in an unhealthy state. For some containers this means I need to write my own health check. I mostly did this to resolve a rare issue where Plex would lock up but it’s helped in other scenarios too.