After being home for weeks, I went away for business, the 1st night away there was a brief powercut and the firewall (on a UPS) seemed to get stuck.

So, that’s no DNS, DHCP, or connectivity between wifi and LAN… All due to (admittedly aging) hardware issue.

Since then my entire home system has had issues whilst it all settles down.

It made me think about getting some redundancy into the system to handle a single failure.

So,.can you give me any insights into High Availability like CARP (for pfSense), VM failover (on Incus?), mesh wifi, Home Assistant, etc?

Of course there are going to be single points, like ISP line, etc, but seems like something to test out.

  • just_another_person@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    1
    ·
    14 hours ago

    There’s a lot of layers here, so let me work backwards from the edge, inward:

    1. You lost power, so you probably lost internet if your endpoint hardware was not also on a UPS. Nothing is going to stop that unless you get a multi-WAN router, and an LTE backup on standby. Probably not worth the cost.

    2. You shouldn’t have lost DNS or DHCP for your local network just because of a reboot. Something is wrong with your setup, and we’d need more info about said setup to say more, but generally these services are stateful for the most part, and shouldn’t lose state on reboot IF you have them configured properly for your local domains, like a DNS forwarded, and static reservations on DHCP for local devices.

    3. You don’t need HA for all your services. You need to fix the issues with your services not running properly with interruptions. The specific services you mentioned don’t behave poorly of they die and come back in properly configured environments.

    4. If you have a UPS in your home, all devices connected to UPS should be getting information about the status of said UPS and shutdown cleanly when thresholds are met. Install NUT somewhere, and upsmon on all your hosts to properly issue shutdown signals when you lose power, and the UPS starts discharging. The thresholds you set for this are up to you.

    In general, you don’t need to overthink HA, you need to focus instead on your services recovering gracefully in these situations. Spending insane amounts of time and money to make highly available services for your media and home automation will only leave you having spent resources and realizing there is no way to ever get to 100% uptime without flaws somewhere.

    • SayCyberOnceMore@feddit.ukOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      13 hours ago

      Good points there.

      For 1. The ISP router is a Fritz one set to bridge mode running over a PoE adapter from the same UPS the firewall is using. It stayed up all the time (looking back at the logs)

      1. Not sure what happened here, but the firewall is the DNS resolver and when everything else powered back up, nothing got an IP address. Now, whether thw service failed or the WAPs took longer to start than the devices could wait, I’m not sure, but as Scotty said: it’s dead Jim.

      2. Good point. I don’t need it ALL to be redundant.

      3. Also good. The UPS is directly connected to the firewall (which has NUT in), but it doesn’t inform anything else… I’ll look into that too.

      Nice mental reset for me about over thinking it… thanks

      • just_another_person@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        11 hours ago
        1. Okay, so no issues there
        2. DHCP handles the address assignments in your network, not DNS. DNS resolves to named host queries. If no devices got IP addresses, that’s one problem. If you couldn’t resolve public hosts like www.news.com, that’s a DNS problem. If you couldn’t resolve INTERNAL named hosts you refer to around your network, then that’s also DNS, but a different problem.

        My hunch here is that you MIGHT be using a named host as your DNS resolves instead of an IP address in your network, OR, for some reason your DNS resolves doesn’t have a static address. Never use named hosts to point to network services, and all network services need a static IP, so go and check all of that.

    • Prove_your_argument@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      edit-2
      13 hours ago

      I have a multi wan SMB router. 945mbit throughput. $60 new.

      TPLink omada or Ubiquiti tier stuff is all you really need for small business. The redundant ISP connections cost way more, but it’s still a tiny cost per month for something that can get the job done in a pinch like a hotspot.

      Battery backups are only useful if you have a generator to take over the utility load imo. Not a common thing in small business unless you’re leasing somewhere with generators provided for the whole building.

      Redundant servers are not that hard to have. Just need proxmox. It’s not as intuitive as old vmware but it’s more than enough for a SMB. Some kind of storage shelf and three little servers gets you a ton of redundancy. If a tiny budget is necessary and small downtime is fine you really only need a couple of hosts that are beefy enough to run everything you need on each.

      • just_another_person@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        13 hours ago

        Well…no, and this is what I’m saying.

        Every downstream issue you try to solve with redundancy has a doubled and duplicate cost to it’s upstream. Internet links, load balancers for web services, and in this specific situation, UPS’s.

        Throwing more servers at a homelab with no power is just wasting money without more UPS power in the mix. You have 4 servers, and want HA for everything on your network, expect to have two of everything, including UPS units.

        This is the n* sunken cost of redundancy at its core, and in your example, you’re assuming this person even had a generator or whatever, but even if they did, they’d need an even BIGGER generator to run all this stuff.

        That’s why my points deal with solving for what they have and making it work better than, instead, immediately jumping to adding more and more and more to the stack. It’s just not necessary when all they want is a graceful recovery to power loss.