Making setups resilient to outages

jobbies@lemmy.zip · 5 hours ago

Making setups resilient to outages

hperrin@lemmy.ca · 3 hours ago

Reduce the number of single failure points. How you choose to do that is up to you and what you can afford.

solrize@lemmy.ml · edit-2 4 hours ago

Maybe you could describe what you mean by self-hosted and resilient. If you mean stuff running on a box in your house connected through a home ISP, then the home internet connection is an obvious point of failure that makes your box’s internet connection way less reliable than AWS despite the occasional AWS problems. On the other hand, if you are only trying to use the box from inside your house over a LAN, then it’s ok if the internet goes out.

You do need backup power. You can possibly have backup internet through a mobile phone or the like.

Next thing after that is redundant servers with failover and all that. I think once you’re there and not doing an academic-style exercise, you want to host your stuff in actual data centers, preferably geo separated ones with anycast. And for that you start needing enough infrastructure like routeable IP blocks that you’re not really self hosting any more.

A less hardcore approach would be use something like haproxy, maybe multiple of them on round robin DNS, to shuffle traffic between servers in case of outages of individual ones. This again gets out of self hosting territory though, I would say.

Finally, at the end of the day, you need humans (that probably means yourself) available 24/7 to handle when something inevitably breaks. There have been various products like Heroku that try to encapsulate service applications so they can reliably restart automatically, but stuff still goes wrong.

Every small but growing web site has to face these issues and it’s not that easy for one person. I think the type of people who consider running self-hosted services that way, has already done it at work and gotten woken up by PagerDuty in the middle of the night so they know what it’s about, and are gluttons for punishment.

I don’t attempt anything like this with my own stuff. If it goes down, I sometimes get around to fixing it whenever, but not always. I do try to keep the software stable though. Avoid the latest shiny.

frongt@lemmy.zip · 3 hours ago

Either figure out a way to multi-cloud host, or just don’t use them, because they’re a single point of failure.

atzanteol@sh.itjust.works · 3 hours ago

How much money are you willing to spend? Resiliency is expensive.

slazer2au@lemmy.world · 5 hours ago

Consider if you need three/five nines uptime. I know I don’t so I don’t worry if jellyfin or tickdone are down for a few hours.

irmadlad@lemmy.world · 4 hours ago

Implement fall backs. If your selfhosted services are public and mission critical, you should have something in your trick bag to fall back to.