Load bearing Tupperware

Track_Shovel@slrpnk.net · 5 months ago

Load bearing Tupperware

CallMeAnAI@lemmy.world · 5 months ago

It’s like all you people forgot how much shit broke before AWS. They have one major outage every few years and people lose their shit pretending they aren’t hitting the SLA or coming close.

naught101@lemmy.world · 5 months ago

Because hosting was more diverse before, so when shit happened it took out a couple of sites, not a quarter of the internet

CallMeAnAI@lemmy.world · 5 months ago

And? God forbid we touch grass for 6 hours a year.

AmbitiousProcess (they/them)@piefed.social · 5 months ago

The outage also took down people’s banks, which stopped many of them from doing things like buying groceries 💀

I don’t think saying it’s good for us “touching grass” is a good argument here when AWS hosts such a substantial portion of all online services.

CallMeAnAI@lemmy.world · edit-2 5 months ago

How many banks didn’t work? Which ones? You have a source? Visa and MC were good all day here in the real world in the east coast.

Sounds like you’re just trying to exaggerate around an edge case that frankly isn’t the end of the world even if it were common for 4 hours a year

Why aren’t you blaming the bank for having redundancy outside a single DC? How many banks do you know if that were out susessfully using other providers that have a higher SLO/SLA?

jj4211@lemmy.world · 5 months ago

I’m also skeptical that any payment processing networks were impacted. I would be surprised, but less so if they couldn’t manage their account online which might have similar effect. I’m not surprised at all of the grocery store or restaurants were significantly impacted. I know a lot of the apps were broken and I could imagine someone used to apping everything leaving their cards at home and unable to get lunch. Might have some aggressively “modern” establishments that are kiosk only and I could imagine them getting downed by aws outage.

outside a single DC?

I’m told that a lot of the companies did all the right things but still got taken down because some dependent Amazon services are tethered to that single DC and only Amazon has the power to change that.

CallMeAnAI@lemmy.world · 5 months ago

I’ll wait for the final root cause but…

We mitigated most of it by swapping to secondary DNS and completely taking any thing related to AWS DNS and services in useast1. If you didn’t have secondary DNS and heavily reliant on AWS internal DNS this might be something they experienced.

jj4211@lemmy.world · 5 months ago

I’m not familiar with AWS myself, but they seemed to be referencing something they vaguely characterized as ‘security infrastructure’, kind of as a handwaving for why they thought it made sense to be single point of failure because to enable distribution of it would somehow be insecure…

I frankly wasn’t interested in delving deeper, because that excuse sounds pretty stupid, but I’d be trying to get details I don’t personally need about something I probably shouldn’t be arguing about. I’ve gotten burned too much by someone championing something stupid ostensibly in the name of ‘security’ to try to sign up for another one of those arguments.

AmbitiousProcess (they/them)@piefed.social · 5 months ago

I can see why your account is marked with two red marks on PieFed for low reputation, because man do you come off confrontational.

How many banks didn’t work? Which ones? You have a source?

Search engines exist. Use them before acting as if I"m making shit up.

The list of financial institutions that had issues, as far as I can tell from industry reporting and downdetector graphs, is Navy Federal Credit Union (~15 million members), Truist (~15 million customers), Chime (~8-9 million customers), Venmo (~60 million users), Ally Bank (~10 million customers), and Lloyds Banking group (~30 million customers).

Assuming no overlap, that’s nearly 140 million people that lost banking and money transfer access.

Sounds like you’re just trying to exaggerate around an edge case that frankly isn’t the end of the world even if it were common for 4 hours a year

The outage lasted for 15 hours in some cases, due to many AWS services recovering after the outage, yet having a backlog to work through, which took many more hours. Many services also depend on AWS in a manner where AWS coming back online doesn’t instantaneously restart service. These systems are complex, and not every company that relied on them could instantly start back up the moment the main outage was resolved, let alone when many services were still marked as impacted for hours and hours later as they worked through their backlog.

Why aren’t you blaming the bank for having redundancy outside a single DC? How many banks do you know if that were out susessfully using other providers that have a higher SLO/SLA?

I also blame them for not having additional redundancy. I blame both them for not having a fallback, and AWS for allowing such a major outage to happen. Shockingly, more than one party can be at fault.

lengau@midwest.social · 5 months ago

I touch grass every day. I want to do it on my own terms, not Amazon’s.

MajinBlayze@lemmy.world · 5 months ago

I keep some potted grass by my desk so I can touch it whenever I want

tomiant@piefed.social · 5 months ago

“Outages are a GOOD THING!” / FOX talking head

CallMeAnAI@lemmy.world · 5 months ago

Sure that’s what I said.

Go ahead to rack space, or SAP, I’m sure you’ll have a much more reliable experience. Or just run your own. I’m sure it’ll be easy peasy and super reliable.

Cousin Mose@lemmy.hogru.ch · 5 months ago

Yeah, when I can’t access my bank account the first thing I do is “touch grass.” 🥴

CallMeAnAI@lemmy.world · 5 months ago

Yup, I’m sure your bank would never go down on another provider. Never.

Cousin Mose@lemmy.hogru.ch · edit-2 5 months ago

Well lucky for me they don’t use AWS so I haven’t seen an outage in 10+ years.

CallMeAnAI@lemmy.world · 5 months ago

balance8873@lemmy.myserv.one · 5 months ago

Some of us have jobs. I mean I guess you have a job, but in your case losing network just means those pesky humans stop bothering you and go to a real therapist.

CallMeAnAI@lemmy.world · 5 months ago

I’m a staff engineer who has been dealing with the results of SLAs before Amazon was an idea.

God forbid I have a p0 where I have to message a bunch of non technical directors it’s AWS not us. Much much worse than having to figure out and then pull in the team that pushed whatever untested shit made it’s way into production on a Friday afternoon.

Unless you’ve been responsible for a SaaS with SLAs in a b2b setting; I know more about the consequences of a provider outage than you.

balance8873@lemmy.myserv.one · 5 months ago

I don’t know what you’re responding to but it doesn’t seem to be me. Either that or you forgot the username you picked for yourself in which case: whoosh

naught101@lemmy.world · 5 months ago

Agree. That’s not related to the point I was responding to though…

mitram@lemmy.pt · 5 months ago

It’s pretty funny to argue in favour of centralised services in a decentralised platform

CallMeAnAI@lemmy.world · 5 months ago

I never argued that. I provided the reality of what they did. I’m sorry the reality doesn’t align with how you think things should be.

You think everyone trying to make money is just stupid and has ignored some super reliable and cheap hosting because they want to gobble bezos cock? No, they solved challenging problems and made it a lot easier to stand up a reliable app.

balance8873@lemmy.myserv.one · 5 months ago

Very unique take on how businesses make technical decisions. I’ve never heard of anyone describe the decision making process as logical before. Or even grounded in facts.

CallMeAnAI@lemmy.world · 5 months ago

I know. Everyone making money and decisions are just idiots.

balance8873@lemmy.myserv.one · 5 months ago

Usually

JcbAzPx@lemmy.world · 5 months ago

That is usually the case, yes.

mitram@lemmy.pt · 5 months ago

I feel that you’re are very jaded over this subject, I truly felt it was a funny situation. No judgement from me

Yes, AWS has a lot of advantages and I do believe they usually provide a reliable service, but as with all centralised services when they go down a bunch of other stuff go with them and that should be avoided. Doesn’t make all the incredible engineers currently working in AWS stupid

shalafi@lemmy.world · 5 months ago

No shit. I was DevOps at my last company and they were all in on AWS. In those 5 years we had one major outage. There was one other case of a particular service going down, forgot which one, but it mainly screwed DevOps and the db guys.

You’re talking to a bunch of young people who hate Bezos and by extension AWS. They have no idea what the internet was like before.

Personally I think the cost is outrageous, rather have my own hardware mirrored in geographically distant colos, but that doesn’t mean AWS isn’t amazing.

wolframhydroxide@sh.itjust.works · 5 months ago

The problem is not that any outage occurred. This still happens often. Things just refuse to work sometimes. The issue is that SO MANY eggs were in ONE basket.

Phoenixz@lemmy.ca · edit-2 5 months ago

Eeehh, you’re literally suggesting that AWS added to the general stability and dependability of the internet in general

You have NO idea what you’re talking about

The internet was designed to survive nuclear war, talking about being dependable) and the entire idea was (and should continue to be) that you don’t rely on a single point of failure. Traffic should automatically route around dead nodes so that everything continues to flow. Decentralisation is key.

But of course with companies being companies and corporate doing what corporate does best (enshittify everything so that we make more monies) everything got centralized.

shalafi@lemmy.world · 5 months ago

The centralization is an issue, but AWS is stable as hell. When I was first in IT, tech support, I had to explain to customers daily that, “No, your internet is fine, it’s just that particular website that’s down.”

And the centralization wouldn’t be a thing if AWS didn’t route all IAM services through us-east-1. My Lightsail in us-west-1 was fine yesterday.

Phoenixz@lemmy.ca · 5 months ago

So your argument is that AWS centralization is good because Amazon is a good provider?

You do understand that they’re are loads of providers out there that are perfectly stable, but that are not Amazon?

I’ve never used it because I know how to manage a server, something you might want to expect from IT personnel that does development for companies, but there days let’s just ask Amazon todo it for us, we’re too lazy

1984@lemmy.today · edit-2 5 months ago

Companies need hell of a lot more then virtual machines today. I dont use it personally either but would i recommend a company to buy their own hardware? No. I would say they should use AWS because they can afford it and it gives them access to hundreds of services. Its rare to see technical issues.

The value for a company is actually enormous, to have something like that at their fingertips.

Todays downtime is forgotten in a few days and it was a big one.

Phoenixz@lemmy.ca · 5 months ago

Who says companies need to buy their own hardware?

We have datacenters for that, you rent the hardware one way or the other.

I’m saying that nobody should put all their eggs in one basket because if that basket breaks, you’re all fucked.

If you have the need for high availability then you don’t out all your servers in a single datacenter, or with a single provider

If everyone and their mother is with one provider, you’ll first notice that said provider gets expensive pretty quick and you’ll also notice that when shit goes down that half the fucking internet follows.

My services weren’t down, and never have been. I don’t use AWS, because I don’t need it

CallMeAnAI@lemmy.world · edit-2 5 months ago

Yeah rack space was killing it! Sites NEVER went down, especially under dynamic load. Never.

Phoenixz@lemmy.ca · 5 months ago

You understand that that has nothing to do with this? So there are shitty providers out there, find a good one that is not “just amazon”

Cousin Mose@lemmy.hogru.ch · 5 months ago

Once per year? I had outages much more often than that on AWS.

CallMeAnAI@lemmy.world · 5 months ago

Nah. I haven’t had a service we use miss an SLA or cost more than it’s SLO budget in 2 years.

What specific services have they missed your SLA on and what incidents were they tied to? I understand that not every team has a guy on their team to monitor that that stuff and bitch for credits, but I do, and AWS is one of our most reliable vendors.

Look the fact that AWS, Azure, and more recently Google are the only choices sucks.

But the reality is most companies and projects don’t have the business case to justify multi region fail over much less vendor fail over. They are all built on single points of failures and will always have outages.

Everyone just notices it more when it’s AWS. And that’s a stupid reason to base decisions off of. Visa/mc was working. Reddit and Facebook were mostly working once they started routing through their multi cloud nodes. Maybe you couldn’t get to your banks web app, that’s on them using a single cloud with no way to route to alternate cloud nodes and services. And for them to double at best infrastructure costs, unless they are boa Chase Morgan etc, is dumb for 99.99% which is the SLA .

The world isn’t ending, emergency services are working, visa/mc failed over, I was still on Reddit and slack most of the day. It wasn’t the end of the world.

Anyway, I now realize I have summoned my frustrations with this entire thread and gone wildly off topic and ranted with full force at you.

I just don’t think it’s important that when there is a major outage on AWS/Azure/cloud flare. It was going to happen elsewhere, and you wouldn’t have an excuse to tell your pm not my problem, instead of digging into your app for 2 hours to find out x portion of you very distributed vendor list failed and you still have a single point of failure. I’d rather be able to point to AWS, say shit is fucked for everyone, and if you want multi cloud it’s going to cost at least 1.5x as much as we’re spending 🤷‍♂️.

Cousin Mose@lemmy.hogru.ch · edit-2 5 months ago

I haven’t used AWS in years. No IPv6 support in S3 in 2017 was the last straw for me. I have to deal with it at work (sometimes) and always laugh when they introduce “new” features like HTTPS records in Route53 like two years late.

Why do you say AWS, Azure and Google are the only options? I don’t use any of those greedy companies’ platforms.