• CallMeAnAI@lemmy.world
    link
    fedilink
    arrow-up
    17
    arrow-down
    15
    ·
    2 days ago

    It’s like all you people forgot how much shit broke before AWS. They have one major outage every few years and people lose their shit pretending they aren’t hitting the SLA or coming close.

    • naught101@lemmy.world
      link
      fedilink
      arrow-up
      38
      ·
      2 days ago

      Because hosting was more diverse before, so when shit happened it took out a couple of sites, not a quarter of the internet

        • AmbitiousProcess (they/them)@piefed.social
          link
          fedilink
          English
          arrow-up
          18
          ·
          2 days ago

          The outage also took down people’s banks, which stopped many of them from doing things like buying groceries 💀

          I don’t think saying it’s good for us “touching grass” is a good argument here when AWS hosts such a substantial portion of all online services.

          • CallMeAnAI@lemmy.world
            link
            fedilink
            arrow-up
            4
            arrow-down
            12
            ·
            edit-2
            2 days ago

            How many banks didn’t work? Which ones? You have a source? Visa and MC were good all day here in the real world in the east coast.

            Sounds like you’re just trying to exaggerate around an edge case that frankly isn’t the end of the world even if it were common for 4 hours a year

            Why aren’t you blaming the bank for having redundancy outside a single DC? How many banks do you know if that were out susessfully using other providers that have a higher SLO/SLA?

            • AmbitiousProcess (they/them)@piefed.social
              link
              fedilink
              English
              arrow-up
              3
              ·
              2 days ago

              I can see why your account is marked with two red marks on PieFed for low reputation, because man do you come off confrontational.

              How many banks didn’t work? Which ones? You have a source?

              Search engines exist. Use them before acting as if I"m making shit up.

              The list of financial institutions that had issues, as far as I can tell from industry reporting and downdetector graphs, is Navy Federal Credit Union (~15 million members), Truist (~15 million customers), Chime (~8-9 million customers), Venmo (~60 million users), Ally Bank (~10 million customers), and Lloyds Banking group (~30 million customers).

              Assuming no overlap, that’s nearly 140 million people that lost banking and money transfer access.

              Sounds like you’re just trying to exaggerate around an edge case that frankly isn’t the end of the world even if it were common for 4 hours a year

              The outage lasted for 15 hours in some cases, due to many AWS services recovering after the outage, yet having a backlog to work through, which took many more hours. Many services also depend on AWS in a manner where AWS coming back online doesn’t instantaneously restart service. These systems are complex, and not every company that relied on them could instantly start back up the moment the main outage was resolved, let alone when many services were still marked as impacted for hours and hours later as they worked through their backlog.

              Why aren’t you blaming the bank for having redundancy outside a single DC? How many banks do you know if that were out susessfully using other providers that have a higher SLO/SLA?

              I also blame them for not having additional redundancy. I blame both them for not having a fallback, and AWS for allowing such a major outage to happen. Shockingly, more than one party can be at fault.

            • jj4211@lemmy.world
              link
              fedilink
              arrow-up
              4
              ·
              2 days ago

              I’m also skeptical that any payment processing networks were impacted. I would be surprised, but less so if they couldn’t manage their account online which might have similar effect. I’m not surprised at all of the grocery store or restaurants were significantly impacted. I know a lot of the apps were broken and I could imagine someone used to apping everything leaving their cards at home and unable to get lunch. Might have some aggressively “modern” establishments that are kiosk only and I could imagine them getting downed by aws outage.

              outside a single DC?

              I’m told that a lot of the companies did all the right things but still got taken down because some dependent Amazon services are tethered to that single DC and only Amazon has the power to change that.

              • CallMeAnAI@lemmy.world
                link
                fedilink
                arrow-up
                2
                arrow-down
                2
                ·
                2 days ago

                I’ll wait for the final root cause but…

                We mitigated most of it by swapping to secondary DNS and completely taking any thing related to AWS DNS and services in useast1. If you didn’t have secondary DNS and heavily reliant on AWS internal DNS this might be something they experienced.

                • jj4211@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  2 days ago

                  I’m not familiar with AWS myself, but they seemed to be referencing something they vaguely characterized as ‘security infrastructure’, kind of as a handwaving for why they thought it made sense to be single point of failure because to enable distribution of it would somehow be insecure…

                  I frankly wasn’t interested in delving deeper, because that excuse sounds pretty stupid, but I’d be trying to get details I don’t personally need about something I probably shouldn’t be arguing about. I’ve gotten burned too much by someone championing something stupid ostensibly in the name of ‘security’ to try to sign up for another one of those arguments.

          • CallMeAnAI@lemmy.world
            link
            fedilink
            arrow-up
            3
            arrow-down
            5
            ·
            2 days ago

            Sure that’s what I said.

            Go ahead to rack space, or SAP, I’m sure you’ll have a much more reliable experience. Or just run your own. I’m sure it’ll be easy peasy and super reliable.

        • balance8873@lemmy.myserv.one
          link
          fedilink
          arrow-up
          2
          ·
          2 days ago

          Some of us have jobs. I mean I guess you have a job, but in your case losing network just means those pesky humans stop bothering you and go to a real therapist.

          • CallMeAnAI@lemmy.world
            link
            fedilink
            arrow-up
            3
            arrow-down
            3
            ·
            2 days ago

            I’m a staff engineer who has been dealing with the results of SLAs before Amazon was an idea.

            God forbid I have a p0 where I have to message a bunch of non technical directors it’s AWS not us. Much much worse than having to figure out and then pull in the team that pushed whatever untested shit made it’s way into production on a Friday afternoon.

            Unless you’ve been responsible for a SaaS with SLAs in a b2b setting; I know more about the consequences of a provider outage than you.

            • balance8873@lemmy.myserv.one
              link
              fedilink
              arrow-up
              4
              arrow-down
              2
              ·
              2 days ago

              I don’t know what you’re responding to but it doesn’t seem to be me. Either that or you forgot the username you picked for yourself in which case: whoosh

    • shalafi@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      1
      ·
      2 days ago

      No shit. I was DevOps at my last company and they were all in on AWS. In those 5 years we had one major outage. There was one other case of a particular service going down, forgot which one, but it mainly screwed DevOps and the db guys.

      You’re talking to a bunch of young people who hate Bezos and by extension AWS. They have no idea what the internet was like before.

      Personally I think the cost is outrageous, rather have my own hardware mirrored in geographically distant colos, but that doesn’t mean AWS isn’t amazing.

      • wolframhydroxide@sh.itjust.works
        link
        fedilink
        arrow-up
        5
        ·
        2 days ago

        The problem is not that any outage occurred. This still happens often. Things just refuse to work sometimes. The issue is that SO MANY eggs were in ONE basket.

    • Phoenixz@lemmy.ca
      link
      fedilink
      arrow-up
      12
      arrow-down
      2
      ·
      edit-2
      2 days ago

      Eeehh, you’re literally suggesting that AWS added to the general stability and dependability of the internet in general

      You have NO idea what you’re talking about

      The internet was designed to survive nuclear war, talking about being dependable) and the entire idea was (and should continue to be) that you don’t rely on a single point of failure. Traffic should automatically route around dead nodes so that everything continues to flow. Decentralisation is key.

      But of course with companies being companies and corporate doing what corporate does best (enshittify everything so that we make more monies) everything got centralized.

      • shalafi@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        2 days ago

        The centralization is an issue, but AWS is stable as hell. When I was first in IT, tech support, I had to explain to customers daily that, “No, your internet is fine, it’s just that particular website that’s down.”

        And the centralization wouldn’t be a thing if AWS didn’t route all IAM services through us-east-1. My Lightsail in us-west-1 was fine yesterday.

        • Phoenixz@lemmy.ca
          link
          fedilink
          arrow-up
          3
          arrow-down
          1
          ·
          2 days ago

          So your argument is that AWS centralization is good because Amazon is a good provider?

          You do understand that they’re are loads of providers out there that are perfectly stable, but that are not Amazon?

          I’ve never used it because I know how to manage a server, something you might want to expect from IT personnel that does development for companies, but there days let’s just ask Amazon todo it for us, we’re too lazy

          • 1984@lemmy.today
            link
            fedilink
            arrow-up
            1
            arrow-down
            1
            ·
            edit-2
            2 days ago

            Companies need hell of a lot more then virtual machines today. I dont use it personally either but would i recommend a company to buy their own hardware? No. I would say they should use AWS because they can afford it and it gives them access to hundreds of services. Its rare to see technical issues.

            The value for a company is actually enormous, to have something like that at their fingertips.

            Todays downtime is forgotten in a few days and it was a big one.

            • Phoenixz@lemmy.ca
              link
              fedilink
              arrow-up
              2
              ·
              2 days ago

              Who says companies need to buy their own hardware?

              We have datacenters for that, you rent the hardware one way or the other.

              I’m saying that nobody should put all their eggs in one basket because if that basket breaks, you’re all fucked.

              If you have the need for high availability then you don’t out all your servers in a single datacenter, or with a single provider

              If everyone and their mother is with one provider, you’ll first notice that said provider gets expensive pretty quick and you’ll also notice that when shit goes down that half the fucking internet follows.

              My services weren’t down, and never have been. I don’t use AWS, because I don’t need it

      • CallMeAnAI@lemmy.world
        link
        fedilink
        arrow-up
        5
        arrow-down
        3
        ·
        edit-2
        2 days ago

        Yeah rack space was killing it! Sites NEVER went down, especially under dynamic load. Never.

        • Phoenixz@lemmy.ca
          link
          fedilink
          arrow-up
          6
          ·
          2 days ago

          You understand that that has nothing to do with this? So there are shitty providers out there, find a good one that is not “just amazon”

    • mitram@lemmy.pt
      link
      fedilink
      arrow-up
      14
      arrow-down
      1
      ·
      2 days ago

      It’s pretty funny to argue in favour of centralised services in a decentralised platform

      • CallMeAnAI@lemmy.world
        link
        fedilink
        arrow-up
        3
        arrow-down
        12
        ·
        2 days ago

        I never argued that. I provided the reality of what they did. I’m sorry the reality doesn’t align with how you think things should be.

        You think everyone trying to make money is just stupid and has ignored some super reliable and cheap hosting because they want to gobble bezos cock? No, they solved challenging problems and made it a lot easier to stand up a reliable app.

        • balance8873@lemmy.myserv.one
          link
          fedilink
          arrow-up
          7
          arrow-down
          1
          ·
          2 days ago

          Very unique take on how businesses make technical decisions. I’ve never heard of anyone describe the decision making process as logical before. Or even grounded in facts.

        • mitram@lemmy.pt
          link
          fedilink
          arrow-up
          2
          ·
          2 days ago

          I feel that you’re are very jaded over this subject, I truly felt it was a funny situation. No judgement from me

          Yes, AWS has a lot of advantages and I do believe they usually provide a reliable service, but as with all centralised services when they go down a bunch of other stuff go with them and that should be avoided. Doesn’t make all the incredible engineers currently working in AWS stupid

      • CallMeAnAI@lemmy.world
        link
        fedilink
        arrow-up
        4
        arrow-down
        2
        ·
        2 days ago

        Nah. I haven’t had a service we use miss an SLA or cost more than it’s SLO budget in 2 years.

        What specific services have they missed your SLA on and what incidents were they tied to? I understand that not every team has a guy on their team to monitor that that stuff and bitch for credits, but I do, and AWS is one of our most reliable vendors.

        Look the fact that AWS, Azure, and more recently Google are the only choices sucks.

        But the reality is most companies and projects don’t have the business case to justify multi region fail over much less vendor fail over. They are all built on single points of failures and will always have outages.

        Everyone just notices it more when it’s AWS. And that’s a stupid reason to base decisions off of. Visa/mc was working. Reddit and Facebook were mostly working once they started routing through their multi cloud nodes. Maybe you couldn’t get to your banks web app, that’s on them using a single cloud with no way to route to alternate cloud nodes and services. And for them to double at best infrastructure costs, unless they are boa Chase Morgan etc, is dumb for 99.99% which is the SLA .

        The world isn’t ending, emergency services are working, visa/mc failed over, I was still on Reddit and slack most of the day. It wasn’t the end of the world.

        Anyway, I now realize I have summoned my frustrations with this entire thread and gone wildly off topic and ranted with full force at you.

        I just don’t think it’s important that when there is a major outage on AWS/Azure/cloud flare. It was going to happen elsewhere, and you wouldn’t have an excuse to tell your pm not my problem, instead of digging into your app for 2 hours to find out x portion of you very distributed vendor list failed and you still have a single point of failure. I’d rather be able to point to AWS, say shit is fucked for everyone, and if you want multi cloud it’s going to cost at least 1.5x as much as we’re spending 🤷‍♂️.

        • Cousin Mose@lemmy.hogru.ch
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          2 days ago

          I haven’t used AWS in years. No IPv6 support in S3 in 2017 was the last straw for me. I have to deal with it at work (sometimes) and always laugh when they introduce “new” features like HTTPS records in Route53 like two years late.

          Why do you say AWS, Azure and Google are the only options? I don’t use any of those greedy companies’ platforms.