• vermaterc@lemmy.ml
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    3
    ·
    4 hours ago

    So are we assuming here that LLMs won’t become more efficient over time? GPT-3 has been a frontier model just a few years ago and it’s performance blew everyone’s mind at that time. I can now run equivalent LLM on my personal computer. Why can’t we expect that after a few years Claude Sonnet level of capability won’t be possible to accomplish locally?

    • ag10n@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 hours ago

      What’s the cost of the compute you have to run something locally?

      Majority of people don’t have 32G of vram to run something remotely as capable

      • MrQuallzin@pie.eyeofthestorm.place
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        1
        ·
        3 hours ago

        I’ve got an old 1060ti in my server. Ollama shares it with just a couple other containers. Electricity here is majority hydro with some natural gas, $0.08/kWh.

        It’s a little slow, but I can comfortably run qwen3:14b. Of course that’s not all done on the GPU, a large part is offloaded to server ram (generally 32GB available so more than enough headroom)

        My server and my gaming PC combined last month came out to $13.32

        • ag10n@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          3 hours ago

          How does that compare to closed models that Anthropic offers, at the context and scale they offer.

          I run Qwen3.6 27B locally and it’s usable with 16G vram but still not the same as a data centre of Blackwell clusters.

        • ag10n@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 hours ago

          Describe greased lightning, because it’s much slower and needs to handle compression for context

          We’re moving in that direction but an M5 is not what the majority of people are running at home

          • greyscale@lemmy.grey.ooo
            link
            fedilink
            English
            arrow-up
            1
            ·
            37 minutes ago

            I dunno man, I’m not a slopjockey so I don’t know the minutiae of the addiction.

            All of our devs appear to have M5s right now. All of those copilot+ laptops have NPUs too.

            • ag10n@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              15 minutes ago

              Your company has bought you the latest and greatest and likely supports commercial token usage too

              You can’t compare LLMs at scale to running it locally; same experience and capabilities

              • greyscale@lemmy.grey.ooo
                link
                fedilink
                English
                arrow-up
                1
                ·
                6 minutes ago

                “Latest and greatest” my fucking sides lmao

                My company gave me some US shitware and I’ve got some local shitware instead.

                If you can’t make that work and are dependent on the teat of the slopgenerators, that’s a skill issue on you, buddy.

      • blackbeans@lemmy.zip
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        2
        ·
        edit-2
        3 hours ago

        I remember my computer not being fast enough to even play an MP3 file. Two years later, my computer was capable of running 3D accelerated games, browsing the internet at broadband speeds and playing videos.

        Sometimes technology advances fast. We could be entering such an era as there are major investments taking place and global competitors will rise to the occasion to market these to a broader audience.

        I think it will be entirely possible for consumers to use a decent LLM on their computer in a few years time.

        • ag10n@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          3 hours ago

          It’s not the 90s anymore. Unless there’s a compression algorithm putting billions of relationships into a manageable size, local AI is highly specific under 8G vram (text-to-speech as an example is under 1G) let alone the context required for keeping a conversation or writing code.

          • blackbeans@lemmy.zip
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            3 hours ago

            To be clear, I wasn’t talking about a leap in LLM design. I was talking about a leap in hardware capabilities…

            • KRAW@linux.community
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 hour ago

              Improved hardware capabilities used to come very quickly (see Moore’s Law and Dennard Scaling). However that trend is basically over, so getting higher performance hardware takes a lot of effort to make hardware specialized for certain tasks. That’s why you see there inference accelerators like Groq, SambaNova, Cerebrus, etc. However this is hardware that still is gonna go into data centers. Something innovative has to happen on the AI side for commercial-grade models to be runnable on consumer hardware.

            • ag10n@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              3 hours ago

              Which are increasingly out of reach for a normal person. Phones let alone PC hardware have increased exponentially in recent history

    • greyscale@lemmy.grey.ooo
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      4 hours ago

      It already happened, small language models are busy dragging their nutsack on frontier models, running on a macbook and costing nothing

      Where’s the fucking product, Sam?

    • Imperious_melange@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      3
      ·
      3 hours ago

      I’m still pretty new to Lemmy and the fediverse although I really enjoy it. I’ve noticed some strong dislike of anything and everything AI to the point I think it’s clouding some peoples ability to really see the situation at hand. That said I get a lot of people skepticism, a lot of AI projects are nonsense and things have been over promised. On top of that there’s the more than problematic issue of data centers and the environment. I think people don’t fully grasp how insane some of the achievements of neural nets are, how fast it’s developing, that having models that pretty much pass the Turing test was pure sci-fi just a few years ago, much less are solving legitimate mathematical conjectures as well as other hard problems in science.

      • Jakeroxs@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        2
        ·
        3 hours ago

        A large majority definitely hate it to the point of having blinders on for sure.

        On one side you have corpo hype/lies, and the other is LLM is slop garbage and terrible for anything, also developers wrote perfect code before LLMs and now everything that breaks is AI slop caused.

        • Imperious_melange@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 hours ago

          As a programmer I can assure you there were plenty of bugs before AI and not all bugs now are AI caused. That said yeah we’re in the awkward teen years of AI. From 2015 to 2020 was like the baby years and people going “omg that’s amazing” and right now we’re in the “I hate everything and everyone” phase and it will emerge into either “omg the world is ending” or “this is utopia” or “alright thing is damn useful for good and bad endeavours”.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      3 hours ago

      Why can’t we expect that after a few years Claude Sonnet level of capability won’t be possible to accomplish locally?

      Because when you’re old enough to remember what AIM chat it’s could do 25 years ago, it stops being impressive what today’s chatbots can do…

      It’s seems “new” because everyone hated it and it was just a novelty back then.

      But if you read up on them, they did 90% of what modern ones do. And if they had access to today’s computing, the only explanation for why they still suck so much, is that no one has ever wanted them.

      The oligarchs just decided it didn’t matter

      • unpossum@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        Because when you’re old enough to remember what AIM chat it’s could do 25 years ago, it stops being impressive what today’s chatbots can do…

        C’mon, that’s just silly.