Recent post re: AI as utility

https://www.tomsguide.com/ai/people-will-buy-intelligence-from-us-on-a-meter-chatgpts-ceo-sam-altman-has-critics-worried-with-his-ai-vision

Myself, I’m a fan of local LLM / self hosted ML… but if you ever needed a clarion call that a hard pivot is coming (soon) for online/ cloud based AI…Altman et al are making some concerning mouth noises (to say nothing of broader concerns with OAI, Anthropic etc).

Right now, I’m sketching out a plan where my Raspberry Pi (always on, 2-3w) uses a magic packet to wake up my modest AI server (Lenovo P330 with Tesla P4) if/when needed (Qwen 3.6-35B-A3B); no point in chugging down 80-100w, 24/7 for no good reason.

If the trend continues the direction it appears to be (increasing costs, environmental impacts etc) then I’d feel a lot better hosting my own as port of first call and replacing simpler tasks with more traditional programs. YMMV.

  • Auli@lemmy.ca
    link
    fedilink
    English
    arrow-up
    3
    ·
    6 hours ago

    Sure but all these self hosted ais are still done by companies who used massive amounts of power and water to train it.

    • KatherinaReichelt@feddit.org
      link
      fedilink
      English
      arrow-up
      6
      ·
      5 hours ago

      Which is an interesting dilemma: Those AIs are already trained. That power and water was used. If you use them, you will not pollute anything. But you may encourage those companies to train another AI

  • sobchak@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    6 hours ago

    I think they know it’s a somewhat viable option and is part of the reason they’re doing the hardware cartel/circlejerk thing.

  • pogmommy@lemmy.ml
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    5
    ·
    13 hours ago

    My issue with the orphan-crushing machine isn’t only that it’s not in my children’s bedroom

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    1
    ·
    edit-2
    20 hours ago

    Yeah.

    It’s not even about efficiency, really, but independence from corporations, privacy, and principle. Kind of like Lemmy.

  • Noxy@pawb.social
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    12
    ·
    16 hours ago

    not gonna self host bullshit that wastes resources and makes me dumber.

  • irmadlad@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    1
    ·
    24 hours ago

    People will buy intelligence from us on a meter’

    We have governmental surveillance and we have surveillance capitalism. Surveillance capitalism works so well that governments are now very interested in the data they collect, which is alarming. Unfounded conspiracy theory: It’s probably one of the reasons that governments don’t seem interested in AI’s regulation. If I had the proper equipment to run AI entirely local and efficiently so that the expenditure would justify it, I would.

    • SuspiciousCarrot78@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      15 hours ago

      You probably could. A Tesla P4 or P40 (old data centre cards) are more than up to the job. My Lenovo tiny hosts a P4 (card cost $100 on eBay; the lenovo itself was $200ish) and runs Qwen3.5-35B-A3B at about 20 tok/s. Smaller models are even faster.

      https://www.youtube.com/watch?v=8F_5pdcD3HY

      If you’re not bound by the one liter shoebox design, then the P40 is still a great and inexpensive card.

      I think I mentioned elsewhere but right now I’m trying to figure out if I can use a magic packet from the Raspberry Pi to wake up the Lenovo as needed rather than leaving it on all the time.

      • irmadlad@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        8 hours ago

        Thing is, if I were going to do in house AI, I’d want to do it up right and from what I can gather, a system like that is going to cost me some jack.

      • klangcola@reddthat.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        15 hours ago

        If you’re already using node-red, the Wake On Lan node works well, and with node-red it’s easy to trigger the magic packet based on whatever trigger condition you want.

        The only limitation I know is WOL doesn’t work after a power outage, because the switch and RPI doesn’t know where to find the target machine

        Thanks for the tips on reusable enterprise cards btw

        • WhyJiffie@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          4 hours ago

          The only limitation I know is WOL doesn’t work after a power outage, because the switch and RPI doesn’t know where to find the target machine

          maybe, but the pi does not need to know that, only the mac address and the interface. the switch doesn’t need to know either because it’s a broadcast frame, it’s forwarded to all cables. the problem sometimes is that if you configure WOL from linux, the network adapter will probably forget on power cycling that it is supposed to react to magic packets. I think not all hardware is susceptible to that, but even then it could help to configure WOL in the BIOS

          @[email protected]

          • klangcola@reddthat.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 hours ago

            Maybe something else going on then, but ive never gotten WOL to work after a blackout when there’s two switches between sender and receiver. After powering up the receiver once, WOL works again

        • SuspiciousCarrot78@aussie.zoneOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          14 hours ago

          Good tips - thanks!

          PS: sad to report the 24GB Tesla p40s are now around $250 USD on eBay, so not quite as cheap as I remembered. P4s are still cheap tho, though frankly if you’re going that end of town, a 1080 is about on par, less fussy and probably cheaper - it just won’t fit in a uSFF.

  • superglue@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    20 hours ago

    Does anyone have a recommendation for a local model that can run well on a 5070 12GB? It pretty much would only get used for help with homelabbing and simple scripts.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      13 hours ago

      Depends on how much CPU RAM you have, and how fast it is.

      As others said, Qwen 35B at the very least. But you can get better models with more CPU RAM.

        • brucethemoose@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          3 hours ago

          Probably Qwen 35B then. ~9GB free VRAM + (let’s say) ~16GB of free CPU RAM is a good size for that, and squeezing bigger models in would be hard unless it’s a headless linux server.

    • SuspiciousCarrot78@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      16 hours ago

      There’s an argument to be had regarding a MoE versus a small dense model. I guess it depends on what exactly you need doing with it. I would be tempted to run a smaller dense model (like a Qwen 3-14B or a Qwen 3.5 9B) as at a reasonable quant, it might fit mostly or entirely on the GPU, thereby giving you excellent speeds.

      PS: I’m actually in the process of designing an expert system (not a LLM) for pretty much the task you described. The intention is that you would still interact with it like a large language model, but the actual brains underneath it would be something more traditional.

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        3 hours ago

        MoEs can be very fast with hybrid inference. I run Xiaomi Mimo 2.5 (a 310B model, 116GB weights) on my single 3090 + 7800 CPU, and it outputs faster than I can read it.

        It’s also easier to fit long context, if you need that.

        It’s best to use the ik_llama.cpp fork for that, though. It gives a huge boost to hybrid MoE speeds.

    • monoboy@lemmy.zip
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      19 hours ago

      Qwen 3.6-35B-A3B (which OP mentioned) would work great as long as you have some system RAM to offload it.

  • commander@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    22 hours ago

    Altman can try to hype up how everyones going to subscribe to them someday all the while their subscriber base is being eaten up by competitors.

    https://www.wheresyoured.at/openai-projects-chatgpt-plus-subscriptions-to-drop-by-80-from-44-million-in-2025-to-9-million-in-2026-made-up-using-cheaper-subscriptions-somehow/

    Local stuff. I still believe the small parameter, ~1B free local, ones will suffice for the vast majority of how people use LLMs and there’s still going to be a few years of improvements there until investments dry up. Eventually I bet more and more phone companies will include one of these small ones out the box. Pretty much like a nice search engine that works offline like if you’re out on a major hike. Cloud stuff, there’ll be stuff like Proton’s Lumo where they’re taking free open weight stuff and piecing them together for users.

    OpenAI’s thing is they’ll make up for falling subscribers with advertising. So pretty much we’re advancing fast in the search engine race of the 90s/early aughts. We’ll at least have Gemini. ChatGPT maybe ends up crashes in value someday and bought up by Microsoft or some other company. Deepseek, Qwen, Kimi. Claude like ChatGPT maybe survices or crashes and gets adsorbed by another company. Proton continue to exist as the company making AI products out of free stuff. Eventually the pace of improvements moves at a crawl and it’s pointless to be paying for the best paywalled stuff. Just use the free stuff like how everyone mostly uses free search engines

    • SuspiciousCarrot78@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      edit-2
      15 hours ago

      Agree. And re small models - very agree. In fact I made a ablated version of Qwen 3.5-2B for use with my pi, before thinking a bit harder and realising I can probably code something bespoke that doesn’t need a stochastic parrot as a squwake box at all.

      https://huggingface.co/BobbyLLM/polaris-heretic-Q4_K_M-GGUF

      Still, as a SLM, it’s perfectly cromulent and does well with tool calling etc which is what I wanted it for.

  • Hiro8811@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    10
    ·
    edit-2
    21 hours ago

    You’re still paying for electricity and a big part of the world is in a electricity crisis. “AI” has few real uses and LLMs are not one of them.

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      2
      ·
      edit-2
      20 hours ago

      This is a “feel guilty about missing recycling” kind of complaint.

      Having a server run for an hour or two (?) a day is negligible. You use more energy running a fridge, or leaving a few lights on, or browsing Lemmy for a while. Or running a docker container for other services. You release more greenhouse gasses eating beef, or driving anywhere, or even opening your front door a few times, and individual industries are going to use vastly more electricity than a few self hosters ever would. If you own an EV, you’ve probably blown out your entire zip code of self hosters.

      But if it still bothers you, you can find an ewaste smartphone(s) and host on that. This is actually a very neat use case IMO.


      However, if you get to the homelab scale of “an EPYC + 3090s running all the time” that electricity use does start to add up. But that’s quite a rare hobbyist tier, I’d say, and it really shouldnt be running 24/7.

  • litchralee@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    9
    ·
    24 hours ago

    I’d like to draw a comparison: a cozy wood fire versus central heating. In the right time and place (eg camping in the woods), a wood fire is both very practical and very useful. Meanwhile, most homes built in the past 70+ years in the USA have central heating (or are somewhere that doesn’t need heating at all) and the benefits are quite obvious: automatic temperature regulation, supplied by a utility, and low or no local emissions. And yet, there will still be rural homes that are heated exclusively by a wood stove, located in the middle of the living room, whose iron construction stores and radiates heat well after the fire has gone out.

    Do I bemoan individual homes that use a wood fire? No, not really. The reality is that a grand, overwhelming majority of people don’t have wood fires anymore. Even when air quality is poor, prohibiting wood fires in a few rural homes isn’t exactly what would clear up the air.

    Now, it would be a vastly different story if city-dwellers all had wood fires. When every home in a neighborhood is building and burning a wood fire, the results are disastrous: horrific PM2.5 in the air, soot coating everything, substantially reduced energy efficiency, and mass logging just to keep the wood supply. A mole-hill quickly becomes a mountain of problems when it’s at scale.

    So to that end, I would very much like to see commercial-scale AI reigned in, as the external costs have already gotten out of hand. What they have built is more correctly called a wildfire, not a wood fire. But where does that leave small-scale AI/LLM users? They can weigh the cost/benefits for themselves, provided that they don’t harm other people or resources in the process.

    But that brings us back to a cozy wood fire versus central heating: at small scale, a wood fire struggles to heat an entire modern American home (ie 2500 sq ft; or 232 sq m). Yet central heating does it with ease. Who then will be interested in this endeavor? Probably only those with a love for the camping aesthetic, and other enthusiasts.

    At this point, it has become more clear what the utility of small LLM models is, and they do pale in comparison to larger LLM models. If small LLMs are what sensibly survives into the future, then that’s essentially a cap on their capabilities, given a want to avoid burning the planet to run anything larger. The only way out would be for substantial developments in the energy efficiency of small LLM models, but that’s not where the interest is.

    No one is seeking to build a more efficient wood fire.

    • pound_heap@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      16 hours ago

      People are downvoting you, but I like your idea to draw analogy with heating, because it is something most of us rely on, and if LLMs and related technology will keep evolving as they do, probably most of us will rely on it more or less, sooner or later. Regardless of what AI haters would say.

      But your wood fire/central heating analogy is bad. I would compare large LLM vendors to hot water heating utility common in Eastern Europe, and small LLMs to various heating devices. Utility companies can set prices, and decide who gets connected to hot water pipe, and set water temperature. There are regulations that limit the power of such utility companies, allow customers to choose the supplier, etc. Same should happen with LLM providers - competition and anti-monopoly laws should protect customers who choose to use them.

      Alternatively, customers may choose not to use utility-supplied heating. They can purchase space heaters, hand warmers, install split systems, burn wood - they are free to pick technology, power source, size, appearance of such devices. They can take responsibility of heating their homes, willing to invest their time and money in order to be independent of central heating utility. Small LLMs are like that - people can run their own, with capabilities dependent on investment, or they can pay smaller providers or resellers to get more flexibility and/or privacy and avoid capital investments. They could spend time tuning small models and harnesses to do some simple tasks, and they wouldn’t need to “buy intelligence” from OpenAI and others.