Another reason to self host your own AI

SuspiciousCarrot78@aussie.zone · edit-2 1 day ago

Another reason to self host your own AI

Auli@lemmy.ca · 6 hours ago

Sure but all these self hosted ais are still done by companies who used massive amounts of power and water to train it.

KatherinaReichelt@feddit.org · 5 hours ago

Which is an interesting dilemma: Those AIs are already trained. That power and water was used. If you use them, you will not pollute anything. But you may encourage those companies to train another AI

GreenBottles@lemmy.world · 6 hours ago

P100s are dirt cheap on ebay fyi

SuspiciousCarrot78@aussie.zone · edit-2 4 hours ago

Huh - cheaper than the P40s (though less VRAM) but larger bandwidth due to HBM2. Good looking out

GreenBottles@lemmy.world · 13 minutes ago

They rip

sobchak@programming.dev · 6 hours ago

I think they know it’s a somewhat viable option and is part of the reason they’re doing the hardware cartel/circlejerk thing.

pogmommy@lemmy.ml · 13 hours ago

My issue with the orphan-crushing machine isn’t only that it’s not in my children’s bedroom

brucethemoose@lemmy.world · edit-2 20 hours ago

Yeah.

It’s not even about efficiency, really, but independence from corporations, privacy, and principle. Kind of like Lemmy.

Noxy@pawb.social · 16 hours ago

not gonna self host bullshit that wastes resources and makes me dumber.

toor@lemmy.world · 14 hours ago

Me, looking at my Jellyfin server…

Oh. Ok.

Noxy@pawb.social · 13 hours ago

NO that makes you dumber in a GOOD WAY THO.

irmadlad@lemmy.world · 24 hours ago

People will buy intelligence from us on a meter’

We have governmental surveillance and we have surveillance capitalism. Surveillance capitalism works so well that governments are now very interested in the data they collect, which is alarming. Unfounded conspiracy theory: It’s probably one of the reasons that governments don’t seem interested in AI’s regulation. If I had the proper equipment to run AI entirely local and efficiently so that the expenditure would justify it, I would.

SuspiciousCarrot78@aussie.zone · 15 hours ago

You probably could. A Tesla P4 or P40 (old data centre cards) are more than up to the job. My Lenovo tiny hosts a P4 (card cost $100 on eBay; the lenovo itself was $200ish) and runs Qwen3.5-35B-A3B at about 20 tok/s. Smaller models are even faster.

https://www.youtube.com/watch?v=8F_5pdcD3HY

If you’re not bound by the one liter shoebox design, then the P40 is still a great and inexpensive card.

I think I mentioned elsewhere but right now I’m trying to figure out if I can use a magic packet from the Raspberry Pi to wake up the Lenovo as needed rather than leaving it on all the time.

irmadlad@lemmy.world · 8 hours ago

Thing is, if I were going to do in house AI, I’d want to do it up right and from what I can gather, a system like that is going to cost me some jack.

klangcola@reddthat.com · edit-2 15 hours ago

If you’re already using node-red, the Wake On Lan node works well, and with node-red it’s easy to trigger the magic packet based on whatever trigger condition you want.

The only limitation I know is WOL doesn’t work after a power outage, because the switch and RPI doesn’t know where to find the target machine

Thanks for the tips on reusable enterprise cards btw

WhyJiffie@sh.itjust.works · edit-2 4 hours ago

The only limitation I know is WOL doesn’t work after a power outage, because the switch and RPI doesn’t know where to find the target machine

maybe, but the pi does not need to know that, only the mac address and the interface. the switch doesn’t need to know either because it’s a broadcast frame, it’s forwarded to all cables. the problem sometimes is that if you configure WOL from linux, the network adapter will probably forget on power cycling that it is supposed to react to magic packets. I think not all hardware is susceptible to that, but even then it could help to configure WOL in the BIOS

@[email protected]

klangcola@reddthat.com · 2 hours ago

Maybe something else going on then, but ive never gotten WOL to work after a blackout when there’s two switches between sender and receiver. After powering up the receiver once, WOL works again

SuspiciousCarrot78@aussie.zone · edit-2 14 hours ago

Good tips - thanks!

PS: sad to report the 24GB Tesla p40s are now around $250 USD on eBay, so not quite as cheap as I remembered. P4s are still cheap tho, though frankly if you’re going that end of town, a 1080 is about on par, less fussy and probably cheaper - it just won’t fit in a uSFF.

superglue@lemmy.dbzer0.com · 20 hours ago

Does anyone have a recommendation for a local model that can run well on a 5070 12GB? It pretty much would only get used for help with homelabbing and simple scripts.

brucethemoose@lemmy.world · edit-2 13 hours ago

Depends on how much CPU RAM you have, and how fast it is.

As others said, Qwen 35B at the very least. But you can get better models with more CPU RAM.

superglue@lemmy.dbzer0.com · 7 hours ago

Ive got 32GB DDR5 6000mhz

brucethemoose@lemmy.world · edit-2 3 hours ago

Probably Qwen 35B then. ~9GB free VRAM + (let’s say) ~16GB of free CPU RAM is a good size for that, and squeezing bigger models in would be hard unless it’s a headless linux server.

SuspiciousCarrot78@aussie.zone · 16 hours ago

There’s an argument to be had regarding a MoE versus a small dense model. I guess it depends on what exactly you need doing with it. I would be tempted to run a smaller dense model (like a Qwen 3-14B or a Qwen 3.5 9B) as at a reasonable quant, it might fit mostly or entirely on the GPU, thereby giving you excellent speeds.

PS: I’m actually in the process of designing an expert system (not a LLM) for pretty much the task you described. The intention is that you would still interact with it like a large language model, but the actual brains underneath it would be something more traditional.

brucethemoose@lemmy.world · edit-2 3 hours ago

MoEs can be very fast with hybrid inference. I run Xiaomi Mimo 2.5 (a 310B model, 116GB weights) on my single 3090 + 7800 CPU, and it outputs faster than I can read it.

It’s also easier to fit long context, if you need that.

It’s best to use the ik_llama.cpp fork for that, though. It gives a huge boost to hybrid MoE speeds.

monoboy@lemmy.zip · 19 hours ago

Qwen 3.6-35B-A3B (which OP mentioned) would work great as long as you have some system RAM to offload it.

commander@lemmy.world · 22 hours ago

Altman can try to hype up how everyones going to subscribe to them someday all the while their subscriber base is being eaten up by competitors.

https://www.wheresyoured.at/openai-projects-chatgpt-plus-subscriptions-to-drop-by-80-from-44-million-in-2025-to-9-million-in-2026-made-up-using-cheaper-subscriptions-somehow/

Local stuff. I still believe the small parameter, ~1B free local, ones will suffice for the vast majority of how people use LLMs and there’s still going to be a few years of improvements there until investments dry up. Eventually I bet more and more phone companies will include one of these small ones out the box. Pretty much like a nice search engine that works offline like if you’re out on a major hike. Cloud stuff, there’ll be stuff like Proton’s Lumo where they’re taking free open weight stuff and piecing them together for users.

OpenAI’s thing is they’ll make up for falling subscribers with advertising. So pretty much we’re advancing fast in the search engine race of the 90s/early aughts. We’ll at least have Gemini. ChatGPT maybe ends up crashes in value someday and bought up by Microsoft or some other company. Deepseek, Qwen, Kimi. Claude like ChatGPT maybe survices or crashes and gets adsorbed by another company. Proton continue to exist as the company making AI products out of free stuff. Eventually the pace of improvements moves at a crawl and it’s pointless to be paying for the best paywalled stuff. Just use the free stuff like how everyone mostly uses free search engines

SuspiciousCarrot78@aussie.zone · edit-2 15 hours ago

Agree. And re small models - very agree. In fact I made a ablated version of Qwen 3.5-2B for use with my pi, before thinking a bit harder and realising I can probably code something bespoke that doesn’t need a stochastic parrot as a squwake box at all.

https://huggingface.co/BobbyLLM/polaris-heretic-Q4_K_M-GGUF

Still, as a SLM, it’s perfectly cromulent and does well with tool calling etc which is what I wanted it for.

Hiro8811@lemmy.world · edit-2 21 hours ago

You’re still paying for electricity and a big part of the world is in a electricity crisis. “AI” has few real uses and LLMs are not one of them.

brucethemoose@lemmy.world · edit-2 20 hours ago

This is a “feel guilty about missing recycling” kind of complaint.

Having a server run for an hour or two (?) a day is negligible. You use more energy running a fridge, or leaving a few lights on, or browsing Lemmy for a while. Or running a docker container for other services. You release more greenhouse gasses eating beef, or driving anywhere, or even opening your front door a few times, and individual industries are going to use vastly more electricity than a few self hosters ever would. If you own an EV, you’ve probably blown out your entire zip code of self hosters.

…But if it still bothers you, you can find an ewaste smartphone(s) and host on that. This is actually a very neat use case IMO.

However, if you get to the homelab scale of “an EPYC + 3090s running all the time” that electricity use does start to add up. But that’s quite a rare hobbyist tier, I’d say, and it really shouldnt be running 24/7.

litchralee@sh.itjust.works · 24 hours ago

I’d like to draw a comparison: a cozy wood fire versus central heating. In the right time and place (eg camping in the woods), a wood fire is both very practical and very useful. Meanwhile, most homes built in the past 70+ years in the USA have central heating (or are somewhere that doesn’t need heating at all) and the benefits are quite obvious: automatic temperature regulation, supplied by a utility, and low or no local emissions. And yet, there will still be rural homes that are heated exclusively by a wood stove, located in the middle of the living room, whose iron construction stores and radiates heat well after the fire has gone out.

Do I bemoan individual homes that use a wood fire? No, not really. The reality is that a grand, overwhelming majority of people don’t have wood fires anymore. Even when air quality is poor, prohibiting wood fires in a few rural homes isn’t exactly what would clear up the air.

Now, it would be a vastly different story if city-dwellers all had wood fires. When every home in a neighborhood is building and burning a wood fire, the results are disastrous: horrific PM2.5 in the air, soot coating everything, substantially reduced energy efficiency, and mass logging just to keep the wood supply. A mole-hill quickly becomes a mountain of problems when it’s at scale.

So to that end, I would very much like to see commercial-scale AI reigned in, as the external costs have already gotten out of hand. What they have built is more correctly called a wildfire, not a wood fire. But where does that leave small-scale AI/LLM users? They can weigh the cost/benefits for themselves, provided that they don’t harm other people or resources in the process.

But that brings us back to a cozy wood fire versus central heating: at small scale, a wood fire struggles to heat an entire modern American home (ie 2500 sq ft; or 232 sq m). Yet central heating does it with ease. Who then will be interested in this endeavor? Probably only those with a love for the camping aesthetic, and other enthusiasts.

At this point, it has become more clear what the utility of small LLM models is, and they do pale in comparison to larger LLM models. If small LLMs are what sensibly survives into the future, then that’s essentially a cap on their capabilities, given a want to avoid burning the planet to run anything larger. The only way out would be for substantial developments in the energy efficiency of small LLM models, but that’s not where the interest is.

No one is seeking to build a more efficient wood fire.

pound_heap@lemmy.dbzer0.com · 16 hours ago

People are downvoting you, but I like your idea to draw analogy with heating, because it is something most of us rely on, and if LLMs and related technology will keep evolving as they do, probably most of us will rely on it more or less, sooner or later. Regardless of what AI haters would say.

But your wood fire/central heating analogy is bad. I would compare large LLM vendors to hot water heating utility common in Eastern Europe, and small LLMs to various heating devices. Utility companies can set prices, and decide who gets connected to hot water pipe, and set water temperature. There are regulations that limit the power of such utility companies, allow customers to choose the supplier, etc. Same should happen with LLM providers - competition and anti-monopoly laws should protect customers who choose to use them.

Alternatively, customers may choose not to use utility-supplied heating. They can purchase space heaters, hand warmers, install split systems, burn wood - they are free to pick technology, power source, size, appearance of such devices. They can take responsibility of heating their homes, willing to invest their time and money in order to be independent of central heating utility. Small LLMs are like that - people can run their own, with capabilities dependent on investment, or they can pay smaller providers or resellers to get more flexibility and/or privacy and avoid capital investments. They could spend time tuning small models and harnesses to do some simple tasks, and they wouldn’t need to “buy intelligence” from OpenAI and others.

irmadlad@lemmy.world · 24 hours ago

(ie 2500 sq ft; or 232 sq m)

Damn, y’all livin’ lavvy.

pound_heap@lemmy.dbzer0.com · 16 hours ago

deleted by creator