Might not be efficient, but at least it... Uhhh, wait, what good does it provide again?

EndOfLine@lemmy.world · 1 day ago

Might not be efficient, but at least it... Uhhh, wait, what good does it provide again?

affenlehrer@feddit.org · 1 day ago

I hope analog hardware or some other trick will help us in the future to make at least local inference fast and low power.

fonix232@fedia.io · 24 hours ago

Local inference isn’t really the issue. Relatively low power hardware can already do passable tokens per sec on medium to large size models (40b to 270b). Of course it won’t compare to an AWS Bedrock instance, but it is passable.

The reason why you won’t get local AI systems - at least not completely - is due to the restrictive nature of the best models. Most actually good models are not open source. At best you’ll get a locally runnable GGUF, but not open weights, meaning re-training potential is lost. Not to mention that most of the good and usable solutions tend to have complex interconnected systems so you’re not just talking to an LLM but a series of models chained together.

But that doesn’t mean that local (not hyperlocal, aka “always on your device” but local to your LAN) inference is impossible or hard. I have a £400 node running 3-4b models at lightning speed, at sub-100W (really sub-60W) power usage. For around £1500-2000 you can get a node that gets similar performance with 32-40b models. For about £4000, you can get a node that does the same with 120b models. Mind you I’m talking about lightning fast performance here, not passable.

affenlehrer@feddit.org · edit-2 10 hours ago

At least for me the small 4-8b models turned out to be pretty useless. Extremely prone to hallucinations, not good at multiple languages and worst of all still pretty slow on my machine.

I tried to create a simple note taking agent with just file io tools available. Without reasoning they fucked up even the simplest tasks in very creative ways and with reasoning it thought about it for 7 before finally doing it.

The larger one require pretty power hungry and / or expensive hardware.

I hope for analog hardware to change this.