• affenlehrer@feddit.org
    link
    fedilink
    arrow-up
    4
    ·
    1 day ago

    I hope analog hardware or some other trick will help us in the future to make at least local inference fast and low power.

    • fonix232@fedia.io
      link
      fedilink
      arrow-up
      2
      ·
      24 hours ago

      Local inference isn’t really the issue. Relatively low power hardware can already do passable tokens per sec on medium to large size models (40b to 270b). Of course it won’t compare to an AWS Bedrock instance, but it is passable.

      The reason why you won’t get local AI systems - at least not completely - is due to the restrictive nature of the best models. Most actually good models are not open source. At best you’ll get a locally runnable GGUF, but not open weights, meaning re-training potential is lost. Not to mention that most of the good and usable solutions tend to have complex interconnected systems so you’re not just talking to an LLM but a series of models chained together.

      But that doesn’t mean that local (not hyperlocal, aka “always on your device” but local to your LAN) inference is impossible or hard. I have a £400 node running 3-4b models at lightning speed, at sub-100W (really sub-60W) power usage. For around £1500-2000 you can get a node that gets similar performance with 32-40b models. For about £4000, you can get a node that does the same with 120b models. Mind you I’m talking about lightning fast performance here, not passable.

      • affenlehrer@feddit.org
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        10 hours ago

        At least for me the small 4-8b models turned out to be pretty useless. Extremely prone to hallucinations, not good at multiple languages and worst of all still pretty slow on my machine.

        I tried to create a simple note taking agent with just file io tools available. Without reasoning they fucked up even the simplest tasks in very creative ways and with reasoning it thought about it for 7 before finally doing it.

        The larger one require pretty power hungry and / or expensive hardware.

        I hope for analog hardware to change this.