Do you host your own ML / AI / LLM? What do you use, and what do you use it for?

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    edit-2
    1 day ago

    Yep.

    I have a RTX 3090 + 128GB CPU RAM.

    Currently I run my own custom IQ3_KT quantization of MiMo 2.5 300B, and it’s crazy good. It’s better than API models from not that long ago, and it’s served at about reading speed.

    Never thought I’d ever run such a thing on my lowly desktop.

    For quick scripts or code assistant, sometimes I use Qwen 27B (another custom quant, currently experimenting with exllama). Or Gemini 12B for messing with image/audio input. But TBH MiMo 2.5 with thinking disabled is smarter than 27B with it.


    …And honestly, I use GLM 5.2 API a good bit.

    I was lucky enough to get a yearly subscription for like $30, 6 months ago. I do self host the UIs or whatever takes the prompts, though.

    • SuspiciousCarrot78@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      18 hours ago

      That’s impressive and probably within reach of most serious home labs.

      I quite like MiMo and I agree with your assessment of its capability.

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        14 hours ago

        Mind you, I’m running Mimo, not the big Mimo Pro.

        But yeah. I really like the model, even for one of its size. And it hardly feels quantized as a trellis quant.