Currently I run my own custom IQ3_KT quantization of MiMo 2.5 300B, and it’s crazy good. It’s better than API models from not that long ago, and it’s served at about reading speed.
Never thought I’d ever run such a thing on my lowly desktop.
For quick scripts or code assistant, sometimes I use Qwen 27B (another custom quant, currently experimenting with exllama). Or Gemini 12B for messing with image/audio input. But TBH MiMo 2.5 with thinking disabled is smarter than 27B with it.
…And honestly, I use GLM 5.2 API a good bit.
I was lucky enough to get a yearly subscription for like $30, 6 months ago. I do self host the UIs or whatever takes the prompts, though.
Yep.
I have a RTX 3090 + 128GB CPU RAM.
Currently I run my own custom IQ3_KT quantization of MiMo 2.5 300B, and it’s crazy good. It’s better than API models from not that long ago, and it’s served at about reading speed.
Never thought I’d ever run such a thing on my lowly desktop.
For quick scripts or code assistant, sometimes I use Qwen 27B (another custom quant, currently experimenting with exllama). Or Gemini 12B for messing with image/audio input. But TBH MiMo 2.5 with thinking disabled is smarter than 27B with it.
…And honestly, I use GLM 5.2 API a good bit.
I was lucky enough to get a yearly subscription for like $30, 6 months ago. I do self host the UIs or whatever takes the prompts, though.
That’s impressive and probably within reach of most serious home labs.
I quite like MiMo and I agree with your assessment of its capability.
Mind you, I’m running Mimo, not the big Mimo Pro.
But yeah. I really like the model, even for one of its size. And it hardly feels quantized as a trellis quant.