Qwen3-30B-A3B-Instruct-2507 device-optimized quant variants without output quality falling off a cliff.
A 30B runs on a Raspberry Pi 5 (16GB) achieving 8.03 TPS at 2.70 BPW, while retaining 94.18% of BF16 quality. ShapeLearn tends to find better TPS/quality tradeoffs versus alternatives.
What’s new/interesting in this one
- CPU behavior is mostly sane
On CPUs, once you’re past “it fits,” smaller tends to be faster in a fairly monotonic way. The tradeoff curve behaves like you’d expect.
- GPU behavior is quirky
On GPUs, performance depends as much on kernel choice as on memory footprint. So you often get sweet spots (especially around ~4b) where the kernels are “golden path,” and pushing lower-bit can get weird.
models: https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF
You must log in or register to comment.

