• brucethemoose@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    12 hours ago

    Doesn’t matter(for this, specifically) if it’s not performant on LLM inference engines.

    And I’m not just talking about CUDA. Even GGUF Vulkan (for example) has all sorts of vendor quirks that can absolutely trash performance. VLLM is often a joke on AMD, with certain models, on certain cards, even with dev support.