• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    2 hours ago

    Gemini actually has a really interesting architecture, hence it has fast responses, and it’s easily the best long context model out there.

    And outside of bechmaxxing or pure coding, Gemma is very good for its size. 12B is an incredible multimodal LLm, the only one natively trained for image/text ingestion without a mmproj hacked on at the end.

    …But it sure feels like executive meddling kills it.

    The pattern I see is:

    • Gemini preview is released.

    • It’s genuinely good! It’s smart, it’s straight.

    • Then they “refine” it, it’s gets more and more sycophantic, more deep fried. Long context performance degrades… benchmark scores go up, but anyone who actually uses it can immediately tell it’s gotten worse.

    • Only then, is it released for mass use.

    It’s obvious they took a good model, then enshittified it to make their bosses happy and tech bros in Twitter excited.

    Gemma has the same pattern. Researchers tease the local community, delay it, and then when a new Gemma finally comes out, it turns out to be using some old SWA architecture. And the biggest model is cut. And only a smaller one uses the multimodal training.

    It’s obvious it was neutered to not “threaten” Gemma API or be too “unsafe.”


    Another thing I’ve noticed is that both Gemini and Gemma are awful with their default 1.0 temperature/top-p 0.95. Sampling completely screws them up. But they like low temperature + minp, and Gemma loves constrained sampling.

    But 99% of users don’t know anything about sampling, so that’s going to leave a bad impression.

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        25 minutes ago

        I use sigma N sampling at 1.0, a slop phrase banlist, and maybe a little rep penalty.

        Beyond that it depends on the usage.

        For scripts or “questioning a document,” it’s as low as can be until it loops. I start with zero temperature. But I don’t really use Gemma for coding, TBH, and it’s not good for longer documents.

        If it’s for a specific language or a very specific script, I sometimes constrain grammar for the language.

        For more “general” writing, like brainstorming or RP or whatever, I start at around 0.7 with minimal DRY sampling and look at the logit percentages in the Mikupad UI. Especially “important” tokens like names or information recall. If the probability of getting correct answers is too low, I turn the temperature down.

        …But honestly, I tend to use big MoEs instead of Gemma for that, too.


        And if none of this makes any sense…

        Yeah. That’s the problem.

        Sampling was supposed to be a temporary stopgap until looping and such was figured out, but the big LLM devs just never addressed it in production. There are all sorts of interesting papers, including one from Google about sampling logits per-layer, but they don’t implement any of them in the API models.