I use sigma N sampling at 1.0, a slop phrase banlist, and maybe a little rep penalty.
Beyond that it depends on the usage.
For scripts or “questioning a document,” it’s as low as can be until it loops. I start with zero temperature. But I don’t really use Gemma for coding, TBH, and it’s not good for longer documents.
If it’s for a specific language or a very specific script, I sometimes constrain grammar for the language.
For more “general” writing, like brainstorming or RP or whatever, I start at around 0.7 with minimal DRY sampling and look at the logit percentages in the Mikupad UI. Especially “important” tokens like names or information recall. If the probability of getting correct answers is too low, I turn the temperature down.
…But honestly, I tend to use big MoEs instead of Gemma for that, too.
And if none of this makes any sense…
Yeah. That’s the problem.
Sampling was supposed to be a temporary stopgap until looping and such was figured out, but the big LLM devs just never addressed it in production. There are all sorts of interesting papers, including one from Google about sampling logits per-layer, but they don’t implement any of them in the API models.
I use sigma N sampling at 1.0, a slop phrase banlist, and maybe a little rep penalty.
Beyond that it depends on the usage.
For scripts or “questioning a document,” it’s as low as can be until it loops. I start with zero temperature. But I don’t really use Gemma for coding, TBH, and it’s not good for longer documents.
If it’s for a specific language or a very specific script, I sometimes constrain grammar for the language.
For more “general” writing, like brainstorming or RP or whatever, I start at around 0.7 with minimal DRY sampling and look at the logit percentages in the Mikupad UI. Especially “important” tokens like names or information recall. If the probability of getting correct answers is too low, I turn the temperature down.
…But honestly, I tend to use big MoEs instead of Gemma for that, too.
And if none of this makes any sense…
Yeah. That’s the problem.
Sampling was supposed to be a temporary stopgap until looping and such was figured out, but the big LLM devs just never addressed it in production. There are all sorts of interesting papers, including one from Google about sampling logits per-layer, but they don’t implement any of them in the API models.