• 4 Posts
  • 1.45K Comments
Joined 2 years ago
cake
Cake day: March 22nd, 2024

help-circle

  • It’s not so much about English as it is about writing patterns. Like others said, it has a “stilted college essay prompt” feel because that’s what instruct-finetuned LLMs are trained to do.

    Another quirk of LLMs is that they overuse specific phrases, which stems from technical issues (training on their output, training on other LLM’s output, training on human SEO junk, artifacts of whole-word tokenization, inheriting style from its own previous output as it writes the prompt, just to start).

    “Slop” is an overused term, but this is precisely what people in the LLM tinkerer/self hosting community mean by it. It’s also what the “temperature” setting you may see in some UIs is supposed to combat, though that crude an ineffective if you ask me.

    Anyway, if you stare at these LLMs long enough, you learn to see a lot of individual model’s signatures. Some of it is… hard to convey in words. But “Embodies” “landmark achievement” and such just set off alarm bells in my head, specifically for ChatGPT/Claude. If you ask an LLM to write a story, “shivers down the spine” is another phrase so common its a meme, as are specific names they tend to choose for characters.

    If you ask an LLM to write in your native language, you’d run into similar issues, though the translation should soften them some. Hence when I use Chinese open weights models, I get them to “think” in Chinese and answer in English, and get a MUCH better result.

    All this is quantifiable, by the way. Check out EQBench’s slop profiles for individual models:

    https://eqbench.com/creative_writing_longform.html

    https://eqbench.com/creative_writing.html

    And it’s best guess at inbreeding “family trees” for models:

    inbreed













  • It’s not just you.

    Zooming in, I feel like the “camera jpeg” lost sharpness to recompression.

    It’s kind of insane that cameras either dump raw data, or do all this magic only to throw so much away to an ancient image codec that loses even more when recompressed.

    Newer ones can save a HEIF or a “lossy RAW” in some circumstances (which is an infinite improvement), but still; I eagerly await the day cameras can save a JPEG-XL all by themself, and that I can post them on the Fediverse.