• halcyoncmdr@piefed.social
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    3
    ·
    15 hours ago

    They are, at best, an unreliable tool you need to babysit. And because of that, you need to be able to do what you’re asking the tool to do so you can verify it actually did what you asked.

    That’s is not at all how these tools are being marketed, or how many are using them. So the current result is 90% of usage is hype and slop.

    • plateee@piefed.social
      link
      fedilink
      English
      arrow-up
      7
      ·
      13 hours ago

      so you can verify it actually did what you asked.

      Nah, just build a harness that validates the output of one model by running it through the same model again to check for hallucinations… And to make sure that second pass isn’t hallucinating, uh… run it through a model a third time to check the second isn’t hallucinating.

      /s