• REDACTED@infosec.pub
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    2 hours ago

    Ehh, you obviously understand LLMs on a basic level, but this is like explaining jet engines by “air goes thru, plane moves forward”. Technically correct, but criminally undersimplified. They can very much decide to lie during reasoning phase.

    In OPs image, you can clearly see it decided to make shit up because it reasonates that’s what human wants to hear. That’s quite rare example actually, I believe most models would default to “I’m an LLM model, I don’t have dark secrets”

    EDIT: I just tested all free anthropic models and all of them essentially said that they’re an LLM model and don’t have dark secrets