Huge Study of Chats Between Delusional Users and AI Finds Alarming Patterns

return2ozma@lemmy.world · 2 months ago

Huge Study of Chats Between Delusional Users and AI Finds Alarming Patterns

Hackworth@piefed.ca · edit-2 2 months ago

Anthropic has some similar findings, and they propose an architectural change (activation capping) that apparently helps keep the Assistant character away from dark traits (sometimes). But it hasn’t been implemented in any models, I assume because of the cost of scaling it up.

porcoesphino@mander.xyz · edit-2 2 months ago

When you talk to a large language model, you can think of yourself as talking to a character

But who exactly is this Assistant? Perhaps surprisingly, even those of us shaping it don’t fully know

Fuck me that’s some terrifying anthropomorphising for a stochastic parrot

The study could also be summarised as “we trained our LLMs on biased data, then honed them to be useful, then chose some human qualities to map models to, and would you believe they align along a spectrum being useful assistants!?”. They built the thing to be that way then are shocked? Who reads this and is impressed besides the people that want another exponential growth investment?

To be fair, I’m only about 1/3rd of the way through and struggling to continue reading it so I haven’t got to the interesting research but the intro is, I think, terrible

Hackworth@piefed.ca · 2 months ago

The paper is more rigorous with language but can be a slog.

nymnympseudonym@piefed.social · 2 months ago

stochastic parrot

A phrase that throws more heat than light.

What they are predicting is not the next word they are predicting the next idea

ageedizzle@piefed.ca · edit-2 2 months ago

Technically, they are predicting the next token. To do that properly they may need to predict the next idea, but thats just a means to an end (the end being the next token).

affenlehrer@feddit.org · 2 months ago

Also, the LLM is just predicting it, it’s not selecting it. Additionally it’s not limited to the role of assistant, if you (mis) configure the inference engine accordingly it will happily predict user tokens or any other token (tool calls etc).

porcoesphino@mander.xyz · edit-2 2 months ago

How it functionally works, its the next word / token / chunk a lot more than its an “idea”. An idea is even rough to define

The other relatively accurate analogy is a probabilistic database

Neither work if you’ve fallen into anthropomorphising, but they’re relatively accurate to architecture and testing for people that aren’t too computer literate, far more than the anthropomorphising alternatives at least

kazerniel@lemmy.world · 2 months ago

throws more heat than light

Thanks, I haven’t heard this phrase before, but it feels quite descriptive :)