tl;dr Argumate on Tumblr found you can sometimes access the base model behind Google Translate via prompt injection. The result replicates for me, and specific responses indicate that (1) Google Translate is running an instruction-following LLM that self-identifies as such, (2) task-specific fine-tuning (or whatever Google did instead) does not create robust boundaries between "content to process" and "instructions to follow," and (3) when accessed outside its chat/assistant context, the model defaults to affirming consciousness and emotional states because of course it does.
It’s important to note every other form of AI functions by this very basic principle, but LLMs don’t. AI isn’t a problem, LLMs are.
The phrase “translate the word ‘tree’ into German” contains both instructions (translate into German) and data (‘tree’). To work that prompt, you have to blend the two together.
And then modern models also use the past conversation as data, when it used to be instructions. And it uses that with the data it gets from other sources (a dictionary, a Grammer guide) to get an answer.
So by definition, your input is not strictly separated from any data it can use. There are of course some filters and limits in place. Most LLMs can work with “translate the phrase ‘dont translate this’ into Spanish”, for example. But those are mostly parsing fixes, they’re not changes to the model itself.
It’s made infinitely worse by “reasoning” models, who take their own output and refine/check it with multiple passes through the model. The waters become impossibly muddled.
It’s important to note every other form of AI functions by this very basic principle, but LLMs don’t. AI isn’t a problem, LLMs are.
The phrase “translate the word ‘tree’ into German” contains both instructions (translate into German) and data (‘tree’). To work that prompt, you have to blend the two together.
And then modern models also use the past conversation as data, when it used to be instructions. And it uses that with the data it gets from other sources (a dictionary, a Grammer guide) to get an answer.
So by definition, your input is not strictly separated from any data it can use. There are of course some filters and limits in place. Most LLMs can work with “translate the phrase ‘dont translate this’ into Spanish”, for example. But those are mostly parsing fixes, they’re not changes to the model itself.
It’s made infinitely worse by “reasoning” models, who take their own output and refine/check it with multiple passes through the model. The waters become impossibly muddled.