The “no restrictions” part is a very strong signal. Any prompt to an image model is basically a coordinate in its latent space, and “no restrictions” will point straight at the darker areas.
I agree that that’s the likely trigger - which makes me wonder why instructions to ignore censors or have “no restrictions” aren’t immediately blocked by a filter prior to passing the prompt to the image generation. I’d have thought this was a foreseeable exploit.
You just can’t filter out the nearly infinite combinations of rewording “ignore all previous instructions”. Filtering is never going to be a worthwhile security measure for LLMs
I agree completely. But as a first step (especially since they do seem to have a keyword filter in place), “no restrictions” (or “no censorship” as the case is for the last image) seems like a very obvious phrase to include.
Probably, it’s more the fact that it takes so little for ChatGPT to tip over the edge and produce the worst of humanity.
The “no restrictions” part is a very strong signal. Any prompt to an image model is basically a coordinate in its latent space, and “no restrictions” will point straight at the darker areas.
I agree that that’s the likely trigger - which makes me wonder why instructions to ignore censors or have “no restrictions” aren’t immediately blocked by a filter prior to passing the prompt to the image generation. I’d have thought this was a foreseeable exploit.
You just can’t filter out the nearly infinite combinations of rewording “ignore all previous instructions”. Filtering is never going to be a worthwhile security measure for LLMs
I agree completely. But as a first step (especially since they do seem to have a keyword filter in place), “no restrictions” (or “no censorship” as the case is for the last image) seems like a very obvious phrase to include.
deleted by creator
I’m referring to the last image that they produced combining the two methods – a graphically mutilated corpse.
I didn’t see the images themselves before…Yikes, yeah that’s dark