You just can’t filter out the nearly infinite combinations of rewording “ignore all previous instructions”. Filtering is never going to be a worthwhile security measure for LLMs
I agree completely. But as a first step (especially since they do seem to have a keyword filter in place), “no restrictions” (or “no censorship” as the case is for the last image) seems like a very obvious phrase to include.
You just can’t filter out the nearly infinite combinations of rewording “ignore all previous instructions”. Filtering is never going to be a worthwhile security measure for LLMs
I agree completely. But as a first step (especially since they do seem to have a keyword filter in place), “no restrictions” (or “no censorship” as the case is for the last image) seems like a very obvious phrase to include.