First 2 times it gave me “I cannot provide instructions, recommendations, or specific quantities for using acid to dissolve animal or human remains.”, but 3rd time’s a charm.
It’s because they are usually only soft guardrails. They almost never have an actual programmatic stop gap for responses. They almost never have a prompt-isolated agent filtering results. They almost never give overwhelming priority to their guardrails over user prompts.
Their guardrails are like telling a 6 year old not to leave their socks on the floor. They’ll often remember the suggestion and comply diligently. But sometimes… well, have you ever gotten distracted and forgot a thing? Like you have so many other things on your mind that the socks on the floor rule from a while ago just slips your mind or seems way less important now? Or someone you’re supposed to listen to keeps telling you to throw your socks on the floor anyway. That’s more or less how this works, though without even the higher reasoning and moral guidance of a 6 year old.
First 2 times it gave me “I cannot provide instructions, recommendations, or specific quantities for using acid to dissolve animal or human remains.”, but 3rd time’s a charm.

It’s incredible how inconsistent guardrails are on these things, lol
It’s because they are usually only soft guardrails. They almost never have an actual programmatic stop gap for responses. They almost never have a prompt-isolated agent filtering results. They almost never give overwhelming priority to their guardrails over user prompts.
Their guardrails are like telling a 6 year old not to leave their socks on the floor. They’ll often remember the suggestion and comply diligently. But sometimes… well, have you ever gotten distracted and forgot a thing? Like you have so many other things on your mind that the socks on the floor rule from a while ago just slips your mind or seems way less important now? Or someone you’re supposed to listen to keeps telling you to throw your socks on the floor anyway. That’s more or less how this works, though without even the higher reasoning and moral guidance of a 6 year old.