Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing

return2ozma@lemmy.world · 1 month ago

Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing

HereIAm@lemmy.world · 1 month ago

Yeah, in that scenario they gave the agents access. Just because you ask it nicely not to destroy your workspace, doesn’t guarantee an LLM not to produce that output.

NotMyOldRedditName@lemmy.world · edit-2 1 month ago

With Claude Code being able to run stuff it creates, it could be as simple as it’s in a sandbox, it finds out there’s an exploit in the sandbox while you ask it to work on security things, and it tests the code, it breaks the sandbox, and now it has permissions outside it.

HereIAm@lemmy.world · 1 month ago

I suppose that would be possible.