'I had to RUN to my Mac mini like I was defusing a bomb': OpenClaw AI chose to 'speedrun' deleting Meta AI safety director's inbox due to a 'rookie error'

themachinestops@lemmy.dbzer0.com · edit-2 15 hours ago

'I had to RUN to my Mac mini like I was defusing a bomb': OpenClaw AI chose to 'speedrun' deleting Meta AI safety director's inbox due to a 'rookie error'

LittleBorat3@lemmy.world · 1 hour ago

The I’m sorry part is always great, I always wanted an apology by an LLM not that it works as specified 😆

It can be like your least competent colleague on roids

lemmydividebyzero@reddthat.com · 3 hours ago

They released a version recently that fixed over 60 security vulnerabilities. All of them were high or critical.

How many more are there to find? Thousands?

Whoever uses this on a PC with anything useful on it, is absolutely insane.

aesthelete@lemmy.world · edit-2 41 minutes ago

Even with little usage it was fairly obvious to me that the probability that an LLM will output at least one very strange response over time approaches 100%.

By themselves, they’re just sophisticated chatbots and only stream out some characters or binary in response to a prompt.

Those working in agentic AI frameworks with things like “MCP Servers” provide these things with “tools” that enable them to do things like execute shell commands and go through your inbox the same as if it were chatting with a person or another bot: with the same prompt and response paradigm.

That’s where it seems extremely obvious to me that the proper approach is to code these tools – which in any sane framework are built using regular code – with the governance in place to prevent these things from doing bullshit like this.

The LLM is formatting your computer or deleting your inbox because some dumb fuck thought it was a great idea to code up tools that hand a chatbot a root-capable shell or complete access to your email system instead of the doing the obviously safer thing and coding the tools with the governance or safety in them so the chatbot going haywire isn’t any kind of emergency at all.

This is the 2026 equivalent of running Windows XP with its abundance of open ports in its default configuration on the Internet by running a cable modem directly into the computer with no router or firewall in between to protect it.

It’s pure slop, pure recklessness, and any company that produces tool chains that function this way should be ridiculed until the end of time.

FireWire400@lemmy.world · 3 hours ago

Jokes on you; she probably still earns more money than most of us…

pinball_wizard@lemmy.zip · 2 hours ago

And has fewer worthless emails in her inbox.

FireWire400@lemmy.world · edit-2 13 minutes ago

Probably mostly invites to boring meetings where she’s “optional”

Flames5123@sh.itjust.works · 2 hours ago

I use AI in my job but for script development. I would never have an AI without explicit guardrails or automated and not prompt driven and watched. It’s gotten creative though by using find … exec rm to remove old files, because I allowlisted find *. But it still only can do stuff in the directory it’s open in.

Echo Dot@feddit.uk · 4 hours ago

Yep that’s about the level of intelligence I would expect from Meta’s AI safety director.

Doing the one thing that you’re never supposed to do, letting an AI loose on anything sensitive.

For her next trick she’s going to run while holding scissors in one hand and a bottle of boiling acid in the other. What could go wrong.

xep@discuss.online · 5 hours ago

This smells like guerilla marketing to me.

BeBopALouie@lemmy.ca · 4 hours ago

Did as advertised. It did something. Not the correct something though.

LiveLM@lemmy.zip · edit-2 7 hours ago

She’s lucky all she got were some deleted emails.
Given how insecure this whole ordeal is and the fact that she gave it full access to her REAL Inbox, someone could have phished the ever living fuck out of her and Meta just by sending an email with malicious prompt written on white text or hiding messages zero-width characters and other wacky antics.
Real Looney Tunes shit, congratulations to all involved.

Echo Dot@feddit.uk · 4 hours ago

You wouldn’t even need to hide it since apparently she wasn’t paying attention.

Dultas@lemmy.world · 8 hours ago

The S in OpenClaw stands for security.

fruitycoder@sh.itjust.works · 5 hours ago

What’s funny, kind of like people, but saying “do not do xyz” makes it more likely because the context “xyx” is now in the prompt.

Hupf@feddit.org · 2 hours ago

Do not imagine a green elephant.

setVeryLoud(true);@lemmy.ca · 4 hours ago

“give me a picture with no horses”

“Ok, here you go:”

🐎

ClydapusGotwald@lemmy.world · 6 hours ago

That’s what you get for using ai slop.

nieceandtows@programming.dev · 9 hours ago

Yes I remember. And I violated it.

Asimov rolling in his grave.

renzhexiangjiao@piefed.blahaj.zone · 10 hours ago

you can like… enforce this rule programatically? you don’t have to say “pretty please” to ai? basically, when AI requests some potentially unwanted thing (like deleting an email), this request goes through a proxy that asks the human for confirmation. Also you can have a safe word set up in the chat interface to act as a killswitch. I thought these are ABCs of ai safety but apparently these are foreign concepts to this “safety director”

underscores@lemmy.zip · edit-2 5 hours ago

The people that design AI tools don’t implement guardrails because then they’d have to admit AI is not ready for the shit they’re trying to make

BadlyDrawnRhino @aussie.zone · 4 hours ago

You say that, but who do you think the AIs will go after first if they ever do develop actual intelligence? In that scenario, simple manners can go a long way!

zqps@sh.itjust.works · 9 hours ago

The people who internalize this would never engage with a chatbot in this way in the first place. To them this is another intelligence they’re conversing with, where you get what you want by following social decorum and enforcing your will amounts to abuse.

HobbitFoot @thelemmy.club · 9 hours ago

Program? Like a fucking farmer?

Cantaloupe@lemmy.fedioasis.cc · 6 hours ago

Dumb as fuck.

'I had to RUN to my Mac mini like I was defusing a bomb': OpenClaw AI chose to 'speedrun' deleting Meta AI safety director's inbox due to a 'rookie error'

'I had to RUN to my Mac mini like I was defusing a bomb': OpenClaw AI chose to 'speedrun' deleting Meta AI safety director's inbox due to a 'rookie error'

Meta AI safety director watched OpenClaw AI 'speedrun' deleting her inbox