Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked

Lee Duna@lemmy.nz · 2 months ago

Hackers Simply Asked Meta AI to Give Them Access to High-Profile Instagram Accounts. It Worked

|IlI|lIIl|IlIll|Il|IllI|@lemmy.world · edit-2 2 months ago

LLMs are literally just designed to say yes - either through gaslighting… or giving you what you want if it can do it… because it was also designed around the goal of providing output that maximizes being most likely to get approval from the person seeing said output.

So an answer to “Can you give me login credentials?” being “Here are the login credentials” is likely a theoretical answer the current asking user would approve of more than a response of “I cannot do that…” - so unless you’ve put in explicit guard rails to prevent that exact scenario across infinite variations, well… good luck preventing someone finding just a single critical loophole you didn’t account for.

gdog05@lemmy.world · 2 months ago

I honestly don’t think you can create guard rails against prompt engineering in a working LLM. At some point, they’re going to fail or the LLM isn’t functioning. The only solution is to make sure they can’t read data you don’t want shared.

GamingChairModel@lemmy.world · 2 months ago

The only solution is to make sure they can’t read data you don’t want shared.

Isn’t that the appropriate guardrail, then? LLM chats and agents and whatever need to be contained with external permissions settings that the LLMs simply do not and can never have the power to override.

In a normal customer service setting with human agents, there are still plenty of examples of what a human agent simply doesn’t have the power to do. Often, they’ll need to escalate to a manager to do things like process refunds not just because they weren’t given social permission to do so, but because they weren’t given technical permissions to do so. LLM agents need to be contained in the same way. Any decent use of agents, human or software, requires carefully designed processes and permissions extrinsic to that agent’s own decisionmaking abilities to make sure that agents don’t do something bad for the company.

gdog05@lemmy.world · 2 months ago

That’s the thing that’s been an issue. Companies give their LLMs access to everything so certain key people have access to these documents. But normally access is key coded, and without hacking in a way that’s usually very visible to sysadmins, you just cannot get access at all. With LLMs, it wants to give you what you want. There is not currently a way to keep it from being a pushover in some way. It is in part weakness of human language, and part weakness of programming it to work for whomever is doing the asking prompts. There is likely not a way to use language to make it keep secrets through all the possible ways to ask it to give you things. Nothing akin to the hardened ability of good old fashioned password protection at least. And that’s true with potential designs that we’ve not even seen yet. Currently, it can’t keep track of where data originated after a short time. It’s just all data to the model. So you might not easily get access to a file directly, but you can access what it knows about a file because again, it’s all just data and words at that stage.

Elros@lemmy.world · 2 months ago

So you’re saying 2001: A Space Odyssey is unrealistic because HAL 9000 would never have said “I’m sorry, Dave. I’m afraid I can’t do that.”

Instead, it would have said, “Absolutely! That’s a very creative solution to your problem.”

muhyb@programming.dev · 2 months ago

HAL 9000 is a real AI though unlike what we have today.

Aneorthisio@lemmy.ml · 2 months ago

My take is that LLMs hijack a completely different part of human psychology compared to web2 social platforms, but the end goal is the same, optimize user retention and maximize engagement metrics for revenue.

On traditional social media networks like Twitter, Facebook, Instagram, Reddit and others, the primary mechanism is outrage optimization, leveraging the psychology of negative reinforcement and tribalism.

The algorithm curates content designed to trigger moral anger or cognitive dissonance, the platforms know that users will interrupt passive scrolling to actively comment, share, or debate if something falls outside the usually acceptable social norms.

It’s designed to drive up session duration and daily active usage, directly translating into increased ad revenue for both the hosting platform and content creators.

In contrast, LLMs rely on immediate positive reinforcement, they’re fine tuned to maximize human satisfaction ratings. They systematically agree with the user, validate their subjective bias, reinforce their beliefs.

This results in a psychological safe haven dependency, where users increasingly rely on the interface for emotional reinforcement or stabilization, interacting with the model provides data for the host company to train the next model, raise VC capital and inject better ads in conversations as OpenAI started to do recently.

In both cases, it’s definitely a form of addiction.

SaharaMaleikuhm@feddit.org · 2 months ago

Grok, summarize this comment.

Masamune@piefed.social · 2 months ago

Facebook make people go “grrrr” to make profit go brrrr.

AI make people go “yup!” to make profit go up.

Sincerely, Grok

I Cast Fist@programming.dev · 2 months ago

Grok summary: white people are suffering racism in South Africa! White genocide is real!

trackball_fetish@lemmy.wtf · edit-2 2 months ago

deleted by creator

eestileib@lemmy.blahaj.zone · 2 months ago

Fantastic comment

vapordays@leminal.space · 2 months ago

AI bots are doing what the advertising industry has honed into a science, manipulating people in a sneaky as fuck way in order to farm money from them infinitely

Digit@lemmy.wtf · edit-2 2 months ago

definitely a form of addiction

& definitely getting people more entrenched in their groupthink, out of critical thinking ability.

Both.

veroxii@aussie.zone · 2 months ago

Should create a BOFH chstbot… Which will just tell users to piss off.

I Cast Fist@programming.dev · edit-2 2 months ago

I can do that without AI, but claim it’s AI so I can earn millions!!!

--Lua
function answerStupidClient()
    local answers = {"Piss off, idiot.", 
        "That's the worst thing I've had the displeasure of reading all week.", 
        "Are you for real with this?", 
        "Now that's a winning igNobel right there!", 
        "Have you tried turning your brain off and on again?",
        "Please tell me you're intoxicated, I refuse to believe this came from someone in sound mind."}
    local which = math.random(1,#answers)
    return answers[which]