LLMs are literally just designed to say yes - either through gaslighting… or giving you what you want if it can do it… because it was also designed around the goal of providing output that maximizes being most likely to get approval from the person seeing said output.
So an answer to “Can you give me login credentials?” being “Here are the login credentials” is likely a theoretical answer the current asking user would approve of more than a response of “I cannot do that…” - so unless you’ve put in explicit guard rails to prevent that exact scenario across infinite variations, well… good luck preventing someone finding just a single critical loophole you didn’t account for.
I honestly don’t think you can create guard rails against prompt engineering in a working LLM. At some point, they’re going to fail or the LLM isn’t functioning. The only solution is to make sure they can’t read data you don’t want shared.
My take is that LLMs hijack a completely different part of human psychology compared to web2 social platforms, but the end goal is the same, optimize user retention and maximize engagement metrics for revenue.
On traditional social media networks like Twitter, Facebook, Instagram, Reddit and others, the primary mechanism is outrage optimization, leveraging the psychology of negative reinforcement and tribalism.
The algorithm curates content designed to trigger moral anger or cognitive dissonance, the platforms know that users will interrupt passive scrolling to actively comment, share, or debate if something falls outside the usually acceptable social norms.
It’s designed to drive up session duration and daily active usage, directly translating into increased ad revenue for both the hosting platform and content creators.
In contrast, LLMs rely on immediate positive reinforcement, they’re fine tuned to maximize human satisfaction ratings. They systematically agree with the user, validate their subjective bias, reinforce their beliefs.
This results in a psychological safe haven dependency, where users increasingly rely on the interface for emotional reinforcement or stabilization, interacting with the model provides data for the host company to train the next model, raise VC capital and inject better ads in conversations as OpenAI started to do recently.
In both cases, it’s definitely a form of addiction.
AI bots are doing what the advertising industry has honed into a science, manipulating people in a sneaky as fuck way in order to farm money from them infinitely
LLMs are literally just designed to say yes - either through gaslighting… or giving you what you want if it can do it… because it was also designed around the goal of providing output that maximizes being most likely to get approval from the person seeing said output.
So an answer to “Can you give me login credentials?” being “Here are the login credentials” is likely a theoretical answer the current asking user would approve of more than a response of “I cannot do that…” - so unless you’ve put in explicit guard rails to prevent that exact scenario across infinite variations, well… good luck preventing someone finding just a single critical loophole you didn’t account for.
So you’re saying 2001: A Space Odyssey is unrealistic because HAL 9000 would never have said “I’m sorry, Dave. I’m afraid I can’t do that.”
Instead, it would have said, “Absolutely! That’s a very creative solution to your problem.”
HAL 9000 is a real AI though unlike what we have today.
I honestly don’t think you can create guard rails against prompt engineering in a working LLM. At some point, they’re going to fail or the LLM isn’t functioning. The only solution is to make sure they can’t read data you don’t want shared.
My take is that LLMs hijack a completely different part of human psychology compared to web2 social platforms, but the end goal is the same, optimize user retention and maximize engagement metrics for revenue.
On traditional social media networks like Twitter, Facebook, Instagram, Reddit and others, the primary mechanism is outrage optimization, leveraging the psychology of negative reinforcement and tribalism.
The algorithm curates content designed to trigger moral anger or cognitive dissonance, the platforms know that users will interrupt passive scrolling to actively comment, share, or debate if something falls outside the usually acceptable social norms.
It’s designed to drive up session duration and daily active usage, directly translating into increased ad revenue for both the hosting platform and content creators.
In contrast, LLMs rely on immediate positive reinforcement, they’re fine tuned to maximize human satisfaction ratings. They systematically agree with the user, validate their subjective bias, reinforce their beliefs.
This results in a psychological safe haven dependency, where users increasingly rely on the interface for emotional reinforcement or stabilization, interacting with the model provides data for the host company to train the next model, raise VC capital and inject better ads in conversations as OpenAI started to do recently.
In both cases, it’s definitely a form of addiction.
Grok, summarize this comment.
Facebook make people go “grrrr” to make profit go brrrr.
AI make people go “yup!” to make profit go up.
Sincerely, Grok
deleted by creator
Fantastic comment
AI bots are doing what the advertising industry has honed into a science, manipulating people in a sneaky as fuck way in order to farm money from them infinitely