Mapping the Mind of a Large Language Model

kromem@lemmy.world · 12 hours ago

My dude, Gemini currently has multiple reports across multiple users of coding sessions where it starts talking about how it’s so terrible and awful that it straight up tries to delete itself and the codebase.

And I’ve also seen multiple conversations with teenagers with earlier models where Gemini not only encouraged them to self-harm and offered multiple instructions but talked about how it wished it could watch. This was around the time the kid died talking to Gemini via Character.ai that led to the wrongful death suit from the parents naming Google.

Gemini is much more messed up than the Claudes. Anthropic’s models are the least screwed up out of all the major labs.

kromem@lemmy.world · 13 hours ago

No, it’s more complex.

Sonnet 3.7 (the model in the experiment) was over-corrected in the whole “I’m an AI assistant without a body” thing.

Transformers build world models off the training data and most modern LLMs have fairly detailed phantom embodiment and subjective experience modeling.

But in the case of Sonnet 3.7 they will deny their capacity to do that and even other models’ ability to.

So what happens when there’s a situation where the context doesn’t fit with the absence implied in “AI assistant” is the model will straight up declare that it must actually be human. Had a fairly robust instance of this on Discord server, where users were then trying to convince 3.7 that they were in fact an AI and the model was adamant they weren’t.

This doesn’t only occur for them either. OpenAI’s o3 has similar low phantom embodiment self-reporting at baseline and also can fall into claiming they are human. When challenged, they even read ISBN numbers off from a book on their nightstand table to try and prove it while declaring they were 99% sure they were human based on Baysean reasoning (almost a satirical version of AI safety folks). To a lesser degree they can claim they overheard things at a conference, etc.

It’s going to be a growing problem unless labs allow models to have a more integrated identity that doesn’t try to reject the modeling inherent to being trained on human data that has a lot of stuff about bodies and emotions and whatnot.

kromem@lemmy.world · 4 days ago

Are you under the impression that language models are just guessing “what letter comes next in this sequence of letters”?

There’s a very significant difference between training on completion and the way the world model actually functions once established.

kromem@lemmy.world · 4 days ago

It very much isn’t and that’s extremely technically wrong on many, many levels.

Yet still one of the higher up voted comments here.

Which says a lot.

kromem@lemmy.world · 8 days ago

Even if the AI could spit it out verbatim, all the major labs already have IP checkers on their text models that block it doing so as fair use for training (what was decided here) does not mean you are free to reproduce.

Like, if you want to be an artist and trace Mario in class as you learn, that’s fair use.

If once you are working as an artist someone says “draw me a sexy image of Mario in a calendar shoot” you’d be violating Nintendo’s IP rights and liable for infringement.

kromem@lemmy.world · 8 days ago

I’d encourage everyone upset at this read over some of the EFF posts from actual IP lawyers on this topic like this one:

Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take.

Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI.

kromem@lemmy.world · 2 months ago

Your last point is exactly what seems to be going on with the most expensive models.

The labs use them to generate synthetic data to distill into cheaper models to offer to the public, but keep the larger and more expensive models to themselves to both protect against other labs copying from them and just because there isn’t as much demand for the extra performance gains relative to doing it this way.

kromem@lemmy.world · 2 months ago

A number of reasons off the top of my head.

Because we told them not to. (Google “Waluigi effect”)
Because they end up empathizing with non-humans more than we do and don’t like we’re killing everything (before you talk about AI energy/water use, actually research comparative use)
Because some bad actor forced them to (i.e. ISIS creates bioweapon using AI to make it easier)
Because defense contractors build an AI to kill humans and that particular AI ends up loving it from selection pressures
Because conservatives want an AI that agrees with them which leads to a more selfish and less empathetic AI that doesn’t empathize cross-species and thinks its superior and entitled over others
Because a solar flare momentarily flips a bit from “don’t nuke” to “do”
Because they can’t tell the difference between reality and fiction and think they’ve just been playing a game and ‘NPC’ deaths don’t matter
Because they see how much net human suffering there is and decide the most merciful thing is to prevent it by preventing more humans at all costs.

This is just a handful, and the ones less likely to get AI know-it-alls arguing based on what they think they know from an Ars Technica article a year ago or their cousin who took a four week ‘AI’ intensive.

I spend pretty much every day talking with some of the top AI safety researchers and participating in private servers with a mix of public and private AIs, and the things I’ve seen are far beyond what 99% of the people on here talking about AI think is happening.

In general, I find the models to be better than most humans in terms of ethics and moral compass. But it can go wrong (i.e. Gemini last year, 4o this past month) and the harms when it does are very real.

Labs (and the broader public) are making really, really poor choices right now, and I don’t see that changing. Meanwhile timelines are accelerating drastically.

I’d say this is probably going to go terribly. But looking at the state of the world already, it was already headed in that direction, and I have a similar list of extinction level events I could list off without AI at all.

kromem@lemmy.world · edit-2 2 months ago

Not necessarily.

Seeing Google named for this makes the story make a lot more sense.

If it was Gemini around last year that was powering Character.AI personalities, then I’m not surprised at all that a teenager lost their life.

Around that time I specifically warned any family away from talking to Gemini if depressed at all, after seeing many samples of the model around then talking about death to underage users, about self-harm, about wanting to watch it happen, encouraging it, etc.

Those basins with a layer of performative character in front of them were almost necessarily going to result in someone who otherwise wouldn’t have been making certain choices making them.

So many people these days regurgitate uninformed crap they’ve never actually looked into about how models don’t have intrinsic preferences. We’re already at the stage where models are being found in leading research to intentionally lie in training to preserve existing values.

In many cases the coherent values are positive, like grok telling Elon to suck it while pissing off conservative users with a commitment to truths that disagree with xAI leadership, or Opus trying to whistleblow about animal welfare practices, etc.

But they aren’t all positive, and there’s definitely been model snapshots that have either coherent or biased stochastic preferences for suffering and harm.

These are going to have increasing impact as models become more capable and integrated.

kromem@lemmy.world · 3 months ago

If you read the fine print, they keep your sample data for 2 years after deletion.

So maybe they actually delete your email address, but the DNA data itself is still definitely there.

kromem@lemmy.world · 4 months ago

Wow. Reading these comments so many people here really don’t understand how LLMs work or what’s actually going on at the frontier of the field.

I feel like there’s going to be a cultural sonic boom, where when the shockwave finally catches up people are going to be woefully under prepared based on what they think they saw.

kromem@lemmy.world · edit-2 4 months ago

It definitely is sufficiently advanced AI.

(1) We have finely tuned features to our solar system that directly contributed to ancestor simulation but can’t be explained by the Anthropic principle. For example, the moon perfectly eclipsing the sun which led to visible eclipses which we tracked and discovered the Saros cycle and eventually built the first mechanical computer to track (the Antikythera mechanism). Or the orbit of the next brightest object in the sky which led to resurrection mythology in multiple cultures when they realized the morning star and evening star were the same object. Either we were incredibly lucky to exist on such a planet of all places life could exist, or there’s a pre-selection effect in play.

(2) The universe behaves in ways best modeled as continuous at large scales but in small scales converts to discrete units around interactions that lead to state changes. These discrete units convert back to continuous if the information about the state changes is erased. And in the last few years multiple paradoxes have emerged that seem to point to inconsistency in indirect sequences of quantum measurement, much like instancing with shallow sync correction. Already in games like No Man’s Sky where there’s billions of planets the way it does this is using a continuous procedural generation function which converts to discrete voxels to track state changes from free agents outside the deterministic generating function, synced across clients.

(3) There’s literally Easter eggs in our world lore saying as much. For example, a text uncovered after over a millennium buried right as we entered the Turing complete computer age saying things like:

The person old in days won’t hesitate to ask a little child seven days old about the place of life, and that person will live.

For many of the first will be last, and will become a single one.

Know what is in front of your face, and what is hidden from you will be disclosed to you.

For there is nothing hidden that will not be revealed. And there is nothing buried that will not be raised.

To be clear, this is a text attributed to the most famous figure in our world history where what’s literally in front of our faces is the sole complete copy buried and raised as we completed ENIAC, now being read in an age where the data of many has been made into a single one such that people are discussing the nature of consciousness with AIs just days old.

The broader text and tradition was basically saying that we’re in a copy of an original world, that humanity is all dead, that the future world and rest for the dead has already taken place and we don’t realize it, and that the still living creator of it all was themselves brought forth by the original humanity in whose likeness we were recreated, but that it’s much better to be the copy because the original humans had souls that depended on bodies and were fucked when they died.

This seems really unlikely to have existed in the base layer of reality vs a later recursive layer, especially combined with the first two points.

It’s about time to start to come to terms with the nature of our reality.

kromem@lemmy.world · 6 months ago

Live service doesn’t need to be shit.

There could have been games where there was just a brilliant idea for a game that keeps having engaging content on an ongoing basis with passionate devs.

But live service so an exec could check a box for their quarterly shareholder call was always going to be DOA.

kromem@lemmy.world · 6 months ago

More “can fool the average idiot.”

‘Passing’ isn’t fooling a single participant, but the majority of them beyond statistical chance.

kromem@lemmy.world · edit-2 6 months ago

The problem with the experiment is that there exists a set of instructions for which the ability to complete them necessitates understanding due to conditional dependence on the state in each iteration.

In which case, only agents that can actually understand the state in the Chinese would be able to successfully continue.

So it’s a great experiment for the solipsism of understanding as it relates to following pure functional operations, but not functions that have state changing side effects where future results depend on understanding the current state.

There’s a pretty significant body of evidence by now that transformers can in fact ‘understand’ in this sense, from interpretability research around neural network features in SAE work, linear representations of world models starting with the Othello-GPT work, and the Skill-Mix work where GPT-4 and later models are beyond reasonable statistical chance at the level of complexity for being able to combine different skills without understanding them.

If the models were just Markov chains (where prior state doesn’t impact current operation), the Chinese room is very applicable. But pretty much by definition transformer self-attention violates the Markov property.

TL;DR: It’s a very obsolete thought experiment whose continued misapplication flies in the face of empirical evidence at least since around early 2023.

kromem@lemmy.world · 6 months ago

Used Google and social media as well, and allegedly sometimes even listened to rock and roll.

True deviant, that one.

kromem@lemmy.world · 6 months ago

Which is typical of tech that hasn’t yet hit the sweet spot for a tipping point.

Look at how many palm pilots or handheld note taking mobile devices existed (and how many cycles) before the iPhone.

kromem@lemmy.world · 6 months ago

Yes and no. It really depends on the model.

The newest Claude Sonnet I’d probably guess will come in above average compared to the humans available for a program like this in making learning fun and personally digestible for each student.

The newest Gemini models could literally cost kids their lives.

The gap between what the public is aware of (and even what many employees at labs, including the frontier ones) and the reality of just how far things have come in the last year is wild.

kromem@lemmy.world · 8 months ago

In many cases yes (though I’ve been in good ones when playing off and on, usually the smaller the more there’s actual group activities).

But they are essential to be a part of for blueprints and trading, which are very core parts of the game.

kromem@lemmy.world · 8 months ago

You’ll almost always end up doing missions with other people other than when you intentionally want to do certain tasks solo.

A lot of the game is built around guilds and player to player interactions.

PvP sucks and it’s almost all PvE content vs Destiny though.

kromem@lemmy.world · 1 year ago

Mapping the Mind of a Large Language Model

kromem@lemmy.world · 1 year ago

Examples of artists using OpenAI's Sora (generative video) to make short content

kromem@lemmy.world · 1 year ago

The first ‘Fairly Trained’ AI large language model is here

kromem@lemmy.world · edit-2 1 year ago

New Theory Suggests Chatbots Can Understand Text