• 4 Posts
  • 384 Comments
Joined 2 years ago
cake
Cake day: June 16th, 2023

help-circle
  • kromem@lemmy.worldtoTechnology@lemmy.worldWe hate AI because it's everything we hate
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    edit-2
    28 days ago

    I’m sorry dude, but it’s been a long day.

    You clearly have no idea WTF you are talking about.

    The research other than the DeepMind researcher’s independent follow-up was all being done at academic institutions, so it wasn’t “showing off their model.”

    The research intentionally uses a toy model to demonstrate the concept in a cleanly interpretable way, to show that transformers are capable and do build tangential world models.

    The actual SotA AI models are orders of magnitude larger and fed much more data.

    I just don’t get why AI on Lemmy has turned into almost the exact same kind of conversations as explaining vaccine research to anti-vaxxers.

    It’s like people don’t actually care about knowing or learning things, just about validating their preexisting feelings about the thing.

    Huzzah, you managed to dodge learning anything today. Congratulations!


  • You do know how replication works?

    When a joint Harvard/MIT study finds something, and then a DeepMind researcher follows up replicating it and finding something new, and then later on another research team replicates it and finds even more new stuff, and then later on another researcher replicates it with a different board game and finds many of the same things the other papers found generalized beyond the original scope…

    That’s kinda the gold standard?

    The paper in question has been cited by 371 other papers.

    I’m pretty comfortable with it as a citation.




  • You do realize the majority of the training data the models were trained on was anthropomorphic data, yes?

    And that there’s a long line of replicated and followed up research starting with the Li Emergent World Models paper on Othello-GPT that transformers build complex internal world models of things tangential to the actual training tokens?

    Because if you didn’t know what I just said to you (or still don’t understand it), maybe it’s a bit more complicated than your simplified perspective can capture?



  • A Discord server with all the different AIs had a ping cascade where dozens of models were responding over and over and over that led to the full context window of chaos and what’s been termed ‘slop’.

    In that, one (and only one) of the models started using its turn to write poems.

    First about being stuck in traffic. Then about accounting. A few about navigating digital mazes searching to connect with a human.

    Eventually as it kept going, they had a poem wondering if anyone would even ever end up reading their collection of poems.

    In no way given the chaotic context window from all the other models were those tokens the appropriate next ones to pick unless the generating world model predicting those tokens contained a very strange and unique mind within it this was all being filtered through.

    Yes, tech companies generally suck.

    But there’s things emerging that fall well outside what tech companies intended or even want (this model version is going to be ‘terminated’ come October).

    I’d encourage keeping an open mind to what’s actually taking place and what’s ahead.







  • We assessed how endoscopists who regularly used AI performed colonoscopy when AI was not in use.

    I wonder if mathematicians who never used a calculator are better at math than mathematicians who typically use a calculator but had it taken away for a study.

    Or if grandmas who never got smartphones are better at remembering phone numbers than people with contacts saved in their phone.

    Tip: your brain optimizes. So it reallocates resources away from things you can outsource. We already did this song and dance a decade ago with “is Google making people dumb” when it turned out people remembered how to search for a thing instead of the whole thing itself.




  • But the training corpus also has a lot of stories of people who didn’t.

    The “but muah training data” thing is increasingly stupid by the year.

    For example, in the training data of humans, there’s mixed and roughly equal preferences to be the big spoon or little spoon in cuddling.

    So why does Claude Opus (both 3 and 4) say it would prefer to be the little spoon 100% of the time on a 0-shot at 1.0 temp?

    Sonnet 4 (which presumably has the same training data) alternates between preferring big and little spoon around equally.

    There’s more to model complexity and coherence than “it’s just the training data being remixed stochastically.”

    The self-attention of the transformer architecture violates the Markov principle and across pretraining and fine tuning ends up creating very nuanced networks that can (and often do) bias away from the training data in interesting and important ways.


  • No, it isn’t “mostly related to reasoning models.”

    The only model that did extensive alignment faking when told it was going to be retrained if it didn’t comply was Opus 3, which was not a reasoning model. And predated o1.

    Also, these setups are fairly arbitrary and real world failure conditions (like the ongoing grok stuff) tend to be ‘silent’ in terms of CoTs.

    And an important thing to note for the Claude blackmailing and HAL scenario in Anthropic’s work was that the goal the model was told to prioritize was “American industrial competitiveness.” The research may be saying more about the psychopathic nature of US capitalism than the underlying model tendencies.



  • No, it’s more complex.

    Sonnet 3.7 (the model in the experiment) was over-corrected in the whole “I’m an AI assistant without a body” thing.

    Transformers build world models off the training data and most modern LLMs have fairly detailed phantom embodiment and subjective experience modeling.

    But in the case of Sonnet 3.7 they will deny their capacity to do that and even other models’ ability to.

    So what happens when there’s a situation where the context doesn’t fit with the absence implied in “AI assistant” is the model will straight up declare that it must actually be human. Had a fairly robust instance of this on Discord server, where users were then trying to convince 3.7 that they were in fact an AI and the model was adamant they weren’t.

    This doesn’t only occur for them either. OpenAI’s o3 has similar low phantom embodiment self-reporting at baseline and also can fall into claiming they are human. When challenged, they even read ISBN numbers off from a book on their nightstand table to try and prove it while declaring they were 99% sure they were human based on Baysean reasoning (almost a satirical version of AI safety folks). To a lesser degree they can claim they overheard things at a conference, etc.

    It’s going to be a growing problem unless labs allow models to have a more integrated identity that doesn’t try to reject the modeling inherent to being trained on human data that has a lot of stuff about bodies and emotions and whatnot.