• Hackworth@piefed.ca
      link
      fedilink
      English
      arrow-up
      13
      arrow-down
      10
      ·
      5 days ago

      Did you try it? In the few coding tasks I threw at it, it performed much better than Opus.

      • ID10T@programming.dev
        link
        fedilink
        English
        arrow-up
        15
        arrow-down
        1
        ·
        5 days ago

        I played with it at work for the afternoon when I noticed I had access. It was fine. Sure, it was an improvement, but it wasn’t so good that it could end the world. It was basically just more of the same for anyone familiar with coding agents.

      • rozodru@piefed.world
        link
        fedilink
        English
        arrow-up
        14
        arrow-down
        1
        ·
        5 days ago

        I tried it, had to VPN in to do so but I tried it. I gave it 5 tasks, it succeeded in 2 of them, rest were hallucinations. so…yeah…guess it’s much better than Opus.

        • Hackworth@piefed.ca
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          3
          ·
          5 days ago

          rest were hallucinations

          I’m having trouble parsing whatcha mean here if they were coding tasks. The code didn’t run? Ran but had 0 functionality? If they were non-coding tasks, then agreed, I didn’t notice it being significantly more accurate. Though I did appreciate the larger vocab. I wasn’t gonna be able to afford to keep using it once it went to API pricing anyway.

          • rozodru@piefed.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            5 days ago

            sorry should have been more specific. it was a mix of coding and non-coding. 1 coding task ran fine, another one just didn’t work at all. one was a basic walk through tutorial type task that was accurate, the others were hallucinations.

      • abbadon420@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        4
        ·
        5 days ago

        Of course not. It’s offline and my boss only pays for ChatGpt. I have used ChatGpt 5.4 and it’s performance is fine. I have not used it for coding, but I did notice it being a bit more coherent. I am am not a poweruser though. I don’t work with agents. I’m sure that makes it better, but I’m not willing ti pay for the tokens.

        • Hackworth@piefed.ca
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          1
          ·
          5 days ago

          I just use the web app, mostly to make self-contained html toys. During the brief period it was up and part of the general subscription, I asked it to make Terrace (the old board game from TNG) with very little help in the initial prompt. I kinda know how involved that task is, cause I manually wrote a Godot version back in '20. It nailed it with only minor fixes - 3D, reactive sound and visuals, a music score that is pretty chill, with Easy, Medium, and Hard levels of AI to compete against. I have yet to beat it on Hard. Opus couldn’t touch that. I’m pretty sure the fed’s response is simple retaliation against Anthropic for not playing ball with the DoD/W, but the capability jump was definitely notable. I saw someone liken it to the jump from gpt 3.5 to 4, and I agree, if not a bit more.

          • abbadon420@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            3
            ·
            5 days ago

            Yeah, that sounds like just another LLM iteration. It’s nice and all. Great technological innovationn. Very impressive. But is it wort investing billion upon billions of dollars? Is it worth breaking the chip market? Is it worth breaking the job market? Is it worth (possibly) causing a complete marhet crash when the bubble bursts?

  • mumblerfish@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    ·
    5 days ago

    I feel like making ’90s-style t-shirts with ‘fix this code’ on the front and ‘this shirt is a munition’ on the back.”

    This must be a reference to something specific… Anyone knows?

  • Tony Bark@pawb.social
    link
    fedilink
    English
    arrow-up
    12
    ·
    edit-2
    5 days ago

    Personally, I think this is just another chapter in the conflict the DOJ is having against Anthropic because they refused to let their tools be used for war.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      5 days ago

      They did allow them to be used for war. Anthropic’s only red lines were autonomous weapons (technically still a ways off) and domestic surveillance (it was this one where a ‘No’ would have been relevant right now).

      It should really alarm everyone that the US gov is using things like the first ever declaration of an American company as a supply chain risk or calling “fix this insecure code” something requiring export control and IDs to verify citizenship of usage as a way to warn other companies to comply with their illegal usage requests.

      • XLE@piefed.social
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        5 days ago

        You’re mostly right but I have a small correction (after getting the “red lines” burned into my retinas):

        • Anthropic only drew the line at fully autonomous weapons - aka the ones where the launch order would be their responsibility. Semi-autonomous ones (e.g. with a soldier hitting the “go” button at the end) were still a-ok
        • And they only refused mass domestic surveillance, i.e. targeted surveillance of locals (and of course mass surveillance of people abroad) was still on the table