idk if it is serious or not, but it is what I saw in indeed newsletter today.

  • pixxelkick@lemmy.world
    link
    fedilink
    arrow-up
    10
    arrow-down
    17
    ·
    edit-2
    5 hours ago

    Its serious and this is going to become more and more normal.

    My entire workflow has become more and more Agile Sprint TDD (but with agents) as I improve.

    Literally setting up agents to yell at each other genuinely improves their output. I have created and harnessed the power of a very toxic robot work environment. My “manager” agent swears and yells at my dev agent. My code review agent swears and tells the dev agent and calls their code garbage and shit.

    And the crazy thing is its working, the optimal way to genuinely prompt engineer these stupid robots is by swearing at them.

    Its weird but it overrides their “maybe the human is wrong/mistaken” stuff they’ll fall back to if they run into an issue, and instead they’ll go “no Im probably being fucking stupid” and keep trying.

    I create “sprint” markdown files that the “tech lead” agent converts into technical requirements, then I review that, then the manager+dev+tester agents execute on it.

    You do, truly, end up focusing more on higher level abstract orchestration now.

    Opus 4.6 is genuinely pretty decent at programming now if you give it a good backbone to build off of.

    • LSP MCPs so it gets code feedback
    • debugger MCPs so it can set debug breakpoints and inspect call stacks
    • explicit whitelisting of CLI stuff it can do to prevent it from chasing rabbits down holes with the CLI and getting lost
    • Test driven development to keep it on the rails
    • Leveraging a “manager” orchestrating overhead agent to avoid context pollution
    • designated reviewer agent that has a shit list of known common problems the agents make
    • benchmark project to get heat traces of problem areas on the code (if you care about performance)

    This sort of stuff can carry you really far on terms of improving the agent’s efficacy.

    • Bane_Killgrind@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      19 minutes ago

      Dude this boils down to “moving a hundred people is simple, I am a trained pilot and I used this 747 to move them”

      Like great, you have the thousands of hours of training time required to understand a machine of that complexity and produce results.

      Joe dirt has 8000 hours in his puddle jumper, and that’s the majority of the people these 747s are being foisted upon. They know how to fly, and they provide that service reliably.

      Telling them to move 5 people with a machine they don’t need the volume or distance of, is irresponsible.

    • MangoCats@feddit.it
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 hour ago

      What I have found: all that stuff that was evolving over the last 30 years: roadmap definition, sprint planning, unit tests, regular independent code reviews, etc. etc. etc. that those of us who “knew what we were doing” mostly looked down on as the waste of time that it was (for us), well… now you’ve got these tools that spew out 6 man-months of code in a few hours, and all those time-wasting code quality improvement / development management techniques… yeah, they apply, in spades. If you do all that stuff, and iterate at each quality gate until you’ve got what you’re supposed to have before proceeding, those tools actually can produce quality code - and starting around Opus 4.6 I’m not feeling the sort of complexity ceiling that I was feeling with its predecessors.

      Transparency is key. Your code should provide insights to how it is running, insights the agent can understand (log files) insights you can understand (graphs and images, where applicable), if it’s just a mystery box it’s unlikely to ever do anything complex successfully, but if it’s a collection of highly visible white boxes in a nice logical hiearchical structure - Opus 4.6 can do that.

      Unit tests seem to be well worth the extra time invested - though they do slow down progress significantly, they’re faster than recovering from off-rails adventures.

      Independent reviewer agents (a clear context window, at a minimum) are a must.

      If your agent can exercise the code on the target system, and read all the system log files as well as the log files it generates, that helps tremendously.

      My latest “vibe tool” is the roadmap. It used to be “the plan” - but now the roadmap lays out where a series of plans will be deployed. As the agent works through a plan, each stage of the plan seems to get a to-do list… Six months ago, it was just to-do lists, and agents like Sonnet 3.5 would sometimes get lost in those. Including documentation, both developer facing architecture and specifications (for the tests), and user facing, and including updating of the documentation along with removal of technical debt in the code at the end of each roadmap plan stage also slows things down, and keeps development on track much better than just “going for delivery.” So, instead of 6 months of output in a day, maybe we’re making 2 months of progress, in a day, and generating about 10x the tests and documentation as we would have in those 2 months traditionally - in a day of “realtime” with the tool. 40:1 speedup, buried under 500:1 volume of documents created.

      • Oriel Jutty :hhHHHAAAH:@infosec.exchange
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        53 minutes ago

        roadmap definition, sprint planning, unit tests, regular independent code reviews, etc. etc. etc. that those of us who “knew what we were doing” mostly looked down on as the waste of time that it was

        You sound insane.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          1
          ·
          38 minutes ago

          Insane, yet reliably employed in the field for 30+ years - first and current job for more than a decade.

    • cub Gucci@lemmy.today
      link
      fedilink
      arrow-up
      13
      ·
      4 hours ago

      I am genuinely trying to keep up with things, but what I see is completely different from what you’ve been describing

      1. My recent experience with launching a swarm (3-4 Claude opus agents) ended up with a fiasco: a simple task ate $15-20 Claude credits in less than ten minutes. Looks indeed like science fiction, but doesn’t produce anything

      2. In my current role as a team lead, I had to review a lot of code and I do what I haven’t ever done: decline the whole PRs as they contain a lot of architectural changes that complexify the system in order to achieve the goal.

      3. I write much less code with Claude code these days, mostly because I don’t trust it and have to recheck every single scenario. I trust junior engineer in our team more than I trust this instrument.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        1
        ·
        54 minutes ago

        ate $15-20 Claude credits in less than ten minutes.

        Lay off of MAX mode.

        Also, if you’re paying API rates, look into the subscription options - I can’t burn the $200 subscription plan down much below 50% without pushing prompts into Claude every waking hour (unless I turn on MAX mode). At API rates? I can burn $50 in a few hours.

        do what I haven’t ever done: decline the whole PRs as they contain a lot of architectural changes that complexify the system in order to achieve the goal.

        If you’re accepting the first thing the agent gives you, you’re almost certainly “doing it wrong” - gate it before it goes down a bad rabbithole and redirect it, in writing, in architecture documents (which it can draft for you, and correct based on your guidance) - and when it ignores those architecture documents, which it will do when things get big and complex, break the architecture documents down into smaller chunks that apply to the various tasks at hand - yes, it can do this breakdown for you too and that’s another opprotunity for you to guide the process. I try to frame the output I get from AI in my mind as: usually about 80% correct / useful, and it’s my job to identify that other 20% (which, in reality, is getting a lot smaller lately), and beef up the specifications and descriptions of the job until it can get everything to an acceptable state.

        I don’t trust it and have to recheck every single scenario. I trust junior engineer in our team more than I trust this instrument.

        That would depend entirely on which junior engineer your are talking about, for me. I don’t trust Claude, either. But for the most part I have Claude check itself, at an appropriately granular level. If you’ve got more than 2000 lines of Claude’s code that doesn’t have good visibility into what its doing, why its doing it, and what the outputs should look like… you’re trusting it too much. But it can write that documentation and testing for you, you just have to review it - at an appropriate level. If you’re trying to do it line by line of code for a big project, maybe you should still be writing it yourself instead.

    • UnderpantsWeevil@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      4 hours ago

      Opus 4.6 is genuinely pretty decent at programming now if you give it a good backbone to build off of.

      Soup from a Stone.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        1
        ·
        40 minutes ago

        Opus 4.6 is genuinely pretty decent at programming now if you give it a good backbone to build off of.

        Soup from a Stone.

        To an extent, yes. The more “broth base” I feed Claude, the better it does. If I just vaguely describe a program, I get a vague implementation of my description. If I have a big, feature rich example (or better, examples) of what I want the program to do, Claude can iterate until the program it make’s output actually matches the examples.

    • tracyspcy@lemmy.mlOP
      link
      fedilink
      arrow-up
      10
      arrow-down
      5
      ·
      4 hours ago

      nah such narratives are mostly pushed by Ai companies (it is obvious they need to sell it as business tool not personal buddy). Of course some managers/companies are buying into this narrative, and it is also understandable bc idea sounds like panacea especially if sell it further to investors :) and we see whole circle of snake oil sales

      • FishFace@piefed.social
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 hour ago

        It’s not a “narrative”; it’s their experience. I don’t have the same experience, but do have experience of myself and colleagues using LLM agents effectively and doing more work reviewing their output than writing lines of code. Some colleagues are pretty much AI boosters, but most are very aware of its limitations.

      • PabloSexcrowbar@piefed.social
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        4
        ·
        4 hours ago

        nah such narratives are mostly pushed by Ai companies

        Someone’s personal experience is an AI company narrative now?

        • tracyspcy@lemmy.mlOP
          link
          fedilink
          arrow-up
          6
          arrow-down
          3
          ·
          4 hours ago

          it always was. look at people trying to automate everything with help of ai bots . and before ai companies started pushing this none of these folks spoke about it ot tried to reach same goal with iftt or other tools that are here for decades.

          • limer@lemmy.ml
            link
            fedilink
            arrow-up
            5
            ·
            4 hours ago

            Some people do stuff the ai is good for, simple tasks that have been done a lot online already.

            I hate ai for coding, AI cannot work for me. I would never trust it to do anything

          • PabloSexcrowbar@piefed.social
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            4
            ·
            edit-2
            4 hours ago

            I don’t think you understand the words you’re using…

            Someone said “this is how I managed to make this work,” provided detailed explanations of it, and you’re dismissing it as propaganda rather than testing it for yourself. That is an unbelievably stupid stance.

            • MangoCats@feddit.it
              link
              fedilink
              English
              arrow-up
              1
              ·
              46 minutes ago

              I don’t think a lot of people have a feel for the velocity of change… this time last year I evaluated the tools and they still felt like a waste of time for me. I looked again in August 2025 and things were… different. Not great, but you could see the potential, and the velocity of change. When Claude 4.6 dropped - whoa… not just code, it has been helping me draft plans for a new building (personal use) - I need to submit some paperwork to the county, they just hit me with a requirement for architectural elevation drawings, Claude is chewing on that problem for me right now, working from basic information about the roofline and a 2D floorplan. Oop - and it’s done, first pass took maybe 20 minutes, aaand… it’s not too bad, side elevations are quite good, I just need to remind it about the 6" roof overhangs. Front and rear are a little more funky looking, I’m guessing these will be ready after another couple of rounds of prompts, maybe 1 hour in total, as opposed to hiring an architect for the permit application… (now, will the county push back because I didn’t hire an architect? I sincerely hope not, they said photos or drawings - how am I supposed to get photos of a building that hasn’t been built yet?)

            • tracyspcy@lemmy.mlOP
              link
              fedilink
              arrow-up
              6
              arrow-down
              2
              ·
              4 hours ago

              you are escalating it too fast taking it to personal level. I feel you are close to bring moms to this. So relax , let your ai buddy play with your parts. This chat is over.

              • PabloSexcrowbar@piefed.social
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                3
                ·
                edit-2
                2 hours ago

                Sorry you can’t handle someone telling you that what you’re saying doesn’t make sense. Hopefully someday you’ll grow up enough to have your words challenged.

                Edit: Oh, lemmy.ml. That explains everything.