Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit before joining the Threadiverse as well.

  • 0 Posts
  • 1.19K Comments
Joined 2 years ago
cake
Cake day: March 3rd, 2024

help-circle
  • Yeah, I wouldn’t use a framework that didn’t let you select the basic model. I’m just thinking about having it automatically switch to a different one during the review “phase”. It’s not as popular a coding agent these days but I like using Google’s Antigravity and it’s capable of being told to go through the sequence of steps “plan - > write documentation -> implement the plan -> run unit tests -> do a code review” automatically without needing to be prompted at each step. That’s where it would be nice to have it automatically switch for the review.

    “Wear the reviewer hat now” does seem to work quite well with the same model, but if more models from different lineages are available it just seems like the right thing to do to switch to another one.


  • I’ve become rather disillusioned with Gemini’s use of search tools lately. It’s odd given that it’s a Google model, you’d think Google would be at the top of the search engine game. But honestly, Deepseek’s been my go-to lately when I want an answer that’s likely to be synthesized from a lot of web searches. I’ve had it search over a hundred different pages for a generic “how does this work?” Sort of query. It didn’t read them all, but it’s casting a wide net and it’s letting me actually see the details. Gemini seems more willing to just tell me what it “thinks” the answer to a question is based off of its training data, which is not a particularly reliable thing for an LLM to do.


  • A thing I found quite amusing about the AI agents I’ve toyed with is that they have a step where they do a code review of their changelist, usually switching to a different “persona” when they write it so that they’re not seeing it as “their own” code. It’s funny reading at the critiques and compliments it gives the “other agent” it’s checking the changes for.

    I haven’t seen this feature yet, but it might be a good future enhancement to ensure that the harness literally uses a different model for the code review from the one that wrote the code in the first place. If Claude wrote the code have GPT do the review, and vice versa, for example. Wouldn’t be surprised if the feature exists and I just haven’t spotted it yet though, things change fast.


  • Indeed. My usual analogy is “it’s a team of junior devs at your beck and call, who will do a huge amount of work quickly when you tell them to so make sure you told them to do the right work.”

    As for the documentation, commenting, unit tests, and so forth - AI is very handy for getting that stuff written too, just make sure to check up on it. For large pieces of work I will often tell the AI to write up an architecture document first, as a separate step, both to make sure it understands what I asked for and to store as future reference for the AI to make sure it doesn’t “forget” how the code is supposed to work or why it exists.

    There was a fun Python application I worked with an AI on a few months back where most of the code ran as a conventional Python program in a conventional Python environment, but one particular file was being “injected” into an entirely separate and very locked-down Python sandbox to use as a bridge between that sandbox and the rest of the application out in the outside world. That particular file would be the only one that was able to import key modules and access key data, and it couldn’t import any other modules that the “outside” application might have access to. Two very different execution environments bundled together in the same repository. But I made sure the file had comments explaining this strange setup and that there were system architecture documents explaining how it worked, and I only recall having to remind the AI once or twice that the change it wanted to make would run afoul of that. It otherwise managed the separation of functionality just fine. And that was with the previous generation of models, the current ones are even better.




  • In this case the evidence is literally first-hand experience. There is nothing that will change my mind on this because it’s my direct personal experience from actual use.

    I honestly don’t care what marketing says, and if other people have different experiences then that’s just them. In my personal actual real-world experience I found that they let me get tons more done and their quality of work is perfectly fine as long as you’re using the right tools and giving them the right instructions.

    The article says that developers are disagreeing with that in situations where they are “forced” to use AI, and that’s fair, it doesn’t make sense to force a tool to be used for something it’s not good at. They might be using it wrong. I use it whenever it’s better than not using it, and that ends up being quite often in my workflow.



  • Well, I’ve seen large projects without extensive unit tests before. The main time I remember a big project with them before coding agents they were largely a checkbox that developers implemented with a grumble when first deploying a new system and then that were slowly disabled one by one as later changes broke them.

    These were stand-alone projects, though, with a large QA department and without an expectation of future versions directly descended from them once deployed. If it worked then it worked, that was all that was needed at the end of the day.


  • Since you brought up the notion that we might be doing different styles of development, I was giving you context as to the kinds of development that I do. Sounds like we might not be doing such different scales of development after all, but I couldn’t have known that until you gave that information just now.

    This isn’t supposed to be some kind of duel or argument, I don’t see the point of that. I’m just explaining my usage of coding agents and specifically unit tests in that context. Since that’s what you were questioning.


  • Could be. I’m a professional programmer whose usage runs the whole gamut - large applications with hundreds of programmers working on them for years, smaller apps that I make for my own use, and one-off scripts to do some particular task and then generally throw away afterwards.

    I don’t do unit tests for that last category, of course. I don’t even use coding agents for those, generally speaking - a bit of back-and-forth in a chat interface is usually enough there.



  • Have you tried giving it coding standards and other such preferences about how you like your code to be organized? I’ve found that coding agents can be quite adaptable to various styles, you can put stuff like “try to keep functions less than 100 lines long” or “include assertions validating all function inputs” into your coding agent’s general instructions and it’ll follow them.

    For me, one of the things that’s a huge fundamental improvement is telling the agent to create and run unit tests for everything. That way when it does mess up accidentally it can immediately catch the problem and usually fixes it in the same session without further intervention. Unit tests used to be more trouble than they were worth most of the time, now I love them.


  • Developers who are told to use AI whether they like it or not, however, tell a different story.

    Well there’s the problem.

    I’m a software developer and I say that AI is the greatest force-multiplier that’s been introduced into the field since the compiler. I love using it, it handles the most tedious and annoying parts of the process. But there are situations I don’t want to use it in, and of course being forced to use would give me a more negative opinion of it. Obviously.





  • As a Canadian who holds negative views of both the American and Chinese governments, I think to myself: which am I more likely to visit someday and will therefore have the opportunity to stick me in an ICE detention center when they look up my profile to discover that? Which of the two governments is a more direct threat to my own country’s security and sovereignty?

    I get an answer that would perhaps surprise Americans.