Doing the Lord’s work in the Devil’s basement

  • 0 Posts
  • 222 Comments
Joined 2 years ago
cake
Cake day: May 8th, 2024

help-circle

  • I think this kind of claim really lies in a sour spot.

    On the one hand it is trivial to get an IDE, plug it to GLM 4.5 or some other smaller more efficient model, and see how it fares on a project. But that’s just anecdotal. On the other hand, model creators do this thing called benchmaxing where they fine-tune their model to hell and back to respond well to specific benchmarks. And the whole culture around benchmarks is… i don’t know i don’t like the vibe it’s all AGI maximalists wanking to percent changes in performance. Not fun. So, yeah, evidence is hard to come by when there are so many snake oil salesmen around.

    On the other hand, it’s pretty easy to check on your own. Install opencode, get 20$ of GLM credit, make it write, deploy and monitor a simple SaaS product, and see how you like it. Then do another one. And do a third one with Claude Code for control if you can get a guest pass (i have some hit me up if you’re interested).

    What is certain from casual observation is that yes, small models have improved tremendously in the last year, to the point where they’re starting to get usable. Code generation is a much more constrained world than generalist text gen, and can be tested automatically, so progress is expected to continue at breakneck pace. Large models are still categorically better but this is expected to change rapidly.



  • I totally share that perspective. My controversial example is always fury road because it fits those criteria so well. It delivers exactly what it says on the tin. If you come expecting something else you’re gonna have a lousy time. But if you come excited about what it has to deliver, you’ll start noticing that it is engineered to near perfection with that one objective in mind.


  • There’s a lot of questionable methodology and straight up larping in these communities. Sure you can probably make Opus hallucinate a crystal meth or bomb making recipe if you get it in a roleplaying mood but that’s a far cry from actual prompt injection in live workflows.

    Anecdotally i’ve been experimenting on those AI robocallers that have been spamming my phone and even on the shitty models they use it is non trivial to get them to deviate from their script. I hope i can get it done though, as it would allow me to hold them on the line potentially for hours doing bullshit tasks, and costing hundreds to their operator.












  • Kind of a tangent : this depends a lot on your use case but I’ve found that transcoding with GPU is not necessarily a good thing. You generally get larger files, and it’s not always faster than CPU. That is because ffmpeg can distribute the load among all your CPU cores. If you’ve got enough of those you’ll get better multipliers than on an old GPU.