AI costs spike as subscriptions hit pricing wall — firms turn towards Chinese LLMs, open-source models to extend budget

sanitation@lemmy.today · 1 month ago

AI costs spike as subscriptions hit pricing wall — firms turn towards Chinese LLMs, open-source models to extend budget

Ilovethebomb@sh.itjust.works · 1 month ago

They don’t explain what you’d need to do to actually maximise one of these plans, would you be hammering it with prompts 24/7 or something?

setsubyou@lemmy.world · 1 month ago

Nowadays agents like Claude Code can run autonomously for hours just given a goal description. It doesn’t take a lot of human effort at all to set up a bunch of sessions, and these companies don’t limit how many instances you run in parallel. Agents can also spawn sub-agents that run in parallel if a task calls for parallelization. Whether all this produces good results is a different story, especially if you don’t put enough effort into the goal description. But burning tokens as such is not difficult.

Even workflows where you’re just chatting with an agent can burn a lot of tokens. When you’re chatting with an LLM, the entire history becomes part of the input each time you send something. This also applies to tool calls, so if the agent decides to read 20 files before it can work on your request that’s 20 times a file gets added to the history and 20 times that entire growing history is then sent back as input to drive the agent’s next step.

Coding is more affected by this than many other applications because even a new conversation tends to start with the agent gathering a bunch of source code files, and then the response to a task is not just a bunch of text once, but a sequence of tool calls to make edits across files, build, run tests, react to test failures, and so on, all for one actual human prompt - but in reality a back-and-forth between the LLM and the harness with a quickly growing history.

tal@lemmy.today · edit-2 1 month ago

I assume that you’d have some sort of massive workload that you span over multiple plans. You just have software to switch you from one plan to the next once you saturate the plan.

Probably not all that hard to write some kind of software that tries to make massive use of LLMs. Like, oh, I don’t know. Getting all abstract here, any problem in computer science where you have a problem that you don’t know how to solve directly, but you can easily check whether an answer is correct. Then you just keep trying to solve it, and repeatedly check whether the generated answer is correct or not.

Another possibility is that you have a problem where you can quickly check the quality of a given solution (either via human assistance or software, even though you don’t know how to solve the problem yourself), and want to generate a number of solutions and pick the best.

I’ve certainly seen that with image-generating diffusion models, rather than LLMs — stuff like “batch-generate me N images using this prompt, and I’ll pick the best”. It’s an algorithmically-simple, brute-force way of improving quality, by just throwing more compute time at the problem. The human “quality evaluation” is cheap to do compared to the human time required to generate an image. Burns a lot of compute time, but the alternative to improve quality is improving the model, and if we don’t know how to do that yet…shrugs

altkey (he\him)@lemmy.dbzer0.com · 1 month ago

Not even that. A business can “implement” AI agent on their website by forwarding client’s inputs to someone else’s API, adding a prompt pointing back at them.

grumpy_cat@thelemmy.club · 1 month ago

I use cursor for work, and boy I can easily burn 100-300$ a day