DeepSeek Permanently Reduces The Price Of Its Flagship V4 Model By 75 Percent

jaykrown@lemmy.world · 2 months ago

DeepSeek Permanently Reduces The Price Of Its Flagship V4 Model By 75 Percent

boatswain@infosec.pub · 2 months ago

My understanding is that tokens are basically words, and that when you ask a question it charges for all the tokens it consumes, produces, or processes. There’s a lot of internal processing for each request, where the input text is summarized in different ways and combined with previous parts of the conversation, so it’s not as straightforward as “word count of what you say plus what it says”.

iamthetot@piefed.ca · edit-2 2 months ago

Worth noting that a token is not necessarily a word, though can be. One word could also take multiple tokens. It can also vary from LLM to LLM and their tokenization methods.

teft@piefed.social · 2 months ago

There’s a lot of internal processing for each request, where the input text is summarized in different ways and combined with previous parts of the conversation, so it’s not as straightforward as “word count of what you say plus what it says”.

In other words obfuscation so they can charge whatever they want using some obscure formula that only they know.

Eager Eagle@lemmy.world · edit-2 2 months ago

Not really, there are ways to count tokens before running an inference. Some providers make tokenizers public, so they even work offline. APIs also usually return rolling costs per response and have budget limits - though some could have more fine-grained limits.

Users who are surprised by the bill are usually not paying attention to each call, or using autonomous subagents, or a setup where they have little or no control to what is sent to the provider.

So the problem isn’t really the API provider, as much as it’s the tooling around it, which makes it too easy to overspend.