There’s a lot of internal processing for each request, where the input text is summarized in different ways and combined with previous parts of the conversation, so it’s not as straightforward as “word count of what you say plus what it says”.
In other words obfuscation so they can charge whatever they want using some obscure formula that only they know.
Not really, there are ways to count tokens before running an inference. Some providers make tokenizers public, so they even work offline. APIs also usually return rolling costs per response and have budget limits - though some could have more fine-grained limits.
Users who are surprised by the bill are usually not paying attention to each call, or using autonomous subagents, or a setup where they have little or no control to what is sent to the provider.
So the problem isn’t really the API provider, as much as it’s the tooling around it, which makes it too easy to overspend.
In other words obfuscation so they can charge whatever they want using some obscure formula that only they know.
Not really, there are ways to count tokens before running an inference. Some providers make tokenizers public, so they even work offline. APIs also usually return rolling costs per response and have budget limits - though some could have more fine-grained limits.
Users who are surprised by the bill are usually not paying attention to each call, or using autonomous subagents, or a setup where they have little or no control to what is sent to the provider.
So the problem isn’t really the API provider, as much as it’s the tooling around it, which makes it too easy to overspend.