• boatswain@infosec.pub
    link
    fedilink
    English
    arrow-up
    13
    ·
    7 hours ago

    My understanding is that tokens are basically words, and that when you ask a question it charges for all the tokens it consumes, produces, or processes. There’s a lot of internal processing for each request, where the input text is summarized in different ways and combined with previous parts of the conversation, so it’s not as straightforward as “word count of what you say plus what it says”.

    • iamthetot@piefed.ca
      link
      fedilink
      English
      arrow-up
      11
      ·
      edit-2
      6 hours ago

      Worth noting that a token is not necessarily a word, though can be. One word could also take multiple tokens. It can also vary from LLM to LLM and their tokenization methods.

    • teft@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      5
      ·
      6 hours ago

      There’s a lot of internal processing for each request, where the input text is summarized in different ways and combined with previous parts of the conversation, so it’s not as straightforward as “word count of what you say plus what it says”.

      In other words obfuscation so they can charge whatever they want using some obscure formula that only they know.

      • Eager Eagle@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        edit-2
        6 hours ago

        Not really, there are ways to count tokens before running an inference. Some providers make tokenizers public, so they even work offline. APIs also usually return rolling costs per response and have budget limits - though some could have more fine-grained limits.

        Users who are surprised by the bill are usually not paying attention to each call, or using autonomous subagents, or a setup where they have little or no control to what is sent to the provider.

        So the problem isn’t really the API provider, as much as it’s the tooling around it, which makes it too easy to overspend.