Ask ChatGPT to estimate the carbs in your lunch. Now ask it again. And again. Five hundred times. You’d expect the same answer each time. It’s the same photo, the same model, the same question. But you won’t get the same answer. Not even close — and the differences are large enough to cause a
AI doesn’t see individual characters, it sees tokens, with most tokens being a word or part of a word. That’s why per-character questions have such a high failure rate.
It doesn’t understand anything though? It never will. It’s a probability machine. If you choose to believe its output, that’s on you. I use it as a coding assistant to get boring things done faster. Fire a prompt at claude code, grab a coffee, check out the diff. But that last step is crucial. Can’t trust AI output blindly.
The embedding layer post tokenization is not just a probability machine the way you’re suggesting it. You can argue that it is probabilistic with inferred sentiment, but too many people think it works like how text prediction on your phone does and that is just factually inaccurate.
Verify output of course, but saying “it doesn’t understand anything” and “probability machine” is a borderline erroneous short sell. At the level of tokens it “understands” relationships, and those relationships are not probabilistic, though they are fundamentally approximated based on a training corpus.
You could also say that it chooses what will be the next word it will say to you.
It has a few words to choose from, which it has selected in relation to the previously spoken words, your question and previous interactions (the context).
The probability you’re talking about (a number) could also be seen as it’s preference among those words.
I’m not sure the probability vocabulary/analogy is necessarily the best one. The best might be to not employ any analogy at all, but then you have to dig deeper into the subject to form yourself an informed opinion.
This series of videos explains it better than I do : https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
AI doesn’t see individual characters, it sees tokens, with most tokens being a word or part of a word. That’s why per-character questions have such a high failure rate.
It’s it doesn’t understand the simple concept of the number of letters and spaces, it needs to be reprogrammed.
How many letters are there in 令牌? It’s a simple question right, you wouldnt need to search for it to find out would you?
It doesn’t understand anything though? It never will. It’s a probability machine. If you choose to believe its output, that’s on you. I use it as a coding assistant to get boring things done faster. Fire a prompt at claude code, grab a coffee, check out the diff. But that last step is crucial. Can’t trust AI output blindly.
The embedding layer post tokenization is not just a probability machine the way you’re suggesting it. You can argue that it is probabilistic with inferred sentiment, but too many people think it works like how text prediction on your phone does and that is just factually inaccurate.
Verify output of course, but saying “it doesn’t understand anything” and “probability machine” is a borderline erroneous short sell. At the level of tokens it “understands” relationships, and those relationships are not probabilistic, though they are fundamentally approximated based on a training corpus.
Can you explain how it’s more than probability? It’s using a neural network to guess the most likely next token, isn’t it?
You could also say that it chooses what will be the next word it will say to you. It has a few words to choose from, which it has selected in relation to the previously spoken words, your question and previous interactions (the context). The probability you’re talking about (a number) could also be seen as it’s preference among those words. I’m not sure the probability vocabulary/analogy is necessarily the best one. The best might be to not employ any analogy at all, but then you have to dig deeper into the subject to form yourself an informed opinion. This series of videos explains it better than I do : https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi