While I agree on the second paragraph, I’m gonna argue about the first, partially because I think the second invalidates the first.
These models do have some form of understanding though. There are features for bugs and typos, and general features that map descriptions and pieces of code.
The models don’t understand anything, they have rules that allow for finding tokens that don’t belong and fuzzy match to correct tokens (typos) and the ability to find code that breaks known rules for a language. That is no more understanding the problem than my spelling or grammar checking understands the comment I’m writing. ‘Understanding’ something requires intelligence and the ability to learn something and incorporate that knowledge into itself and use it to better process that information, not just finding tokens that break rules.
It understands the code in so far it helps with next token prediction.
And this is the crux of my beef, I think, because stochastic pattern matching is not understanding, it’s a mathematical representation of how the model processes your input tokens. The fact that it has to start over every time you provide it input, and uses the previous input/output tokens as context is why this is not ‘understanding’, it’s just fancy token prediction that gives a middling-to-passable facsimile to intelligence and understanding things.
The problems you note in your second paragraph fundamentally undermine the argument that there is any form of understanding to the AI, because those are basic mistakes that a trivial understanding of the problem would prevent.
That is no more understanding the problem than my spelling or grammar checking understands the comment I’m writing.
My general point would be that even a grammar checking can have some form of understanding of the text, no matter how shallow. The checker probably has a rule for when a is used versus an, if this rule generalizes across new words that rule that were previously unseen is a form of ‘understanding’ of the language being used in my view, despite being overly simplistic, while rote memorization - having a list of words that are followed by an - may not be.
LLMs are a weird case, because their internal representations for many concepts generalize even across new settings / inputs - in that sense the model has a form of understanding of what is being given, while for many other concepts the patterns break down even in the simplest of cases. It may ‘know’ that the preceding text is the writing of an essay - and it should autocomplete accounting for that -as is defined in its weights, but this understanding is shallow - it does not know why it knows, or how it knows - it cannot self reflect as it does not see and understand its own internal workings, or account for that. Yet the internal representations represent a form of text understanding that can be useful nonetheless - it is a language model after all.
My comment was intended to show this duality, hence the duality between the two paragraphs.
And this is the crux of my beef, I think, because stochastic pattern matching is not understanding, it’s a mathematical representation of how the model processes your input tokens. The fact that it has to start over every time you provide it input, and uses the previous input/output tokens as context is why this is not ‘understanding’, it’s just fancy token prediction that gives a middling-to-passable facsimile to intelligence and understanding things.
The problems you note in your second paragraph fundamentally undermine the argument that there is any form of understanding to the AI, because those are basic mistakes that a trivial understanding of the problem would prevent.
I am not entirely grasping the point you are trying to make here. I am certainly not arguing that it is conscious, self-aware, or in any way not a mechanical procedure that is being performed (I would not argue for that!). My key point is that it is not a simple black or white it understands / it does not understand - it may have internal representations that relate many concepts together, allowing it to draw upon these links when generating text, giving it a certain semantic understanding of the language and text it is using, while simultaneously not having a bit of self-awareness.
My point is that saying an LLM understands anything is anthropomorphizing the LLM and leads people into thought patterns that give it an inordinate amount of authority because people equate the simulacrum of understanding/comprehension with actual understanding.
I think we just fundamentally disagree on the concept of llms a being able to understand a topic rather than it being a shallow statistical prediction if the correct answer, and I just can’t equate understanding with statistical predictions. The fact that the underlying math is able to generalize the prediction in novel ways lends weight to the misbelief that it understands concepts, but the decoherence that happens over long conversations should shatter the illusion.
That’s fair. I actually don’t think we disagree that much - I just think I have trouble conveying what I am trying to say. Whenever someone talks about ‘shallow statistical predictions’, I think about older techniques like Statistical Machine Translation which even had trouble with things like word order, LLMs handle text on a higher level of abstraction (which I described as a form of textual understanding) - and hence handle things like word order better - but are still inherently statistical predictors. The model stores info about how words interact and relate to one another, but it does not ‘understand’ what the words actually (physically?) represent beyond these interactions nor does it ‘understand’ what it is doing. Albeit, those interactions are modeled well enough to give a convincing replica of doing so.
That makes more sense, thanks for expanding on your point.
Like I said, I mainly take issue with describing it as ‘understanding’ due to the connotations it gives off. I’m used to AI glazers using the same wordings and actually try to make the argument there is an understanding behind the statistical probabilities.
While I agree on the second paragraph, I’m gonna argue about the first, partially because I think the second invalidates the first.
The models don’t understand anything, they have rules that allow for finding tokens that don’t belong and fuzzy match to correct tokens (typos) and the ability to find code that breaks known rules for a language. That is no more understanding the problem than my spelling or grammar checking understands the comment I’m writing. ‘Understanding’ something requires intelligence and the ability to learn something and incorporate that knowledge into itself and use it to better process that information, not just finding tokens that break rules.
And this is the crux of my beef, I think, because stochastic pattern matching is not understanding, it’s a mathematical representation of how the model processes your input tokens. The fact that it has to start over every time you provide it input, and uses the previous input/output tokens as context is why this is not ‘understanding’, it’s just fancy token prediction that gives a middling-to-passable facsimile to intelligence and understanding things.
The problems you note in your second paragraph fundamentally undermine the argument that there is any form of understanding to the AI, because those are basic mistakes that a trivial understanding of the problem would prevent.
My general point would be that even a grammar checking can have some form of understanding of the text, no matter how shallow. The checker probably has a rule for when
ais used versusan, if this rule generalizes across new words that rule that were previously unseen is a form of ‘understanding’ of the language being used in my view, despite being overly simplistic, while rote memorization - having a list of words that are followed by an - may not be.LLMs are a weird case, because their internal representations for many concepts generalize even across new settings / inputs - in that sense the model has a form of understanding of what is being given, while for many other concepts the patterns break down even in the simplest of cases. It may ‘know’ that the preceding text is the writing of an essay - and it should autocomplete accounting for that -as is defined in its weights, but this understanding is shallow - it does not know why it knows, or how it knows - it cannot self reflect as it does not see and understand its own internal workings, or account for that. Yet the internal representations represent a form of text understanding that can be useful nonetheless - it is a language model after all.
My comment was intended to show this duality, hence the duality between the two paragraphs.
I am not entirely grasping the point you are trying to make here. I am certainly not arguing that it is conscious, self-aware, or in any way not a mechanical procedure that is being performed (I would not argue for that!). My key point is that it is not a simple black or white it understands / it does not understand - it may have internal representations that relate many concepts together, allowing it to draw upon these links when generating text, giving it a certain semantic understanding of the language and text it is using, while simultaneously not having a bit of self-awareness.
My point is that saying an LLM understands anything is anthropomorphizing the LLM and leads people into thought patterns that give it an inordinate amount of authority because people equate the simulacrum of understanding/comprehension with actual understanding.
I think we just fundamentally disagree on the concept of llms a being able to understand a topic rather than it being a shallow statistical prediction if the correct answer, and I just can’t equate understanding with statistical predictions. The fact that the underlying math is able to generalize the prediction in novel ways lends weight to the misbelief that it understands concepts, but the decoherence that happens over long conversations should shatter the illusion.
That’s fair. I actually don’t think we disagree that much - I just think I have trouble conveying what I am trying to say. Whenever someone talks about ‘shallow statistical predictions’, I think about older techniques like Statistical Machine Translation which even had trouble with things like word order, LLMs handle text on a higher level of abstraction (which I described as a form of textual understanding) - and hence handle things like word order better - but are still inherently statistical predictors. The model stores info about how words interact and relate to one another, but it does not ‘understand’ what the words actually (physically?) represent beyond these interactions nor does it ‘understand’ what it is doing. Albeit, those interactions are modeled well enough to give a convincing replica of doing so.
That makes more sense, thanks for expanding on your point.
Like I said, I mainly take issue with describing it as ‘understanding’ due to the connotations it gives off. I’m used to AI glazers using the same wordings and actually try to make the argument there is an understanding behind the statistical probabilities.
@Passerby6497 @8uurg #imho people believe #computers are #god like perfect machines, free of any #errors nothing could be further from the truth, basically every #cpu because of its complexity has #errors or even #security #flaws that need to be corrected afterwards via #software #microcode #updates yes current #llm #ai does not understand anything? It is just very good at guessing the next token to output?