It appears there is a slight misunderstanding of the acronym: Yann LeCun’s architecture is called JEPA (Joint Embedding Predictive Architecture), not JASP.
JEPA differs from Large Language Models (LLMs) primarily in how they learn, what they predict, and how they represent the world. While LLMs are “word models” designed to generate sequences of tokens, JEPA is intended to create “world models” that understand the underlying physics and logic of reality
3
4
.
The key differences are summarized below:
Generative vs. Predictive (Non-Generative)
LLMs are Generative: They operate by predicting the next token in a sequence (generative AI)
2
. This approach often leads to hallucinations because the model focuses on statistical probability rather than factual ground truth
6
.
JEPA is Predictive: Instead of generating every single pixel or word, JEPA predicts latent representations (embeddings) in a hidden space
5
. It tries to learn what is “plausible” rather than attempting to reconstruct every single detail of the input.
Word Models vs. World Models
LLMs are “Word Models”: They learn from text and treat intelligence as a language manipulation task
4
. LeCun argues that language captures only a small subset of human thinking and cannot represent high-dimensional physical spaces
2
7
.
JEPA aims for “World Models”: It is designed to understand cause and effect, physics, and the physical environment
1
. This allows the system to reason from first principles and plan sequences of actions, which is a prerequisite for autonomous AI
1
.
System 1 vs. System 2 Thinking
LLMs (System 1): LeCun describes LLMs as “System 1” processes—they are reactive and perform a fixed amount of computation to produce each token
2
.
JEPA (Path to System 2): By incorporating world models, JEPA is intended to enable “System 2” thinking—the ability to plan, reason, and deliberate before acting
1
.
Summary Comparison Table
Feature Large Language Models (LLMs) JEPA / World Models
Core Goal Predict next token (Text/Code) Predict latent state (Reality/Physics)
Method Generative (Pixel/Token by pixel/token) Joint Embedding (Non-generative)
Domain Linguistic/Statistical patterns Physical/Causal understanding
Weakness Hallucinations, lacks physical grounding Limited fluency in natural language
Cognition Reactive (System 1) Planning/Reasoning (System 2)
Wait, this is my first time reading about this. Got an ELI5 or TL;DR?
Courtesy of Kagi’s search AI:
000000000
The key differences are summarized below:
Generative vs. Predictive (Non-Generative)
LLMs are Generative: They operate by predicting the next token in a sequence (generative AI) 2 . This approach often leads to hallucinations because the model focuses on statistical probability rather than factual ground truth 6 . JEPA is Predictive: Instead of generating every single pixel or word, JEPA predicts latent representations (embeddings) in a hidden space 5 . It tries to learn what is “plausible” rather than attempting to reconstruct every single detail of the input.
Word Models vs. World Models
LLMs are “Word Models”: They learn from text and treat intelligence as a language manipulation task 4 . LeCun argues that language captures only a small subset of human thinking and cannot represent high-dimensional physical spaces 2 7 . JEPA aims for “World Models”: It is designed to understand cause and effect, physics, and the physical environment 1 . This allows the system to reason from first principles and plan sequences of actions, which is a prerequisite for autonomous AI 1 .
System 1 vs. System 2 Thinking
LLMs (System 1): LeCun describes LLMs as “System 1” processes—they are reactive and perform a fixed amount of computation to produce each token 2 . JEPA (Path to System 2): By incorporating world models, JEPA is intended to enable “System 2” thinking—the ability to plan, reason, and deliberate before acting 1 .
Summary Comparison Table Feature Large Language Models (LLMs) JEPA / World Models Core Goal Predict next token (Text/Code) Predict latent state (Reality/Physics) Method Generative (Pixel/Token by pixel/token) Joint Embedding (Non-generative) Domain Linguistic/Statistical patterns Physical/Causal understanding Weakness Hallucinations, lacks physical grounding Limited fluency in natural language Cognition Reactive (System 1) Planning/Reasoning (System 2)
Don’t fucking post chatbot vomit.
https://letmegooglethat.com/?q=Yann+lecun+jasp