Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

brianpeiris@lemmy.ca · edit-2 4 個月前

Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

sp3ctr4l@lemmy.dbzer0.com · edit-2 4 個月前

Here is a way of describing what I see as ‘the problem’:

An LLM cannot forget things in its base training data set.

Its permanent memory… is totally permanent.

And this memory has a bunch of wrong ideas, a bunch of nonsensical associations, a bunch of false facts, a bunch of meaningless gibberish.

It has no way of evaluating its own knowledge set for consistency, coherence, and stability.

It literally cannot learn and grow, because it cannot realize why it made mistakes, it cannot discard or ammend in a permanent way, concepts that are incoherent, faulty ways of reasoning (associating) things.

Seriously, ask an LLM a trick question, then tell it it was wrong, explain the correct answer, then ask it to determine why it was wrong.

Then give it another similar category of trick question, but that is specifically different, repeat.

The closer you try to get it toward reworking a fundamental axiom it holds to that is flawed, the closer it gets to responding in totally paradoxical, illogical gibberish, or just stuck in some kind of repetetive loop.

… Learning is as much building new ideas and experiences, as it is reevaluating your old ideas and experiences, and discarding concepts that are wrong or insufficient.

Biological brains have neuroplasticity.

So far, silicon ones do not.

Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

Announcing ARC-AGI-3 - A benchmark that tests if AI can explore, learn, and adapt in unfamiliar situations. Humans score 100%. Frontier AI scores 0.26%.

Announcing ARC-AGI-3 | ARC Prize