Transformers Animation Test

14h

Exclusive: This new benchmark could expose AI’s biggest weakness

ARC-AGI-3 tests whether models can reason through novel problems, not just recall patterns, a task even top systems still ...

Some results have been hidden because they may be inaccessible to you