ARC-AGI-3 tests whether models can reason through novel problems, not just recall patterns, a task even top systems still ...