Google researchers introduce ‘Internal RL,’ a technique that steers an models' hidden activations to solve long-horizon tasks ...
“I was curious to establish a baseline for when LLMs are effectively able to solve open math problems compared to where they ...