AI Researcher's New Trick: Train LLMs To Explore On "Hard" Tokens

Comments