next BIG future

next BIG future

Share this post

next BIG future
next BIG future
Microsoft and China AI Research Possible Reinforcement Pre-Training Breakthrough

Microsoft and China AI Research Possible Reinforcement Pre-Training Breakthrough

NextBigFuture's avatar
NextBigFuture
Jun 11, 2025
∙ Paid

Share this post

next BIG future
next BIG future
Microsoft and China AI Research Possible Reinforcement Pre-Training Breakthrough
Share

Reinforcement Pre-Training (RPT) is a new method for training large language models (LLMs) by reframing the standard task of predicting the next token in a sequence as a reasoning problem solved using reinforcement learning (RL). Unlike traditional RL methods for LLMs that need expensive human data or limited annotated data, RPT uses verifiable rewards …

Keep reading with a 7-day free trial

Subscribe to next BIG future to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Nextbigfuture
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share