Alg Papers Reinforcement Learning¶ 最近Deepseek的R1模型爆火了,在这之前OpenAI的o1模型也声势浩大。 它们都使用了类似的技术:强化学习(Reinforcement Learning, i.e. RL)。 RL基础¶ PPO¶ Last update: 2025-03-27 23:10:53 Created: 2025-03-13 00:28:33 Was this page helpful? Thanks for your feedback! Thanks for your feedback! Help me improve this page by using the feedback form. Comments