https://github.com/naidezhujimo/exploring-the-limit-of-outcome-reward-for-learning-mathematical-reasoning
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
https://github.com/naidezhujimo/exploring-the-limit-of-outcome-reward-for-learning-mathematical-reasoning
llm rl testtime
Last synced: 6 days ago
JSON representation
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
- Host: GitHub
- URL: https://github.com/naidezhujimo/exploring-the-limit-of-outcome-reward-for-learning-mathematical-reasoning
- Owner: naidezhujimo
- Created: 2025-04-29T10:07:46.000Z (22 days ago)
- Default Branch: main
- Last Pushed: 2025-04-29T10:08:13.000Z (22 days ago)
- Last Synced: 2025-04-29T11:22:59.686Z (22 days ago)
- Topics: llm, rl, testtime
- Language: Python
- Homepage:
- Size: 6.84 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Exploring-the-Limit-of-Outcome-Reward-for-Learning-Mathematical-Reasoning
https://arxiv.org/abs/2502.06781