https://github.com/naidezhujimo/exploring-the-limit-of-outcome-reward-for-learning-mathematical-reasoning

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
https://github.com/naidezhujimo/exploring-the-limit-of-outcome-reward-for-learning-mathematical-reasoning

llm rl testtime

Last synced: 6 days ago
JSON representation

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Host: GitHub
URL: https://github.com/naidezhujimo/exploring-the-limit-of-outcome-reward-for-learning-mathematical-reasoning
Owner: naidezhujimo
Created: 2025-04-29T10:07:46.000Z (22 days ago)
Default Branch: main
Last Pushed: 2025-04-29T10:08:13.000Z (22 days ago)
Last Synced: 2025-04-29T11:22:59.686Z (22 days ago)
Topics: llm, rl, testtime
Language: Python
Homepage:
Size: 6.84 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Exploring-the-Limit-of-Outcome-Reward-for-Learning-Mathematical-Reasoning
https://arxiv.org/abs/2502.06781