https://github.com/RyanLiu112/GenPRM

Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
https://github.com/RyanLiu112/GenPRM

large-language-model o1 process-reward-model r1 test-time-scaling

Last synced: 5 months ago
JSON representation

Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".

StarryDivineSky - RyanLiu112/GenPRM - Time Compute of Process Reward Models via Generative Reasoning”。该项目旨在通过生成式推理扩展过程奖励模型(Process Reward Models)的测试时计算规模。GenPRM的核心思想是利用生成模型来生成多个推理过程，并使用过程奖励模型对这些过程进行评估，从而选择最佳的推理路径。这种方法允许在测试时进行更深入的探索，提高模型的性能。项目主要关注如何高效地生成和评估这些推理过程，以实现更好的结果。它可能包含用于训练生成模型、实现过程奖励模型以及执行生成式推理的代码。该项目为研究如何利用生成式方法改进过程奖励模型提供了一个有价值的平台。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

ecosyste.ms