https://github.com/NineAbyss/S2R
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
https://github.com/NineAbyss/S2R
Last synced: 9 months ago
JSON representation
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
- Host: GitHub
- URL: https://github.com/NineAbyss/S2R
- Owner: NineAbyss
- License: mit
- Created: 2025-02-18T16:56:50.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-03-12T07:19:55.000Z (11 months ago)
- Last Synced: 2025-03-12T07:32:18.992Z (11 months ago)
- Language: Python
- Size: 14.9 MB
- Stars: 45
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - NineAbyss/S2R - verify and Self-correct via Reinforcement Learning",提供了官方实现代码。其核心思想是通过强化学习训练LLM,使其能够识别自身生成的错误并进行修正,从而提高生成内容的质量和可靠性。S²R方法旨在解决LLM在复杂任务中容易出错的问题,通过自我反思和迭代优化,使LLM能够更准确地完成任务。项目代码库包含了训练和评估S²R模型的必要工具和脚本,方便研究人员复现实验结果并进行进一步研究。该项目的亮点在于其利用强化学习框架,赋予LLM自我纠错的能力,是提升LLM性能的一种创新方法。 (A01_文本生成_文本对话 / 大语言对话模型及数据)