https://github.com/NineAbyss/S2R

This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
https://github.com/NineAbyss/S2R

Last synced: 9 months ago
JSON representation

This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"

Host: GitHub
URL: https://github.com/NineAbyss/S2R
Owner: NineAbyss
License: mit
Created: 2025-02-18T16:56:50.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-03-12T07:19:55.000Z (11 months ago)
Last Synced: 2025-03-12T07:32:18.992Z (11 months ago)
Language: Python
Size: 14.9 MB
Stars: 45
Watchers: 2
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - NineAbyss/S2R - verify and Self-correct via Reinforcement Learning"，提供了官方实现代码。其核心思想是通过强化学习训练LLM，使其能够识别自身生成的错误并进行修正，从而提高生成内容的质量和可靠性。S²R方法旨在解决LLM在复杂任务中容易出错的问题，通过自我反思和迭代优化，使LLM能够更准确地完成任务。项目代码库包含了训练和评估S²R模型的必要工具和脚本，方便研究人员复现实验结果并进行进一步研究。该项目的亮点在于其利用强化学习框架，赋予LLM自我纠错的能力，是提升LLM性能的一种创新方法。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/NineAbyss/S2R

Awesome Lists containing this project