Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eureka-research/Eureka
Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)
https://github.com/eureka-research/Eureka
Last synced: about 2 months ago
JSON representation
Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)
- Host: GitHub
- URL: https://github.com/eureka-research/Eureka
- Owner: eureka-research
- License: mit
- Created: 2023-09-25T17:48:50.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-03T07:31:13.000Z (8 months ago)
- Last Synced: 2024-10-29T15:35:02.132Z (about 2 months ago)
- Language: Jupyter Notebook
- Homepage: https://eureka-research.github.io/
- Size: 178 MB
- Stars: 2,819
- Watchers: 25
- Forks: 255
- Open Issues: 38
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- AiTreasureBox - eureka-research/Eureka - 12-20_2854_1](https://img.shields.io/github/stars/eureka-research/Eureka.svg)|Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models"| (Repos)
- StarryDivineSky - eureka-research/Eureka - 4)的卓越零镜头生成、代码编写和上下文改进功能,对奖励代码执行上下文进化优化。由此产生的奖励可用于通过强化学习获得复杂的技能。Eureka 生成的奖励函数优于专家人工设计的奖励,无需任何特定于任务的提示或预定义的奖励模板。在包含 10 种不同机器人形态的 29 种开源强化学习环境中,Eureka 在 83% 的任务中表现优于人类专家,平均标准化改进了 52%。尤里卡的通用性还提供了一种新的无梯度方法来从人类反馈(RLHF)进行强化学习,很容易结合人类监督来提高上下文中生成的奖励的质量和安全性。最后,在课程学习环境中使用尤里卡奖励,我们首次演示了一个模拟的五指影手,能够执行钢笔旋转技巧,熟练地以人类的速度操纵笔。 (A01_文本生成_文本对话 / 大语言对话模型及数据)