Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidrains/llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
https://github.com/lucidrains/llama-qrlhf
artificial-intelligence attention deep-learning q-learning
Last synced: 2 days ago
JSON representation
Implementation of the Llama architecture with RLHF + Q-learning
- Host: GitHub
- URL: https://github.com/lucidrains/llama-qrlhf
- Owner: lucidrains
- License: mit
- Created: 2023-11-23T15:28:31.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-22T16:47:23.000Z (almost 1 year ago)
- Last Synced: 2024-12-10T09:50:11.893Z (12 days ago)
- Topics: artificial-intelligence, attention, deep-learning, q-learning
- Language: Python
- Homepage:
- Size: 19.5 KB
- Stars: 158
- Watchers: 21
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Llama - QRLHF (wip)
Implementation of the Llama (or any language model) architecture with RLHF + Q-learning.
This is experimental / independent open research, built off nothing but speculation. But I'll throw some of my brain cycles at the problem in the coming month, just in case the rumors have any basis. Anything you PhD students can get working is up for grabs.
Will start off by adapting the autoregressive discrete Q-learning formulation in the cited paper below and run a few experiments on arithmetic, using a symbolic solver as reward generator.
Yannic Kilcher's educational Q-learning video
## Citations
```bibtex
@inproceedings{qtransformer,
title = {Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions},
authors = {Yevgen Chebotar and Quan Vuong and Alex Irpan and Karol Hausman and Fei Xia and Yao Lu and Aviral Kumar and Tianhe Yu and Alexander Herzog and Karl Pertsch and Keerthana Gopalakrishnan and Julian Ibarz and Ofir Nachum and Sumedh Sontakke and Grecia Salazar and Huong T Tran and Jodilyn Peralta and Clayton Tan and Deeksha Manjunath and Jaspiar Singht and Brianna Zitkovich and Tomas Jackson and Kanishka Rao and Chelsea Finn and Sergey Levine},
booktitle = {7th Annual Conference on Robot Learning},
year = {2023}
}
``````bibtex
@inproceedings{Wang2015DuelingNA,
title = {Dueling Network Architectures for Deep Reinforcement Learning},
author = {Ziyun Wang and Tom Schaul and Matteo Hessel and H. V. Hasselt and Marc Lanctot and Nando de Freitas},
booktitle = {International Conference on Machine Learning},
year = {2015},
url = {https://api.semanticscholar.org/CorpusID:5389801}
}
```