https://github.com/lucidrains/llama-qrlhf

Implementation of the Llama architecture with RLHF + Q-learning
https://github.com/lucidrains/llama-qrlhf

artificial-intelligence attention deep-learning q-learning

Last synced: 6 months ago
JSON representation

Implementation of the Llama architecture with RLHF + Q-learning

Host: GitHub
URL: https://github.com/lucidrains/llama-qrlhf
Owner: lucidrains
License: mit
Created: 2023-11-23T15:28:31.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-02-01T19:45:11.000Z (8 months ago)
Last Synced: 2025-03-29T06:03:45.122Z (7 months ago)
Topics: artificial-intelligence, attention, deep-learning, q-learning
Language: Python
Homepage:
Size: 26.4 KB
Stars: 163
Watchers: 21
Forks: 8
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Llama - QRLHF (wip)

Implementation of the Llama (or any language model) architecture with RLHF + Q-learning.

This is experimental / independent open research, built off nothing but speculation. But I'll throw some of my brain cycles at the problem in the coming month, just in case the rumors have any basis. Anything you PhD students can get working is up for grabs.

Will start off by adapting the autoregressive discrete Q-learning formulation in the cited paper below and run a few experiments on arithmetic, using a symbolic solver as reward generator.

Yannic Kilcher's educational Q-learning video

## Citations

```bibtex
@inproceedings{qtransformer,
title = {Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions},
authors = {Yevgen Chebotar and Quan Vuong and Alex Irpan and Karol Hausman and Fei Xia and Yao Lu and Aviral Kumar and Tianhe Yu and Alexander Herzog and Karl Pertsch and Keerthana Gopalakrishnan and Julian Ibarz and Ofir Nachum and Sumedh Sontakke and Grecia Salazar and Huong T Tran and Jodilyn Peralta and Clayton Tan and Deeksha Manjunath and Jaspiar Singht and Brianna Zitkovich and Tomas Jackson and Kanishka Rao and Chelsea Finn and Sergey Levine},
booktitle = {7th Annual Conference on Robot Learning},
year = {2023}
}
```

```bibtex
@inproceedings{Wang2015DuelingNA,
title = {Dueling Network Architectures for Deep Reinforcement Learning},
author = {Ziyun Wang and Tom Schaul and Matteo Hessel and H. V. Hasselt and Marc Lanctot and Nando de Freitas},
booktitle = {International Conference on Machine Learning},
year = {2015},
url = {https://api.semanticscholar.org/CorpusID:5389801}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucidrains/llama-qrlhf

Awesome Lists containing this project

README