Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lucidrains/llama-qrlhf

Implementation of the Llama architecture with RLHF + Q-learning
https://github.com/lucidrains/llama-qrlhf

artificial-intelligence attention deep-learning q-learning

Last synced: 2 days ago
JSON representation

Implementation of the Llama architecture with RLHF + Q-learning

Awesome Lists containing this project

README

        

## Llama - QRLHF (wip)

Implementation of the Llama (or any language model) architecture with RLHF + Q-learning.

This is experimental / independent open research, built off nothing but speculation. But I'll throw some of my brain cycles at the problem in the coming month, just in case the rumors have any basis. Anything you PhD students can get working is up for grabs.

Will start off by adapting the autoregressive discrete Q-learning formulation in the cited paper below and run a few experiments on arithmetic, using a symbolic solver as reward generator.

Yannic Kilcher's educational Q-learning video

## Citations

```bibtex
@inproceedings{qtransformer,
title = {Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions},
authors = {Yevgen Chebotar and Quan Vuong and Alex Irpan and Karol Hausman and Fei Xia and Yao Lu and Aviral Kumar and Tianhe Yu and Alexander Herzog and Karl Pertsch and Keerthana Gopalakrishnan and Julian Ibarz and Ofir Nachum and Sumedh Sontakke and Grecia Salazar and Huong T Tran and Jodilyn Peralta and Clayton Tan and Deeksha Manjunath and Jaspiar Singht and Brianna Zitkovich and Tomas Jackson and Kanishka Rao and Chelsea Finn and Sergey Levine},
booktitle = {7th Annual Conference on Robot Learning},
year = {2023}
}
```

```bibtex
@inproceedings{Wang2015DuelingNA,
title = {Dueling Network Architectures for Deep Reinforcement Learning},
author = {Ziyun Wang and Tom Schaul and Matteo Hessel and H. V. Hasselt and Marc Lanctot and Nando de Freitas},
booktitle = {International Conference on Machine Learning},
year = {2015},
url = {https://api.semanticscholar.org/CorpusID:5389801}
}
```