An open API service indexing awesome lists of open source software.

https://github.com/genji970/llm_rl_fine_tuning_for_solving_rl_problem

This repo follows three steps. First, do actor critic method and get trajectory. Second, Put these trajectory to pretrained llm, meta llama for lora fine tuning. Third, with fine tuned model, do rl training
https://github.com/genji970/llm_rl_fine_tuning_for_solving_rl_problem

Last synced: 3 months ago
JSON representation

This repo follows three steps. First, do actor critic method and get trajectory. Second, Put these trajectory to pretrained llm, meta llama for lora fine tuning. Third, with fine tuned model, do rl training

Awesome Lists containing this project