https://github.com/genji970/llm_rl_fine_tuning_for_solving_rl_problem

This repo follows three steps. First, do actor critic method and get trajectory. Second, Put these trajectory to pretrained llm, meta llama for lora fine tuning. Third, with fine tuned model, do rl training
https://github.com/genji970/llm_rl_fine_tuning_for_solving_rl_problem

Last synced: 3 months ago
JSON representation

This repo follows three steps. First, do actor critic method and get trajectory. Second, Put these trajectory to pretrained llm, meta llama for lora fine tuning. Third, with fine tuned model, do rl training

Host: GitHub
URL: https://github.com/genji970/llm_rl_fine_tuning_for_solving_rl_problem
Owner: genji970
License: mit
Created: 2024-11-25T15:48:50.000Z (6 months ago)
Default Branch: main
Last Pushed: 2024-11-25T16:00:22.000Z (6 months ago)
Last Synced: 2024-11-25T16:39:30.850Z (6 months ago)
Language: Jupyter Notebook
Size: 0 Bytes
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project