https://github.com/genji970/llm_rl_fine_tuning_for_solving_rl_problem
This repo follows three steps. First, do actor critic method and get trajectory. Second, Put these trajectory to pretrained llm, meta llama for lora fine tuning. Third, with fine tuned model, do rl training
https://github.com/genji970/llm_rl_fine_tuning_for_solving_rl_problem
Last synced: 3 months ago
JSON representation
This repo follows three steps. First, do actor critic method and get trajectory. Second, Put these trajectory to pretrained llm, meta llama for lora fine tuning. Third, with fine tuned model, do rl training
- Host: GitHub
- URL: https://github.com/genji970/llm_rl_fine_tuning_for_solving_rl_problem
- Owner: genji970
- License: mit
- Created: 2024-11-25T15:48:50.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-11-25T16:00:22.000Z (6 months ago)
- Last Synced: 2024-11-25T16:39:30.850Z (6 months ago)
- Language: Jupyter Notebook
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE