Projects in Awesome Lists tagged with trl
A curated list of projects in awesome lists tagged with trl .
https://github.com/argilla-io/notus
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
alignment-handbook dpo fine-tuning lm-alignment preference-data trl zephyr
Last synced: 02 May 2025
https://github.com/akshint0407/nano-r1
This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.
adapters grpo huggingface python qwen2-5 safetensors text-generation-inference transformer trl unsloth
Last synced: 09 Apr 2025
https://github.com/mikesterner87/nano-r1
This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.
adapters build grpo huggingface nanopi nanopi-r1 nanopi-r1s openwrt python safetensors text-generation-inference transformer trl unsloth
Last synced: 10 Apr 2025