An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with trl

A curated list of projects in awesome lists tagged with trl .

https://github.com/argilla-io/notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

alignment-handbook dpo fine-tuning lm-alignment preference-data trl zephyr

Last synced: 02 May 2025

https://github.com/ssbuild/llm_rlhf

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

llm llm-rlhf lora reward rlhf trl trlx

Last synced: 24 Apr 2025

https://github.com/sofiakhutsieva/llm_experiments

Эксперименты с LLM (инференс, rag, дообучение)

langchain llamacpp llm mistral peft rag trl

Last synced: 08 Apr 2025

https://github.com/akshint0407/nano-r1

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

adapters grpo huggingface python qwen2-5 safetensors text-generation-inference transformer trl unsloth

Last synced: 09 Apr 2025

https://github.com/mikesterner87/nano-r1

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

adapters build grpo huggingface nanopi nanopi-r1 nanopi-r1s openwrt python safetensors text-generation-inference transformer trl unsloth

Last synced: 10 Apr 2025