Projects in Awesome Lists tagged with rlhf

https://github.com/laion-ai/open-assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

ai assistant chatgpt discord-bot language-model machine-learning nextjs python rlhf

Last synced: 16 Dec 2024

https://github.com/LAION-AI/Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

ai assistant chatgpt discord-bot language-model machine-learning nextjs python rlhf

Last synced: 26 Oct 2024

https://github.com/hiyouga/llama-factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

agent ai chatglm fine-tuning gpt instruction-tuning language-model large-language-models llama llama3 llm lora mistral moe peft qlora quantization qwen rlhf transformers

Last synced: 22 Dec 2024

https://github.com/hiyouga/LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

agent ai chatglm fine-tuning gpt instruction-tuning language-model large-language-models llama llama3 llm lora mistral moe peft qlora quantization qwen rlhf transformers

Last synced: 25 Oct 2024

https://github.com/hiyouga/LLaMA-Efficient-Tuning

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

agent ai chatglm fine-tuning gpt instruction-tuning language-model large-language-models llama llama3 llm lora mistral moe peft qlora quantization qwen rlhf transformers

Last synced: 05 Sep 2024

https://github.com/rucaibox/llmsurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

chain-of-thought chatgpt in-context-learning instruction-tuning large-language-models llm llms natural-language-processing pre-trained-language-models pre-training rlhf

Last synced: 16 Dec 2024

https://github.com/RUCAIBox/LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

chain-of-thought chatgpt in-context-learning instruction-tuning large-language-models llm llms natural-language-processing pre-trained-language-models pre-training rlhf

Last synced: 25 Oct 2024

https://github.com/ymcui/chinese-llama-alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn

Last synced: 22 Dec 2024

https://github.com/ymcui/Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn

Last synced: 29 Oct 2024

https://github.com/InternLM/InternLM

Official release of InternLM2 7B and 20B base and chat models. 200K context support

chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf

Last synced: 27 Oct 2024

https://github.com/internlm/internlm

Official release of InternLM2 7B and 20B base and chat models. 200K context support

chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf

Last synced: 17 Dec 2024

https://github.com/huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

llm rlhf transformers

Last synced: 16 Dec 2024

https://github.com/argilla-io/argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

active-learning ai annotation-tool developer-tools gpt-4 human-in-the-loop langchain llm machine-learning mlops natural-language-processing nlp rlhf text-annotation text-labeling weak-supervision weakly-supervised-learning

Last synced: 16 Dec 2024

https://github.com/hiyouga/chatglm-efficient-tuning

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

alpaca chatglm chatglm2 chatgpt fine-tuning huggingface language-model lora peft pytorch qlora rlhf transformers

Last synced: 26 Sep 2024

https://github.com/hiyouga/ChatGLM-Efficient-Tuning

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

alpaca chatglm chatglm2 chatgpt fine-tuning huggingface language-model lora peft pytorch qlora rlhf transformers

Last synced: 31 Oct 2024

https://github.com/docta-ai/docta

A Doctor for your data

data data-centric-ai data-centric-machine-learning data-curation data-diagnosis language-model rlhf

Last synced: 17 Dec 2024

https://github.com/Docta-ai/docta

A Doctor for your data

data data-centric-ai data-centric-machine-learning data-curation data-diagnosis language-model rlhf

Last synced: 30 Oct 2024

https://github.com/thudm/webglm

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

chatgpt llm rlhf webglm

Last synced: 22 Dec 2024

https://github.com/THUDM/WebGLM

WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)

chatgpt llm rlhf webglm

Last synced: 29 Oct 2024

https://github.com/tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf

Last synced: 17 Dec 2024

https://github.com/pku-alignment/safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna

Last synced: 19 Dec 2024

https://github.com/PKU-Alignment/safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna

Last synced: 16 Nov 2024

https://github.com/thudm/imagereward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

diffusion-models generative-model human-preferences rlhf

Last synced: 18 Dec 2024

https://github.com/THUDM/ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

diffusion-models generative-model human-preferences rlhf

Last synced: 31 Oct 2024

https://tatsu-lab.github.io/alpaca_eval/

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf

Last synced: 28 Oct 2024

https://github.com/xtreme1-io/xtreme1

Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.

3d-annotation annotation annotation-tool computer-vision image-annotation image-classification image-labelling-tool labeling-tool multimodal point-cloud rlhf

Last synced: 28 Oct 2024

https://github.com/argilla-io/distilabel

Distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

ai huggingface llms openai python rlaif rlhf synthetic-data synthetic-dataset-generation

Last synced: 18 Dec 2024

https://github.com/GaryYufei/AlignLLMHumanSurvey

Aligning Large Language Models with Human: A Survey

awesome chatgpt chinese-llama gpt-4 large-language-models llama llama2 llms rlhf supervised-finetuning survey

Last synced: 17 Nov 2024

https://github.com/garyyufei/alignllmhumansurvey

Aligning Large Language Models with Human: A Survey

awesome chatgpt chinese-llama gpt-4 large-language-models llama llama2 llms rlhf supervised-finetuning survey

Last synced: 03 Dec 2024

https://github.com/jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese

聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)

chinese finance large-language-models llama nlp qa rlhf sft text-generation transformers

Last synced: 02 Nov 2024

https://github.com/jianzhnie/llamatuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

chatgpt dpo llama llama3 mixtral ppo qlora qwen rlhf

Last synced: 21 Dec 2024

https://github.com/jianzhnie/LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

chatgpt dpo llama llama3 mixtral ppo qlora qwen rlhf

Last synced: 10 Nov 2024

https://github.com/voidful/textrl

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

chatgpt controlled-nlg gpt-2 gpt-3 language-model nlg nlp pytorch reinforcement-learning rlhf

Last synced: 09 Nov 2024

https://github.com/voidful/TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

chatgpt controlled-nlg gpt-2 gpt-3 language-model nlg nlp pytorch reinforcement-learning rlhf

Last synced: 31 Oct 2024

https://github.com/mindspore-courses/step_into_llm

MindSpore online courses: Step into LLM

bert chatglm chatglm2 chatgpt codegeex gpt gpt2 instruction-tuning large-language-models llama llama2 llm mindspore moe natural-language-processing nlp parallel-computing peft prompt-tuning rlhf

Last synced: 21 Dec 2024

https://github.com/CambioML/pykoi-rlhf-finetuned-transformers

pykoi: Active learning in one unified interface

ai chatbot feedback language-model llm machine-learning rlhf

Last synced: 05 Nov 2024

https://github.com/rlhflow/online-rlhf

A recipe for online RLHF and online iterative DPO.

llama3 llm rlhf

Last synced: 21 Dec 2024

https://github.com/transformerlab/transformerlab-app

Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.

electron llama llms lora mlx rlhf transformers

Last synced: 21 Dec 2024

https://github.com/WangRongsheng/MedQA-ChatGLM

🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调，我们的眼光不止于医疗问答

chatglm-6b chatgpt dataset fine-tuning freeze huggingface large-language-models llms lora medical rlhf transformer

Last synced: 09 Nov 2024

https://github.com/tudb-labs/mlora

An Efficient "Factory" to Build Multiple LoRA Adapters

baichuan chatglm dpo finetune gpu llama llama2 llm lora mlora peft rlhf

Last synced: 21 Dec 2024

https://github.com/haoliuhl/chain-of-hindsight

Chain-of-Hindsight, A Scalable RLHF Method

large-language-models learning-from-human-feedback rlhf

Last synced: 17 Nov 2024

https://github.com/forhaoliu/chain-of-hindsight

Chain-of-Hindsight, A Scalable RLHF Method

large-language-models learning-from-human-feedback rlhf

Last synced: 10 Oct 2024

https://github.com/jackaduma/vicuna-lora-rlhf-pytorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

chatgpt finetune gpt llama llm lora peft ppo pytorch reward-models rlhf vicuna vicuna-7b

Last synced: 11 Nov 2024

https://github.com/tomekkorbak/pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

ai-alignment ai-safety decision-transformers gpt language-models pretraining reinforcement-learning rlhf

Last synced: 19 Dec 2024

https://github.com/jianzhnie/open-chatgpt

The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.

chatgpt gpt llama llm lora peft ppo rlhf stanford-alpaca

Last synced: 18 Dec 2024

https://github.com/xrsrke/instructGOOSE

Implementation of Reinforcement Learning from Human Feedback (RLHF)

chatgpt human-feedback instructgpt reinforcement-learning rlhf

Last synced: 31 Oct 2024

https://github.com/xrsrke/instructgoose

Implementation of Reinforcement Learning from Human Feedback (RLHF)

chatgpt human-feedback instructgpt reinforcement-learning rlhf

Last synced: 17 Dec 2024

https://github.com/allenai/reward-bench

RewardBench: the first evaluation tool for reward models.

preference-learning rlhf

Last synced: 10 Nov 2024

https://github.com/rlhflow/rlhf-reward-modeling

A recipe to train reward models for RLHF.

llm reward-functions rlhf

Last synced: 19 Dec 2024

https://github.com/liziniu/ReMax

Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)

large-language-models policy-gradient reinforcement-learning rlhf

Last synced: 16 Nov 2024

https://github.com/mihirp1998/VADER

Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.

alignment diffusion reinforcement-learning reinforcement-learning-human-feedback rl rlhf vader video-diffusion video-diffusion-alignment

Last synced: 31 Oct 2024

https://github.com/jackaduma/chatglm-lora-rlhf-pytorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

chatglm chatglm-6b chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf

Last synced: 11 Nov 2024

https://github.com/l294265421/alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

alpaca chatgpt language-model large-language-models llama llm reinforcement-learning rlhf

Last synced: 31 Oct 2024

https://github.com/PKU-Alignment/align-anything

Align Anything: Training All-modality Model with Feedback

chameleon dpo large-language-models multimodal rlhf vision-language-model

Last synced: 02 Nov 2024

https://github.com/nlp-uoregon/okapi

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

bloom chatbot dataset instruction-tuning language-model large-language-models llama multilingual natural-language-processing nlp question-answering reinforcement-learning reinforcement-learning-from-human-feedback rlhf

Last synced: 11 Nov 2024

https://github.com/nlp-uoregon/Okapi

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

bloom chatbot dataset instruction-tuning language-model large-language-models llama multilingual natural-language-processing nlp question-answering reinforcement-learning reinforcement-learning-from-human-feedback rlhf

Last synced: 05 Oct 2024

https://github.com/opening-up-chatgpt/opening-up-chatgpt.github.io

Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

chatgpt chatgpt-free llm open-source rlhf transparency

Last synced: 08 Nov 2024

https://github.com/cogment/cogment-verse

Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)

cogment human-in-the-loop-learning reinforcement-learning rlhf

Last synced: 31 Oct 2024

https://github.com/log10-io/log10

Python client library for improving your LLM app accuracy

agents ai anthropic artificial-intelligence autonomous-agents debugging evaluations feedback fine-tuning llmops llms logging monitoring openai python rlhf

Last synced: 22 Dec 2024

https://github.com/niutrans/vision-llm-alignment

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

alignment dpo llama3-vision llava llm mllm multi-model ppo reward rlhf sft vision

Last synced: 18 Nov 2024

https://github.com/natolambert/rlhf-book

Textbook on reinforcement learning from human feedback

ai alignment rlhf

Last synced: 10 Dec 2024

https://github.com/jackaduma/alpaca-lora-rlhf-pytorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

alpaca chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf

Last synced: 11 Nov 2024

https://github.com/rlhflow/directional-preference-alignment

Directional Preference Alignment

ai-alignment large-language-models rlhf

Last synced: 15 Nov 2024

https://github.com/ssbuild/chatglm_rlhf

chatglm_rlhf_finetuning

chat chatglm finetuning lora qlora reward rlhf

Last synced: 08 Nov 2024

https://github.com/quentinwach/image-ranker

Rank images using TrueSkill by comparing them against each other in the browser. 🖼📊

data-analysis deep-learning fine-tuning finetune flux generative-ai image-analysis image-annotation image-annotation-tool image-classification image-rank rank ranking ranking-algorithm reinforcement-learning rlhf stable-diffusion trueskill trueskill-algorithm web-ui

Last synced: 13 Nov 2024

https://github.com/ssbuild/llm_rlhf

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

llm llm-rlhf lora reward rlhf trl trlx

Last synced: 08 Nov 2024

https://github.com/AmirMotefaker/Create-your-own-ChatGPT

Create your own ChatGPT with Python

ai artificial-intelligence chatgpt chatgpt-api chatgpt3 large-language-model llm machine-learning ml openai openai-api python rlhf

Last synced: 31 Oct 2024

https://github.com/arunprsh/ChatGPT-Decoded-GPT2-FAQ-Bot-RLHF-PPO

A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback using GPT-2 on AWS

aws chatbot gpt-2 gpt2 question-answering reinforcement-learning rlhf sagemaker transformers

Last synced: 31 Oct 2024

https://github.com/log10-io/log10js

JavaScript client library for managing your LLM data in one place

ai artificial-intelligence autonomous-agents debugging javascript langchain langchain-js llmops logging monitoring openai openai-api rlhf

Last synced: 14 Dec 2024

https://github.com/DaehanKim/EasyRLHF

EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets

dpo instruction-tuning ipo language-model rlhf rrhf sft

Last synced: 31 Oct 2024

https://github.com/jeremy-collins/robot-rlhf

Robot Learning from Human Feedback. Inspired by advancements in NLP, we train a robot policy via reinforcement learning using a reward function learned exclusively from human preferences.

alignment chatgpt reinforcement-learning rlhf robotics

Last synced: 31 Oct 2024

https://github.com/codename-detective/prompt-to-song-generation-using-large-language-models

This project uses LLMs to generate music from text by understanding prompts, creating lyrics, determining genre, and composing melodies. It harnesses LLM capabilities to create songs based on text inputs through a multi-step approach.

deep-learning deep-reinforcement-learning flan-t5 genre-classification llama3 llms natural-language-processing policy-gradient rlhf seq-to-seq transformers

Last synced: 15 Nov 2024

https://github.com/pku-alignment/llms-resist-alignment

Repo for paper "Language Models Resist Alignment"

ai-safety alignment alpaca llama llama2 llama3 llm llms rlhf safe safe-rlhf vicuna

Last synced: 03 Dec 2024

https://github.com/ZiyiZhang27/tdpo

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

alignment diffusion-models human-feedback reinforcement-learning rlhf stable-diffusion text-to-image

Last synced: 09 Nov 2024

https://github.com/shreyansh26/llm-activation-steering-experiments

Some experiments with activation steering in LLMs

llama2 llama2-7b red-teaming rlhf

Last synced: 14 Nov 2024

https://github.com/li-plus/nanorlhf

Train a tiny LLaMA model from scratch to repeat your words using Reinforcement Learning from Human Feedback (RLHF)

deep-reinforcement-learning llama llm ppo reinforcement-learning rlhf

Last synced: 06 Nov 2024

https://github.com/himanshuvnm/foundation-model-large-language-model-fm-llm

This repository was commited under the action of executing important tasks on which modern Generative AI concepts are laid on. In particular, we focussed on three coding actions of Large Language Models. Extra and necessary details are given in the README.md file.

attention-is-all-you-need aws fine-tuning flan-t5 foundation-models generative-ai hate-speech-detection huggingface huggingface-transformers large-language-models lora low-rank-ada ml-m5-2xlarge peft-fine-tuning-llm python3 pytorch rlhf rnn-pytorch

Last synced: 06 Nov 2024

https://github.com/ssbuild/t5_rlhf

chatyuan_rlhf_training

adalora lora ppo qlora reward rlhf t5

Last synced: 08 Nov 2024

https://github.com/augustasmacijauskas/mlmi-thesis

Code for my thesis titled "Eliciting latent knowledge from language reward models" for the MPhil in Machine Learning and Machine Intelligence at the University of Cambridge

alignment interpretability rlhf

Last synced: 24 Nov 2024

https://github.com/neugence/acehub

AI Champions for Excellence: Fresh, informative courses and content designed to help developers, researchers, and leaders advance in the field of AI.

ai cuda cv ml mlops nlp pytorch rl rlhf tensorflow

Last synced: 13 Oct 2024

https://github.com/soheil-mp/rlhf-in-llama

Reinforcement Learning from Human Feedback (RLHF) in Llama 2

google-cloud llama rlhf

Last synced: 09 Dec 2024

https://github.com/esmail-ibraheem/axon

AI research lab🔬: implementations of AI papers and theoretical research: InstructGPT, llama, transformers, diffusion models, RLHF, etc...

llama llms paper-implementations pytorch rlhf transformers

Last synced: 13 Nov 2024