Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with rlhf
A curated list of projects in awesome lists tagged with rlhf .
https://github.com/laion-ai/open-assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
ai assistant chatgpt discord-bot language-model machine-learning nextjs python rlhf
Last synced: 16 Dec 2024
https://github.com/LAION-AI/Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
ai assistant chatgpt discord-bot language-model machine-learning nextjs python rlhf
Last synced: 26 Oct 2024
https://github.com/hiyouga/llama-factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
agent ai chatglm fine-tuning gpt instruction-tuning language-model large-language-models llama llama3 llm lora mistral moe peft qlora quantization qwen rlhf transformers
Last synced: 22 Dec 2024
https://github.com/hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
agent ai chatglm fine-tuning gpt instruction-tuning language-model large-language-models llama llama3 llm lora mistral moe peft qlora quantization qwen rlhf transformers
Last synced: 25 Oct 2024
https://github.com/hiyouga/LLaMA-Efficient-Tuning
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
agent ai chatglm fine-tuning gpt instruction-tuning language-model large-language-models llama llama3 llm lora mistral moe peft qlora quantization qwen rlhf transformers
Last synced: 05 Sep 2024
https://github.com/rucaibox/llmsurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
chain-of-thought chatgpt in-context-learning instruction-tuning large-language-models llm llms natural-language-processing pre-trained-language-models pre-training rlhf
Last synced: 16 Dec 2024
https://github.com/RUCAIBox/LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
chain-of-thought chatgpt in-context-learning instruction-tuning large-language-models llm llms natural-language-processing pre-trained-language-models pre-training rlhf
Last synced: 25 Oct 2024
https://github.com/ymcui/chinese-llama-alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn
Last synced: 22 Dec 2024
https://github.com/ymcui/Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn
Last synced: 29 Oct 2024
https://github.com/InternLM/InternLM
Official release of InternLM2 7B and 20B base and chat models. 200K context support
chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf
Last synced: 27 Oct 2024
https://github.com/internlm/internlm
Official release of InternLM2 7B and 20B base and chat models. 200K context support
chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf
Last synced: 17 Dec 2024
https://github.com/huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
Last synced: 16 Dec 2024
https://github.com/argilla-io/argilla
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
active-learning ai annotation-tool developer-tools gpt-4 human-in-the-loop langchain llm machine-learning mlops natural-language-processing nlp rlhf text-annotation text-labeling weak-supervision weakly-supervised-learning
Last synced: 16 Dec 2024
https://github.com/hiyouga/chatglm-efficient-tuning
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
alpaca chatglm chatglm2 chatgpt fine-tuning huggingface language-model lora peft pytorch qlora rlhf transformers
Last synced: 26 Sep 2024
https://github.com/hiyouga/ChatGLM-Efficient-Tuning
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
alpaca chatglm chatglm2 chatgpt fine-tuning huggingface language-model lora peft pytorch qlora rlhf transformers
Last synced: 31 Oct 2024
https://github.com/docta-ai/docta
A Doctor for your data
data data-centric-ai data-centric-machine-learning data-curation data-diagnosis language-model rlhf
Last synced: 17 Dec 2024
https://github.com/Docta-ai/docta
A Doctor for your data
data data-centric-ai data-centric-machine-learning data-curation data-diagnosis language-model rlhf
Last synced: 30 Oct 2024
https://github.com/thudm/webglm
WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)
Last synced: 22 Dec 2024
https://github.com/THUDM/WebGLM
WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)
Last synced: 29 Oct 2024
https://github.com/tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf
Last synced: 17 Dec 2024
https://github.com/pku-alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna
Last synced: 19 Dec 2024
https://github.com/PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna
Last synced: 16 Nov 2024
https://github.com/thudm/imagereward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
diffusion-models generative-model human-preferences rlhf
Last synced: 18 Dec 2024
https://github.com/THUDM/ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
diffusion-models generative-model human-preferences rlhf
Last synced: 31 Oct 2024
https://tatsu-lab.github.io/alpaca_eval/
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf
Last synced: 28 Oct 2024
https://github.com/xtreme1-io/xtreme1
Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.
3d-annotation annotation annotation-tool computer-vision image-annotation image-classification image-labelling-tool labeling-tool multimodal point-cloud rlhf
Last synced: 28 Oct 2024
https://github.com/argilla-io/distilabel
Distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
ai huggingface llms openai python rlaif rlhf synthetic-data synthetic-dataset-generation
Last synced: 18 Dec 2024
https://github.com/GaryYufei/AlignLLMHumanSurvey
Aligning Large Language Models with Human: A Survey
awesome chatgpt chinese-llama gpt-4 large-language-models llama llama2 llms rlhf supervised-finetuning survey
Last synced: 17 Nov 2024
https://github.com/garyyufei/alignllmhumansurvey
Aligning Large Language Models with Human: A Survey
awesome chatgpt chinese-llama gpt-4 large-language-models llama llama2 llms rlhf supervised-finetuning survey
Last synced: 03 Dec 2024
https://github.com/jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
chinese finance large-language-models llama nlp qa rlhf sft text-generation transformers
Last synced: 02 Nov 2024
https://github.com/voidful/textrl
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
chatgpt controlled-nlg gpt-2 gpt-3 language-model nlg nlp pytorch reinforcement-learning rlhf
Last synced: 09 Nov 2024
https://github.com/voidful/TextRL
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
chatgpt controlled-nlg gpt-2 gpt-3 language-model nlg nlp pytorch reinforcement-learning rlhf
Last synced: 31 Oct 2024
https://github.com/mindspore-courses/step_into_llm
MindSpore online courses: Step into LLM
bert chatglm chatglm2 chatgpt codegeex gpt gpt2 instruction-tuning large-language-models llama llama2 llm mindspore moe natural-language-processing nlp parallel-computing peft prompt-tuning rlhf
Last synced: 21 Dec 2024
https://github.com/CambioML/pykoi-rlhf-finetuned-transformers
pykoi: Active learning in one unified interface
ai chatbot feedback language-model llm machine-learning rlhf
Last synced: 05 Nov 2024
https://github.com/rlhflow/online-rlhf
A recipe for online RLHF and online iterative DPO.
Last synced: 21 Dec 2024
https://github.com/transformerlab/transformerlab-app
Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
electron llama llms lora mlx rlhf transformers
Last synced: 21 Dec 2024
https://github.com/WangRongsheng/MedQA-ChatGLM
🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答
chatglm-6b chatgpt dataset fine-tuning freeze huggingface large-language-models llms lora medical rlhf transformer
Last synced: 09 Nov 2024
https://github.com/haoliuhl/chain-of-hindsight
Chain-of-Hindsight, A Scalable RLHF Method
large-language-models learning-from-human-feedback rlhf
Last synced: 17 Nov 2024
https://github.com/forhaoliu/chain-of-hindsight
Chain-of-Hindsight, A Scalable RLHF Method
large-language-models learning-from-human-feedback rlhf
Last synced: 10 Oct 2024
https://github.com/jackaduma/vicuna-lora-rlhf-pytorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
chatgpt finetune gpt llama llm lora peft ppo pytorch reward-models rlhf vicuna vicuna-7b
Last synced: 11 Nov 2024
https://github.com/tomekkorbak/pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
ai-alignment ai-safety decision-transformers gpt language-models pretraining reinforcement-learning rlhf
Last synced: 19 Dec 2024
https://github.com/jianzhnie/open-chatgpt
The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.
chatgpt gpt llama llm lora peft ppo rlhf stanford-alpaca
Last synced: 18 Dec 2024
https://github.com/xrsrke/instructGOOSE
Implementation of Reinforcement Learning from Human Feedback (RLHF)
chatgpt human-feedback instructgpt reinforcement-learning rlhf
Last synced: 31 Oct 2024
https://github.com/xrsrke/instructgoose
Implementation of Reinforcement Learning from Human Feedback (RLHF)
chatgpt human-feedback instructgpt reinforcement-learning rlhf
Last synced: 17 Dec 2024
https://github.com/allenai/reward-bench
RewardBench: the first evaluation tool for reward models.
Last synced: 10 Nov 2024
https://github.com/rlhflow/rlhf-reward-modeling
A recipe to train reward models for RLHF.
Last synced: 19 Dec 2024
https://github.com/liziniu/ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
large-language-models policy-gradient reinforcement-learning rlhf
Last synced: 16 Nov 2024
https://github.com/mihirp1998/VADER
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.
alignment diffusion reinforcement-learning reinforcement-learning-human-feedback rl rlhf vader video-diffusion video-diffusion-alignment
Last synced: 31 Oct 2024
https://github.com/jackaduma/chatglm-lora-rlhf-pytorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
chatglm chatglm-6b chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf
Last synced: 11 Nov 2024
https://github.com/l294265421/alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
alpaca chatgpt language-model large-language-models llama llm reinforcement-learning rlhf
Last synced: 31 Oct 2024
https://github.com/PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback
chameleon dpo large-language-models multimodal rlhf vision-language-model
Last synced: 02 Nov 2024
https://github.com/nlp-uoregon/okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
bloom chatbot dataset instruction-tuning language-model large-language-models llama multilingual natural-language-processing nlp question-answering reinforcement-learning reinforcement-learning-from-human-feedback rlhf
Last synced: 11 Nov 2024
https://github.com/nlp-uoregon/Okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
bloom chatbot dataset instruction-tuning language-model large-language-models llama multilingual natural-language-processing nlp question-answering reinforcement-learning reinforcement-learning-from-human-feedback rlhf
Last synced: 05 Oct 2024
https://github.com/opening-up-chatgpt/opening-up-chatgpt.github.io
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
chatgpt chatgpt-free llm open-source rlhf transparency
Last synced: 08 Nov 2024
https://github.com/cogment/cogment-verse
Research platform for Human-in-the-loop learning (HILL) & Multi-Agent Reinforcement Learning (MARL)
cogment human-in-the-loop-learning reinforcement-learning rlhf
Last synced: 31 Oct 2024
https://github.com/log10-io/log10
Python client library for improving your LLM app accuracy
agents ai anthropic artificial-intelligence autonomous-agents debugging evaluations feedback fine-tuning llmops llms logging monitoring openai python rlhf
Last synced: 22 Dec 2024
https://github.com/niutrans/vision-llm-alignment
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
alignment dpo llama3-vision llava llm mllm multi-model ppo reward rlhf sft vision
Last synced: 18 Nov 2024
https://github.com/natolambert/rlhf-book
Textbook on reinforcement learning from human feedback
Last synced: 10 Dec 2024
https://github.com/jackaduma/alpaca-lora-rlhf-pytorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
alpaca chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf
Last synced: 11 Nov 2024
https://github.com/rlhflow/directional-preference-alignment
Directional Preference Alignment
ai-alignment large-language-models rlhf
Last synced: 15 Nov 2024
https://github.com/ssbuild/chatglm_rlhf
chatglm_rlhf_finetuning
chat chatglm finetuning lora qlora reward rlhf
Last synced: 08 Nov 2024
https://github.com/quentinwach/image-ranker
Rank images using TrueSkill by comparing them against each other in the browser. 🖼📊
data-analysis deep-learning fine-tuning finetune flux generative-ai image-analysis image-annotation image-annotation-tool image-classification image-rank rank ranking ranking-algorithm reinforcement-learning rlhf stable-diffusion trueskill trueskill-algorithm web-ui
Last synced: 13 Nov 2024
https://github.com/AmirMotefaker/Create-your-own-ChatGPT
Create your own ChatGPT with Python
ai artificial-intelligence chatgpt chatgpt-api chatgpt3 large-language-model llm machine-learning ml openai openai-api python rlhf
Last synced: 31 Oct 2024
https://github.com/arunprsh/ChatGPT-Decoded-GPT2-FAQ-Bot-RLHF-PPO
A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback using GPT-2 on AWS
aws chatbot gpt-2 gpt2 question-answering reinforcement-learning rlhf sagemaker transformers
Last synced: 31 Oct 2024
https://github.com/log10-io/log10js
JavaScript client library for managing your LLM data in one place
ai artificial-intelligence autonomous-agents debugging javascript langchain langchain-js llmops logging monitoring openai openai-api rlhf
Last synced: 14 Dec 2024
https://github.com/DaehanKim/EasyRLHF
EasyRLHF aims to provide an easy and minimal interface to train aligned language models, using off-the-shelf solutions and datasets
dpo instruction-tuning ipo language-model rlhf rrhf sft
Last synced: 31 Oct 2024
https://github.com/jeremy-collins/robot-rlhf
Robot Learning from Human Feedback. Inspired by advancements in NLP, we train a robot policy via reinforcement learning using a reward function learned exclusively from human preferences.
alignment chatgpt reinforcement-learning rlhf robotics
Last synced: 31 Oct 2024
https://github.com/codename-detective/prompt-to-song-generation-using-large-language-models
This project uses LLMs to generate music from text by understanding prompts, creating lyrics, determining genre, and composing melodies. It harnesses LLM capabilities to create songs based on text inputs through a multi-step approach.
deep-learning deep-reinforcement-learning flan-t5 genre-classification llama3 llms natural-language-processing policy-gradient rlhf seq-to-seq transformers
Last synced: 15 Nov 2024
https://github.com/ZiyiZhang27/tdpo
[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"
alignment diffusion-models human-feedback reinforcement-learning rlhf stable-diffusion text-to-image
Last synced: 09 Nov 2024
https://github.com/shreyansh26/llm-activation-steering-experiments
Some experiments with activation steering in LLMs
llama2 llama2-7b red-teaming rlhf
Last synced: 14 Nov 2024
https://github.com/li-plus/nanorlhf
Train a tiny LLaMA model from scratch to repeat your words using Reinforcement Learning from Human Feedback (RLHF)
deep-reinforcement-learning llama llm ppo reinforcement-learning rlhf
Last synced: 06 Nov 2024
https://github.com/himanshuvnm/foundation-model-large-language-model-fm-llm
This repository was commited under the action of executing important tasks on which modern Generative AI concepts are laid on. In particular, we focussed on three coding actions of Large Language Models. Extra and necessary details are given in the README.md file.
attention-is-all-you-need aws fine-tuning flan-t5 foundation-models generative-ai hate-speech-detection huggingface huggingface-transformers large-language-models lora low-rank-ada ml-m5-2xlarge peft-fine-tuning-llm python3 pytorch rlhf rnn-pytorch
Last synced: 06 Nov 2024
https://github.com/augustasmacijauskas/mlmi-thesis
Code for my thesis titled "Eliciting latent knowledge from language reward models" for the MPhil in Machine Learning and Machine Intelligence at the University of Cambridge
alignment interpretability rlhf
Last synced: 24 Nov 2024
https://github.com/soheil-mp/rlhf-in-llama
Reinforcement Learning from Human Feedback (RLHF) in Llama 2
Last synced: 09 Dec 2024
https://github.com/esmail-ibraheem/axon
AI research lab🔬: implementations of AI papers and theoretical research: InstructGPT, llama, transformers, diffusion models, RLHF, etc...
llama llms paper-implementations pytorch rlhf transformers
Last synced: 13 Nov 2024