An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with human-feedback

A curated list of projects in awesome lists tagged with human-feedback .

https://github.com/lucidrains/palm-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

artificial-intelligence attention-mechanisms deep-learning human-feedback reinforcement-learning transformers

Last synced: 14 May 2025

https://github.com/lucidrains/PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

artificial-intelligence attention-mechanisms deep-learning human-feedback reinforcement-learning transformers

Last synced: 29 Mar 2025

https://github.com/conceptofmind/lamda-rlhf-pytorch

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

artificial-intelligence attention-mechanism deep-learning human-feedback machine-learning reinforcement-learning transformers

Last synced: 21 Apr 2025

https://github.com/conceptofmind/LaMDA-rlhf-pytorch

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

artificial-intelligence attention-mechanism deep-learning human-feedback machine-learning reinforcement-learning transformers

Last synced: 29 Mar 2025

https://github.com/huggingface/data-is-better-together

Let's build better datasets, together!

community datasets human-feedback machine-learning

Last synced: 14 Oct 2025

https://github.com/wxjiao/ParroT

The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

bloomz chatgpt contrastive error-guided gpt-4 human-feedback instruction-tuning llama lora machine-translation

Last synced: 17 Apr 2025

https://github.com/xrsrke/instructgoose

Implementation of Reinforcement Learning from Human Feedback (RLHF)

chatgpt human-feedback instructgpt reinforcement-learning rlhf

Last synced: 09 Apr 2025

https://github.com/xrsrke/instructGOOSE

Implementation of Reinforcement Learning from Human Feedback (RLHF)

chatgpt human-feedback instructgpt reinforcement-learning rlhf

Last synced: 29 Mar 2025

https://github.com/pku-alignment/beavertails

BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

ai-safety beaver datasets gpt human-feedback human-feedback-data language-model large-language-model llama llm llms rlhf safe-rlhf safety

Last synced: 09 Aug 2025

https://github.com/davidberenstein1957/dataset-viber

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

data-collection data-quality evaluation human-feedback

Last synced: 06 Mar 2025

https://github.com/ZiyiZhang27/tdpo

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

alignment diffusion-models human-feedback reinforcement-learning rlhf stable-diffusion text-to-image

Last synced: 19 Apr 2025

https://github.com/cluebbers/dpo-rlhf-paraphrase-types

Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs to human-ranked data for robust, safety-focused NLP.

alignment deep-learning direct-preference-optimization human-feedback paraphrase-generation paraphrase-type-generation reinforcement-learning transformers

Last synced: 29 Apr 2026

https://github.com/auraoneai/open

Open tools for the human-judgment layer of AI evaluation: EvalKit (Python package + CLI), Robotics ReviewKit, and the Buying Toolkit.

ai-safety auraone evals evaluation human-feedback lerobot llm openx rlds robotics rubrics teleoperation

Last synced: 28 May 2026

https://github.com/sunwang-ai-linguist/bilingual-rlhf-semantic-repair-corpus

Daily Mandarin-English semantic alignment corpus for RLHF training, tone repair, AI metaphor translation, and OpenAI contributor tracking. #SamPickMe #RLHF #TSMC

bilingual bilingual-corpora chatgpt crosslingual crosslingual-transfer gpt-training human-feedback openai rlhf sam-altman semantic-alignment tone-correction

Last synced: 28 Apr 2026