Projects in Awesome Lists by PKU-Alignment
A curated list of projects in awesome lists by PKU-Alignment .
https://github.com/pku-alignment/align-anything
Align Anything: Training All-modality Model with Feedback
chameleon dpo large-language-models multimodal rlhf vision-language-model
Last synced: 14 May 2025
https://github.com/PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback
chameleon dpo large-language-models multimodal rlhf vision-language-model
Last synced: 01 Apr 2025
https://github.com/pku-alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna
Last synced: 16 May 2025
https://github.com/PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna
Last synced: 09 May 2025
https://github.com/PKU-Alignment/omnisafe
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
benchmark constraint-rl constraint-satisfaction-problem deep-learning deep-reinforcement-learning machine-learning pytorch reinforcement-learning safe-reinforcement-learning safe-rl saferl safety-critical safety-gym safety-gymnasium
Last synced: 30 Jul 2025
https://github.com/pku-alignment/omnisafe
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
benchmark constraint-rl constraint-satisfaction-problem deep-learning deep-reinforcement-learning machine-learning pytorch reinforcement-learning safe-reinforcement-learning safe-rl saferl safety-critical safety-gym safety-gymnasium
Last synced: 14 May 2025
https://github.com/pku-alignment/safety-gymnasium
NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
constraint-rl constraint-satisfaction-problem reinforcement-learning safe-policy-optimization safe-reinforcement-learning safe-reinforcement-learning-environments safety-critical safety-critical-systems
Last synced: 14 May 2025
https://github.com/PKU-Alignment/safety-gymnasium
NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
constraint-rl constraint-satisfaction-problem reinforcement-learning safe-policy-optimization safe-reinforcement-learning safe-reinforcement-learning-environments safety-critical safety-critical-systems
Last synced: 30 Jul 2025
https://github.com/pku-alignment/safe-policy-optimization
NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms
benchmarks constrained-reinforcement-learning reinforcement-learning-algorithms safe safe-reinforcement-learning
Last synced: 07 May 2025
https://github.com/pku-alignment/aligner
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
aisafety aligner alignment interpretability llm mecinterp rlhf weak-to-strong
Last synced: 07 May 2025
https://github.com/pku-alignment/beavertails
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
ai-safety beaver datasets gpt human-feedback human-feedback-data language-model large-language-model llama llm llms rlhf safe-rlhf safety
Last synced: 09 Aug 2025
https://github.com/pku-alignment/alignmentsurvey
AI Alignment: A Comprehensive Survey
ai alignment awesome deep-learning interpretability large-language-models papers red-teaming reinforcement-learning survey
Last synced: 07 May 2025
https://github.com/PKU-Alignment/AlignmentSurvey
AI Alignment: A Comprehensive Survey
ai alignment awesome deep-learning interpretability large-language-models papers red-teaming reinforcement-learning survey
Last synced: 07 May 2025
https://github.com/pku-alignment/proagent
AAAI24(Oral) ProAgent: Building Proactive Cooperative Agents with Large Language Models
cooperative cooperative-ai human-ai human-ai-interaction language-model llm-agent overcooked
Last synced: 07 May 2025
https://github.com/pku-alignment/safedreamer
ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models
constraint-rl constraint-satisfaction-problem reinforcement-learning safe-policy-optimization safe-reinforcement-learning safety-critical-systems
Last synced: 07 May 2025
https://github.com/pku-alignment/safe-sora
SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
alignment human-preferences large-vision-models text-to-video-generation
Last synced: 07 May 2025
https://github.com/pku-alignment/progressgym
Alignment with a millennium of moral progress. Spotlight@NeurIPS 2024 Track on Datasets and Benchmarks.
Last synced: 07 May 2025
https://github.com/pku-alignment/redman
ReDMan is an open-source simulation platform that provides a standardized implementation of safe RL algorithms for Reliable Dexterous Manipulation.
Last synced: 07 May 2025
https://github.com/pku-alignment/sae-v
[ICML 2025 Poster] SAE-V: Interpreting Multimodal Models for Enhanced Alignment
Last synced: 19 Feb 2026