An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with ppo

A curated list of projects in awesome lists tagged with ppo .

https://github.com/datawhalechina/easy-rl

强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/

a3c ddpg deep-reinforcement-learning double-dqn dqn dueling-dqn easy-rl imitation-learning policy-gradient ppo q-learning reinforcement-learning sarsa td3

Last synced: 10 May 2025

https://github.com/thu-ml/tianshou

An elegant PyTorch deep reinforcement learning library.

a2c atari bcq cql ddpg double-dqn dqn drl imitation-learning mujoco npg policy-gradient ppo pytorch rl sac td3 transferlab trpo

Last synced: 13 May 2025

https://github.com/vwxyzjn/cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

a2c actor-critic advantage-actor-critic ale atari deep-learning deep-reinforcement-learning gym machine-learning phasic-policy-gradient ppo proximal-policy-optimization python pytorch reinforcement-learning wandb

Last synced: 14 May 2025

https://github.com/andri27-ts/reinforcement-learning

Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning

a2c artificial-intelligence deep-learning deep-reinforcement-learning deepmind dqn evolution-strategies machine-learning policy-gradients ppo qlearning reinforcement-learning

Last synced: 15 May 2025

https://github.com/andri27-ts/Reinforcement-Learning

Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning

a2c artificial-intelligence deep-learning deep-reinforcement-learning deepmind dqn evolution-strategies machine-learning policy-gradients ppo qlearning reinforcement-learning

Last synced: 15 Mar 2025

https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

a2c acktr actor-critic advantage-actor-critic ale atari continuous-control deep-learning deep-reinforcement-learning hessian kfac kronecker-factored-approximation mujoco natural-gradients ppo proximal-policy-optimization pytorch reinforcement-learning roboschool second-order

Last synced: 13 Apr 2025

https://github.com/seungeunrho/minimalrl

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

a2c a3c acer ddpg deep-learning deep-reinforcement-learning dqn machine-learning policy-gradients ppo pytorch reinforce reinforcement-learning sac simple

Last synced: 15 May 2025

https://github.com/seungeunrho/minimalRL

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

a2c a3c acer ddpg deep-learning deep-reinforcement-learning dqn machine-learning policy-gradients ppo pytorch reinforce reinforcement-learning sac simple

Last synced: 03 Apr 2025

https://github.com/XinJingHao/DRL-Pytorch

Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)

asl c51 categorical-dqn ddpg deep-reinforcement-learning double-dqn dueling-dqn machine-learning noisynet-dqn ppo prioritized-experience-replay pytorch q-learning reinforcement-learning sac td3

Last synced: 04 Mar 2025

https://github.com/marlbenchmark/on-policy

This is the official implementation of Multi-Agent PPO (MAPPO).

algorithms hanabi mappo mpes multi-agent ppo smac starcraftii

Last synced: 15 May 2025

https://github.com/kengz/slm-lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

a2c a3c benchmark deep-reinforcement-learning dqn policy-gradient ppo pytorch reinforcement-learning sac

Last synced: 14 May 2025

https://github.com/kengz/SLM-Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

a2c a3c benchmark deep-reinforcement-learning dqn policy-gradient ppo pytorch reinforcement-learning sac

Last synced: 01 Apr 2025

https://github.com/khrylx/pytorch-rl

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

a2c deep-reinforcement-learning fisher-vectors generative-adversarial-network policy-gradient ppo proximal-policy-optimization pytorch pytorch-rl reinforcement-learning trpo

Last synced: 12 Apr 2025

https://github.com/Khrylx/PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

a2c deep-reinforcement-learning fisher-vectors generative-adversarial-network policy-gradient ppo proximal-policy-optimization pytorch pytorch-rl reinforcement-learning trpo

Last synced: 29 Apr 2025

https://github.com/lcswillems/rl-starter-files

RL starter files in order to immediately train, visualize and evaluate an agent without writing any line of code

a2c a3c minigrid multi-process ppo preprocessed-observations pytorch reward-shaping

Last synced: 26 Oct 2025

https://github.com/TianhongDai/reinforcement-learning-algorithms

This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)

a2c actor-critic algorithm atari2600 ddpg deep-learning deep-reinforcement-learning dqn dueling-dqn flappy-bird ppo proximal-policy-optimization pytorch sac soft-actor-critic trpo trust-region-policy-optimization

Last synced: 19 Jul 2025

https://github.com/archsyscall/deeprl-tensorflow2

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

a2c a3c ddpg deep-learning deep-reinforcement-learning double-dqn dqn dueling-dqn machine-learning ppo rainbow-dqn reinforce reinforcement-learning sac tensorflow tensorflow2 trpo

Last synced: 05 Apr 2025

https://github.com/archsyscall/DeepRL-TensorFlow2

🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2

a2c a3c ddpg deep-learning deep-reinforcement-learning double-dqn dqn dueling-dqn machine-learning ppo rainbow-dqn reinforce reinforcement-learning sac tensorflow tensorflow2 trpo

Last synced: 15 Oct 2025

https://github.com/jianzhnie/LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

chatgpt dpo llama llama3 mixtral ppo qlora qwen rlhf

Last synced: 23 Apr 2025

https://github.com/jianzhnie/llamatuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

chatgpt dpo llama llama3 mixtral ppo qlora qwen rlhf

Last synced: 15 May 2025

https://github.com/rohanpsingh/learninghumanoidwalking

Training a humanoid robot for locomotion using Reinforcement Learning

bipedal-robots cassie humanoids jvrc-1 mujoco ppo reinforcement-learning

Last synced: 04 Oct 2025

https://github.com/dongminlee94/deep_rl

PyTorch implementation of deep reinforcement learning algorithms

a2c ddpg ddqn deep-reinforcement-learning dqn model-free-rl npg ppo pytorch sac sac-aea td3 trpo vpg

Last synced: 05 Apr 2025

https://github.com/skylark0924/machine-learning-is-all-you-need

🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!

actor-critic convolutional-neural-networks ddpg decision-trees deep-reinforcement-learning dqn gan k-nearest-neighbours keras logistic-regression lstm naive-bayes-classifier ppo pytorch qlearning random-forest resnet support-vector-machine tensorflow trpo

Last synced: 05 Apr 2025

https://github.com/iffix/machin

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

a3c-pytorch ddpg deep-learning distributed dqn ppo prioritized-experience-replay python pytorch pytorch-lightning pytorch-reinforcement-learning reinforcement-learning sac td3

Last synced: 04 Apr 2025

https://github.com/sail-sg/oat

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

alignment distributed-rl distributed-training dpo dueling-bandits grpo llm llm-aligment llm-exploration online-alignment online-rl ppo r1-zero reasoning rlhf thompson-sampling

Last synced: 08 May 2025

https://github.com/rlgraph/rlgraph

RLgraph: Modular computation graphs for deep reinforcement learning

deep-learning deep-reinforcement-learning dqn machine-learning neural-networks ppo pytorch reinforcement-learning tensorflow

Last synced: 03 Oct 2025

https://github.com/huawei-noah/xingtian

xingtian is a componentized library for the development and verification of reinforcement learning algorithms

dqn impala muzero ppo qmix reinforcement-learning-algorithms

Last synced: 05 Apr 2025

https://github.com/miroblog/tf_deep_rl_trader

Trading Environment(OpenAI Gym) + PPO(TensorForce)

ppo proximal-policy-optimization stock-market tensorflow tensorforce trading

Last synced: 24 Mar 2025

https://github.com/taherfattahi/ppo-rocket-landing

Proximal Policy Optimization (PPO) algorithm using PyTorch to train an agent for a rocket landing task in a custom environment

ai machine-learning ppo ppo-pytorch pytorch reinforcement-learning

Last synced: 04 Apr 2025

https://github.com/jackaduma/vicuna-lora-rlhf-pytorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

chatgpt finetune gpt llama llm lora peft ppo pytorch reward-models rlhf vicuna vicuna-7b

Last synced: 13 Apr 2025

https://github.com/liuzuxin/fsrl

🚀 A fast safe reinforcement learning library in PyTorch

cpo cvpo decision-making library ppo pytorch reinforcement-learning robotics sac safe-rl safety-critical trpo trustworthy-ai

Last synced: 03 Apr 2025

https://github.com/jianzhnie/Open-R1

The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.

chatgpt gpt llama llm lora peft ppo rlhf stanford-alpaca

Last synced: 05 Oct 2025

https://github.com/yongzhuo/chatglm-maths

chatglm-6b微调/LORA/PPO/推理, 样本为自动生成的整数/小数加减乘除运算, 可gpu/cpu

chatglm chatgpt fine-tuning maths maths-problem ppo

Last synced: 11 Oct 2025

https://github.com/gordicaleksa/pytorch-learn-reinforcement-learning

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.

deep-learning deep-q-network dqn jupyter policy-gradient ppo python pytorch pytorch-dqn pytorch-implementation pytorch-policy-gradient pytorch-ppo reinforcement-learning reinforcement-learning-algorithms rl

Last synced: 12 Sep 2025

https://github.com/jackaduma/chatglm-lora-rlhf-pytorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM

chatglm chatglm-6b chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf

Last synced: 27 Apr 2025

https://github.com/niutrans/vision-llm-alignment

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

alignment dpo llama3-vision llava llm mllm multi-model ppo reward rlhf sft vision

Last synced: 06 Apr 2025

https://github.com/NiuTrans/Vision-LLM-Alignment

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

alignment dpo llama3-vision llava llm mllm multi-model ppo reward rlhf sft vision

Last synced: 07 May 2025

https://github.com/lorenmt/minimal-isaac-gym

A Minimal Example of Isaac Gym with DQN and PPO.

dqn isaac-gym ppo pytorch

Last synced: 02 Apr 2025

https://github.com/chendrag/mujoco-benchmark

Provide full reinforcement learning benchmark on mujoco environments, including ddpg, sac, td3, pg, a2c, ppo, library

baseline benchmark ddpg drl mujoco performance ppo pytorch results rl sac tianshou

Last synced: 17 Mar 2025

https://github.com/godka/pensieve-ppo

The simplest implementation of Pensieve (SIGCOMM' 17) via state-of-the-art RL algorithms, including PPO, DQN, SAC, and support for both TensorFlow and PyTorch.

a2c deep-learning dqn pensieve ppo pytorch reinforcement-learning tensorflow

Last synced: 07 Apr 2025

https://github.com/zhaoyingjun/general

Alignment成为GPT类大模型微调的必须环节,深度强化学习是Alignment的核心。本项目是一个支持非gym环境训练、支持可视化配置的深度强化学习应用编程框架,30分钟上手强化学习编程。

ddpg deep-reinforcement-learning dqn gui gym ppo tensorflow2

Last synced: 13 Apr 2025

https://github.com/datvodinh/recurrent-ppo

A Reinforcement Learning Project using PPO + LSTM

lstm ppo reinforcement-learning

Last synced: 14 Oct 2025

https://github.com/CN-UPB/DeepCoMP

Dynamic multi-cell selection for cooperative multipoint (CoMP) using (multi-agent) deep reinforcement learning

cell-selection cellular comp mobile multi-agent-reinforcement-learning ppo python ray reinforcement-learning rllib simulation wireless

Last synced: 13 May 2025

https://github.com/jackaduma/alpaca-lora-rlhf-pytorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

alpaca chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf

Last synced: 16 Jun 2025

https://github.com/instadeepai/sebulba

🪐 The Sebulba architecture to scale reinforcement learning on Cloud TPUs in JAX

ai deep-learning hpc jax machine-learning podracer ppo reinforcement-learning sebulba tpu

Last synced: 03 Nov 2025

https://github.com/seungeunrho/football-paris

The exact codes used by the team "liveinparis" at the kaggle football competition ranked 6th/1141

gfootball kaggle liveinparis ppo pytorch reinforcement-learning self-play

Last synced: 28 Apr 2025

https://github.com/ugurkanates/neurirs2019dronechallengerl

Long-Term Planning with Deep Reinforcement Learning on Autonomous Drones

autonomous autonomousdrones deeplearning drl drones pathplanning ppo reinforcementlearning

Last synced: 26 Apr 2025

https://github.com/chagmgang/pytorch_ppo_rl

Pytorch implementation of intrinsic curiosity module with proximal policy optimization

breakout curiosity deep-learning icm intrinsic-curiosity-module multi-process ppo pytorch reinforcement-learning

Last synced: 22 Apr 2025

https://github.com/guichristmann/thormang3-gogoro-PPO

Steering-based control of a two-wheeled vehicle using RL-PPO and NVIDIA Isaac Gym.

isaac-gym ppo pytorch

Last synced: 02 Apr 2025

https://github.com/datvodinh/ppo-transformer

A Reinforcement Learning Project using PPO + Transformer

ppo reinforcement-learning transformer

Last synced: 27 Sep 2025

https://github.com/bilalkabas/DRL-Nav

Deep Reinforcement Learning based autonomous navigation in realistic simulation environments.

airsim deep-reinforcement-learning gym ppo unreal-engine

Last synced: 11 Mar 2025

https://github.com/jason-cky/deeprl-pytorch

Pytorch implementations of various Deep Reinforcement Learning algorithms on pybullet environments.

ddpg ppo pybullet-environments python3 pytorch-implementation reinforcement-learning-algorithms rlbench td3 trpo

Last synced: 10 Jul 2025

https://github.com/yeyupiaoling/pytorch-ppo

基于Pytorch实现的PPO强化学习模型,支持训练各种游戏,如超级马里奥,雪人兄弟,魂斗罗等等。

ppo pytroch

Last synced: 14 Aug 2025

https://github.com/jianzhnie/rltoolkit

RLToolkit is a flexible and high-efficient reinforcement learning framework. Include implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

a2c actor-critic ddpg ddqn dqn maddpg mappo ppo qmix rl sac td3 trpo

Last synced: 03 Aug 2025

https://github.com/mindspore-courses/deep-reinforcement-learning-algorithms-with-mindspore

MindSpore implementations of deep reinforcement learning algorithms and environments

deep-learning deep-reinforcement-learning mindspore ppo reinforcement-learning tutorial

Last synced: 28 Apr 2025