Projects in Awesome Lists tagged with ppo
A curated list of projects in awesome lists tagged with ppo .
https://github.com/datawhalechina/easy-rl
强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
a3c ddpg deep-reinforcement-learning double-dqn dqn dueling-dqn easy-rl imitation-learning policy-gradient ppo q-learning reinforcement-learning sarsa td3
Last synced: 10 May 2025
https://github.com/morvanzhou/reinforcement-learning-with-tensorflow
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
a3c actor-critic asynchronous-advantage-actor-critic ddpg deep-deterministic-policy-gradient deep-q-network double-dqn dqn dueling-dqn machine-learning policy-gradient ppo prioritized-replay proximal-policy-optimization q-learning reinforcement-learning sarsa sarsa-lambda tensorflow-tutorials tutorial
Last synced: 13 May 2025
https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
a3c actor-critic asynchronous-advantage-actor-critic ddpg deep-deterministic-policy-gradient deep-q-network double-dqn dqn dueling-dqn machine-learning policy-gradient ppo prioritized-replay proximal-policy-optimization q-learning reinforcement-learning sarsa sarsa-lambda tensorflow-tutorials tutorial
Last synced: 30 Mar 2025
https://github.com/thu-ml/tianshou
An elegant PyTorch deep reinforcement learning library.
a2c atari bcq cql ddpg double-dqn dqn drl imitation-learning mujoco npg policy-gradient ppo pytorch rl sac td3 transferlab trpo
Last synced: 13 May 2025
https://github.com/vwxyzjn/cleanrl
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
a2c actor-critic advantage-actor-critic ale atari deep-learning deep-reinforcement-learning gym machine-learning phasic-policy-gradient ppo proximal-policy-optimization python pytorch reinforcement-learning wandb
Last synced: 14 May 2025
https://github.com/udacity/deep-reinforcement-learning
Repo for the Deep Reinforcement Learning Nanodegree program
cross-entropy ddpg deep-reinforcement-learning dqn dynamic-programming hill-climbing ml-agents neural-networks openai-gym openai-gym-solutions ppo pytorch pytorch-rl reinforcement-learning reinforcement-learning-algorithms rl-algorithms
Last synced: 19 Jul 2025
https://github.com/andri27-ts/reinforcement-learning
Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning
a2c artificial-intelligence deep-learning deep-reinforcement-learning deepmind dqn evolution-strategies machine-learning policy-gradients ppo qlearning reinforcement-learning
Last synced: 15 May 2025
https://github.com/andri27-ts/Reinforcement-Learning
Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning
a2c artificial-intelligence deep-learning deep-reinforcement-learning deepmind dqn evolution-strategies machine-learning policy-gradients ppo qlearning reinforcement-learning
Last synced: 15 Mar 2025
https://github.com/sweetice/deep-reinforcement-learning-with-pytorch
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
a2c a3c actor-critic actor-critic-algorithm algorithm alphago deep-learning deep-reinforcement-learning dqn policy-gradient ppo pytorch reinforce resnet sac sarsa td3 trpo
Last synced: 14 May 2025
https://github.com/ai4finance-foundation/elegantrl
Massively Parallel Deep Reinforcement Learning. 🔥
a2c bipedalwalkerhardcore ddpg dqn drl-pytorch efficient gae lightweight model-free-rl multiple-gpu per ppo pytorch reinforcement-learning sac stable td3
Last synced: 13 May 2025
https://github.com/AI4Finance-Foundation/ElegantRL
Massively Parallel Deep Reinforcement Learning. 🔥
a2c bipedalwalkerhardcore ddpg dqn drl-pytorch efficient gae lightweight model-free-rl multiple-gpu per ppo pytorch reinforcement-learning sac stable td3
Last synced: 02 Apr 2025
https://github.com/sweetice/Deep-reinforcement-learning-with-pytorch
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
a2c a3c actor-critic actor-critic-algorithm algorithm alphago deep-learning deep-reinforcement-learning dqn policy-gradient ppo pytorch reinforce resnet sac sarsa td3 trpo
Last synced: 01 May 2025
https://github.com/simoninithomas/deep_reinforcement_learning_course
Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch
a2c actor-critic deep-learning deep-q-learning deep-q-network deep-reinforcement-learning ppo pytorch qlearning tensorflow tensorflow-tutorials unity
Last synced: 14 May 2025
https://github.com/simoninithomas/Deep_reinforcement_learning_Course
Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch
a2c actor-critic deep-learning deep-q-learning deep-q-network deep-reinforcement-learning ppo pytorch qlearning tensorflow tensorflow-tutorials unity
Last synced: 19 Jul 2025
https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
a2c acktr actor-critic advantage-actor-critic ale atari continuous-control deep-learning deep-reinforcement-learning hessian kfac kronecker-factored-approximation mujoco natural-gradients ppo proximal-policy-optimization pytorch reinforcement-learning roboschool second-order
Last synced: 13 Apr 2025
https://github.com/shangtongzhang/deeprl
Modularized Implementation of Deep RL Algorithms in PyTorch
a2c categorical-dqn ddpg deep-reinforcement-learning deeprl double-dqn dqn dueling-network-architecture option-critic option-critic-architecture ppo prioritized-experience-replay pytorch quantile-regression rainbow td3
Last synced: 13 Apr 2025
https://github.com/ShangtongZhang/DeepRL
Modularized Implementation of Deep RL Algorithms in PyTorch
a2c categorical-dqn ddpg deep-reinforcement-learning deeprl double-dqn dqn dueling-network-architecture option-critic option-critic-architecture ppo prioritized-experience-replay pytorch quantile-regression rainbow td3
Last synced: 01 Apr 2025
https://github.com/seungeunrho/minimalrl
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
a2c a3c acer ddpg deep-learning deep-reinforcement-learning dqn machine-learning policy-gradients ppo pytorch reinforce reinforcement-learning sac simple
Last synced: 15 May 2025
https://github.com/seungeunrho/minimalRL
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
a2c a3c acer ddpg deep-learning deep-reinforcement-learning dqn machine-learning policy-gradients ppo pytorch reinforce reinforcement-learning sac simple
Last synced: 03 Apr 2025
https://github.com/ai4finance-foundation/finrl-trading
For trading. Please star.
a2c-algorithm automated-stock-trading ddpg deep-reinforcement-learning ensemble-strategy openai-gym ppo sharpe-ratio stock-trading stock-trading-strategy
Last synced: 15 May 2025
https://github.com/AI4Finance-Foundation/FinRL-Trading
For trading. Please star.
a2c-algorithm automated-stock-trading ddpg deep-reinforcement-learning ensemble-strategy openai-gym ppo sharpe-ratio stock-trading stock-trading-strategy
Last synced: 05 May 2025
https://github.com/nikhilbarhate99/ppo-pytorch
Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
deep-learning deep-reinforcement-learning policy-gradient ppo ppo-pytorch proximal-policy-optimization pytorch pytorch-implmention pytorch-tutorial reinforcement-learning reinforcement-learning-algorithms
Last synced: 15 May 2025
https://github.com/XinJingHao/DRL-Pytorch
Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)
asl c51 categorical-dqn ddpg deep-reinforcement-learning double-dqn dueling-dqn machine-learning noisynet-dqn ppo prioritized-experience-replay pytorch q-learning reinforcement-learning sac td3
Last synced: 04 Mar 2025
https://github.com/nikhilbarhate99/PPO-PyTorch
Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
deep-learning deep-reinforcement-learning policy-gradient ppo ppo-pytorch proximal-policy-optimization pytorch pytorch-implmention pytorch-tutorial reinforcement-learning reinforcement-learning-algorithms
Last synced: 29 Apr 2025
https://github.com/marlbenchmark/on-policy
This is the official implementation of Multi-Agent PPO (MAPPO).
algorithms hanabi mappo mpes multi-agent ppo smac starcraftii
Last synced: 15 May 2025
https://github.com/kengz/slm-lab
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
a2c a3c benchmark deep-reinforcement-learning dqn policy-gradient ppo pytorch reinforcement-learning sac
Last synced: 14 May 2025
https://github.com/kengz/SLM-Lab
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
a2c a3c benchmark deep-reinforcement-learning dqn policy-gradient ppo pytorch reinforcement-learning sac
Last synced: 01 Apr 2025
https://github.com/khrylx/pytorch-rl
PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
a2c deep-reinforcement-learning fisher-vectors generative-adversarial-network policy-gradient ppo proximal-policy-optimization pytorch pytorch-rl reinforcement-learning trpo
Last synced: 12 Apr 2025
https://github.com/vietnh1009/super-mario-bros-ppo-pytorch
Proximal Policy Optimization (PPO) algorithm for Super Mario Bros
ai deep-learning gym mario openai openai-gym ppo ppo2 proximal-policy-optimization python python3 pytorch reinforcement-learning super-mario-bros
Last synced: 16 May 2025
https://github.com/Khrylx/PyTorch-RL
PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
a2c deep-reinforcement-learning fisher-vectors generative-adversarial-network policy-gradient ppo proximal-policy-optimization pytorch pytorch-rl reinforcement-learning trpo
Last synced: 29 Apr 2025
https://github.com/qfettes/deeprl-tutorials
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
a2c actor-critic advantage-actor-critic categorical-dqn deep-q-network deep-recurrent-q-network deep-reinforcement-learning deeprl-tutorials double-dqn dueling-dqn gae multi-step-learning noisy-networks ppo prioritized-experience-replay python3 pytorch quantile-regression rainbow reinforcement-learning
Last synced: 16 May 2025
https://github.com/qfettes/DeepRL-Tutorials
Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
a2c actor-critic advantage-actor-critic categorical-dqn deep-q-network deep-recurrent-q-network deep-reinforcement-learning deeprl-tutorials double-dqn dueling-dqn gae multi-step-learning noisy-networks ppo prioritized-experience-replay python3 pytorch quantile-regression rainbow reinforcement-learning
Last synced: 01 May 2025
https://github.com/agi-brain/xuance
XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library
a2c atari ddpg decision-making dqn google-research-football maddpg magent mappo mindspore mpe mujoco multi-agent-reinforcement-learning ppo pytorch qmix reinforcement-learning reinforcement-learning-library starcraft2 tensorflow2
Last synced: 12 Jun 2025
https://github.com/sudharsan13296/hands-on-reinforcement-learning-with-python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
asynchronous-advantage-actor-critic deep-deterministic-policy-gradient deep-learning-algorithms deep-q-network deep-recurrent-q-network deep-reinforcement-learning double-dqn drqn dueling-dqn hindsight-experience-replay markov-decision-processes monte-carlo openai-gym policy-gradient policy-gradients ppo q-learning reinforcement-learning sarsa trpo
Last synced: 04 Apr 2025
https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
asynchronous-advantage-actor-critic deep-deterministic-policy-gradient deep-learning-algorithms deep-q-network deep-recurrent-q-network deep-reinforcement-learning double-dqn drqn dueling-dqn hindsight-experience-replay markov-decision-processes monte-carlo openai-gym policy-gradient policy-gradients ppo q-learning reinforcement-learning sarsa trpo
Last synced: 02 Apr 2025
https://github.com/lcswillems/rl-starter-files
RL starter files in order to immediately train, visualize and evaluate an agent without writing any line of code
a2c a3c minigrid multi-process ppo preprocessed-observations pytorch reward-shaping
Last synced: 26 Oct 2025
https://github.com/TianhongDai/reinforcement-learning-algorithms
This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress)
a2c actor-critic algorithm atari2600 ddpg deep-learning deep-reinforcement-learning dqn dueling-dqn flappy-bird ppo proximal-policy-optimization pytorch sac soft-actor-critic trpo trust-region-policy-optimization
Last synced: 19 Jul 2025
https://github.com/luchris429/purejaxrl
Really Fast End-to-End Jax RL Implementations
deep-reinforcement-learning jax ppo reinforcement-learning reinforcement-learning-algorithms
Last synced: 20 Mar 2025
https://github.com/cpnota/autonomous-learning-library
A PyTorch library for building deep reinforcement learning agents.
a2c advantage-actor-critic ddpg deep-deterministic-policy-gradient deep-q-learning deep-reinforcement-learning dqn dqn-pytorch ppo proximal-policy-optimization reinforcement-learning reinforcement-learning-algorithms sac soft-actor-critic
Last synced: 11 Sep 2025
https://github.com/archsyscall/deeprl-tensorflow2
🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2
a2c a3c ddpg deep-learning deep-reinforcement-learning double-dqn dqn dueling-dqn machine-learning ppo rainbow-dqn reinforce reinforcement-learning sac tensorflow tensorflow2 trpo
Last synced: 05 Apr 2025
https://github.com/archsyscall/DeepRL-TensorFlow2
🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2
a2c a3c ddpg deep-learning deep-reinforcement-learning double-dqn dqn dueling-dqn machine-learning ppo rainbow-dqn reinforce reinforcement-learning sac tensorflow tensorflow2 trpo
Last synced: 15 Oct 2025
https://github.com/chenglongchen/pytorch-drl
PyTorch implementations of various Deep Reinforcement Learning (DRL) algorithms for both single agent and multi-agent.
a2c acktr actor-critic advantage-actor-critic ddpg deep-deterministic-policy-gradient deep-q-network deep-reinforcement-learning dqn drl madrl multi-agent ppo proximal-policy-optimization pytorch reinforcement-learning rl
Last synced: 05 Apr 2025
https://github.com/rohanpsingh/learninghumanoidwalking
Training a humanoid robot for locomotion using Reinforcement Learning
bipedal-robots cassie humanoids jvrc-1 mujoco ppo reinforcement-learning
Last synced: 04 Oct 2025
https://github.com/Omegastick/pytorch-cpp-rl
PyTorch C++ Reinforcement Learning
a2c actor-critic advantage-actor-critic continuous-control cplusplus cpp libtorch ppo proximal-policy-optimization pytorch pytorch-cpp-frontend pytorch-rl reinforcement-learning reinforcement-learning-algorithms
Last synced: 07 May 2025
https://github.com/dongminlee94/deep_rl
PyTorch implementation of deep reinforcement learning algorithms
a2c ddpg ddqn deep-reinforcement-learning dqn model-free-rl npg ppo pytorch sac sac-aea td3 trpo vpg
Last synced: 05 Apr 2025
https://github.com/mishalaskin/rad
RAD: Reinforcement Learning with Augmented Data
codebase data- data-augmentations deep-learning deep-learning-algorithms deep-neural-networks deep-q-learning deep-q-network deep-reinforcement-learning deeplearning-ai dm-control model-free mujoc off-policy ppo rad reinforcement-learning rl sac soft-actor-critic
Last synced: 06 Apr 2025
https://github.com/sudharsan13296/deep-reinforcement-learning-with-python
Master classic RL, deep RL, distributional RL, inverse RL, and more using OpenAI Gym and TensorFlow with extensive Math
a2c a3c actor-critic bellman-equation c51 ddpg deep-learning deep-reinforcement-learning double-dqn dqn inverse-reinforcement-learning openai-gym policy-gradient ppo q-learning reinforcement-learning sac td3 trpo
Last synced: 05 Apr 2025
https://github.com/skylark0924/machine-learning-is-all-you-need
🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need!
actor-critic convolutional-neural-networks ddpg decision-trees deep-reinforcement-learning dqn gan k-nearest-neighbours keras logistic-regression lstm naive-bayes-classifier ppo pytorch qlearning random-forest resnet support-vector-machine tensorflow trpo
Last synced: 05 Apr 2025
https://github.com/iffix/machin
Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...
a3c-pytorch ddpg deep-learning distributed dqn ppo prioritized-experience-replay python pytorch pytorch-lightning pytorch-reinforcement-learning reinforcement-learning sac td3
Last synced: 04 Apr 2025
https://github.com/pythonlessons/reinforcement_learning
Reinforcement learning tutorials
a2c a3c actor-critic-algorythm bipedalwalker d3qn ddqn dqn dueling-dqn lunarlander policy-gradient ppo ppo-agent reinforcement-learning
Last synced: 06 Oct 2025
https://github.com/zuoxingdong/lagom
lagom: A PyTorch infrastructure for rapid prototyping of reinforcement learning algorithms.
artificial-intelligence cem cmaes ddpg deep-deterministic-policy-gradient deep-learning deep-reinforcement-learning evolution-strategies machine-learning mujoco policy-gradient ppo proximal-policy-optimization python pytorch reinforcement-learning research sac soft-actor-critic td3
Last synced: 02 Aug 2025
https://github.com/sail-sg/oat
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
alignment distributed-rl distributed-training dpo dueling-bandits grpo llm llm-aligment llm-exploration online-alignment online-rl ppo r1-zero reasoning rlhf thompson-sampling
Last synced: 08 May 2025
https://github.com/rlgraph/rlgraph
RLgraph: Modular computation graphs for deep reinforcement learning
deep-learning deep-reinforcement-learning dqn machine-learning neural-networks ppo pytorch reinforcement-learning tensorflow
Last synced: 03 Oct 2025
https://github.com/huawei-noah/xingtian
xingtian is a componentized library for the development and verification of reinforcement learning algorithms
dqn impala muzero ppo qmix reinforcement-learning-algorithms
Last synced: 05 Apr 2025
https://github.com/idreesshaikh/Autonomous-Driving-in-Carla-using-Deep-Reinforcement-Learning
Deep Reinforcement Learning (PPO) in Autonomous Driving (Carla) [from scratch]
autonomous-driving carla-driving-simulator carla-environment carla-simulator ddqn deep-learning deep-learning-algorithms deep-reinforcement-learning openai ppo proximal-policy-optimization pytorch reinforcement-learning self-driving self-driving-car self-driving-car-simulation self-driving-cars
Last synced: 26 Mar 2025
https://github.com/miroblog/tf_deep_rl_trader
Trading Environment(OpenAI Gym) + PPO(TensorForce)
ppo proximal-policy-optimization stock-market tensorflow tensorforce trading
Last synced: 24 Mar 2025
https://github.com/taherfattahi/ppo-rocket-landing
Proximal Policy Optimization (PPO) algorithm using PyTorch to train an agent for a rocket landing task in a custom environment
ai machine-learning ppo ppo-pytorch pytorch reinforcement-learning
Last synced: 04 Apr 2025
https://github.com/jackaduma/vicuna-lora-rlhf-pytorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
chatgpt finetune gpt llama llm lora peft ppo pytorch reward-models rlhf vicuna vicuna-7b
Last synced: 13 Apr 2025
https://github.com/lcswillems/torch-ac
Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO
a2c a3c actor-critic advantage-actor-critic deep-reinforcement-learning minigrid multi-process ppo proximal-policy-optimization pytorch recurrent recurrent-neural-networks reinforcement-learning reward-shaping
Last synced: 10 Oct 2025
https://github.com/liuzuxin/fsrl
🚀 A fast safe reinforcement learning library in PyTorch
cpo cvpo decision-making library ppo pytorch reinforcement-learning robotics sac safe-rl safety-critical trpo trustworthy-ai
Last synced: 03 Apr 2025
https://github.com/jianzhnie/Open-R1
The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.
chatgpt gpt llama llm lora peft ppo rlhf stanford-alpaca
Last synced: 05 Oct 2025
https://github.com/marcometer/episodic-transformer-memory-ppo
Clean baseline implementation of PPO using an episodic TransformerXL memory
actor-critic deep-reinforcement-learning episodic-memory gated-transformer-xl gtrxl memory-gym on-policy policy-gradient pomdp ppo proximal-policy-optimization pytorch transformer transformer-xl trxl
Last synced: 09 Aug 2025
https://github.com/yongzhuo/chatglm-maths
chatglm-6b微调/LORA/PPO/推理, 样本为自动生成的整数/小数加减乘除运算, 可gpu/cpu
chatglm chatgpt fine-tuning maths maths-problem ppo
Last synced: 11 Oct 2025
https://github.com/gordicaleksa/pytorch-learn-reinforcement-learning
A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.
deep-learning deep-q-network dqn jupyter policy-gradient ppo python pytorch pytorch-dqn pytorch-implementation pytorch-policy-gradient pytorch-ppo reinforcement-learning reinforcement-learning-algorithms rl
Last synced: 12 Sep 2025
https://github.com/marcometer/recurrent-ppo-truncated-bptt
Baseline implementation of recurrent PPO using truncated BPTT
actor-critic bptt deep-learning deep-reinforcement-learning gru lstm on-policy policy-gradient pomdp ppo proximal-policy-optimization pytorch recurrence recurrent recurrent-neural-networks truncated
Last synced: 16 Mar 2025
https://github.com/jackaduma/chatglm-lora-rlhf-pytorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
chatglm chatglm-6b chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf
Last synced: 27 Apr 2025
https://github.com/adik993/ppo-pytorch
Proximal Policy Optimization(PPO) with Intrinsic Curiosity Module(ICM)
cartpole-v1 deep-learning generalized-advantage-estimation icm intrinsic-curiosity-module mountaincar-v0 pendulum-v0 ppo proximal-policy-optimization pytorch reinforcement-learning
Last synced: 19 Jul 2025
https://github.com/vietnh1009/contra-ppo-pytorch
Proximal Policy Optimization (PPO) algorithm for Contra
ai contra contra-nes deep-learning gym openai ppo ppo2 proximal-policy-optimization reinforcement-learning
Last synced: 04 May 2025
https://github.com/niutrans/vision-llm-alignment
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
alignment dpo llama3-vision llava llm mllm multi-model ppo reward rlhf sft vision
Last synced: 06 Apr 2025
https://github.com/NiuTrans/Vision-LLM-Alignment
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
alignment dpo llama3-vision llava llm mllm multi-model ppo reward rlhf sft vision
Last synced: 07 May 2025
https://github.com/urinx/reinforcementlearning
Reinforcing Your Learning of Reinforcement Learning
advantage-actor-critic alphago alphago-zero atari-2600 cartpole ddpg doom dqn frozenlake gomoku mcts policy-gradient ppo q-learning reinforcement-learning space-invaders tic-tac-toe
Last synced: 14 Jul 2025
https://github.com/lorenmt/minimal-isaac-gym
A Minimal Example of Isaac Gym with DQN and PPO.
Last synced: 02 Apr 2025
https://github.com/scitator/run-skeleton-run
Reason8.ai PyTorch solution for NIPS RL 2017 challenge
actor-critic ddpg ddpg-agent nips nips-2017 physics-based ppo pytorch pytorch-solution reinforcement-learning tensorflow trpo
Last synced: 05 Sep 2025
https://github.com/godka/pensieve-ppo
The simplest implementation of Pensieve (SIGCOMM' 17) via state-of-the-art RL algorithms, including PPO, DQN, SAC, and support for both TensorFlow and PyTorch.
a2c deep-learning dqn pensieve ppo pytorch reinforcement-learning tensorflow
Last synced: 07 Apr 2025
https://github.com/zhaoyingjun/general
Alignment成为GPT类大模型微调的必须环节,深度强化学习是Alignment的核心。本项目是一个支持非gym环境训练、支持可视化配置的深度强化学习应用编程框架,30分钟上手强化学习编程。
ddpg deep-reinforcement-learning dqn gui gym ppo tensorflow2
Last synced: 13 Apr 2025
https://github.com/datvodinh/recurrent-ppo
A Reinforcement Learning Project using PPO + LSTM
lstm ppo reinforcement-learning
Last synced: 14 Oct 2025
https://github.com/jcwleo/mario_rl
a2c actor-critic curiosity-driven deep-learning icm ppo pytorch reinforcement-learning supermariobros
Last synced: 03 Apr 2025
https://github.com/CN-UPB/DeepCoMP
Dynamic multi-cell selection for cooperative multipoint (CoMP) using (multi-agent) deep reinforcement learning
cell-selection cellular comp mobile multi-agent-reinforcement-learning ppo python ray reinforcement-learning rllib simulation wireless
Last synced: 13 May 2025
https://github.com/tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
differential-privacy diffusion-models distributed-training fedavg federated-learning flan-t5-xxl gemma image-captioning jax large-language-models llama maml meta-learning mixed-precision mlsys model-parallelism ppo reinforcement-learning seq2seq stable-diffusion
Last synced: 06 Apr 2025
https://github.com/jackaduma/alpaca-lora-rlhf-pytorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
alpaca chatgpt deepspeed finetune gpt llama llm lora peft ppo pytorch reward-models rlhf
Last synced: 16 Jun 2025
https://github.com/instadeepai/sebulba
🪐 The Sebulba architecture to scale reinforcement learning on Cloud TPUs in JAX
ai deep-learning hpc jax machine-learning podracer ppo reinforcement-learning sebulba tpu
Last synced: 03 Nov 2025
https://github.com/seungeunrho/football-paris
The exact codes used by the team "liveinparis" at the kaggle football competition ranked 6th/1141
gfootball kaggle liveinparis ppo pytorch reinforcement-learning self-play
Last synced: 28 Apr 2025
https://github.com/ugurkanates/neurirs2019dronechallengerl
Long-Term Planning with Deep Reinforcement Learning on Autonomous Drones
autonomous autonomousdrones deeplearning drl drones pathplanning ppo reinforcementlearning
Last synced: 26 Apr 2025
https://github.com/chagmgang/pytorch_ppo_rl
Pytorch implementation of intrinsic curiosity module with proximal policy optimization
breakout curiosity deep-learning icm intrinsic-curiosity-module multi-process ppo pytorch reinforcement-learning
Last synced: 22 Apr 2025
https://github.com/guichristmann/thormang3-gogoro-PPO
Steering-based control of a two-wheeled vehicle using RL-PPO and NVIDIA Isaac Gym.
Last synced: 02 Apr 2025
https://github.com/datvodinh/ppo-transformer
A Reinforcement Learning Project using PPO + Transformer
ppo reinforcement-learning transformer
Last synced: 27 Sep 2025
https://github.com/bilalkabas/DRL-Nav
Deep Reinforcement Learning based autonomous navigation in realistic simulation environments.
airsim deep-reinforcement-learning gym ppo unreal-engine
Last synced: 11 Mar 2025
https://github.com/jason-cky/deeprl-pytorch
Pytorch implementations of various Deep Reinforcement Learning algorithms on pybullet environments.
ddpg ppo pybullet-environments python3 pytorch-implementation reinforcement-learning-algorithms rlbench td3 trpo
Last synced: 10 Jul 2025
https://github.com/vietnh1009/sonic-ppo-pytorch
Proximal Policy Optimization (PPO) algorithm for Sonic the Hedgehog
ai deep-learning gym openai openai-gym ppo ppo2 proximal-policy-optimization reinforcement-learning sonic sonic-the-hedgehog
Last synced: 04 May 2025
https://github.com/kachayev/gym-microrts-paper-sb3
RL agent to play μRTS with Stable-Baselines3 and PyTorch
gym-environment ppo pytorch real-time-strategy reinforcement-learning reinforcement-learning-agent
Last synced: 12 Apr 2025
https://github.com/yeyupiaoling/pytorch-ppo
基于Pytorch实现的PPO强化学习模型,支持训练各种游戏,如超级马里奥,雪人兄弟,魂斗罗等等。
Last synced: 14 Aug 2025
https://github.com/yeyupiaoling/reinforcementlearning
强化学习教程
dqn flappybird gym gym-retro paddlepaddle ppo visualdl
Last synced: 09 Jul 2025
https://github.com/hcnoh/rl-collection-pytorch
A collection of Reinforcement Learning implementations with PyTorch
actor-critic continuous-control deep-learning deep-reinforcement-learning gae generalized-advantage-estimation openai-gym policy-gradient ppo proximal-policy-optimization pytorch reinforcement-learning trpo trust-region-policy-optimization
Last synced: 30 Apr 2025
https://github.com/ai-glimpse/toyrl
Reinforce learning is awesome!
a2c aiglimpse build-your-own-x double-dqn dqn ppo python3 reinforce reinforcement-learning sarsa toyrl
Last synced: 10 Jun 2025
https://github.com/mindspore-courses/deep-reinforcement-learning-algorithms-with-mindspore
MindSpore implementations of deep reinforcement learning algorithms and environments
deep-learning deep-reinforcement-learning mindspore ppo reinforcement-learning tutorial
Last synced: 28 Apr 2025