{"id":23940462,"url":"https://github.com/pockerman/py_cube_ai","last_synced_at":"2026-05-07T14:32:44.040Z","repository":{"id":42189261,"uuid":"383052503","full_name":"pockerman/py_cube_ai","owner":"pockerman","description":"Reinforcement learning algorithms with Python","archived":false,"fork":false,"pushed_at":"2023-12-24T07:50:00.000Z","size":2369,"stargazers_count":3,"open_issues_count":21,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-06T03:17:10.880Z","etag":null,"topics":["openai-gym","python","pytorch","reinforcement-learning","reinforcement-learning-algorithms","ros2"],"latest_commit_sha":null,"homepage":"https://pockerman-py-cubeai.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pockerman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-07-05T07:28:31.000Z","updated_at":"2024-05-05T17:31:48.000Z","dependencies_parsed_at":"2023-12-16T20:32:49.073Z","dependency_job_id":"6ec27b45-8559-4565-85eb-bb659895a7cf","html_url":"https://github.com/pockerman/py_cube_ai","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pockerman%2Fpy_cube_ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pockerman%2Fpy_cube_ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pockerman%2Fpy_cube_ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pockerman%2Fpy_cube_ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pockerman","download_url":"https://codeload.github.com/pockerman/py_cube_ai/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240415463,"owners_count":19797637,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["openai-gym","python","pytorch","reinforcement-learning","reinforcement-learning-algorithms","ros2"],"created_at":"2025-01-06T03:17:15.398Z","updated_at":"2026-05-07T14:32:38.991Z","avatar_url":"https://github.com/pockerman.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## PyCubeAI\n\n[![Documentation Status](https://readthedocs.org/projects/pockerman-py-cubeai/badge/?version=latest)](https://pockerman-py-cubeai.readthedocs.io/en/latest/?badge=latest) [![Python application](https://github.com/pockerman/rl_python/actions/workflows/python-app.yml/badge.svg?branch=master)](https://github.com/pockerman/rl_python/actions/workflows/python-app.yml)\n\nPyCubeAI is an effort to create an environment for design, devlopment simulation and deployment of reinforcement learning algorithms \nthat target robotic platforms. \n\nThe project documentation can be found at \u003ca href=\"https://pockerman-py-cubeai.readthedocs.io/en/latest/\"\u003eCubeAI documentaion\u003c/a\u003e.\nThe C++ flavor of the project can be found at \u003ca href=\"https://github.com/pockerman/cubeai\"\u003eCubeAI\u003c/a\u003e.\n\n\n\n\nImplementation of reinforcement learning algorithms. Algorithms have been refactored/reimplemented\nfrom various resources such as:\n\n- \u003ca href=\"https://github.com/udacity/deep-reinforcement-learning\"\u003eUdacity DRL repository\u003c/a\u003e\n- \u003ca href=\"https://livevideo.manning.com/module/56_8_7/reinforcement-learning-in-motion/\"\u003eReinforcement learning in motion\u003c/a\u003e\n- \u003ca href=\"#\"\u003eDeep Reinforcement Learning in Action\u003c/a\u003e\n\n## Dependencies\n\n- \u003ca href=\"#\"\u003eOpenAI Gym\u003c/a\u003e\n- \u003ca href=\"#\"\u003ePyTorch\u003c/a\u003e\n- \u003ca href=\"#\"\u003eNumPy\u003c/a\u003e\n- \u003ca href=\"https://cyberbotics.com/#cyberbotics\"\u003eWebots\u003c/a\u003e\n\n## Installation\nTODO \n### Installing webots and getting started\nCheckout the instructions \u003ca href=\"webots_howto.md\"\u003ehere\u003c/a\u003e how to install and get started with Webots.\n\n## Documentation\nTODO\n\n## Examples\n\n### Reinforcement learning basic algorithms\n\n- \u003ca href=\"src/examples/dummy/dummy_gym_agent_example.py\"\u003eDummy agent on ```MountainCar-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/armed_bandit_epsilon_greedy.py\"\u003eArmed-bandit with epsilon greedy policy\u003c/a\u003e\n- \u003ca href=\"#\"\u003eArmed-bandit with softmax policy\u003c/a\u003e\n- \u003ca href=\"src/examples/pytorch_examples/advertisement_placement.py\"\u003eContextual bandits\u003c/a\u003e\n\n#### Dynamic programming\n\n- \u003ca href=\"src/examples/dp/iterative_policy_evaluation_frozen_lake.py\"\u003eIterative policy evaluation on ```FrozenLake-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/dp/policy_improvement_frozen_lake.py\"\u003ePolicy improvement on ```FrozenLake-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/dp/policy_iteration_frozen_lake.py\"\u003ePolicy iteration on ```FrozenLake-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/dp/value_iteration_frozen_lake.py\"\u003eValue iteration on ```FrozenLake-v0```\u003c/a\u003e\n\n#### Monte Carlo\n\n- \u003ca href=\"src/examples/mc/mc_prediction_black_jack.py\"\u003eMonte Carlo prediction on ```Blackjack-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/mc/mountain_car_approximate_monte_carlo.py\"\u003eApproximate Monte Carlo on ```MountainCar-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/mc/mc_tree_search_taxi_v3.py.py\"\u003eMonte Carlo tree search ```Taxi-v3```\u003c/a\u003e\n\n#### Temporal differencing\n\n- \u003ca href=\"src/examples/td/td_zero_cart_pole_v0.py\"\u003eTD(0) on ```CartPole-v0```\u003c/a\u003e \n- \u003ca href=\"src/examples/td/cliff_walking_q_learning.py\"\u003eSARSA on ```Cliffwalking-v0```\u003c/a\u003e \n- \u003ca href=\"src/examples/td/sarsa_cart_pole_v0.py\"\u003eSARSA on ```CartPole-v0```\u003c/a\u003e \n- \u003ca href=\"src/examples/td/cliff_walking_q_learning.py\"\u003eQ-learning on ```Cliffwalking-v0``` \u003c/a\u003e \n- \u003ca href=\"src/examples/td/q_learning_cart_pole_v0.py\"\u003eQ-learning on ```CartPole-v0``` \u003c/a\u003e \n- \u003ca href=\"#\"\u003eExpected SARSA  \u003c/a\u003e (TODO)\n- \u003ca href=\"#\"\u003eSARSA lambda  \u003c/a\u003e (TODO)\n- \u003ca href=\"src/examples/td/td_zero_semi_gradient_mountain_car.py\"\u003eTD(0) semi-gradient on ```MountainCar-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/td/sarsa_semi_gradient_mountain_car_v0.py\"\u003eSARSA semi-gradient on ```MountainCar-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/td/q_learning_moutain_car_v0.py\"\u003eQ-learning on ```MountainCar-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/td/double_q_learning_cart_pole_v0.py\"\u003eDouble Q-learning on ```CartPole-v0```\u003c/a\u003e\n\n#### DQN\n\n- \u003ca href=\"src/examples/dqn/dqn_grid_world.py\"\u003eVanilla DQN on ```Gridworld```\u003c/a\u003e\n- \u003ca href=\"src/examples/dqn/dqn_with_experience_replay_on_grid_world.py\"\u003eDQN with experience replay on ```Gridworld```\u003c/a\u003e\n- \u003ca href=\"#\"\u003eDQN with target network on ```Gridworld```\u003c/a\u003e\n- \u003ca href=\"src/examples/dqn/dqn_lunar_lander.py\"\u003eVanilla DQN on ```CartPole-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/dqn/dqn_lunar_lander.py\"\u003eVanilla DQN on ```LunarLander-v2```\u003c/a\u003e\n\n#### Approximate methods\n\n- \u003ca href=\"#\"\u003eSimple gradient descent solver\u003c/a\u003e\n- \u003ca href=\"src/examples/pg/reinforce_cart_pole.py\"\u003eREINFORCE on ```CartPole-v0```\u003c/a\u003e\n- \u003ca href=\"src/examples/ac/a2c_cart_pole_v1.py\"\u003eA2C on ```CartPole-v1```\u003c/a\u003e\n\n## Robotics simulations\n\n- \u003ca href=\"src/examples/webots/epuck_robot/controllers/epuck_q_learn_simple_controller.py\"\u003eQ-learning with epuck robot.\u003c/a\u003e\n\n## References\n\n- ```Deep Reinforcement Learning in Action```\n- \u003ca href=\"https://www.youtube.com/watch?v=2GEfuYoSXqg\"\u003etinyML Talks: Deploying AI to Embedded Systems\u003c/a\u003e\n- \u003ca href=\"https://www.youtube.com/watch?v=ZDGlx2ulJv0\"\u003etinyML Talks: Exploring techniques to build efficient and robust TinyML deployments\u003c/a\u003e\n\n\n\n\n \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpockerman%2Fpy_cube_ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpockerman%2Fpy_cube_ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpockerman%2Fpy_cube_ai/lists"}