{"id":21850597,"url":"https://github.com/freakwill/skinner","last_synced_at":"2025-09-09T15:46:01.911Z","repository":{"id":57468059,"uuid":"288159949","full_name":"Freakwill/skinner","owner":"Freakwill","description":"🐁 Skinner, a new framework of reinforcement learning by Python","archived":false,"fork":false,"pushed_at":"2021-09-20T07:47:26.000Z","size":913,"stargazers_count":0,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-04-26T06:02:59.336Z","etag":null,"topics":["gym","python","qlearning","reinforcement-learning","skinner"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Freakwill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-17T11:17:13.000Z","updated_at":"2021-09-20T07:47:28.000Z","dependencies_parsed_at":"2022-09-19T09:01:51.596Z","dependency_job_id":null,"html_url":"https://github.com/Freakwill/skinner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Freakwill%2Fskinner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Freakwill%2Fskinner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Freakwill%2Fskinner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Freakwill%2Fskinner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Freakwill","download_url":"https://codeload.github.com/Freakwill/skinner/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244843184,"owners_count":20519778,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gym","python","qlearning","reinforcement-learning","skinner"],"created_at":"2024-11-28T00:18:28.532Z","updated_at":"2025-03-21T17:49:03.235Z","avatar_url":"https://github.com/Freakwill.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# skinner\nSkinner, a new framework of reinforcement learning by Python\n\nIt is built for the beginner of RL.\n\n\n\nIt is under development, the APIs are not designed perfectly, but runs stably. For grid worlds, it is mature enough.\n\n\n\nEnjoy `skinner`!\n\n![](rat.gif)\n\n## Requrements\n\n- gym\n- numpy\n\n## Download\n\ndownload from github, or pypi by pip command `pip install skinner`.\n\n## Design\n\nWe consider the **observer design pattern**. The env and agents in it observe each other generally. The agents observe the env to how to act and got the reward, env observe the agents and other objects to render the viewer and record the information.\n\n## Feature\n\nso easy\n\n## Use\n\n### Quick start\n\nrun `demo.py` in examples. There are other examples: `demo1.py, demo2.py`.\n\nAlso, one could watch animations in [`bilibili`](https://www.bilibili.com/video/bv1ca4y1E7Dr)\n\n\n\n### Examples\n\nThe author make 3 examples. users are suggested to review the codes. Define objects in `objects.py`, define new envs in `simple_grid.py` then write a demonstration programming in a script (see `demo.py`).\n\n### Define envs\n\nIf you just want to build a simple env, then the following is an option, a grid world.\n\n```python\n#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\n\"\"\"Demo of RL\n\nAn env with some traps and a gold.\n\"\"\"\n\nfrom skinner import *\nfrom objects import *\n\nclass MyGridWorld(GridMaze, SingleAgentEnv):\n    \"\"\"Grid world\n    \n    A robot playing the grid world, tries to find the golden (yellow circle), meanwhile\n    it has to avoid of the traps(black circles)\n    Extends:\n        GridMaze: grid world with walls\n        SingleAgentEnv: there is only one agent\n    \"\"\"\n    \n    # configure the env\n    \n    # get the positions of the objects (done automatically)\n    CHARGER = ...\n    TRAPS = ...\n    DEATHTRAPS = ...\n    GOLD = ...\n\n    def __init__(self, *args, **kwargs):\n        super(MyGridWorld, self).__init__(*args, **kwargs)\n        self.add_walls(conf['walls'])\n        self.add_objects((*traps, *deathtraps, charger, gold))\n\n    # Define the condition when the demo of rl will stop.\n    def is_terminal(self):\n        return self.agent.position in self.DEATHTRAPS or self.agent.position == self.GOLD or self.agent.power\u003c=0\n\n    def is_successful(self):\n        return self.agent.position == self.GOLD\n\n    # Following methods are not necessary, that only for recording the process of rl\n    def post_process(self):\n        if self.is_successful():\n            self.history['n_steps'].append(self.agent.n_steps)\n        else:\n            self.history['n_steps'].append(self.max_steps)\n        self.history['reward'].append(self.agent.total_reward)\n        self.agent.post_process()\n\n    def begin_process(self):\n        self.history['n_steps'] = []\n        self.history['reward'] = []\n\n    def end_process(self):\n        import pandas as pd\n        data = pd.DataFrame(self.history)\n        data.to_csv('history.csv')\n\n\n```\n\n\n\n#### Configure env and its objects\n\nsee `conf.yaml` for an example. The object classes would be defined in `objects.py`.\n\n```yaml\n# Grid Maze: \n# n_cols * n_rows: size of the maze, the number of squares\n# edge: the length of the edge of each square\n# walls: the positions of walls as the components of the environment\n\n\n## number of grids\nn_cols: 7\nn_rows: 7\n## size of every grid\nedge: 80\n\n\n## positions of walls\nwalls: !!set\n  {\n  !!python/tuple [2, 6],\n  !!python/tuple [3, 6],\n  ...\n  !!python/tuple [4, 2]}\n\n\n## objects in environment (excluding the agent)\n## the significant attrs of objects are position color and size, the size will be calculated\n## automatically according to proportion (size = proportion * edge)\n## traps, not terminal\ntraps: !!python/object:objects.ObjectGroup\n  name: 'traps'\n  members:\n    - !!python/object:objects.Trap\n      position: !!python/tuple [3, 5]\n      color: [1,0.5,0]\n\n    - !!python/object:objects.Trap\n      position: !!python/tuple [1, 3]\n      color: [1,0.5,0]\n\n    - !!python/object:objects.Trap\n      position: !!python/tuple [7, 1]\n      color: [1,0.5,0]\n\n\n## deathtraps, terminal\ndeathtraps: !!python/object:objects.ObjectGroup\n  name: 'deathtraps'\n  members:\n    - !!python/object:objects.DeathTrap\n      position: !!python/tuple [6, 5]\n      color: [.8,0,0.5]\n\n    - !!python/object:objects.DeathTrap\n      position: !!python/tuple [2, 1]\n      color: [.8,0,0.5]\n\n## gold, terminal\ngold: !!python/object:objects.Gold\n  name: 'gold'\n  position: !!python/tuple\n    [7, 7]\n  color: [1,0.8,0]\n```\n\n\n\n### Define objects\n\n1. the shape of object (circle by default)\n2. the method to plot (don't override it, if the shape is simple)\n\n```python\nclass _Object(Object):\n    props = ('name', 'position', 'color', 'size')\n    default_position=(0, 0)  # set default value to help you reducing the codes when creating an object\n\nclass Gold(_Object):\n    def draw(self, viewer):\n        '''this method is the most direct to determine how to plot the object\n        You should define the shape and coordinate\n        '''\n        ...\n\nclass Charger(_Object):\n    def create_shape(self):\n        '''redefine the shape, here we define a squre with edges length of 40.\n        The default shape is a circle\n        '''\n        a = 20\n        self.shape = rendering.make_polygon([(-a,-a), (a,-a), (a,a), (-a,a)])\n        self.shape.set_color(*self.color)\n```\n\n\n\n### Define agents\n\n1. transition function $f(s,a)$\n2. reward function $r(s,a,s')$\n\n```python\nfrom skinner import *\n\nclass MyRobot(StandardAgent):\n    actions = Discrete(4)\n    \n    # define the shape\n    size = 30\n    color = (0.8, 0.6, 0.4)\n\n    def _reset(self):\n        # define the initial state\n        ...\n        \n    def _next_state(self, state, action):\n        \"\"\"transition function: s, a -\u003e s'\n        \"\"\"\n        ...\n\n\n    def _get_reward(self, state0, action, state1):\n        \"\"\"reward function: s,a,s'-\u003er\n        \"\"\"\n        ...\n\n\n# define parameters\nagent = MyRobot(alpha = 0.3, gamma = 0.9)\n```\n\n\n## Example\n\n### codes\n\nsee scripts in `examples`\n\n### results\n\n![](performance.png)\n\n\n\n## Commemoration\n\nIn memory of [B. F. Skinner](https://www.bfskinner.org/) (1904-1990), a great American psychologist. The RL is mainly inspired by his behaviorism. There are many contributors in the history of behaviorist psychology, he may be the most famous one.\n\n ![](skinner.jpg)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffreakwill%2Fskinner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffreakwill%2Fskinner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffreakwill%2Fskinner/lists"}